Which model to use?

balmyfirflyer · May 23, 2026, 10:56pm

Hello there, I was wondering if someone could offer some advice about which model to use in my situation

I have an image that was generated for apple image playground, this is perfect for my needs and I want to use this single image to generate other images using the same girl that is in the picture

I have tried a few image to image models (using Draw Things) but they are not giving me anything close to the same girl in the existing image

I want to do this totally off line please but I am willing to use other software (running a MacBook Air) if suggestions are made but I feel it is the model

After doing some research Flux 1 was suggested, I uploaded my image, tried to mess about with the settings, trying to give no artist license to the model at all but I got nothing close as I said.

My goal is to create a series of images that are for a mock up of a children’s book so that the artist kind of gets the idea, but obviously I need a model that can keep a ‘character’ that is the same girl (or animal)

Any help that I can get would be greatly appreciated

Thanks

John6666 · May 24, 2026, 7:45am

For that use case, I think the safest first step is to try a reference-image image editing model first. This is a bit different from ordinary img2img. There are other methods too, but they usually involve more steps, more setup, or more local hardware constraints.

I would separate this into two different questions:

What workflow gives you the best chance of preserving the character?
What workflow can realistically run locally on your MacBook Air?

Those are related, but they are not the same question.

For the first question, I would not start with ordinary img2img. Normal img2img is useful, but it is not really designed for “keep this exact character and place her in many new scenes.”

In Diffusers terminology, img2img uses the input image as a starting point, and the strength value controls how much noise is added before denoising. Higher strength gives the model more freedom but moves away from the input; lower strength preserves the input more but makes large scene changes harder. See the Diffusers docs on image-to-image and the API note that strength=1 essentially ignores the input image: StableDiffusionImg2ImgPipeline.

For your use case, I would first test reference-image image editing models. They are closer to what you want than plain img2img.

First: test the upper bound online

I would use online demos first, mainly to see whether this kind of workflow can preserve your character well enough at all.

Good first tests:

The reason I would try Qwen-Image-Edit-2511/2509 is that these are not just “make a similar image” models. They are image editing models with reference-image behavior. Qwen-Image-Edit-2511 explicitly mentions improved character consistency, reduced image drift, multi-person consistency, and integrated LoRA capabilities. Qwen-Image-Edit-2509 also documents multi-image editing examples such as person + scene and person + object.

FLUX.1 Kontext is also very relevant. It is designed for image editing from text instructions and image context, and the Dev model is described as focusing on image editing tasks, local/global edits, character preservation, style reference, and iterative editing.

However, I would treat these online demos as an upper-bound test, not as proof that the same workflow will be comfortable locally.

HF Spaces may be running on cloud GPUs, ZeroGPU, or other server-side hardware. Also, even if the Space page exists, the demo may be busy, paused, quota-limited, or temporarily broken at the moment you try it. Your MacBook Air is a separate constraint.

Important caveat: online upper bound != local MacBook Air workflow

This is probably the most important point.

The models that currently look strongest for reference-image character consistency tend to be large and memory-hungry. Smaller local workflows can still be useful, but they usually come with tradeoffs in quality, speed, resolution, and consistency.

I would avoid thinking of it as:

“If the Space works, I can just do the same thing locally.”

A better way to think about it is:

“If the Space works, then this type of workflow is promising. After that, I need to find the smallest local workflow that is good enough on my MacBook Air.”

Those are different problems.

For example, DFloat11/Qwen-Image-Edit-2509-DF11 says that even its compressed version can run on a single 32GB GPU, or on a 24GB GPU with CPU offloading, while maintaining full model quality. That is not the same kind of target as an 8GB or 16GB MacBook Air.

Similarly, FLUX.1 Kontext Dev is a 12B image editing model. It is very relevant conceptually, but I would not assume it is a beginner-friendly local baseline for a MacBook Air.

First thing to check: unified memory

Before choosing a local workflow, I would check how much unified memory your MacBook Air has.

An 8GB Air, 16GB Air, and 24GB Air are very different targets for local image generation.

This matters more than people often expect. On Apple Silicon, tools can use the MPS backend, but memory pressure can still be a major issue. The Diffusers MPS documentation says Diffusers can use Apple Silicon through the PyTorch mps device, but also recommends attention slicing to reduce memory pressure, especially on systems with less than 64GB RAM or when generating larger-than-standard resolutions:

Diffusers MPS docs

So the local question is not only “does it run?” but also:

does it fit in memory?
is it unbearably slow?
does it require CPU offload?
does it swap heavily?
can you use the frontend you want?
can you use the adapter/control/reference workflow you want?
can you debug it if it fails?

The tighter the memory budget is, the more technical the workflow becomes.

If you want a Mac-local app: start with Draw Things

For a Mac user who wants local/offline generation, I would probably start with Draw Things rather than raw Python scripts.

Links:

Draw Things is a local/offline app for iPhone, iPad, and Mac. The App Store page lists support for modern model families such as SDXL, FLUX.1, Qwen Image, Z Image, FLUX.2, LoRA training, ControlNet, inpainting, outpainting, pose editing, and importing community models/LoRAs.

The release notes are also interesting because Draw Things has been adding support for many of the relevant model families, including:

FLUX.1 Kontext Dev image editing tasks
Qwen Image
Qwen Image Edit
Qwen Image Edit 2509
Qwen Image Edit 2511
Kwai Kolors IP Adapter FaceID Plus
PuLID for FLUX.1
FLUX.2 series models

That makes Draw Things probably the easiest Mac-oriented place to start.

But “supported” does not mean “comfortable on every MacBook Air.”

For example, the Draw Things Qwen Image support article gives these approximate peak runtime VRAM numbers for Qwen Image 1.0 variants:

8-bit quantized model: about 16 GiB peak runtime VRAM, suggested for devices with 24 GiB or more total RAM
6-bit quantized model: about 11 GiB peak runtime VRAM, suggested for devices with 16 GiB or more total RAM
FP16/BF16: about 30 GiB peak runtime VRAM, suggested for much larger-memory devices

That page is about Qwen Image 1.0 rather than every Qwen Image Edit variant, so I would not over-apply the numbers mechanically. But it is still a useful warning: these modern image models can be heavy even when quantized.

So I would phrase the local Draw Things path as:

Try Draw Things first if you want a Mac-local app. Start with smaller/quantized models and moderate resolutions. Do not start by assuming that the same Qwen/FLUX workflow you saw in an HF Space will be comfortable on a MacBook Air.

If Draw Things is not enough: ComfyUI Desktop

The more flexible route is ComfyUI, especially if you want to experiment with IP-Adapter, ControlNet, PuLID, multiple references, or custom workflows.

Links:

ComfyUI is powerful because you can build more exact workflows:

base model
reference image model
image encoder
IP-Adapter
FaceID
ControlNet
pose conditioning
inpainting
LoRAs
multiple reference images
upscalers

But this flexibility is also the problem. If you are not already familiar with local Stable Diffusion workflows, ComfyUI can become a troubleshooting project.

This becomes worse when memory is tight. You may need to care about:

model format
FP16 / BF16 / FP8 / GGUF / quantized versions
MPS compatibility
unsupported dtypes
CPU offloading
image encoder placement
ControlNet memory
VAE memory
resolution
custom node versions
whether a workflow was made for CUDA/NVIDIA rather than Apple Silicon

So I would consider ComfyUI the “more powerful but more technical” path.

Diffusers is possible, but I would not make it the first recommendation

There is also the raw Python route with Diffusers:

This is useful if you are comfortable with Python, PyTorch, model loading, dtype choices, MPS, and memory debugging.

But if the goal is to make a children’s book mockup and you are not already deep into local AI image workflows, I would not start here.

Local model/workflow options

Here is how I would roughly categorize the options.

Option	Why it is relevant	MacBook Air realism	Suggested role
Qwen-Image-Edit-2511	Improved character consistency, reduced drift, multi-person consistency, integrated LoRA capabilities	Low for local Air; likely heavy	Best current online upper-bound test
Qwen-Image-Edit-2509	Multi-image editing, `person + scene`, `person + object`, character consistency improvements	Low for local Air; compressed versions still target large VRAM	Strong online test / reference model
FLUX.1 Kontext Dev	Image editing with reference context, character/object/style preservation	Low to medium; 12B model	Online upper-bound test
Draw Things	Local/offline Mac/iOS app with broad model support	Best Mac-friendly starting point, but memory-limited	First local app to try
ComfyUI Desktop	Very flexible node workflow	More technical, memory-sensitive	Advanced local route
Diffusers	Developer route with MPS support	Technical	Use only if comfortable with Python
SDXL + IP-Adapter / FaceID / PuLID	More realistic than Qwen/FLUX for local reference-image workflows	Maybe on 16GB/24GB, but still not lightweight	Local upper compromise
SD1.5 + IP-Adapter / FaceID	Lighter than SDXL	More realistic on low-memory Macs	Practical fallback
LoRA / DreamBooth LoRA	Stronger consistency if you have enough good training images	Training adds complexity	Later step, not first step
DiffusionBee	Simple Mac Stable Diffusion GUI	Easier, but less flexible for this exact problem	Simple fallback, not my main choice

About SDXL + IP-Adapter / FaceID

If the full Qwen/FLUX editing models are too heavy locally, then a more realistic local compromise might be:

SDXL + IP-Adapter
SDXL + IP-Adapter FaceID
SDXL + PuLID
SDXL + ControlNet
SDXL Lightning/Turbo-style model variants
or, if SDXL is too heavy, SD1.5 + IP-Adapter / FaceID

Relevant links:

This is probably the more realistic “local upper compromise” than Qwen-Image-Edit or FLUX.1 Kontext on a MacBook Air.

But I would not describe SDXL + IP-Adapter / FaceID as simple or lightweight.

It can involve several moving parts:

the base model
the IP-Adapter model
the image encoder
FaceID or InsightFace components
possibly a matching LoRA
possibly ControlNet
the frontend
MPS/backend compatibility
resolution/memory tradeoffs

So this may be more realistic locally than Qwen/FLUX, but it is still not necessarily beginner-friendly under memory pressure.

If the MacBook Air has only 8GB or 16GB unified memory, I would be prepared to fall back to SD1.5-based workflows. SD1.5 will usually have a lower ceiling than SDXL or the newer large editing models, but it may be much easier to run and iterate locally.

One reference image may not be enough

Even if the model is good, one image may not fully define the character.

A single front-facing image does not tell the model:

the side view
the back view
the full body proportions
the character’s expressions
how the character looks in different lighting
how the character looks in different poses
whether the outfit should stay fixed
whether the goal is “same identity” or “same illustration style”

This is especially important for a children’s book illustration style, because simplified/cartoon faces often contain less identity information than realistic portraits.

So I would not expect any model to guarantee perfect consistency from one image.

A more practical workflow might be:

Use Qwen-Image-Edit-2511 or FLUX.1 Kontext online to see what good reference-image editing can do.
Generate a few candidate images of the same character.
Keep only the ones that actually look like the same girl.
Make a small character sheet: front view, side view, full body, expressions.
Then try to reproduce a smaller version of the workflow locally.
If local consistency is still not good enough, consider LoRA/DreamBooth LoRA later.

Example test prompts

For the online model test, I would start with small changes first. Do not immediately change everything at once.

For example:

Use the girl in the reference image as the same character. Preserve her face, hairstyle, age, body proportions, outfit, color palette, and soft children's book illustration style. Change only the background to a sunny garden.

Then try a slightly larger change:

Use the girl in the reference image as the same character. Keep her facial features, hairstyle, age, proportions, and children's book illustration style consistent. Make her standing and waving in a sunny garden.

Then try a scene change:

Use the girl in the reference image as the same character. Preserve her identity, face, hairstyle, age, body proportions, and soft children's book illustration style. Place her in a cozy bedroom reading a picture book.

Then try a character sheet:

Create a character sheet of the same girl from the reference image. Show front view, side view, back view, and three facial expressions. Keep the same face, hairstyle, age, proportions, outfit, color palette, and children's book illustration style.

If the model cannot keep the girl recognizable under these tests, I would not expect it to work reliably for a whole picture book mockup.

My practical recommendation

I would not start with LoRA.

LoRA/DreamBooth LoRA can be useful later, but training from one image is a bad starting point. It can overfit, and it does not solve the fact that the character is underdefined. It is better to first create or curate multiple consistent references.

My practical order would be:

Online test: Qwen-Image-Edit-2511, Qwen-Image-Edit-2509, FLUX.1 Kontext.
Mac-local app: Draw Things.
If you need more control: ComfyUI Desktop.
If the large editing models are too heavy locally: SDXL + IP-Adapter / FaceID / PuLID.
If SDXL is still too heavy: SD1.5 + IP-Adapter / FaceID.
If you eventually need stronger consistency: build a small character sheet first, then consider LoRA.

The key point is that HF Spaces can show you what this type of reference-image editing is capable of, but your local MacBook Air determines what is actually practical.

So I would test the high-end models online first, but plan the local workflow around your MacBook Air’s unified memory and your tolerance for troubleshooting.

Topic		Replies	Views
Ccreate continues set of generated images - same style and characters Intermediate	0	201	May 20, 2024
Dont even know where to start! Beginners	5	269	April 13, 2026
A few questions about models Beginners	3	121	December 16, 2025
Need help getting started with image generation Beginners	8	998	March 6, 2026
Image diffuser improver Beginners	0	174	March 15, 2024