Which model to use?

Hello there, I was wondering if someone could offer some advice about which model to use in my situation

I have an image that was generated for apple image playground, this is perfect for my needs and I want to use this single image to generate other images using the same girl that is in the picture

I have tried a few image to image models (using Draw Things) but they are not giving me anything close to the same girl in the existing image

I want to do this totally off line please but I am willing to use other software (running a MacBook Air) if suggestions are made but I feel it is the model

After doing some research Flux 1 was suggested, I uploaded my image, tried to mess about with the settings, trying to give no artist license to the model at all but I got nothing close as I said.

My goal is to create a series of images that are for a mock up of a children’s book so that the artist kind of gets the idea, but obviously I need a model that can keep a ‘character’ that is the same girl (or animal)

Any help that I can get would be greatly appreciated

Thanks

For that use case, I think the safest first step is to try a reference-image image editing model first. This is a bit different from ordinary img2img. There are other methods too, but they usually involve more steps, more setup, or more local hardware constraints.


I would separate this into two different questions:

  1. What workflow gives you the best chance of preserving the character?
  2. What workflow can realistically run locally on your MacBook Air?

Those are related, but they are not the same question.

For the first question, I would not start with ordinary img2img. Normal img2img is useful, but it is not really designed for “keep this exact character and place her in many new scenes.”

In Diffusers terminology, img2img uses the input image as a starting point, and the strength value controls how much noise is added before denoising. Higher strength gives the model more freedom but moves away from the input; lower strength preserves the input more but makes large scene changes harder. See the Diffusers docs on image-to-image and the API note that strength=1 essentially ignores the input image: StableDiffusionImg2ImgPipeline.

For your use case, I would first test reference-image image editing models. They are closer to what you want than plain img2img.

First: test the upper bound online

I would use online demos first, mainly to see whether this kind of workflow can preserve your character well enough at all.

Good first tests:

The reason I would try Qwen-Image-Edit-2511/2509 is that these are not just “make a similar image” models. They are image editing models with reference-image behavior. Qwen-Image-Edit-2511 explicitly mentions improved character consistency, reduced image drift, multi-person consistency, and integrated LoRA capabilities. Qwen-Image-Edit-2509 also documents multi-image editing examples such as person + scene and person + object.

FLUX.1 Kontext is also very relevant. It is designed for image editing from text instructions and image context, and the Dev model is described as focusing on image editing tasks, local/global edits, character preservation, style reference, and iterative editing.

However, I would treat these online demos as an upper-bound test, not as proof that the same workflow will be comfortable locally.

HF Spaces may be running on cloud GPUs, ZeroGPU, or other server-side hardware. Also, even if the Space page exists, the demo may be busy, paused, quota-limited, or temporarily broken at the moment you try it. Your MacBook Air is a separate constraint.

Important caveat: online upper bound != local MacBook Air workflow

This is probably the most important point.

The models that currently look strongest for reference-image character consistency tend to be large and memory-hungry. Smaller local workflows can still be useful, but they usually come with tradeoffs in quality, speed, resolution, and consistency.

I would avoid thinking of it as:

“If the Space works, I can just do the same thing locally.”

A better way to think about it is:

“If the Space works, then this type of workflow is promising. After that, I need to find the smallest local workflow that is good enough on my MacBook Air.”

Those are different problems.

For example, DFloat11/Qwen-Image-Edit-2509-DF11 says that even its compressed version can run on a single 32GB GPU, or on a 24GB GPU with CPU offloading, while maintaining full model quality. That is not the same kind of target as an 8GB or 16GB MacBook Air.

Similarly, FLUX.1 Kontext Dev is a 12B image editing model. It is very relevant conceptually, but I would not assume it is a beginner-friendly local baseline for a MacBook Air.

First thing to check: unified memory

Before choosing a local workflow, I would check how much unified memory your MacBook Air has.

An 8GB Air, 16GB Air, and 24GB Air are very different targets for local image generation.

This matters more than people often expect. On Apple Silicon, tools can use the MPS backend, but memory pressure can still be a major issue. The Diffusers MPS documentation says Diffusers can use Apple Silicon through the PyTorch mps device, but also recommends attention slicing to reduce memory pressure, especially on systems with less than 64GB RAM or when generating larger-than-standard resolutions:

So the local question is not only “does it run?” but also:

  • does it fit in memory?
  • is it unbearably slow?
  • does it require CPU offload?
  • does it swap heavily?
  • can you use the frontend you want?
  • can you use the adapter/control/reference workflow you want?
  • can you debug it if it fails?

The tighter the memory budget is, the more technical the workflow becomes.

If you want a Mac-local app: start with Draw Things

For a Mac user who wants local/offline generation, I would probably start with Draw Things rather than raw Python scripts.

Links:

Draw Things is a local/offline app for iPhone, iPad, and Mac. The App Store page lists support for modern model families such as SDXL, FLUX.1, Qwen Image, Z Image, FLUX.2, LoRA training, ControlNet, inpainting, outpainting, pose editing, and importing community models/LoRAs.

The release notes are also interesting because Draw Things has been adding support for many of the relevant model families, including:

  • FLUX.1 Kontext Dev image editing tasks
  • Qwen Image
  • Qwen Image Edit
  • Qwen Image Edit 2509
  • Qwen Image Edit 2511
  • Kwai Kolors IP Adapter FaceID Plus
  • PuLID for FLUX.1
  • FLUX.2 series models

That makes Draw Things probably the easiest Mac-oriented place to start.

But “supported” does not mean “comfortable on every MacBook Air.”

For example, the Draw Things Qwen Image support article gives these approximate peak runtime VRAM numbers for Qwen Image 1.0 variants:

  • 8-bit quantized model: about 16 GiB peak runtime VRAM, suggested for devices with 24 GiB or more total RAM
  • 6-bit quantized model: about 11 GiB peak runtime VRAM, suggested for devices with 16 GiB or more total RAM
  • FP16/BF16: about 30 GiB peak runtime VRAM, suggested for much larger-memory devices

That page is about Qwen Image 1.0 rather than every Qwen Image Edit variant, so I would not over-apply the numbers mechanically. But it is still a useful warning: these modern image models can be heavy even when quantized.

So I would phrase the local Draw Things path as:

Try Draw Things first if you want a Mac-local app. Start with smaller/quantized models and moderate resolutions. Do not start by assuming that the same Qwen/FLUX workflow you saw in an HF Space will be comfortable on a MacBook Air.

If Draw Things is not enough: ComfyUI Desktop

The more flexible route is ComfyUI, especially if you want to experiment with IP-Adapter, ControlNet, PuLID, multiple references, or custom workflows.

Links:

ComfyUI is powerful because you can build more exact workflows:

  • base model
  • reference image model
  • image encoder
  • IP-Adapter
  • FaceID
  • ControlNet
  • pose conditioning
  • inpainting
  • LoRAs
  • multiple reference images
  • upscalers

But this flexibility is also the problem. If you are not already familiar with local Stable Diffusion workflows, ComfyUI can become a troubleshooting project.

This becomes worse when memory is tight. You may need to care about:

  • model format
  • FP16 / BF16 / FP8 / GGUF / quantized versions
  • MPS compatibility
  • unsupported dtypes
  • CPU offloading
  • image encoder placement
  • ControlNet memory
  • VAE memory
  • resolution
  • custom node versions
  • whether a workflow was made for CUDA/NVIDIA rather than Apple Silicon

So I would consider ComfyUI the “more powerful but more technical” path.

Diffusers is possible, but I would not make it the first recommendation

There is also the raw Python route with Diffusers:

This is useful if you are comfortable with Python, PyTorch, model loading, dtype choices, MPS, and memory debugging.

But if the goal is to make a children’s book mockup and you are not already deep into local AI image workflows, I would not start here.

Local model/workflow options

Here is how I would roughly categorize the options.

Option Why it is relevant MacBook Air realism Suggested role
Qwen-Image-Edit-2511 Improved character consistency, reduced drift, multi-person consistency, integrated LoRA capabilities Low for local Air; likely heavy Best current online upper-bound test
Qwen-Image-Edit-2509 Multi-image editing, person + scene, person + object, character consistency improvements Low for local Air; compressed versions still target large VRAM Strong online test / reference model
FLUX.1 Kontext Dev Image editing with reference context, character/object/style preservation Low to medium; 12B model Online upper-bound test
Draw Things Local/offline Mac/iOS app with broad model support Best Mac-friendly starting point, but memory-limited First local app to try
ComfyUI Desktop Very flexible node workflow More technical, memory-sensitive Advanced local route
Diffusers Developer route with MPS support Technical Use only if comfortable with Python
SDXL + IP-Adapter / FaceID / PuLID More realistic than Qwen/FLUX for local reference-image workflows Maybe on 16GB/24GB, but still not lightweight Local upper compromise
SD1.5 + IP-Adapter / FaceID Lighter than SDXL More realistic on low-memory Macs Practical fallback
LoRA / DreamBooth LoRA Stronger consistency if you have enough good training images Training adds complexity Later step, not first step
DiffusionBee Simple Mac Stable Diffusion GUI Easier, but less flexible for this exact problem Simple fallback, not my main choice

About SDXL + IP-Adapter / FaceID

If the full Qwen/FLUX editing models are too heavy locally, then a more realistic local compromise might be:

  • SDXL + IP-Adapter
  • SDXL + IP-Adapter FaceID
  • SDXL + PuLID
  • SDXL + ControlNet
  • SDXL Lightning/Turbo-style model variants
  • or, if SDXL is too heavy, SD1.5 + IP-Adapter / FaceID

Relevant links:

This is probably the more realistic “local upper compromise” than Qwen-Image-Edit or FLUX.1 Kontext on a MacBook Air.

But I would not describe SDXL + IP-Adapter / FaceID as simple or lightweight.

It can involve several moving parts:

  • the base model
  • the IP-Adapter model
  • the image encoder
  • FaceID or InsightFace components
  • possibly a matching LoRA
  • possibly ControlNet
  • the frontend
  • MPS/backend compatibility
  • resolution/memory tradeoffs

So this may be more realistic locally than Qwen/FLUX, but it is still not necessarily beginner-friendly under memory pressure.

If the MacBook Air has only 8GB or 16GB unified memory, I would be prepared to fall back to SD1.5-based workflows. SD1.5 will usually have a lower ceiling than SDXL or the newer large editing models, but it may be much easier to run and iterate locally.

One reference image may not be enough

Even if the model is good, one image may not fully define the character.

A single front-facing image does not tell the model:

  • the side view
  • the back view
  • the full body proportions
  • the character’s expressions
  • how the character looks in different lighting
  • how the character looks in different poses
  • whether the outfit should stay fixed
  • whether the goal is “same identity” or “same illustration style”

This is especially important for a children’s book illustration style, because simplified/cartoon faces often contain less identity information than realistic portraits.

So I would not expect any model to guarantee perfect consistency from one image.

A more practical workflow might be:

  1. Use Qwen-Image-Edit-2511 or FLUX.1 Kontext online to see what good reference-image editing can do.
  2. Generate a few candidate images of the same character.
  3. Keep only the ones that actually look like the same girl.
  4. Make a small character sheet: front view, side view, full body, expressions.
  5. Then try to reproduce a smaller version of the workflow locally.
  6. If local consistency is still not good enough, consider LoRA/DreamBooth LoRA later.

Example test prompts

For the online model test, I would start with small changes first. Do not immediately change everything at once.

For example:

Use the girl in the reference image as the same character. Preserve her face, hairstyle, age, body proportions, outfit, color palette, and soft children's book illustration style. Change only the background to a sunny garden.

Then try a slightly larger change:

Use the girl in the reference image as the same character. Keep her facial features, hairstyle, age, proportions, and children's book illustration style consistent. Make her standing and waving in a sunny garden.

Then try a scene change:

Use the girl in the reference image as the same character. Preserve her identity, face, hairstyle, age, body proportions, and soft children's book illustration style. Place her in a cozy bedroom reading a picture book.

Then try a character sheet:

Create a character sheet of the same girl from the reference image. Show front view, side view, back view, and three facial expressions. Keep the same face, hairstyle, age, proportions, outfit, color palette, and children's book illustration style.

If the model cannot keep the girl recognizable under these tests, I would not expect it to work reliably for a whole picture book mockup.

My practical recommendation

I would not start with LoRA.

LoRA/DreamBooth LoRA can be useful later, but training from one image is a bad starting point. It can overfit, and it does not solve the fact that the character is underdefined. It is better to first create or curate multiple consistent references.

My practical order would be:

  1. Online test: Qwen-Image-Edit-2511, Qwen-Image-Edit-2509, FLUX.1 Kontext.
  2. Mac-local app: Draw Things.
  3. If you need more control: ComfyUI Desktop.
  4. If the large editing models are too heavy locally: SDXL + IP-Adapter / FaceID / PuLID.
  5. If SDXL is still too heavy: SD1.5 + IP-Adapter / FaceID.
  6. If you eventually need stronger consistency: build a small character sheet first, then consider LoRA.

The key point is that HF Spaces can show you what this type of reference-image editing is capable of, but your local MacBook Air determines what is actually practical.

So I would test the high-end models online first, but plan the local workflow around your MacBook Air’s unified memory and your tolerance for troubleshooting.