Does anyone know if Gemma 4 26b can be converted into an NVFP4 format with no vision tower? I’m wondering if such a configuration would allow it to fit comfortably fit on a 5060ti 16gb for use with Openclaw.
Even that card reports about ~16 GB model size and around ~18 GB minimum GPU memory for serving, before comfortable KV cache headroom.
On a 16GB card, it may load only with tight limits / offloading and then feel slow.
“No vision tower”
Gemma 4 26B-A4B is a multimodal architecture; removing vision tower is not a standard toggle in typical runtimes.
You can run text-only inference without sending images, but physically stripping vision components is model surgery and usually breaks compatibility unless specifically supported.
OpenClaw compatibility
OpenClaw is the orchestration layer; real support depends on backend/runtime kernels (vLLM/TensorRT/llama.cpp/Ollama path you use).
If your backend doesn’t support this NVFP4 format end-to-end, it won’t help.
Practical recommendation:
If you want reliability on 16GB today, use a text-focused quantized path with proven OpenClaw backend support.
If you want Gemma 4 26B NVFP4 specifically, expect experimentation and likely compromises (lower context, offload, slower throughput).