Squeeze Gemma 4 26b on a 4060ti with NVFP4

ThadB · April 29, 2026, 1:42pm

Hi All,

Does anyone know if Gemma 4 26b can be converted into an NVFP4 format with no vision tower? I’m wondering if such a configuration would allow it to fit comfortably fit on a 5060ti 16gb for use with Openclaw.

zroshot · April 30, 2026, 1:01pm

Good question. Short version: probably not “comfortably” on a 5060 Ti 16GB yet, at least not in a clean plug-and-play OpenClaw setup.

What matters:

NVFP4 availability

There are community NVFP4 checkpoints for Gemma 4 26B-A4B, but these are not the same as mainstream GGUF flows.
Example model card: CyberFitz/gemma-4-26B-A4B-it-NVFP4

VRAM headroom

Even that card reports about ~16 GB model size and around ~18 GB minimum GPU memory for serving, before comfortable KV cache headroom.
On a 16GB card, it may load only with tight limits / offloading and then feel slow.

“No vision tower”

Gemma 4 26B-A4B is a multimodal architecture; removing vision tower is not a standard toggle in typical runtimes.
You can run text-only inference without sending images, but physically stripping vision components is model surgery and usually breaks compatibility unless specifically supported.

OpenClaw compatibility

OpenClaw is the orchestration layer; real support depends on backend/runtime kernels (vLLM/TensorRT/llama.cpp/Ollama path you use).
If your backend doesn’t support this NVFP4 format end-to-end, it won’t help.

Practical recommendation:

If you want reliability on 16GB today, use a text-focused quantized path with proven OpenClaw backend support.
If you want Gemma 4 26B NVFP4 specifically, expect experimentation and likely compromises (lower context, offload, slower throughput).

Base model reference: google/gemma-4-26B-A4B-it
OpenClaw repo/docs entry point: openclaw/openclaw

Topic		Replies	Views
Successfully Running Gemma4-26B On-Prem? Looking to Discuss Deployment Struggles & Stable Setups Models	1	32	May 24, 2026
Building Local: My 2026 Headless AI Server Journey Beginners	6	173	April 24, 2026
Fine-tuning Gemma-4-E2B on MacBook M3 🤗Transformers	4	657	April 14, 2026
CPU offloading error scenario 🤗Transformers	11	250	April 27, 2026
Gemma 4 e4b latency optimisations Models	1	85	May 14, 2026

Squeeze Gemma 4 26b on a 4060ti with NVFP4

Related topics