Hi ,
I tried to convert the paligemma2 3B parameter model with 224 image resolution to onnx using optimum and got this error:
$optimum-cli export onnx --model google/paligemma-3b-pt-224 paligemma-3b-pt-224_onnx/
KeyError: âUnknown task: image-text-to-text. Possible values are: audio-classification for AutoModelForAudioClassification, audio-frame-classification for AutoModelForAudioFrameClassification, audio-xvector for AutoModelForAudioXVector, automatic-speech-recognition for (âAutoModelForSpeechSeq2Seqâ, âAutoModelForCTCâ), depth-estimation for AutoModelForDepthEstimation, feature-extraction for AutoModel, fill-mask for AutoModelForMaskedLM, image-classification for AutoModelForImageClassification, image-segmentation for (âAutoModelForImageSegmentationâ, âAutoModelForSemanticSegmentationâ, âAutoModelForInstanceSegmentationâ, âAutoModelForUniversalSegmentationâ), image-to-image for AutoModelForImageToImage, image-to-text for (âAutoModelForVision2Seqâ, âAutoModelâ), mask-generation for AutoModel, masked-im for AutoModelForMaskedImageModeling, multiple-choice for AutoModelForMultipleChoice, object-detection for AutoModelForObjectDetection, question-answering for AutoModelForQuestionAnswering, reinforcement-learning for AutoModel, semantic-segmentation for AutoModelForSemanticSegmentation, text-to-audio for (âAutoModelForTextToSpectrogramâ, âAutoModelForTextToWaveformâ), text-generation for AutoModelForCausalLM, text2text-generation for AutoModelForSeq2SeqLM, text-classification for AutoModelForSequenceClassification, token-classification for AutoModelForTokenClassification, visual-question-answering for AutoModelForVisualQuestionAnswering, zero-shot-image-classification for AutoModelForZeroShotImageClassification, zero-shot-object-detection for AutoModelForZeroShotObjectDetectionâ
Please help if you have any solution. Is âimage-text-to-textâ task is available in optimum? If yes, how to use it?
Or is there any alternative method to convert the model to onnx?
It seems that this can be avoided by explicitly specifying a task (to one of the supported tasks).
opened 10:24AM - 12 Jul 24 UTC
onnx
### Feature request
I wonder if the task text-classification can to be supporte⌠d in the ONNX export for clip? Ich want to use the openai/clip-vit-large-path14 model for zero-shot image classification (classification of images without pretraining based on given candidate labels) but I get the following error:
ValueError Traceback (most recent call last)
File /home/danne00a/ZablageBlazeG/ZeroShotClassification/zeroshotclassifier.py:2
[1](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/zeroshotclassifier.py:1) #%%
----> [2](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/zeroshotclassifier.py:2) ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
File ~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:669, in ORTModel.from_pretrained(cls, model_id, export, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, provider, session_options, provider_options, use_io_binding, **kwargs)
[620](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:620) @classmethod
[621](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:621) @add_start_docstrings(FROM_PRETRAINED_START_DOCSTRING)
[622](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:622) def from_pretrained(
(...)
[636](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:636) **kwargs,
[637](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:637) ):
[638](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:638) """
[639](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:639) provider (`str`, defaults to `"CPUExecutionProvider"`):
[640](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:640) ONNX Runtime provider to use for loading the model. See https://onnxruntime.ai/docs/execution-providers/ for
(...)
[667](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:667) `ORTModel`: The loaded ORTModel model.
[668](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:668) """
--> [669](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:669) return super().from_pretrained(
[670](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:670) model_id,
[671](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:671) export=export,
[672](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:672) force_download=force_download,
[673](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:673) use_auth_token=use_auth_token,
[674](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:674) cache_dir=cache_dir,
...
[274](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py:274) )
[276](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py:276) # TODO: Fix in Transformers so that SdpaAttention class can be exported to ONNX. `attn_implementation` is introduced in Transformers 4.36.
[277](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py:277) if model_type in SDPA_ARCHS_ONNX_EXPORT_NOT_SUPPORTED and _transformers_version >= version.parse("4.35.99"):
ValueError: Asked to export a clip model for the task text-classification, but the Optimum ONNX exporter only supports the tasks feature-extraction, zero-shot-image-classification for clip. Please use a supported task. Please open an issue at https://github.com/huggingface/optimum/issues if you would like the task text-classification to be supported in the ONNX export for clip.
### Motivation
I'm struggling with the sioze of the openai/clip-vit-large-patch14 model, thus I want to convert it to OPTIMUM onnx!
### Your contribution
no ideas so far..
I tried specifying one of the existing task image-to-text. But that throws another error
$optimum-cli export onnx --model google/paligemma-3b-pt-224 --task image-to-text paligemma-3b-pt-224_onnx/
ValueError: Trying to export a paligemma model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to Export a model to ONNX with optimum.exporters.onnx for an example on how to export custom models. Please open an issue at GitHub ¡ Where software is built if you would like the model type paligemma to be supported natively in the ONNX export.
Of course, some of the newer models are not supported, but I found a converted version of Paligemma2. Maybe the github version of ONNX supports it.
The best way to find out is to ask the ONNX Community, who distribute itâŚ