This gist contains three scripts:
- A basic inferencing example
- A script that shows how to quantize a sentence-transformers model for use with OpenVINO
- A script that shows how to evaluate a quantized model, comparing the INT8 model with the FP32 model
NOTE: The PR to add OpenVINO support to sentence-transformers has not been merged yet. For now, install sentence-transformers with
pip install "git+https://github.com/helena-intel/sentence-transformers.git@helena/openvino-support"
To use sentence-transformers with OpenVINO, simply add backend="openvino"
to the SentenceTransformers()
model initialization. model_name_or_path
can refer to a model_id on the Hugging Face Hub, or a path to a local directory with a compatible model. If a model_id with a PyTorch model is provided, it will be converted to OpenVINO on the fly.
model = SentenceTransformer(model_name_or_path, backend="openvino")
You can save this OpenVINO model and load it directly:
# load a model from the Hugging Face Hub and convert to OpenVINO on the fly
model = SentenceTransformer("BAAI/bge-base-en-v1.5", backend="openvino")
# save the model
model.save("bge-base-en-v.1.5-ov")
# load the saved OpenVINO model
model = SentenceTransformer("bge-base-en-v.1.5-ov", backend="openvino")
To use an OpenVINO config, set ov_config
in model_kwargs
. ov_config
can either be a dictionary
with an OpenVINO config, or point to a .json file with an OpenVINO config:
model = SentenceTransformer("BAAI/bge-base-en-v1.5", backend="openvino", model_kwargs = {"ov_config": {"INFERENCE_PRECISION_HINT": "f32"})
model = SentenceTransformer("BAAI/bge-base-en-v1.5", backend="openvino", model_kwargs = {"ov_config": "ov_config.json"})
To use an Intel iGPU or dGPU for inference, set model_kwargs["device"]
to GPU
:
model = SentenceTransformer("BAAI/bge-base-en-v1.5", backend="openvino", model_kwargs = {"device": "GPU"})
NOTE: do not set the
device
argument directly.
See the quantization and evaluation scripts in this gist for an example of how to quantize models for use with OpenVINO.