Transformers (ONNX) Embeddings
The TransformersEmbeddingModel
is an EmbeddingModel
implementation that locally computes sentence embeddings using a selected sentence transformer.
You can use any HuggingFace Embedding model.
It uses pre-trained transformer models, serialized into the Open Neural Network Exchange (ONNX) format.
The Deep Java Library and the Microsoft ONNX Java Runtime libraries are applied to run the ONNX models and compute the embeddings in Java.
Prerequisites
To run things in Java, we need to serialize the Tokenizer and the Transformer Model into ONNX
format.
Serialize with optimum-cli - One, quick, way to achieve this, is to use the optimum-cli command line tool.
The following snippet prepares a python virtual environment, installs the required packages and serializes (e.g. exports) the specified model using optimum-cli
:
python3 -m venv venv
source ./venv/bin/activate
(venv) pip install --upgrade pip
(venv) pip install optimum onnx onnxruntime sentence-transformers
(venv) optimum-cli export onnx --model sentence-transformers/all-MiniLM-L6-v2 onnx-output-folder
The snippet exports the sentence-transformers/all-MiniLM-L6-v2 transformer into the onnx-output-folder
folder. The latter includes the tokenizer.json
and model.onnx
files used by the embedding model.
In place of the all-MiniLM-L6-v2 you can pick any huggingface transformer identifier or provide direct file path.
Auto-configuration
Spring AI provides Spring Boot auto-configuration for the ONNX Transformer Embedding Model.
To enable it add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-transformers-spring-boot-starter</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-transformers-spring-boot-starter'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file. Refer to the Repositories section to add these repositories to your build system. |
To configure it, use the spring.ai.embedding.transformer.*
properties.
For example, add this to your application.properties file to configure the client with the intfloat/e5-small-v2 text embedding model:
spring.ai.embedding.transformer.onnx.modelUri=https://huggingface.co/intfloat/e5-small-v2/resolve/main/model.onnx spring.ai.embedding.transformer.tokenizer.uri=https://huggingface.co/intfloat/e5-small-v2/raw/main/tokenizer.json
The complete list of supported properties are:
Embedding Properties
Property | Description | Default |
---|---|---|
spring.ai.embedding.transformer.enabled |
Enable the Transformer Embedding model. |
true |
spring.ai.embedding.transformer.tokenizer.uri |
URI of a pre-trained HuggingFaceTokenizer created by the ONNX engine (e.g. tokenizer.json). |
onnx/all-MiniLM-L6-v2/tokenizer.json |
spring.ai.embedding.transformer.tokenizer.options |
HuggingFaceTokenizer options such as ‘addSpecialTokens’, ‘modelMaxLength’, ‘truncation’, ‘padding’, ‘maxLength’, ‘stride’, ‘padToMultipleOf’. Leave empty to fallback to the defaults. |
empty |
spring.ai.embedding.transformer.cache.enabled |
Enable remote Resource caching. |
true |
spring.ai.embedding.transformer.cache.directory |
Directory path to cache remote resources, such as the ONNX models |
${java.io.tmpdir}/spring-ai-onnx-model |
spring.ai.embedding.transformer.onnx.modelUri |
Existing, pre-trained ONNX model. |
onnx/all-MiniLM-L6-v2/model.onnx |
spring.ai.embedding.transformer.onnx.modelOutputName |
The ONNX model’s output node name, which we’ll use for embedding calculation. |
last_hidden_state |
spring.ai.embedding.transformer.onnx.gpuDeviceId |
The GPU device ID to execute on. Only applicable if >= 0. Ignored otherwise.(Requires additional onnxruntime_gpu dependency) |
-1 |
spring.ai.embedding.transformer.metadataMode |
Specifies what parts of the Documents content and metadata will be used for computing the embeddings. |
NONE |
Errors and special cases
If you see an error like spring.ai.embedding.transformer.tokenizer.options.padding=true |
If you get an error like spring.ai.embedding.transformer.onnx.modelOutputName=token_embeddings |
If you get an error like The Currently the only workaround is to copy the large |
If you get an error like <dependency> <groupId>com.microsoft.onnxruntime</groupId> <artifactId>onnxruntime_gpu</artifactId> </dependency> Please select the appropriate onnxruntime_gpu version based on the CUDA version(ONNX Java Runtime). |
Manual Configuration
If you are not using Spring Boot, you can manually configure the Onnx Transformers Embedding Model.
For this add the spring-ai-transformers
dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-transformers</artifactId>
</dependency>
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
then create a new TransformersEmbeddingModel
instance and use the setTokenizerResource(tokenizerJsonUri)
and setModelResource(modelOnnxUri)
methods to set the URIs of the exported tokenizer.json
and model.onnx
files. (classpath:
, file:
or https:
URI schemas are supported).
If the model is not explicitly set, TransformersEmbeddingModel
defaults to sentence-transformers/all-MiniLM-L6-v2:
Dimensions |
384 |
Avg. performance |
58.80 |
Speed |
14200 sentences/sec |
Size |
80MB |
The following snippet illustrates how to use the TransformersEmbeddingModel
manually:
TransformersEmbeddingModel embeddingModel = new TransformersEmbeddingModel();
// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json
embeddingModel.setTokenizerResource("classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json");
// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/model.onnx
embeddingModel.setModelResource("classpath:/onnx/all-MiniLM-L6-v2/model.onnx");
// (optional) defaults to ${java.io.tmpdir}/spring-ai-onnx-model
// Only the http/https resources are cached by default.
embeddingModel.setResourceCacheDirectory("/tmp/onnx-zoo");
// (optional) Set the tokenizer padding if you see an errors like:
// "ai.onnxruntime.OrtException: Supplied array is ragged, ..."
embeddingModel.setTokenizerOptions(Map.of("padding", "true"));
embeddingModel.afterPropertiesSet();
List<List<Double>> embeddings = this.embeddingModel.embed(List.of("Hello world", "World is big"));
If you create an instance of TransformersEmbeddingModel manually, you must call the afterPropertiesSet() method after setting the properties and before using the client.
|
The first embed()
call downloads the large ONNX model and caches it on the local file system.
Therefore, the first call might take longer than usual.
Use the #setResourceCacheDirectory(<path>)
method to set the local folder where the ONNX models as stored.
The default cache folder is ${java.io.tmpdir}/spring-ai-onnx-model
.
It is more convenient (and preferred) to create the TransformersEmbeddingModel as a Bean
.
Then you don’t have to call the afterPropertiesSet()
manually.
@Bean
public EmbeddingModel embeddingModel() {
return new TransformersEmbeddingModel();
}