This version is still in development and is not considered stable yet. For the latest snapshot version, please use Spring AI 1.0.0-SNAPSHOT!spring-doc.cn

Ollama offers an OpenAI API compatible endpoint as well. Check the OpenAI API compatibility section to learn how to use the Spring AI OpenAI to talk to an Ollama server.
installing ollama run llama3 will download a 4.7GB model artifact.
Refer to the Dependency Management section to add the Spring AI BOM to your build file.
Property Description Default

spring.ai.ollama.base-urlspring-doc.cn

Base URL where Ollama API server is running.spring-doc.cn

localhost:11434spring-doc.cn

Property Description Default

spring.ai.ollama.chat.enabledspring-doc.cn

Enable Ollama chat model.spring-doc.cn

truespring-doc.cn

spring.ai.ollama.chat.options.modelspring-doc.cn

The name of the supported model to use.spring-doc.cn

mistralspring-doc.cn

spring.ai.ollama.chat.options.formatspring-doc.cn

The format to return a response in. Currently the only accepted value is jsonspring-doc.cn

-spring-doc.cn

spring.ai.ollama.chat.options.keep_alivespring-doc.cn

Controls how long the model will stay loaded into memory following the requestspring-doc.cn

5mspring-doc.cn

Propertyspring-doc.cn

Descriptionspring-doc.cn

Defaultspring-doc.cn

spring.ai.ollama.chat.options.numaspring-doc.cn

Whether to use NUMA.spring-doc.cn

falsespring-doc.cn

spring.ai.ollama.chat.options.num-ctxspring-doc.cn

Sets the size of the context window used to generate the next token.spring-doc.cn

2048spring-doc.cn

spring.ai.ollama.chat.options.num-batchspring-doc.cn

Prompt processing maximum batch size.spring-doc.cn

512spring-doc.cn

spring.ai.ollama.chat.options.num-gpuspring-doc.cn

The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamicallyspring-doc.cn

-1spring-doc.cn

spring.ai.ollama.chat.options.main-gpuspring-doc.cn

When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.spring-doc.cn

0spring-doc.cn

spring.ai.ollama.chat.options.low-vramspring-doc.cn

-spring-doc.cn

falsespring-doc.cn

spring.ai.ollama.chat.options.f16-kvspring-doc.cn

-spring-doc.cn

truespring-doc.cn

spring.ai.ollama.chat.options.logits-allspring-doc.cn

Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.spring-doc.cn

-spring-doc.cn

spring.ai.ollama.chat.options.vocab-onlyspring-doc.cn

Load only the vocabulary, not the weights.spring-doc.cn

-spring-doc.cn

spring.ai.ollama.chat.options.use-mmapspring-doc.cn

By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.spring-doc.cn

nullspring-doc.cn

spring.ai.ollama.chat.options.use-mlockspring-doc.cn

Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.spring-doc.cn

falsespring-doc.cn

spring.ai.ollama.chat.options.num-threadspring-doc.cn

Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decidespring-doc.cn

0spring-doc.cn

spring.ai.ollama.chat.options.num-keepspring-doc.cn

-spring-doc.cn

4spring-doc.cn

spring.ai.ollama.chat.options.seedspring-doc.cn

Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt.spring-doc.cn

-1spring-doc.cn

spring.ai.ollama.chat.options.num-predictspring-doc.cn

Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)spring-doc.cn

-1spring-doc.cn

spring.ai.ollama.chat.options.top-kspring-doc.cn

Reduces the probability of generating nonsense. A higher value (e.g., 100) will give more diverse answers, while a lower value (e.g., 10) will be more conservative.spring-doc.cn

40spring-doc.cn

spring.ai.ollama.chat.options.top-pspring-doc.cn

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text.spring-doc.cn

0.9spring-doc.cn

spring.ai.ollama.chat.options.tfs-zspring-doc.cn

Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.spring-doc.cn

1.0spring-doc.cn

spring.ai.ollama.chat.options.typical-pspring-doc.cn

-spring-doc.cn

1.0spring-doc.cn

spring.ai.ollama.chat.options.repeat-last-nspring-doc.cn

Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)spring-doc.cn

64spring-doc.cn

spring.ai.ollama.chat.options.temperaturespring-doc.cn

The temperature of the model. Increasing the temperature will make the model answer more creatively.spring-doc.cn

0.8spring-doc.cn

spring.ai.ollama.chat.options.repeat-penaltyspring-doc.cn

Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.spring-doc.cn

1.1spring-doc.cn

spring.ai.ollama.chat.options.presence-penaltyspring-doc.cn

-spring-doc.cn

0.0spring-doc.cn

spring.ai.ollama.chat.options.frequency-penaltyspring-doc.cn

-spring-doc.cn

0.0spring-doc.cn

spring.ai.ollama.chat.options.mirostatspring-doc.cn

Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)spring-doc.cn

0spring-doc.cn

spring.ai.ollama.chat.options.mirostat-tauspring-doc.cn

Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.spring-doc.cn

5.0spring-doc.cn

spring.ai.ollama.chat.options.mirostat-etaspring-doc.cn

Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.spring-doc.cn

0.1spring-doc.cn

spring.ai.ollama.chat.options.penalize-newlinespring-doc.cn

-spring-doc.cn

truespring-doc.cn

spring.ai.ollama.chat.options.stopspring-doc.cn

Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.spring-doc.cn

-spring-doc.cn

spring.ai.ollama.chat.options.functionsspring-doc.cn

List of functions, identified by their names, to enable for function calling in a single prompt requests. Functions with those names must exist in the functionCallbacks registry.spring-doc.cn

-spring-doc.cn

All properties prefixed with spring.ai.ollama.chat.options can be overridden at runtime by adding a request specific Runtime Options to the Prompt call.
In addition to the model specific OllamaOptions you can use a portable ChatOptions instance, created with the ChatOptionsBuilder#builder().
You need Ollama 0.2.8 or newer.
Currently, the Ollama API (0.2.8) does not support function calling in streaming mode.
replace the base-url with your Ollama server URL.
Refer to the Dependency Management section to add the Spring AI BOM to your build file.
The spring-ai-ollama dependency provides access also to the OllamaEmbeddingModel. For more information about the OllamaEmbeddingModel refer to the Ollama Embedding Model section.
The OllamaApi is low level api and is not recommended for direct use. Use the OllamaChatModel instead.