watsonx.ai Chat

With watsonx.ai you can run various Large Language Models (LLMs) locally and generate text from them. Spring AI supports the watsonx.ai text generation with WatsonxAiChatModel.spring-doc.cn

Prerequisites

You first need to have a SaaS instance of watsonx.ai (as well as an IBM Cloud account).spring-doc.cn

Refer to free-trial to try watsonx.ai for freespring-doc.cn

More info can be found here

Auto-configuration

Spring AI provides Spring Boot auto-configuration for the watsonx.ai Chat Client. To enable it add the following dependency to your project’s Maven pom.xml file:spring-doc.cn

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-watsonx-ai-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.spring-doc.cn

dependencies {
    implementation 'org.springframework.ai:spring-ai-watsonx-ai-spring-boot-starter'
}

Chat Properties

Connection Properties

The prefix spring.ai.watsonx.ai is used as the property prefix that lets you connect to watsonx.ai.spring-doc.cn

Property Description Default

spring.ai.watsonx.ai.base-urlspring-doc.cn

The URL to connect tospring-doc.cn

us-south.ml.cloud.ibm.comspring-doc.cn

spring.ai.watsonx.ai.stream-endpointspring-doc.cn

The streaming endpointspring-doc.cn

ml/v1/text/generation_stream?version=2023-05-29spring-doc.cn

spring.ai.watsonx.ai.text-endpointspring-doc.cn

The text endpointspring-doc.cn

ml/v1/text/generation?version=2023-05-29spring-doc.cn

spring.ai.watsonx.ai.project-idspring-doc.cn

The project IDspring-doc.cn

-spring-doc.cn

spring.ai.watsonx.ai.iam-tokenspring-doc.cn

The IBM Cloud account IAM tokenspring-doc.cn

-spring-doc.cn

Configuration Properties

The prefix spring.ai.watsonx.ai.chat is the property prefix that lets you configure the chat model implementation for Watsonx.AI.spring-doc.cn

Property Description Default

spring.ai.watsonx.ai.chat.enabledspring-doc.cn

Enable Watsonx.AI chat model.spring-doc.cn

truespring-doc.cn

spring.ai.watsonx.ai.chat.options.temperaturespring-doc.cn

The temperature of the model. Increasing the temperature will make the model answer more creatively.spring-doc.cn

0.7spring-doc.cn

spring.ai.watsonx.ai.chat.options.top-pspring-doc.cn

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.2) will generate more focused and conservative text.spring-doc.cn

1.0spring-doc.cn

spring.ai.watsonx.ai.chat.options.top-kspring-doc.cn

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative.spring-doc.cn

50spring-doc.cn

spring.ai.watsonx.ai.chat.options.decoding-methodspring-doc.cn

Decoding is the process that a model uses to choose the tokens in the generated output.spring-doc.cn

greedyspring-doc.cn

spring.ai.watsonx.ai.chat.options.max-new-tokensspring-doc.cn

Sets the limit of tokens that the LLM follow.spring-doc.cn

20spring-doc.cn

spring.ai.watsonx.ai.chat.options.min-new-tokensspring-doc.cn

Sets how many tokens must the LLM generate.spring-doc.cn

0spring-doc.cn

spring.ai.watsonx.ai.chat.options.stop-sequencesspring-doc.cn

Sets when the LLM should stop. (e.g., ["\n\n\n"]) then when the LLM generates three consecutive line breaks it will terminate. Stop sequences are ignored until after the number of tokens that are specified in the Min tokens parameter are generated.spring-doc.cn

-spring-doc.cn

spring.ai.watsonx.ai.chat.options.repetition-penaltyspring-doc.cn

Sets how strongly to penalize repetitions. A higher value (e.g., 1.8) will penalize repetitions more strongly, while a lower value (e.g., 1.1) will be more lenient.spring-doc.cn

1.0spring-doc.cn

spring.ai.watsonx.ai.chat.options.random-seedspring-doc.cn

Produce repeatable results, set the same random seed value every time.spring-doc.cn

randomly generatedspring-doc.cn

spring.ai.watsonx.ai.chat.options.modelspring-doc.cn

Model is the identifier of the LLM Model to be used.spring-doc.cn

google/flan-ul2spring-doc.cn

Runtime Options

The WatsonxAiChatOptions.java provides model configurations, such as the model to use, the temperature, the frequency penalty, etc.spring-doc.cn

On start-up, the default options can be configured with the WatsonxAiChatModel(api, options) constructor or the spring.ai.watsonxai.chat.options.* properties.spring-doc.cn

At run-time you can override the default options by adding new, request specific, options to the Prompt call. For example to override the default model and temperature for a specific request:spring-doc.cn

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        WatsonxAiChatOptions.builder()
            .withTemperature(0.4)
        .build()
    ));
In addition to the model specific WatsonxAiChatOptions.java you can use a portable ChatOptions instance, created with the ChatOptionsBuilder#builder().
For more information go to watsonx-parameters-info

Usage example

public class MyClass {

    private final static String MODEL = "google/flan-ul2";
    private final WatsonxAiChatModel chatModel;

    @Autowired
    MyClass(WatsonxAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    public String generate(String userInput) {

        WatsonxAiOptions options = WatsonxAiOptions.create()
            .withModel(MODEL)
            .withDecodingMethod("sample")
            .withRandomSeed(1);

        Prompt prompt = new Prompt(new SystemMessage(userInput), options);

        var results = this.chatModel.call(prompt);

        var generatedText = results.getResult().getOutput().getContent();

        return generatedText;
    }

    public String generateStream(String userInput) {

        WatsonxAiOptions options = WatsonxAiOptions.create()
            .withModel(MODEL)
            .withDecodingMethod("greedy")
            .withRandomSeed(2);

        Prompt prompt = new Prompt(new SystemMessage(userInput), options);

        var results = this.chatModel.stream(prompt).collectList().block(); // wait till the stream is resolved (completed)

        var generatedText = results.stream()
            .map(generation -> generation.getResult().getOutput().getContent())
            .collect(Collectors.joining());

        return generatedText;
    }

}