This version is still in development and is not considered stable yet. For the latest snapshot version, please use Spring AI 1.0.0-SNAPSHOT! |
Hugging Face Chat
Hugging Face Text Generation Inference (TGI) is a specialized deployment solution for serving Large Language Models (LLMs) in the cloud, making them accessible via an API. TGI provides optimized performance for text generation tasks through features like continuous batching, token streaming, and efficient memory management.
Text Generation Inference requires models to be compatible with its architecture-specific optimizations. While many popular LLMs are supported, not all models on Hugging Face Hub can be deployed using TGI. If you need to deploy other types of models, consider using standard Hugging Face Inference Endpoints instead. |
For a complete and up-to-date list of supported models and architectures, see the Text Generation Inference supported models documentation. |
Prerequisites
You will need to create an Inference Endpoint on Hugging Face and create an API token to access the endpoint.
Further details can be found here.
The Spring AI project defines a configuration property named spring.ai.huggingface.chat.api-key
that you should set to the value of the API token obtained from Hugging Face.
There is also a configuration property named spring.ai.huggingface.chat.url
that you should set to the inference endpoint URL obtained when provisioning your model in Hugging Face.
You can find this on the Inference Endpoint’s UI here.
Exporting environment variables is one way to set these configuration properties:
export SPRING_AI_HUGGINGFACE_CHAT_API_KEY=<INSERT KEY HERE>
export SPRING_AI_HUGGINGFACE_CHAT_URL=<INSERT INFERENCE ENDPOINT URL HERE>
Add Repositories and BOM
Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the Repositories section to add these repositories to your build system.
To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.
Auto-configuration
Spring AI provides Spring Boot auto-configuration for the Hugging Face Chat Client.
To enable it add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-huggingface-spring-boot-starter</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-huggingface-spring-boot-starter'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Chat Properties
The prefix spring.ai.huggingface
is the property prefix that lets you configure the chat model implementation for Hugging Face.
Property |
Description |
Default |
spring.ai.huggingface.chat.api-key |
API Key to authenticate with the Inference Endpoint. |
- |
spring.ai.huggingface.chat.url |
URL of the Inference Endpoint to connect to |
- |
spring.ai.huggingface.chat.enabled |
Enable Hugging Face chat model. |
true |
Sample Controller (Auto-configuration)
Create a new Spring Boot project and add the spring-ai-huggingface-spring-boot-starter
to your pom (or gradle) dependencies.
Add an application.properties
file, under the src/main/resources
directory, to enable and configure the Hugging Face chat model:
spring.ai.huggingface.chat.api-key=YOUR_API_KEY
spring.ai.huggingface.chat.url=YOUR_INFERENCE_ENDPOINT_URL
replace the api-key and url with your Hugging Face values.
|
This will create a HuggingfaceChatModel
implementation that you can inject into your class.
Here is an example of a simple @Controller
class that uses the chat model for text generations.
@RestController
public class ChatController {
private final HuggingfaceChatModel chatModel;
@Autowired
public ChatController(HuggingfaceChatModel chatModel) {
this.chatModel = chatModel;
}
@GetMapping("/ai/generate")
public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return Map.of("generation", this.chatModel.call(message));
}
}
Manual Configuration
The HuggingfaceChatModel implements the ChatModel
interface and uses the [low-level-api] to connect to the Hugging Face inference endpoints.
Add the spring-ai-huggingface
dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-huggingface</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-huggingface'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Next, create a HuggingfaceChatModel
and use it for text generations:
HuggingfaceChatModel chatModel = new HuggingfaceChatModel(apiKey, url);
ChatResponse response = this.chatModel.call(
new Prompt("Generate the names of 5 famous pirates."));
System.out.println(response.getGeneration().getResult().getOutput().getContent());