LLM clients
LLM clients are designed for direct interaction with LLM providers.
Each client implements the LLMClient interface, which provides methods for executing prompts and streaming responses.
You can use an LLM client when you work with a single LLM provider and don't need advanced lifecycle management. If you need to manage multiple LLM providers, use a prompt executor.
The table below shows the available LLM clients and their capabilities.
| LLM provider | LLMClient | Tool calling |
Streaming | Multiple choices |
Embeddings | Moderation | Model listing |
Notes |
|---|---|---|---|---|---|---|---|---|
| OpenAI | OpenAILLMClient | ✓ | ✓ | ✓ | ✓ | ✓1 | ✓ | |
| Anthropic | AnthropicLLMClient | ✓ | ✓ | - | - | - | - | - |
| GoogleLLMClient | ✓ | ✓ | ✓ | ✓ | - | ✓ | - | |
| DeepSeek | DeepSeekLLMClient | ✓ | ✓ | ✓ | - | - | ✓ | OpenAI-compatible chat client. |
| OpenRouter | OpenRouterLLMClient | ✓ | ✓ | ✓ | - | - | ✓ | OpenAI-compatible router client. |
| Amazon Bedrock | BedrockLLMClient | ✓ | ✓ | - | ✓ | ✓2 | - | JVM-only AWS SDK client that supports multiple model families. |
| Mistral | MistralAILLMClient | ✓ | ✓ | ✓ | ✓ | ✓3 | ✓ | OpenAI-compatible client. |
| Alibaba | DashScopeLLMClient | ✓ | ✓ | ✓ | - | - | ✓ | OpenAI-compatible client that exposes provider-specific parameters (enableSearch, parallelToolCalls, enableThinking). |
| Ollama | OllamaClient | ✓ | ✓ | - | ✓ | ✓ | - | Local server client with model management APIs. |
Running a prompt
To run a prompt using an LLM client, perform the following:
- Create an LLM client that handles the connection between your application and LLM providers.
- Call the
execute()method with the prompt and LLM as arguments.
Here is an example that uses OpenAILLMClient to run prompts:
fun main() = runBlocking {
// Create an OpenAI client
val apiKey = System.getenv("OPENAI_API_KEY")
val client = OpenAILLMClient(apiKey)
// Create a prompt
val prompt = prompt("prompt_name", LLMParams()) {
// Add a system message to set the context
system("You are a helpful assistant.")
// Add a user message
user("Tell me about Kotlin")
// You can also add assistant messages for few-shot examples
assistant("Kotlin is a modern programming language...")
// Add another user message
user("What are its key features?")
}
// Run the prompt
val response = client.execute(prompt, OpenAIModels.Chat.GPT4o)
// Print the response
println(response)
}
// Create an OpenAI client
String apiKey = System.getenv("OPENAI_API_KEY");
OpenAILLMClient client = new OpenAILLMClient(apiKey);
// Create a prompt
Prompt prompt = Prompt.builder("prompt_name")
// Add a system message to set the context
.system("You are a helpful assistant.")
// Add a user message
.user("Tell me about Kotlin")
// You can also add assistant messages for few-shot examples
.assistant("Kotlin is a modern programming language...")
// Add another user message
.user("What are its key features?")
.build();
// Run the prompt
List<Message.Response> response = client.execute(prompt, OpenAIModels.Chat.GPT4o, Collections.emptyList());
// Print the response
System.out.println(response);
client.close();
Streaming responses
Note
Available for all LLM clients.
When you need to process responses as they are generated, you can use the executeStreaming() method in Kotlin or
executeStreamingWithPublisher() in Java to stream the model output.
The streaming API provides different frame types:
- Delta frames (
TextDelta,ReasoningDelta,ToolCallDelta) — incremental content that arrives in chunks - Complete frames (
TextComplete,ReasoningComplete,ToolCallComplete) — full content after all deltas are received - End frame (
End) — signals stream completion with finish reason
For models that support reasoning (such as Claude Sonnet 4.5 or GPT-o1), reasoning frames will be emitted during streaming. See the Streaming API documentation for more details on working with frames.
// Set up the OpenAI client with your API key
val token = System.getenv("OPENAI_API_KEY")
val client = OpenAILLMClient(token)
val response = client.executeStreaming(
prompt = prompt("stream_demo") { user("Stream this response in short chunks.") },
model = OpenAIModels.Chat.GPT4_1
)
response.collect { frame ->
when (frame) {
is StreamFrame.TextDelta -> print(frame.text)
is StreamFrame.ReasoningDelta -> print("[Reasoning] ${frame.text}")
is StreamFrame.ToolCallComplete -> println("\nTool call: ${frame.name}")
is StreamFrame.End -> println("\n[done] Reason: ${frame.finishReason}")
else -> {} // Handle other frame types if needed
}
}
// Set up the OpenAI client with your API key
String token = System.getenv("OPENAI_API_KEY");
OpenAILLMClient client = new OpenAILLMClient(token);
Prompt prompt = Prompt.builder("stream_demo")
.user("Stream this response in short chunks.")
.build();
Publisher<StreamFrame> response = client.executeStreamingWithPublisher(prompt, OpenAIModels.Chat.GPT4_1);
// Subscribe to the Publisher to consume frames
response.subscribe(new Subscriber<StreamFrame>() {
private Subscription subscription;
@Override
public void onSubscribe(Subscription s) {
this.subscription = s;
s.request(Long.MAX_VALUE);
}
@Override
public void onNext(StreamFrame frame) {
switch (frame) {
case StreamFrame.TextDelta delta ->
System.out.print(delta.getText());
case StreamFrame.ReasoningDelta reasoning ->
System.out.print("[Reasoning] " + reasoning.getText());
case StreamFrame.ToolCallComplete toolCall ->
System.out.println("\nTool call: " + toolCall.getName());
case StreamFrame.End end ->
System.out.println("\n[done] Reason: " + end.getFinishReason());
default -> {} // Handle other frame types
}
}
@Override
public void onError(Throwable t) {
t.printStackTrace();
}
@Override
public void onComplete() { }
});
Multiple choices
Note
Available for all LLM clients except GoogleLLMClient, BedrockLLMClient, and OllamaClient
You can request multiple alternative responses from the model in a single call by using the executeMultipleChoices() method.
It requires additionally specifying the numberOfChoices LLM parameter in the prompt
being executed.
fun main() = runBlocking {
val apiKey = System.getenv("OPENAI_API_KEY")
val client = OpenAILLMClient(apiKey)
val choices = client.executeMultipleChoices(
prompt = prompt("n_best", params = LLMParams(numberOfChoices = 3)) {
system("You are a creative assistant.")
user("Give me three different opening lines for a story.")
},
model = OpenAIModels.Chat.GPT4o
)
choices.forEachIndexed { i, choice ->
val text = choice.joinToString(" ") { it.content }
println("Line #${i + 1}: $text")
}
}
String apiKey = System.getenv("OPENAI_API_KEY");
OpenAILLMClient client = new OpenAILLMClient(apiKey);
// Configure parameters (LLMParams constructor requires all 8 arguments in Java)
LLMParams params = new LLMParams(
null, // temperature
null, // maxTokens
3, // numberOfChoices
null, // speculation
null, // schema
null, // toolChoice
null, // user
null // additionalProperties
);
Prompt prompt = Prompt.builder("n_best")
.system("You are a creative assistant.")
.user("Give me three different opening lines for a story.")
.build()
.withParams(params);
// LLMChoice is a type alias for List<Message.Response>
List<List<Message.Response>> choices = client.executeMultipleChoices(
prompt,
OpenAIModels.Chat.GPT4o
);
for (int i = 0; i < choices.size(); i++) {
List<Message.Response> choice = choices.get(i);
StringBuilder text = new StringBuilder();
for (Message.Response msg : choice) {
text.append(msg.getContent()).append(" ");
}
System.out.println("Line #" + (i + 1) + ": " + text.toString().trim());
}
Listing available models
Note
Available for all LLM clients except AnthropicLLMClient, BedrockLLMClient, and OllamaClient.
To get a list of available model IDs supported by the LLM client, use the models() method:
Embeddings
Note
Available for OpenAILLMClient, GoogleLLMClient, BedrockLLMClient, MistralAILLMClient, and OllamaClient.
You convert text into embedding vectors using the embed() method.
Choose an embedding model and pass your text to this method:
fun main() = runBlocking {
val apiKey = System.getenv("OPENAI_API_KEY")
val client = OpenAILLMClient(apiKey)
val embedding = client.embed(
text = "This is a sample text for embedding",
model = OpenAIModels.Embeddings.TextEmbedding3Large
)
println("Embedding size: ${embedding.size}")
}
Moderation
Note
Available for the following LLM clients: OpenAILLMClient, BedrockLLMClient, MistralAILLMClient, OllamaClient.
You can use the moderate() method with a moderation model to check whether a prompt contains inappropriate content:
String apiKey = System.getenv("OPENAI_API_KEY");
OpenAILLMClient client = new OpenAILLMClient(apiKey);
Prompt prompt = Prompt.builder("moderation")
.user("This is a test message that may contain offensive content.")
.build();
ModerationResult result = client.moderate(prompt, OpenAIModels.Moderation.Omni);
System.out.println(result);
Integration with prompt executors
Prompt executors wrap LLM clients and provide additional functionality, such as routing, fallbacks, and unified usage across providers. They are recommended for production use, as they offer flexibility when working with multiple providers.