Skip to content

LLM clients

LLM clients are designed for direct interaction with LLM providers. Each client implements the LLMClient interface, which provides methods for executing prompts and streaming responses.

You can use an LLM client when you work with a single LLM provider and don't need advanced lifecycle management. If you need to manage multiple LLM providers, use a prompt executor.

The table below shows the available LLM clients and their capabilities. The * symbol indicates additional notes available in the Notes column.

LLM provider LLMClient Tool
calling
Streaming Multiple
choices
Embeddings Moderation
Model
listing
Notes
OpenAI OpenAILLMClient ✓* Supports moderation via the OpenAI Moderation API.
Anthropic AnthropicLLMClient - - - - -
Google GoogleLLMClient - -
DeepSeek DeepSeekLLMClient - - OpenAI-compatible chat client.
OpenRouter OpenRouterLLMClient - - OpenAI-compatible router client.
Amazon Bedrock BedrockLLMClient - ✓* - JVM-only AWS SDK client that supports multiple model families. Moderation requires Guardrails configuration.
Mistral MistralAILLMClient ✓* OpenAI-compatible client that supports moderation via the Mistral v1/moderations endpoint.
Alibaba DashScopeLLMClient - - OpenAI-compatible client that exposes provider-specific parameters (enableSearch, parallelToolCalls, enableThinking).
Ollama OllamaClient - - Local server client with model management APIs.

Running a prompt

To run a prompt using an LLM client, perform the following:

  1. Create an LLM client that handles the connection between your application and LLM providers.
  2. Call the execute() method with the prompt and LLM as arguments.

Here is an example that uses OpenAILLMClient to run prompts:

fun main() = runBlocking {
    // Create an OpenAI client
    val token = System.getenv("OPENAI_API_KEY")
    val client = OpenAILLMClient(token)

    // Create a prompt
    val prompt = prompt("prompt_name", LLMParams()) {
        // Add a system message to set the context
        system("You are a helpful assistant.")

        // Add a user message
        user("Tell me about Kotlin")

        // You can also add assistant messages for few-shot examples
        assistant("Kotlin is a modern programming language...")

        // Add another user message
        user("What are its key features?")
    }

    // Run the prompt
    val response = client.execute(prompt, OpenAIModels.Chat.GPT4o)
    // Print the response
    println(response)
}

Streaming responses

Note

Available for all LLM clients.

When you need to process responses as they are generated, you can use the executeStreaming() method to stream the model output:

// Set up the OpenAI client with your API key
val token = System.getenv("OPENAI_API_KEY")
val client = OpenAILLMClient(token)

val response = client.executeStreaming(
    prompt = prompt("stream_demo") { user("Stream this response in short chunks.") },
    model = OpenAIModels.Chat.GPT4_1
)

response.collect { event ->
    when (event) {
        is StreamFrame.Append -> println(event.text)
        is StreamFrame.ToolCall -> println("\nTool call: ${event.name}")
        is StreamFrame.End -> println("\n[done] Reason: ${event.finishReason}")
    }
}

Multiple choices

Note

Available for all LLM clients except GoogleLLMClient, BedrockLLMClient, and OllamaClient

You can request multiple alternative responses from the model in a single call by using the executeMultipleChoices() method. It requires additionally specifying the numberOfChoices LLM parameter in the prompt being executed.

fun main() = runBlocking {
    val apiKey = System.getenv("OPENAI_API_KEY")
    val client = OpenAILLMClient(apiKey)

    val choices = client.executeMultipleChoices(
        prompt = prompt("n_best", params = LLMParams(numberOfChoices = 3)) {
            system("You are a creative assistant.")
            user("Give me three different opening lines for a story.")
        },
        model = OpenAIModels.Chat.GPT4o
    )

    choices.forEachIndexed { i, choice ->
        val text = choice.joinToString(" ") { it.content }
        println("Line #${i + 1}: $text")
    }
}

Tip

You can also request multiple choices by adding the numberOfChoices LLM parameter into the prompt.

Listing available models

Note

Available for all LLM clients except GoogleLLMClient, BedrockLLMClient, and OllamaClient.

To get a list of available model IDs supported by the LLM client, use the models() method:

fun main() = runBlocking {
    val apiKey = System.getenv("OPENAI_API_KEY")
    val client = OpenAILLMClient(apiKey)

    val ids: List<String> = client.models()
    ids.forEach { println(it) }
}

Embeddings

Note

Available for OpenAILLMClient, GoogleLLMClient, BedrockLLMClient, MistralAILLMClient, and OllamaClient.

You convert text into embedding vectors using the embed() method. Choose an embedding model and pass your text to this method:

fun main() = runBlocking {
    val apiKey = System.getenv("OPENAI_API_KEY")
    val client = OpenAILLMClient(apiKey)

    val embedding = client.embed(
        text = "This is a sample text for embedding",
        model = OpenAIModels.Embeddings.TextEmbedding3Large
    )

    println("Embedding size: ${embedding.size}")
}

Moderation

Note

Available for the following LLM clients: OpenAILLMClient, BedrockLLMClient, MistralAILLMClient, OllamaClient.

You can use the moderate() method with a moderation model to check whether a prompt contains inappropriate content:

fun main() = runBlocking {
    val apiKey = System.getenv("OPENAI_API_KEY")
    val client = OpenAILLMClient(apiKey)

    val result = client.moderate(
        prompt = prompt("moderation") {
            user("This is a test message that may contain offensive content.")
        },
        model = OpenAIModels.Moderation.Omni
    )

    println(result)
}

Integration with prompt executors

Prompt executors wrap LLM clients and provide additional functionality, such as routing, fallbacks, and unified usage across providers. They are recommended for production use, as they offer flexibility when working with multiple providers.