Skip to content

Prompt API

The Prompt API provides a comprehensive toolkit for interacting with Large Language Models (LLMs) in production applications. It offers:

  • Kotlin DSL for creating structured prompts with type safety.
  • Multi-provider support for OpenAI, Anthropic, Google, and other LLM providers.
  • Production features such as retry logic, error handling, and timeout configuration.
  • Multimodal capabilities for working with text, images, audio, and documents.

Architecture overview

The Prompt API consists of three main layers:

  • LLM clients: Low-level interfaces to specific providers (OpenAI, Anthropic, etc.).
  • Decorators: Optional wrappers that add functionality like retry logic.
  • Prompt executors: High-level abstractions that manage client lifecycle and simplify usage.

Creating a prompt

The Prompt API uses Kotlin DSL to create prompts. It supports the following types of messages:

  • system: Sets the context and instructions for the LLM.
  • user: Represents user input.
  • assistant: Represents LLM responses.

Here is an example of a simple prompt:

val prompt = prompt("prompt_name", LLMParams()) {
    // Add a system message to set the context
    system("You are a helpful assistant.")

    // Add a user message
    user("Tell me about Kotlin")

    // You can also add assistant messages for few-shot examples
    assistant("Kotlin is a modern programming language...")

    // Add another user message
    user("What are its key features?")
}

Multimodal inputs

In addition to providing text messages within prompts, Koog also lets you send images, audio, video, and files to LLMs along with user messages. As with standard text-only prompts, you also add media to the prompt using the DSL structure for prompt construction.

val prompt = prompt("multimodal_input") {
    system("You are a helpful assistant.")

    user {
        +"Describe these images"

        attachments {
            image("https://example.com/test.png")
            image(Path("/User/koog/image.png"))
        }
    }
}

Textual prompt content

To accommodate for the support for various attachment types and create a clear distinction between text and file inputs in a prompt, you put text messages in a dedicated content parameter within a user prompt. To add file inputs, provide them as a list within the attachments parameter.

The general format of a user message that includes a text message and a list of attachments is as follows:

user(
    content = "This is the user message",
    attachments = listOf(
        // Add attachments
    )
)

File attachments

To include an attachment, provide the file in the attachments parameter, following the format below:

user(
    content = "Describe this image",
    attachments = listOf(
        Attachment.Image(
            content = AttachmentContent.URL("https://example.com/capture.png"),
            format = "png",
            mimeType = "image/png",
            fileName = "capture.png"
        )
    )
)

The attachments parameter takes a list of file inputs, where each item is an instance of one of the following classes:

  • Attachment.Image: image attachments, such as jpg or png files.
  • Attachment.Audio: audio attachments, such as mp3 or wav files.
  • Attachment.Video: video attachments, such as mpg or avi files.
  • Attachment.File: file attachments, such as pdf or txt files.

Each of the classes above takes the following parameters:

Name Data type Required Description
content AttachmentContent Yes The source of the provided file content. For more information, see AttachmentContent.
format String Yes The format of the provided file. For example, png.
mimeType String Only for Attachment.File The MIME Type of the provided file. For example, image/png.
fileName String No The name of the provided file including the extension. For example, screenshot.png.

For more details, see API reference.

AttachmentContent

AttachmentContent defines the type and source of content that is provided as an input to the LLM. The following classes are supported:

  • AttachmentContent.URL(val url: String)

Provides file content from the specified URL. Takes the following parameter:

Name Data type Required Description
url String Yes The URL of the provided content.

See also API reference.

  • AttachmentContent.Binary.Bytes(val data: ByteArray)

Provides file content as a byte array. Takes the following parameter:

Name Data type Required Description
data ByteArray Yes The file content provided as a byte array.

See also API reference.

  • AttachmentContent.Binary.Base64(val base64: String)

Provides file content encoded as a Base64 string. Takes the following parameter:

Name Data type Required Description
base64 String Yes The Base64 string containing file data.

See also API reference.

  • AttachmentContent.PlainText(val text: String)

Tip

Applies only if the attachment type is Attachment.File.

Provides content from a plain text file (such as the text/plain MIME type). Takes the following parameter:

Name Data type Required Description
text String Yes The content of the file.

See also API reference.

Mixed attachment content

In addition to providing different types of attachments in separate prompts or messages, you can also provide multiple and mixed types of attachments in a single user message:

val prompt = prompt("mixed_content") {
    system("You are a helpful assistant.")

    user {
        +"Compare the image with the document content."

        attachments {
            image(Path("/User/koog/page.png"))
            binaryFile(Path("/User/koog/page.pdf"), "application/pdf")
        }
    }
}

Choosing between LLM clients and prompt executors

When working with the Prompt API, you can run prompts by using either LLM clients or prompt executors. To choose between clients and executors, consider the following factors:

  • Use LLM clients directly if you work with a single LLM provider and do not require advanced lifecycle management. To learm more, see Running prompts with LLM clients.
  • Use prompt executors if you need a higher level of abstraction for managing LLMs and their lifecycle, or if you want to run prompts with a consistent API across multiple providers and dynamically switch between them. To learn more, see Runnning prompts with prompt executors.

Note

Both the LLM clients and prompt executors let you stream responses, generate multiple choices, and run content moderation. For more information, refer to the API Reference for the specific client or executor.

Running prompts with LLM clients

You can use LLM clients to run prompts if you work with a single LLM provider and do not require advanced lifecycle management. Koog provides the following LLM clients:

To run a prompt using an LLM client, perform the following:

1) Create the LLM client that handles the connection between your application and LLM providers. For example:

// Create an OpenAI client
val client = OpenAILLMClient(apiKey)

2) Call the execute method with the prompt and LLM as arguments.

// Execute the prompt
val response = client.execute(
    prompt = prompt,
    model = OpenAIModels.Chat.GPT4o  // You can choose different models
)

Here is an example that uses the OpenAI client to run a prompt:

fun main() {
    runBlocking {
        // Set up the OpenAI client with your API key
        val token = System.getenv("OPENAI_API_KEY")
        val client = OpenAILLMClient(token)

        // Create a prompt
        val prompt = prompt("prompt_name", LLMParams()) {
            // Add a system message to set the context
            system("You are a helpful assistant.")

            // Add a user message
            user("Tell me about Kotlin")

            // You can also add assistant messages for few-shot examples
            assistant("Kotlin is a modern programming language...")

            // Add another user message
            user("What are its key features?")
        }

        // Execute the prompt and get the response
        val response = client.execute(prompt = prompt, model = OpenAIModels.Chat.GPT4o)
        println(response)
    }
}

Note

The LLM clients let you stream responses, generate multiple choices, and run content moderation. For more information, refer to the API Reference for the specific client. To learn more about content moderation, see Content moderation.

Running prompts with prompt executors

While LLM clients provide direct access to providers, prompt executors offer a higher-level abstraction that simplifies common use cases and handles client lifecycle management. They are ideal when you need to:

  • Quickly prototype without managing client configuration.
  • Work with multiple providers through a unified interface.
  • Simplify dependency injection in larger applications.
  • Abstract away provider-specific details.

Executor types

Koog provides two main prompt executors:

Name
Description
SingleLLMPromptExecutor Wraps a single LLM client for one provider. Use this executor if your agent only requires the ability to switch between models within a single LLM provider.
MultiLLMPromptExecutor Routes to multiple LLM clients by a provider, with optional fallbacks for each provider to be used when a requested provider is not available. Use this executor if your agent needs to switch between models from different providers.

These are implementations of the PromtExecutor interface for executing prompts with LLMs.

Creating a single provider executor

To create a prompt executor for a specific LLM provider, perform the following:

1) Configure an LLM client for a specific provider with the corresponding API key:

val openAIClient = OpenAILLMClient(System.getenv("OPENAI_KEY"))

2) Create a prompt executor using SingleLLMPromptExecutor:

val promptExecutor = SingleLLMPromptExecutor(openAIClient)

Creating a multi-provider executor

To create a prompt executor that works with multiple LLM providers, do the following:

1) Configure clients for the required LLM providers with the corresponding API keys:

val openAIClient = OpenAILLMClient(System.getenv("OPENAI_KEY"))
val ollamaClient = OllamaClient()

2) Pass the configured clients to the MultiLLMPromptExecutor class constructor to create a prompt executor with multiple LLM providers:

val multiExecutor = MultiLLMPromptExecutor(
    LLMProvider.OpenAI to openAIClient,
    LLMProvider.Ollama to ollamaClient
)

Pre-defined prompt executors

For faster setup, Koog provides the following ready-to-use executor implementations for common providers:

  • Single provider executors that return SingleLLMPromptExecutor configured with a certain LLM client:

    • simpleOpenAIExecutor for executing prompts with OpenAI models.
    • simpleAzureOpenAIExecutor for executing prompts using Azure OpenAI Service.
    • simpleAnthropicExecutor for executing prompts with Anthropic models.
    • simpleGoogleAIExecutor for executing prompts with Google models.
    • simpleOpenRouterExecutor for executing prompts with OpenRouter.
    • simpleOllamaAIExecutor for executing prompts with Ollama.
  • Multi-provider executor:

    • DefaultMultiLLMPromptExecutor which is an implementation of MultiLLMPromptExecutor that supports OpenAI, Anthropic, and Google providers.

Here is an example of creating pre-defined single and multi-provider executors:

// Create an OpenAI executor
val promptExecutor = simpleOpenAIExecutor("OPENAI_KEY")

// Create a DefaultMultiLLMPromptExecutor with OpenAI, Anthropic, and Google LLM clients
val openAIClient = OpenAILLMClient("OPENAI_KEY")
val anthropicClient = AnthropicLLMClient("ANTHROPIC_KEY")
val googleClient = GoogleLLMClient("GOOGLE_KEY")
val multiExecutor = DefaultMultiLLMPromptExecutor(openAIClient, anthropicClient, googleClient)

Executing a prompt

The prompt executors provide methods to run prompts using various capabilities, such as streaming, multiple choices generation, and content moderation.

Here is an example of how to run a prompt with a specific LLM using the execute method:

// Execute a prompt
val response = promptExecutor.execute(
    prompt = prompt,
    model = OpenAIModels.Chat.GPT4o
)

This will run the prompt with the GPT4o model and return the response.

Note

The prompt executors let you stream responses, generate multiple choices, and run content moderation. For more information, refer to the API Reference for the specific executor. To learn more about content moderation, see Content moderation.

Cached prompt executors

For repeated requests, you can cache LLM responses to optimize performance and reduce costs. Koog provides the CachedPromptExecutor, which is a wrapper around the PromptExecutor that adds caching functionality. It lets you store responses from previously executed prompts and retrieve them when the same prompts are run again.

To create a cached prompt executor, perform the following:

1) Create a prompt executor for which you want to cache responses:

val client = OpenAILLMClient(System.getenv("OPENAI_KEY"))
val promptExecutor = SingleLLMPromptExecutor(client)

2) Create a CachedPromptExecutor instance with the desired cache and provide the created prompt executor:

val cachedExecutor = CachedPromptExecutor(
    cache = FilePromptCache(Path("/cache_directory")),
    nested = promptExecutor
)

3) Run the cached prompt executor with the desired prompt and model:

val response = cachedExecutor.execute(prompt, OpenAIModels.Chat.GPT4o)

Now you can run the same prompt with the same model multiple times, the response will be retrieved from the cache.

Note

  • If you call executeStreaming() with the cached prompt executor, it produces a response as a single chunk.
  • If you call moderate() with the cached prompt executor, it forwards the request to the nested prompt executor and does not use the cache.
  • Caching of multiple choice responses is not supported.

Retry functionality

When working with LLM providers, you may encounter transient errors like rate limits or temporary service unavailability. The RetryingLLMClient decorator adds automatic retry logic to any LLM client.

Basic usage

Wrap any existing client with retry capability:

// Wrap any client with retry capability
val client = OpenAILLMClient(apiKey)
val resilientClient = RetryingLLMClient(client)

// Now all operations will automatically retry on transient errors
val response = resilientClient.execute(prompt, OpenAIModels.Chat.GPT4o)

Configuring retry behavior

Koog provides several predefined retry configurations:

Configuration Max Attempts Initial Delay Max Delay Use Case
DISABLED 1 (no retry) - - Development and testing
CONSERVATIVE 3 2s 30s Normal production use
AGGRESSIVE 5 500ms 20s Critical operations
PRODUCTION 3 1s 20s Recommended default

You can use them directly or create custom configurations:

// Use the predefined configuration
val conservativeClient = RetryingLLMClient(
    delegate = client,
    config = RetryConfig.CONSERVATIVE
)

// Or create a custom configuration
val customClient = RetryingLLMClient(
    delegate = client,
    config = RetryConfig(
        maxAttempts = 5,
        initialDelay = 1.seconds,
        maxDelay = 30.seconds,
        backoffMultiplier = 2.0,
        jitterFactor = 0.2
    )
)

Retryable error patterns

By default, the retry mechanism recognizes common transient errors:

  • HTTP status codes:

    • 429: Rate limit
    • 500: Internal server error
    • 502: Bag gateway
    • 503: Service unavailable
    • 504: Gateway timeout
    • 529: Anthropic overloaded
  • Error keywords:

    • rate limit
    • too many requests
    • request timeout
    • connection timeout
    • read timeout
    • write timeout
    • connection reset by peer
    • connection refused
    • temporarily unavailable
    • service unavailable

You can define custom patterns for your specific needs:

val config = RetryConfig(
    retryablePatterns = listOf(
        RetryablePattern.Status(429),           // Specific status code
        RetryablePattern.Keyword("quota"),      // Keyword in error message
        RetryablePattern.Regex(Regex("ERR_\\d+")), // Custom regex pattern
        RetryablePattern.Custom { error ->      // Custom logic
            error.contains("temporary") && error.length > 20
        }
    )
)

Retry with prompt executors

When working with prompt executors, you can wrap the underlying LLM client with a retry mechanism before creating the executor:

// Single provider executor with retry
val resilientClient = RetryingLLMClient(
    OpenAILLMClient(System.getenv("OPENAI_KEY")),
    RetryConfig.PRODUCTION
)
val executor = SingleLLMPromptExecutor(resilientClient)

// Multi-provider executor with flexible client configuration
val multiExecutor = MultiLLMPromptExecutor(
    LLMProvider.OpenAI to RetryingLLMClient(
        OpenAILLMClient(System.getenv("OPENAI_KEY")),
        RetryConfig.CONSERVATIVE
    ),
    LLMProvider.Anthropic to RetryingLLMClient(
        AnthropicLLMClient(System.getenv("ANTHROPIC_API_KEY")),
        RetryConfig.AGGRESSIVE  
    ),
    // The Bedrock client already has a built-in AWS SDK retry 
    LLMProvider.Bedrock to BedrockLLMClient(
        awsAccessKeyId = System.getenv("AWS_ACCESS_KEY_ID"),
        awsSecretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY"),
        awsSessionToken = System.getenv("AWS_SESSION_TOKEN"),
    ))

Streaming with retry

Streaming operations can optionally be retried. This feature is disabled by default.

val config = RetryConfig(
    maxAttempts = 3
)

val client = RetryingLLMClient(baseClient, config)
val stream = client.executeStreaming(prompt, OpenAIModels.Chat.GPT4o)

Note

Streaming retry only applies to the connection failures before the first token is received. Once streaming begins, errors are passed through to preserve content integrity.

Timeout configuration

All LLM clients support timeout configuration to prevent hanging requests:

val client = OpenAILLMClient(
    apiKey = apiKey,
    settings = OpenAIClientSettings(
        timeoutConfig = ConnectionTimeoutConfig(
            connectTimeoutMillis = 5000,    // 5 seconds to establish connection
            requestTimeoutMillis = 60000    // 60 seconds for the entire request
        )
    )
)

Error handling

When working with LLMs in production, you need to imlement error-handling strategies:

  • Use try-catch blocks to handle unexpected errors.
  • Log errors with context for debugging.
  • Implement fallback strategies for critical operations.
  • Monitor retry patterns to identify recurring or systemic issues.

Here is an example of comprehensive error handling:

try {
    val response = resilientClient.execute(prompt, model)
    processResponse(response)
} catch (e: Exception) {
    logger.error("LLM operation failed", e)

    when {
        e.message?.contains("rate limit") == true -> {
            // Handle rate limiting specifically
            scheduleRetryLater()
        }
        e.message?.contains("invalid api key") == true -> {
            // Handle authentication errors
            notifyAdministrator()
        }
        else -> {
            // Fall back to an alternative solution
            useDefaultResponse()
        }
    }
}