Skip to content

LLM parameters

This page provides details about LLM parameters in the Koog agentic framework. LLM parameters let you control and customize the behavior of language models.

Overview

LLM parameters are configuration options that let you fine-tune how language models generate responses. These parameters control aspects like response randomness, length, format, and tool usage. By adjusting the parameters, you optimize model behavior for different use cases, from creative content generation to deterministic structured outputs.

In Koog, the LLMParams class incorporates LLM parameters and provides a consistent interface for configuring language model behavior. You can use LLM parameters in the following ways:

  • When creating a prompt:
val prompt = prompt(
    id = "dev-assistant",
    params = LLMParams(
        temperature = 0.7,
        maxTokens = 500
    )
) {
    // Add a system message to set the context
    system("You are a helpful assistant.")

    // Add a user message
    user("Tell me about Kotlin")
}

For more information about prompt creation, see Prompts.

  • When creating a subgraph:
val processQuery by subgraphWithTask<String, String>(
    tools = listOf(searchTool, calculatorTool, weatherTool),
    llmModel = OpenAIModels.Chat.GPT4o,
    llmParams = LLMParams(
        temperature = 0.7,
        maxTokens = 500
    ),
    runMode = ToolCalls.SEQUENTIAL,
    assistantResponseRepeatMax = 3,
) { userQuery ->
    """
    You are a helpful assistant that can answer questions about various topics.
    Please help with the following query:
    $userQuery
    """
}

For more information about existing subgraph types in Koog, see Predefined subgraphs. To learn how to create and implement your own subgraphs, see Custom subgraphs.

  • When updating a prompt in an LLM write session:
llm.writeSession {
    changeLLMParams(
        LLMParams(
            temperature = 0.7,
            maxTokens = 500
        )
    )
}

For more information about sessions, see LLM sessions and manual history management.

LLM parameter reference

The following table provides a reference of LLM parameters included in the LLMParams class and supported by all LLM providers that are available in Koog out of the box. For a list of parameters that are specific to some providers, see Provider-specific parameters.

Parameter Type Description
temperature Double Controls randomness in the output. Higher values, such as 0.7–1.0, produce more diverse and creative responses, while lower values produce more deterministic and focused responses.
maxTokens Integer Maximum number of tokens to generate in the response. Useful for controlling response length.
numberOfChoices Integer Number of alternative responses to generate. Must be greater than 0.
speculation String A speculative configuration string that influences model behavior, designed to enhance result speed and accuracy. Supported only by certain models, but may greatly improve speed and accuracy.
schema Schema Defines the structure for the model's response format, enabling structured outputs like JSON. For more information, see Schema.
toolChoice ToolChoice Controls tool calling behavior of the language model. For more information, see Tool choice.
user String Identifier for the user making the request, which can be used for tracking purposes.
additionalProperties Map<String, JsonElement> Additional properties that can be used to store custom parameters specific to certain model providers.

For a list of default values for each parameter, see the corresponding LLM provider documentation:

Schema

The Schema interface defines the structure for the model's response format. Koog supports JSON schemas, as described in the sections below.

JSON schemas

JSON schemas let you request structured JSON data from language models. Koog supports the following two types of JSON schemas:

1) Basic JSON Schema (LLMParams.Schema.JSON.Basic): Used for basic JSON processing capabilities. This format primarily focuses on nested data definitions without advanced JSON Schema functionalities.

// Create parameters with a basic JSON schema
val jsonParams = LLMParams(
    temperature = 0.2,
    schema = LLMParams.Schema.JSON.Basic(
        name = "PersonInfo",
        schema = JsonObject(mapOf(
            "type" to JsonPrimitive("object"),
            "properties" to JsonObject(
                mapOf(
                    "name" to JsonObject(mapOf("type" to JsonPrimitive("string"))),
                    "age" to JsonObject(mapOf("type" to JsonPrimitive("number"))),
                    "skills" to JsonObject(
                        mapOf(
                            "type" to JsonPrimitive("array"),
                            "items" to JsonObject(mapOf("type" to JsonPrimitive("string")))
                        )
                    )
                )
            ),
            "additionalProperties" to JsonPrimitive(false),
            "required" to JsonArray(listOf(JsonPrimitive("name"), JsonPrimitive("age"), JsonPrimitive("skills")))
        ))
    )
)

2) Standard JSON Schema (LLMParams.Schema.JSON.Standard): Represents a standard JSON schema according to json-schema.org. This format is a proper subset of the official JSON Schema specification. Note that the flavor across different LLM providers might vary, since not all of them support full JSON schemas.

// Create parameters with a standard JSON schema
val standardJsonParams = LLMParams(
    temperature = 0.2,
    schema = LLMParams.Schema.JSON.Standard(
        name = "ProductCatalog",
        schema = JsonObject(mapOf(
            "type" to JsonPrimitive("object"),
            "properties" to JsonObject(mapOf(
                "products" to JsonObject(mapOf(
                    "type" to JsonPrimitive("array"),
                    "items" to JsonObject(mapOf(
                        "type" to JsonPrimitive("object"),
                        "properties" to JsonObject(mapOf(
                            "id" to JsonObject(mapOf("type" to JsonPrimitive("string"))),
                            "name" to JsonObject(mapOf("type" to JsonPrimitive("string"))),
                            "price" to JsonObject(mapOf("type" to JsonPrimitive("number"))),
                            "description" to JsonObject(mapOf("type" to JsonPrimitive("string")))
                        )),
                        "additionalProperties" to JsonPrimitive(false),
                        "required" to JsonArray(listOf(JsonPrimitive("id"), JsonPrimitive("name"), JsonPrimitive("price"), JsonPrimitive("description")))
                    ))
                ))
            )),
            "additionalProperties" to JsonPrimitive(false),
            "required" to JsonArray(listOf(JsonPrimitive("products")))
        ))
    )
)

Tool choice

The ToolChoice class controls how the language model uses tools. It provides the following options:

  • LLMParams.ToolChoice.Named: the language model calls the specified tool. Takes the name string argument that represents the name of the tool to call.
  • LLMParams.ToolChoice.All: the language model calls all tools.
  • LLMParams.ToolChoice.None: the language model does not call tools and only generates text.
  • LLMParams.ToolChoice.Auto: the language model automatically decides whether to call tools and which tool to call.
  • LLMParams.ToolChoice.Required: the language model calls at least one tool.

Here is an example of using the LLMParams.ToolChoice.Named class to call a specific tool:

val specificToolParams = LLMParams(
    toolChoice = LLMParams.ToolChoice.Named(name = "calculator")
)

Provider-specific parameters

Koog supports provider-specific parameters for some LLM providers. These parameters extend the base LLMParams class and add provider-specific functionality. The following classes include parameters that are specific per provider:

  • OpenAIChatParams: Parameters specific to the OpenAI Chat Completions API.
  • OpenAIResponsesParams: Parameters specific to the OpenAI Responses API.
  • GoogleParams: Parameters specific to Google models.
  • AnthropicParams: Parameters specific to Anthropic models.
  • MistralAIParams: Parameters specific to Mistral models.
  • DeepSeekParams: Parameters specific to DeepSeek models.
  • OpenRouterParams: Parameters specific to OpenRouter models.
  • DashscopeParams: Parameters specific to Alibaba models.

Here is the complete reference of provider-specific parameters in Koog:

Parameter Type Description
audio OpenAIAudioConfig Audio output configuration when using audio-capable models. For more information, see the API documentation for OpenAIAudioConfig.
frequencyPenalty Double Penalizes frequent tokens to reduce repetition. Higher frequencyPenalty values result in larger variations of phrasing and reduced repetition. Takes a value in the range of -2.0 to 2.0.
logprobs Boolean If true, includes log-probabilities for output tokens.
parallelToolCalls Boolean If true, multiple tool calls can run in parallel. Particularly applicable to custom nodes or LLM interactions outside of agent strategies.
presencePenalty Double Prevents the model from reusing tokens that have already been included in the output. Higher values encourage the introduction of new tokens and topics. Takes a value in the range of -2.0 to 2.0.
promptCacheKey String Stable cache key for prompt caching. OpenAI uses it to cache responses for similar requests.
reasoningEffort ReasoningEffort Specifies the level of reasoning effort that the model will use. For more information and available values, see the API documentation for ReasoningEffort.
safetyIdentifier String A stable and unique user identifier that may be used to detect users who violate OpenAI policies.
serviceTier ServiceTier OpenAI processing tier selection that lets you prioritize performance over cost or vice versa. For more information, see the API documentation for ServiceTier.
stop List<String> Strings that signal to the model that it should stop generating content when it encounters any of them. For example, to make the model stop generating content when it produces two newlines, specify the stop sequence as stop = listOf("/n/n").
store Boolean If true, the provider may store outputs for later retrieval.
topLogprobs Integer Number of top most likely tokens per position. Takes a value in the range of 0–20. Requires the logprobs parameter to be set to true.
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
webSearchOptions OpenAIWebSearchOptions Configure web search tool usage (if supported). For more information, see the API documentation for OpenAIWebSearchOptions.
Parameter Type Description
background Boolean Run the response in the background.
include List<OpenAIInclude> Additional data to include in the model's response, such as sources of web search tool call or search results of a file search tool call. For detailed reference information, see OpenAIInclude in the Koog API reference. To learn more about the include parameter, see OpenAI's documentation.
logprobs Boolean If true, includes log-probabilities for output tokens.
maxToolCalls Integer Maximum total number of built-in tool calls allowed in this response. Takes a value equal to or greater than 0.
parallelToolCalls Boolean If true, multiple tool calls can run in parallel. Particularly applicable to custom nodes or LLM interactions outside of agent strategies.
promptCacheKey String Stable cache key for prompt caching. OpenAI uses it to cache responses for similar requests.
reasoning ReasoningConfig Reasoning configuration for reasoning-capable models. For more information, see the API documentation for ReasoningConfig.
safetyIdentifier String A stable and unique user identifier that may be used to detect users who violate OpenAI policies.
serviceTier ServiceTier OpenAI processing tier selection that lets you prioritize performance over cost or vice versa. For more information, see the API documentation for ServiceTier.
store Boolean If true, the provider may store outputs for later retrieval.
topLogprobs Integer Number of top most likely tokens per position. Takes a value in the range of 0–20. Requires the logprobs parameter to be set to true.
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
truncation Truncation Truncation strategy when nearing the context window. For more information, see the API documentation for Truncation.
Parameter Type Description
thinkingConfig GoogleThinkingConfig Controls whether the model should expose its chain-of-thought and how many tokens it may spend on it. For more information, see the API reference for GoogleThinkingConfig.
topK Integer Number of top tokens to consider when generating the output. Takes a value greater than or equal to 0 (provider-specific minimums may apply).
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
Parameter Type Description
container String Container identifier for reuse across requests. Containers are used by Anthropic's code execution tool to provide a secure and containerized code execution environment. By providing the container identifier from a previous response, you can reuse containers across multiple requests, which preserves created files between requests. For more information, see Containers in Anthropic's documentation.
mcpServers List<AnthropicMCPServerURLDefinition> Definitions of MCP servers to be used in the request. Supports at most 20 servers. For more information, see the API reference for AnthropicMCPServerURLDefinition.
serviceTier ServiceTier OpenAI processing tier selection that lets you prioritize performance over cost or vice versa. For more information, see the API documentation for ServiceTier.
stopSequences List<String> Custom text sequences that cause the model to stop generating content. If matched, the value of stop_reason in the response is stop_sequence.
thinking AnthropicThinking Configuration for activating Claude's extended thinking. When activated, responses also include thinking content blocks. For more information, see the API reference for AnthropicThinking.
topK Integer Number of top tokens to consider when generating the output. Takes a value greater than or equal to 0 (provider-specific minimums may apply).
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
Parameter Type Description
frequencyPenalty Double Penalizes frequent tokens to reduce repetition. Higher frequencyPenalty values result in larger variations of phrasing and reduced repetition. Takes a value in the range of -2.0 to 2.0.
parallelToolCalls Boolean If true, multiple tool calls can run in parallel. Particularly applicable to custom nodes or LLM interactions outside of agent strategies.
presencePenalty Double Prevents the model from reusing tokens that have already been included in the output. Higher values encourage the introduction of new tokens and topics. Takes a value in the range of -2.0 to 2.0.
promptMode String Lets you toggle between the reasoning mode and no system prompt. When set to reasoning, the default system prompt for reasoning models is used. For more information, see Mistral's Reasoning documentation.
randomSeed Integer The seed to use for random sampling. If set, different calls with the same parameters and the same seed value will generate deterministic results.
safePrompt Boolean Specifies whether to inject a safety prompt before all conversations. The safety prompt is used to enforce guardrails and protect against harmful content. For more information, see Mistral's Moderation & Guardarailing documentation.
stop List<String> Strings that signal to the model that it should stop generating content when it encounters any of them. For example, to make the model stop generating content when it produces two newlines, specify the stop sequence as stop = listOf("/n/n").
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
Parameter Type Description
frequencyPenalty Double Penalizes frequent tokens to reduce repetition. Higher frequencyPenalty values result in larger variations of phrasing and reduced repetition. Takes a value in the range of -2.0 to 2.0.
logprobs Boolean If true, includes log-probabilities for output tokens.
presencePenalty Double Prevents the model from reusing tokens that have already been included in the output. Higher values encourage the introduction of new tokens and topics. Takes a value in the range of -2.0 to 2.0.
stop List<String> Strings that signal to the model that it should stop generating content when it encounters any of them. For example, to make the model stop generating content when it produces two newlines, specify the stop sequence as stop = listOf("/n/n").
topLogprobs Integer Number of top most likely tokens per position. Takes a value in the range of 0–20. Requires the logprobs parameter to be set to true.
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
Parameter Type Description
frequencyPenalty Double Penalizes frequent tokens to reduce repetition. Higher frequencyPenalty values result in larger variations of phrasing and reduced repetition. Takes a value in the range of -2.0 to 2.0.
logprobs Boolean If true, includes log-probabilities for output tokens.
minP Double Filters out tokens whose relative probability to the most likely token is below the defined minP value. Takes a value in the range of 0.0–0.1.
models List<String> List of allowed models for the request.
presencePenalty Double Prevents the model from reusing tokens that have already been included in the output. Higher values encourage the introduction of new tokens and topics. Takes a value in the range of -2.0 to 2.0.
provider ProviderPreferences Includes a range of parameters that let you explicitly control how OpenRouter chooses which LLM provider to use. For more information, see the API documentation on ProviderPreferences.
repetitionPenalty Double Penalizes token repetition. Next-token probabilities for tokens that already appeared in the output are divided by the value of repetitionPenalty, which makes them less likely to appear again if repetitionPenalty > 1. Takes a value greater than 0.0 and lower than or equal to 2.0.
route String Request routing strategy to use.
stop List<String> Strings that signal to the model that it should stop generating content when it encounters any of them. For example, to make the model stop generating content when it produces two newlines, specify the stop sequence as stop = listOf("/n/n").
topA Double Dynamically adjusts the sampling window based on model confidence. If the model is confident (there are dominant high-probability next tokens), it keeps the sampling window limited to a few top tokens. If the confidence is low (there are many tokens with similar probabilities), keeps more tokens in the sampling window. Takes a value in the range of 0.0–0.1 (inclusive). Higher value means greater dynamic adaptation.
topK Integer Number of top tokens to consider when generating the output. Takes a value greater than or equal to 0 (provider-specific minimums may apply).
topLogprobs Integer Number of top most likely tokens per position. Takes a value in the range of 0–20. Requires the logprobs parameter to be set to true.
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.
transforms List<String> List of context transforms. Defines how context is transformed when it exceeds the model's token limit. The default transformation is middle-out which truncates from the middle of the prompt. Use empty list for no transformations. For more information, see Message Transforms in OpenRouter documentation.
Parameter Type Description
enableSearch Boolean Specifies whether to enable web search functionality. For more information, see Alibaba's Web search documentation.
enableThinking Boolean Specifies whether to enable thinking mode when using a hybrid thinking model. For more information, see Alibaba's documentation on Deep thinking.
frequencyPenalty Double Penalizes frequent tokens to reduce repetition. Higher frequencyPenalty values result in larger variations of phrasing and reduced repetition. Takes a value in the range of -2.0 to 2.0.
logprobs Boolean If true, includes log-probabilities for output tokens.
parallelToolCalls Boolean If true, multiple tool calls can run in parallel. Particularly applicable to custom nodes or LLM interactions outside of agent strategies.
presencePenalty Double Prevents the model from reusing tokens that have already been included in the output. Higher values encourage the introduction of new tokens and topics. Takes a value in the range of -2.0 to 2.0.
stop List<String> Strings that signal to the model that it should stop generating content when it encounters any of them. For example, to make the model stop generating content when it produces two newlines, specify the stop sequence as stop = listOf("/n/n").
topLogprobs Integer Number of top most likely tokens per position. Takes a value in the range of 0–20. Requires the logprobs parameter to be set to true.
topP Double Also referred to as nucleus sampling. Creates a subset of next tokens by adding tokens with the highest probability values to the subset until the sum of their probabilities reaches the specified topP value. Takes a value greater than 0.0 and lower than or equal to 1.0.

The following example shows defined OpenRouter LLM parameters using the provider-specific OpenRouterParams class:

val openRouterParams = OpenRouterParams(
    temperature = 0.7,
    maxTokens = 500,
    frequencyPenalty = 0.5,
    presencePenalty = 0.5,
    topP = 0.9,
    topK = 40,
    repetitionPenalty = 1.1,
    models = listOf("anthropic/claude-3-opus", "anthropic/claude-3-sonnet"),
    transforms = listOf("middle-out")
)

Usage examples

Basic usage

// A basic set of parameters with limited length 
val basicParams = LLMParams(
    temperature = 0.7,
    maxTokens = 150,
    toolChoice = LLMParams.ToolChoice.Auto
)

Reasoning control

You implement reasoning control through provider-specific parameters that control model reasoning. When using the OpenAI Chat API and models that support reasoning, use the reasoningEffort parameter to control how many reasoning tokens the model generates before providing a response:

val openAIReasoningEffortParams = OpenAIChatParams(
    reasoningEffort = ReasoningEffort.MEDIUM
)

In addition, when using the OpenAI Responses API in a stateless mode, you keep an encrypted history of reasoning items and send it to the model in every conversation turn. The encryption is done on the OpenAI side, and you need to request encrypted reasoning tokens by setting the include parameter in your requests to reasoning.encrypted_content. You can then pass the encrypted reasoning tokens back to the model in the next conversation turns.

val openAIStatelessReasoningParams = OpenAIResponsesParams(
    include = listOf(OpenAIInclude.REASONING_ENCRYPTED_CONTENT)
)

Custom parameters

To add custom parameters that may be provider specific and not supported in Koog out of the box, use the additionalProperties property as shown in the example below.

// Add custom parameters for specific model providers
val customParams = LLMParams(
    additionalProperties = additionalPropertiesOf(
        "top_p" to 0.95,
        "frequency_penalty" to 0.5,
        "presence_penalty" to 0.5
    )
)

Setting and overriding parameters

The code sample below shows how you can define a set of LLM parameters that you may want to use primarily, then create another set by partially overriding values from the original set and adding new values to it. This lets you define parameters that are common to most requests but also add more specific parameter combinations without having to repeat the common parameters.

// Define default parameters
val defaultParams = LLMParams(
    temperature = 0.7,
    maxTokens = 150,
    toolChoice = LLMParams.ToolChoice.Auto
)

// Create parameters with some overrides, using defaults for the rest
val overrideParams = LLMParams(
    temperature = 0.2,
    numberOfChoices = 3
).default(defaultParams)

The values in the resulting overrideParams set are equivalent to the following:

val overrideParams = LLMParams(
    temperature = 0.2,
    maxTokens = 150,
    toolChoice = LLMParams.ToolChoice.Auto,
    numberOfChoices = 3
)