LLM sessions and manual history management
This page provides detailed information about LLM sessions, including how to work with read and write sessions, manage conversation history, and make requests to language models.
Introduction
LLM sessions are a fundamental concept that provides a structured way to interact with language models (LLMs). They manage the conversation history, handle requests to the LLM, and provide a consistent interface for running tools and processing responses.
Understanding LLM sessions
An LLM session represents a context for interacting with a language model. It encapsulates:
- The conversation history (prompt)
- Available tools
- Methods for making requests to the LLM
- Methods for updating the conversation history
- Methods for running tools
Sessions are managed by the AIAgentLLMContext
class, which provides methods for creating read and write sessions.
Session types
The Koog framework provides two types of sessions:
-
Write Sessions (
AIAgentLLMWriteSession
): Allow modifying the prompt and tools, making LLM requests, and running tools. Changes made in a write session are persisted back to the LLM context. -
Read Sessions (
AIAgentLLMReadSession
): Provide read-only access to the prompt and tools. They are useful for inspecting the current state without making changes.
The key difference is that write sessions can modify the conversation history, while read sessions cannot.
Session lifecycle
Sessions have a defined lifecycle:
- Creation: a session is created using
llm.writeSession { ... }
orllm.readSession { ... }
. - Active phase: the session is active while the lambda block is executing.
- Termination: the session is automatically closed when the lambda block completes.
Sessions implement the AutoCloseable
interface, ensuring they are properly cleaned up even if an exception occurs.
Working with LLM sessions
Creating sessions
Sessions are created using extension functions on the AIAgentLLMContext
class:
// Creating a write session
llm.writeSession {
// Session code here
}
// Creating a read session
llm.readSession {
// Session code here
}
These functions take a lambda block that runs within the context of the session. The session is automatically closed when the block completes.
Session scope and thread safety
Sessions use a read-write lock to ensure thread safety:
- Multiple read sessions can be active simultaneously.
- Only one write session can be active at a time.
- A write session blocks all other sessions (both read and write).
This ensures that the conversation history is not corrupted by concurrent modifications.
Accessing session properties
Within a session, you can access the prompt and tools:
llm.readSession {
val messageCount = prompt.messages.size
val availableTools = tools.map { it.name }
}
In a write session, you can also modify these properties:
llm.writeSession {
// Modify the prompt
updatePrompt {
user("New user message")
}
// Modify the tools
tools = newTools
}
Making LLM requests
Basic request methods
The most common methods for making LLM requests are:
-
requestLLM()
: makes a request to the LLM with the current prompt and tools, returning a single response. -
requestLLMMultiple()
: makes a request to the LLM with the current prompt and tools, returning multiple responses. -
requestLLMWithoutTools()
: makes a request to the LLM with the current prompt but without any tools, returning a single response. -
requestLLMForceOneTool
: makes a request to the LLM with the current prompt and tools, forcing the use of one tool. -
requestLLMOnlyCallingTools
: makes a request to the LLM that should be processed by only using tools.
Example:
llm.writeSession {
// Make a request with tools enabled
val response = requestLLM()
// Make a request without tools
val responseWithoutTools = requestLLMWithoutTools()
// Make a request that returns multiple responses
val responses = requestLLMMultiple()
}
How requests work
LLM requests are made when you explicitly call one of the request methods. The key points to understand are:
- Explicit invocation: requests only happen when you call methods like
requestLLM()
,requestLLMWithoutTools()
and so on. - Immediate execution: when you call a request method, the request is made immediately, and the method blocks until a response is received.
- Automatic history update: in a write session, the response is automatically added to the conversation history.
- No implicit requests: the system does not make implicit requests; you need to explicitly call a request method.
Request methods with tools
When making requests with tools enabled, the LLM may respond with a tool call instead of a text response. The request methods handle this transparently:
llm.writeSession {
val response = requestLLM()
// The response might be a tool call or a text response
if (response is Message.Tool.Call) {
// Handle tool call
} else {
// Handle text response
}
}
In practice, you typically do not need to check the response type manually, as the agent graph handles this routing automatically.
Structured and streaming requests
For more advanced use cases, the platform provides methods for structured and streaming requests:
-
requestLLMStructured()
: requests the LLM to provide a response in a specific structured format. -
requestLLMStructuredOneShot()
: similar torequestLLMStructured()
but without retries or corrections. -
requestLLMStreaming()
: makes a streaming request to the LLM, returning a flow of response chunks.
Example:
llm.writeSession {
// Make a structured request
val structuredResponse = requestLLMStructured(myStructure)
// Make a streaming request
val responseStream = requestLLMStreaming()
responseStream.collect { chunk ->
// Process each chunk as it arrives
}
}
Managing conversation history
Updating the prompt
In a write session, you can update the prompt (conversation history) using the updatePrompt
method:
llm.writeSession {
updatePrompt {
// Add a system message
system("You are a helpful assistant.")
// Add a user message
user("Hello, can you help me with a coding question?")
// Add an assistant message
assistant("Of course! What's your question?")
// Add a tool result
tool {
result(myToolResult)
}
}
}
You can also completely rewrite the prompt using the rewritePrompt
method:
llm.writeSession {
rewritePrompt { oldPrompt ->
// Create a new prompt based on the old one
oldPrompt.copy(messages = filteredMessages)
}
}
Automatic history update on response
When you make an LLM request in a write session, the response is automatically added to the conversation history:
llm.writeSession {
// Add a user message
updatePrompt {
user("What's the capital of France?")
}
// Make a request – the response is automatically added to the history
val response = requestLLM()
// The prompt now includes both the user message and the model's response
}
This automatic history update is the key feature of write sessions, ensuring that the conversation flows naturally.
History compression
For long-running conversations, the history can grow large and consume a lot of tokens. The platform provides methods for compressing history:
llm.writeSession {
// Compress the history using a TLDR approach
replaceHistoryWithTLDR(HistoryCompressionStrategy.WholeHistory, preserveMemory = true)
}
You can also use the nodeLLMCompressHistory
node in a strategy graph to compress history at specific points.
For more information about history compression and compression strategies, see History compression.
Running tools in sessions
Calling tools
Write sessions provide several methods for calling tools:
-
callTool(tool, args)
: calls a tool by reference. -
callTool(toolName, args)
: calls a tool by name. -
callTool(toolClass, args)
: calls a tool by class. -
callToolRaw(toolName, args)
: calls a tool by name and returns the raw string result.
Example:
llm.writeSession {
// Call a tool by reference
val result = callTool(myTool, myArgs)
// Call a tool by name
val result2 = callTool("myToolName", myArgs)
// Call a tool by class
val result3 = callTool(MyTool::class, myArgs)
// Call a tool and get the raw result
val rawResult = callToolRaw("myToolName", myArgs)
}
Parallel tool runs
To run multiple tools in parallel, write sessions provide extension functions on Flow
:
llm.writeSession {
// Run tools in parallel
parseDataToArgs(data).toParallelToolCalls(MyTool::class).collect { result ->
// Process each result
}
// Run tools in parallel and get raw results
parseDataToArgs(data).toParallelToolCallsRaw(MyTool::class).collect { rawResult ->
// Process each raw result
}
}
This is useful for processing large amounts of data efficiently.
Best practices
When working with LLM sessions, follow these best practices:
-
Use the right session type: Use write sessions when you need to modify the conversation history and read sessions when you only need to read it.
-
Keep sessions short: Sessions should be focused on a specific task and closed as soon as possible to release resources.
-
Handle exceptions: Make sure to handle exceptions within sessions to prevent resource leaks.
-
Manage history size: For long-running conversations, use history compression to reduce token usage.
-
Prefer high-Level abstractions: When possible, use the node-based API. For example,
nodeLLMRequest
instead of directly working with sessions. -
Be mindful of thread safety: Remember that write sessions block other sessions, so keep write operations as short as possible.
-
Use structured requests for complex data: When you need the LLM to return structured data, use
requestLLMStructured
instead of parsing free-form text. -
Use streaming for long responses: For long responses, use
requestLLMStreaming
to process the response as it arrives.
Troubleshooting
Session already closed
If you see an error such as Cannot use session after it was closed
, you are trying to use a session after its lambda
block has completed. Make sure all session operations are performed within the session block.
History too large
If your history becomes too large and consumes too many tokens, use history compression techniques:
llm.writeSession {
replaceHistoryWithTLDR(HistoryCompressionStrategy.FromLastNMessages(10), preserveMemory = true)
}
For more information, see History compression
Tool not found
If you see errors about tools not being found, check that:
- The tool is correctly registered in the tool registry.
- You are using the correct tool name or class.
API documentation
For more information, see the full AIAgentLLMSession and AIAgentLLMContext reference.