LLM response caching
For repeated requests that you run with a prompt executor,
you can cache LLM responses to optimize performance and reduce costs.
In Koog, caching is available for all prompt executors through CachedPromptExecutor,
which is a wrapper around PromptExecutor that adds caching functionality.
It lets you store responses from previously executed prompts and retrieve them when the same prompts are run again.
To create a cached prompt executor, perform the following:
- Create a prompt executor for which you want to cache responses.
- Create a
CachedPromptExecutorinstance by providing the desired cache and the prompt executor you created. - Run the created
CachedPromptExecutorwith the desired prompt and model.
Here is an example:
// Create a prompt executor
val client = OpenAILLMClient(System.getenv("OPENAI_KEY"))
val promptExecutor = SingleLLMPromptExecutor(client)
// Create a cached prompt executor
val cachedExecutor = CachedPromptExecutor(
cache = FilePromptCache(Path("/cache_directory")),
nested = promptExecutor
)
// Run the cached prompt executor
val response = cachedExecutor.execute(prompt, OpenAIModels.Chat.GPT4o)
Now you can run the same prompt with the same model multiple times. The response will be retrieved from the cache.
Note
- If you call
executeStreaming()with the cached prompt executor, it produces a response as a single chunk. - If you call
moderate()with the cached prompt executor, it forwards the request to the nested prompt executor and does not use the cache. - Caching of multiple choice responses (
executeMultipleChoice()) is not supported.