LLM response caching

For repeated requests that you run with a prompt executor, you can cache LLM responses to optimize performance and reduce costs. In Koog, caching is available for all prompt executors through CachedPromptExecutor, which is a wrapper around PromptExecutor that adds caching functionality. It lets you store responses from previously executed prompts and retrieve them when the same prompts are run again.

To create a cached prompt executor, perform the following:

Create a prompt executor for which you want to cache responses.
Create a CachedPromptExecutor instance by providing the desired cache and the prompt executor you created.
Run the created CachedPromptExecutor with the desired prompt and model.

Here is an example:

// Create a prompt executor
val client = OpenAILLMClient(System.getenv("OPENAI_KEY"))
val promptExecutor = SingleLLMPromptExecutor(client)

// Create a cached prompt executor
val cachedExecutor = CachedPromptExecutor(
    cache = FilePromptCache(Path("/cache_directory")),
    nested = promptExecutor
)

// Run the cached prompt executor
val response = cachedExecutor.execute(prompt, OpenAIModels.Chat.GPT4o)

Now you can run the same prompt with the same model multiple times. The response will be retrieved from the cache.

Note

If you call executeStreaming() with the cached prompt executor, it produces a response as a single chunk.
If you call moderate() with the cached prompt executor, it forwards the request to the nested prompt executor and does not use the cache.
Caching of multiple choice responses (executeMultipleChoice()) is not supported.