# Retrieval-augmented generation (RAG)

Beta

This feature is part of a beta module (`1.0.0-beta`). The API may change in future releases. See [module versioning](../module-versioning/) for details.

Koog provides building blocks for retrieval-augmented generation (RAG): embedding text, storing embedded documents, and retrieving the most relevant results for a query.

This page focuses on what is available in the current `rag` module and how to use it.

## What Koog provides today

The current RAG support is split into two modules:

- `rag-base`: common abstractions for retrieval, storage, search requests, filtering, and file/document providers
- `rag-vector`: local implementations that combine document embedding with vector storage

## Embedding and retrieving documents with EmbeddingStorage

The most complete out-of-the-box RAG flow uses `EmbeddingStorage` from the `rag-vector` module. It combines a `DocumentEmbedder` (which converts documents to vectors) with a `VectorStorageBackend` (which persists the vectors).

The steps are:

1. Create an `Embedder` backed by an embedding model (Ollama or OpenAI).
1. Create a `JVMTextDocumentEmbedder` that reads file content and delegates to the embedder.
1. Create an `EmbeddingStorage` with an in-memory or file-based backend.
1. Add documents with `add()`.
1. Search with `search(SimilaritySearchRequest(...))`.

```
// 1. Create an embedder backed by a local Ollama model
val embedder = LLMEmbedder(
    client = OllamaClient(),
    model = OllamaModels.Embeddings.NOMIC_EMBED_TEXT
)

// 2. Create a JVM document embedder that reads files and embeds their text
val documentEmbedder = JVMTextDocumentEmbedder(embedder)

// 3. Create an EmbeddingStorage with an in-memory backend
val storage = EmbeddingStorage(
    embedder = documentEmbedder,
    storage = InMemoryVectorStorageBackend()
)

// 4. Add documents to the storage
storage.add(
    listOf(
        Path.of("./docs/faq.txt"),
        Path.of("./docs/pricing.txt"),
        Path.of("./docs/getting-started.txt")
    )
)

// 5. Search for the most relevant documents
val results = storage.search(
    SimilaritySearchRequest(
        queryText = "How do I reset my password?",
        limit = 3,
        minScore = 0.5
    )
)

results.forEach { result ->
    println("${result.document} (score: ${result.score.value})")
}
```

```

```

## Providing relevance search as an agent tool (in agentic RAG)

Instead of injecting all retrieved documents into the prompt upfront, you can expose the RAG storage as a tool that the agent calls on demand. This gives the agent control over when and what to search for.

The example below wraps a `SearchStorage` (the base search interface that `EmbeddingStorage` implements) in a function annotated with `@Tool` and `@LLMDescription`, then registers it in a `ToolRegistry` for the agent to use.

```
// Define a tool that searches the RAG storage
@Tool
@LLMDescription("Search the knowledge base for documents relevant to a query. Returns the content of the most relevant documents.")
suspend fun searchKnowledgeBase(
    @LLMDescription("The search query describing what information you need")
    query: String,
    @LLMDescription("Maximum number of documents to return")
    count: Int
): String {
    val results = ragStorage.search(
        SimilaritySearchRequest(
            queryText = query,
            limit = count,
            minScore = 0.5
        )
    )

    if (results.isEmpty()) {
        return "No relevant documents found for: $query"
    }

    val response = StringBuilder("Found ${results.size} relevant documents:\n\n")
    results.forEachIndexed { index, result ->
        val content = Files.readString(result.document)
        response.append("Document ${index + 1}: ${result.document.fileName}")
        response.append(" (score: ${"%.2f".format(result.score.value)})\n")
        response.append("Content: $content\n\n")
    }
    return response.toString()
}

fun main() {
    runBlocking {
        // Register the search tool and create an agent
        val tools = ToolRegistry {
            tool(::searchKnowledgeBase.asTool())
        }

        val agent = AIAgent(
            toolRegistry = tools,
            promptExecutor = simpleOpenAIExecutor(apiKey),
            llmModel = OpenAIModels.Chat.GPT4o
        )

        val response = agent.run("What is your refund policy?")
        println("Agent response: $response")
    }
}
```

```

```

With this approach, the agent decides when to call the search tool based on the user's query. This is useful when the agent handles diverse requests and only some of them require knowledge base lookups.

## Available implementations

### Vector storage backends

- `InMemoryVectorStorageBackend`: stores vectors in memory; suitable for testing and prototypes
- `FileVectorStorageBackend`: persists vectors to disk for durability across restarts
- `JVMFileVectorStorageBackend`: JVM-specific file-based backend using `java.nio.file.Path`

### Document embedders

- `TextDocumentEmbedder`: generic document-to-text embedder parameterized by document and path types
- `JVMTextDocumentEmbedder`: JVM-specific embedder that reads files from `java.nio.file.Path`

### Combined storage implementations

- `EmbeddingStorage`: composes any `DocumentEmbedder` with any `VectorStorageBackend`
- `InMemoryDocumentEmbeddingStorage`: convenience shortcut for `EmbeddingStorage` + `InMemoryVectorStorageBackend`
- `FileDocumentEmbeddingStorage`: convenience shortcut for `EmbeddingStorage` + `FileVectorStorageBackend`
- `JVMFileDocumentEmbeddingStorage`: JVM file-based embedding storage
- `TextFileDocumentEmbeddingStorage`: file-based storage for text documents
- `JVMFileEmbeddingStorage`: JVM file-based storage for text documents

## Current limitations

The built-in flow is useful for local and reference implementations, but it is not yet a full production RAG platform.

Important limitations:

- the built-in implementations support similarity search only
- there is no built-in chunking pipeline in the `rag` module
- metadata-rich production record modeling is still limited
- production vector database integrations (Pinecone, Weaviate, pgvector, Milvus) are not provided in the current `rag` module

If you are building a custom backend, start from `rag-base` abstractions and implement your own storage adapter.

## Choosing where to start

Use `rag-vector` if:

- you want a local RAG prototype
- you want a simple reference implementation
- you want to experiment with embedding and retrieval flow inside Koog

Use `rag-base` if:

- you are building your own storage backend
- you want to integrate an external vector database
- you want to reuse the abstractions in another Koog module

## See also

- [Embeddings](../embeddings/)