Skip to content

Multimodal inputs

In addition to text messages, Koog also lets you send images, audio, video, and files to LLMs within the user message. You can add these attachments to the user message by using the corresponding functions:

  • image(): Adds images (JPG, PNG, WebP, GIF).
  • audio(): Adds audio files (MP3, WAV, FLAC).
  • video(): Adds video files (MP4, AVI, MOV).
  • file() / binaryFile() / textFile(): Add documents (PDF, TXT, MD, etc.).

Each function supports two ways of configuring media content parameters, so you can:

  • Pass a URL or a file path to the function, and it automatically handles media content parameters.
  • Create and pass a ContentPart object to the function for custom control over media content parameters.

Auto-configured attachments

If you pass a URL or a file path to the image(), audio(), video(), or file() functions, Koog automatically constructs the corresponding media content parameters based on the file extension.

The general format of the user message that includes a text message and a list of auto-configured attachments is as follows:

user {
    +"Describe these images:"

    image("https://example.com/test.png")
    image(Path("/User/koog/image.png"))

    +"Focus on the main subjects."
}

The + operator adds text content to the user message along with the media attachments.

Custom-configured attachments

The ContentPart class lets you configure media content parameters for each attachment individually.

You can create a ContentPart object for each attachment, configure its parameters, and pass it to the corresponding image(), audio(), video(), or file() functions.

The general format of the user message that includes a text message and a list of custom-configured attachments is as follows:

user {
    +"Describe this image"
    image(
        ContentPart.Image(
            content = AttachmentContent.URL("https://example.com/capture.png"),
            format = "png",
            mimeType = "image/png",
            fileName = "capture.png"
        )
    )
}

Koog provides specialized ContentPart classes for each media type:

All ContentPart types accept the following parameters:

Name Data type Required Description
content AttachmentContent Yes The source of the provided file content.
format String Yes The format of the provided file. For example, png.
mimeType String Only for ContentPart.File The MIME Type of the provided file. For example, image/png.
fileName String No The name of the provided file including the extension. For example, screenshot.png.

Attachment content

AttachmentContent defines the type and source of content that is provided as input to the LLM:

  • URL of the provided content:

    AttachmentContent.URL("https://example.com/image.png")
    
    See also API reference.

  • File content as a byte array:

    AttachmentContent.Binary.Bytes(byteArrayOf(/* ... */))
    
    See also API reference.

  • File content as a Base64-encoded string containing file data:

    AttachmentContent.Binary.Base64("iVBORw0KGgoAAAANS...")
    
    See also API reference.

  • File content as plain text (for ContentPart.File only):

    AttachmentContent.PlainText("This is the file content.")
    
    See also API reference.

Mixed attachments

In addition to providing different types of attachments in separate prompts or messages, you can also provide multiple and mixed types of attachments in a single user message:

val prompt = prompt("mixed_content") {
    system("You are a helpful assistant.")

    user {
        +"Compare the image with the document content."
        image(Path("/User/koog/page.png"))
        binaryFile(Path("/User/koog/page.pdf"), "application/pdf")
        +"Structure the result as a table"
    }
}

Next steps

  • Run prompts with LLM clients if you work with a single LLM provider.
  • Run prompts with prompt executors if you work with multiple LLM providers.