Multimodal inputs
In addition to text messages, Koog also lets you send images, audio, video, and files to LLMs within the user message.
You can add these attachments to the user message by using the corresponding functions:
image(): Adds images (JPG, PNG, WebP, GIF).audio(): Adds audio files (MP3, WAV, FLAC).video(): Adds video files (MP4, AVI, MOV).file()/binaryFile()/textFile(): Add documents (PDF, TXT, MD, etc.).
Each function supports two ways of configuring media content parameters, so you can:
- Pass a URL or a file path to the function, and it automatically handles media content parameters.
- Create and pass a
ContentPartobject to the function for custom control over media content parameters.
Auto-configured attachments
If you pass a URL or a file path to the image(), audio(), video(), or file() functions, Koog automatically constructs
the corresponding media content parameters based on the file extension.
The general format of the user message that includes a text message and a list of auto-configured attachments is as follows:
user {
+"Describe these images:"
image("https://example.com/test.png")
image(Path("/User/koog/image.png"))
+"Focus on the main subjects."
}
The + operator adds text content to the user message along with the media attachments.
Custom-configured attachments
The ContentPart class
lets you configure media content parameters for each attachment individually.
You can create a ContentPart object for each attachment, configure its parameters,
and pass it to the corresponding image(), audio(), video(), or file() functions.
The general format of the user message that includes a text message and a list of custom-configured attachments is as follows:
user {
+"Describe this image"
image(
ContentPart.Image(
content = AttachmentContent.URL("https://example.com/capture.png"),
format = "png",
mimeType = "image/png",
fileName = "capture.png"
)
)
}
Koog provides specialized ContentPart classes for each media type:
ContentPart.Image: image attachments, such as JPG or PNG files.ContentPart.Audio: audio attachments, such as MP3 or WAV files.ContentPart.Video: video attachments, such as MP4 or AVI files.ContentPart.File: file attachments, such as PDF or TXT files.
All ContentPart types accept the following parameters:
| Name | Data type | Required | Description |
|---|---|---|---|
content |
AttachmentContent | Yes | The source of the provided file content. |
format |
String | Yes | The format of the provided file. For example, png. |
mimeType |
String | Only for ContentPart.File |
The MIME Type of the provided file. For example, image/png. |
fileName |
String | No | The name of the provided file including the extension. For example, screenshot.png. |
Attachment content
AttachmentContent defines the type and source of content that is provided as input to the LLM:
-
URL of the provided content:
See also API reference. -
File content as a byte array:
See also API reference. -
File content as a Base64-encoded string containing file data:
See also API reference. -
File content as plain text (for
See also API reference.ContentPart.Fileonly):
Mixed attachments
In addition to providing different types of attachments in separate prompts or messages, you can also provide multiple and mixed types of attachments in a single user message:
val prompt = prompt("mixed_content") {
system("You are a helpful assistant.")
user {
+"Compare the image with the document content."
image(Path("/User/koog/page.png"))
binaryFile(Path("/User/koog/page.pdf"), "application/pdf")
+"Structure the result as a table"
}
}
Next steps
- Run prompts with LLM clients if you work with a single LLM provider.
- Run prompts with prompt executors if you work with multiple LLM providers.