Structured data processing
Introduction
The Structured Data Processing API provides a way to ensure that responses from Large Language Models (LLMs) conform to specific data structures. This is crucial for building reliable AI applications where you need predictable, well-formatted data rather than free-form text.
This page explains how to use the Structured Data Processing API to define data structures, generate schemas, and request structured responses from LLMs.
Key components and concepts
The Structured Data Processing API consists of several key components:
- Data structure definition: Kotlin data classes annotated with kotlinx.serialization and LLM-specific annotations.
- JSON Schema generation: tools to generate JSON schemas from Kotlin data classes.
- Structured LLM requests: methods to request responses from LLMs that conform to the defined structures.
- Response handling: processing and validating the structured responses.
Defining data structures
The first step in using the Structured Data Processing API is to define your data structures using Kotlin data classes.
Basic structure
@Serializable
@SerialName("WeatherForecast")
@LLMDescription("Weather forecast for a given location")
data class WeatherForecast(
@LLMDescription("Temperature in Celsius")
val temperature: Int,
@LLMDescription("Weather conditions (e.g., sunny, cloudy, rainy)")
val conditions: String,
@LLMDescription("Chance of precipitation in percentage")
val precipitation: Int
)
Key annotations
@Serializable
: required for kotlinx.serialization to work with the class.@SerialName
: specifies the name to use during serialization.@LLMDescription
: provides a description of the class or field for the LLM.
Supported features
The API supports a wide range of data structure features:
Nested classes
@Serializable
@SerialName("WeatherForecast")
data class WeatherForecast(
// Other fields
@LLMDescription("Coordinates of the location")
val latLon: LatLon
) {
@Serializable
@SerialName("LatLon")
data class LatLon(
@LLMDescription("Latitude of the location")
val lat: Double,
@LLMDescription("Longitude of the location")
val lon: Double
)
}
Collections (lists and maps)
@Serializable
@SerialName("WeatherForecast")
data class WeatherForecast(
// Other fields
@LLMDescription("List of news articles")
val news: List<WeatherNews>,
@LLMDescription("Map of weather sources")
val sources: Map<String, WeatherSource>
)
Enums
Polymorphism with sealed classes
@Serializable
@SerialName("WeatherAlert")
sealed class WeatherAlert {
abstract val severity: Severity
abstract val message: String
@Serializable
@SerialName("Severity")
enum class Severity { Low, Moderate, Severe, Extreme }
@Serializable
@SerialName("StormAlert")
data class StormAlert(
override val severity: Severity,
override val message: String,
@LLMDescription("Wind speed in km/h")
val windSpeed: Double
) : WeatherAlert()
@Serializable
@SerialName("FloodAlert")
data class FloodAlert(
override val severity: Severity,
override val message: String,
@LLMDescription("Expected rainfall in mm")
val expectedRainfall: Double
) : WeatherAlert()
}
Generating JSON schemas
Once you have defined your data structures, you can generate JSON schemas from them using the JsonStructuredData
class:
val weatherForecastStructure = JsonStructuredData.createJsonStructure<WeatherForecast>(
schemaFormat = JsonSchemaGenerator.SchemaFormat.JsonSchema,
examples = exampleForecasts,
schemaType = JsonStructuredData.JsonSchemaType.SIMPLE
)
Schema format options
JsonSchema
: standard JSON Schema format.Simple
: a simplified schema format that may work better with some models but has limitations such as no polymorphism support.
Schema type options
The following schema types are supported
-
SIMPLE
: a simplified schema type:- Supports only standard JSON fields
- Does not support definitions, URL references, and recursive checks
- Does not support polymorphism
- Supported by a larger number of language models
- Used for simpler data structures
-
FULL
: a more comprehensive schema type:- Supports advanced JSON Schema capabilities, including definitions, URL references, and recursive checks
- Supports polymorphism: can work with sealed classes or interfaces and their implementations
- Supported by fewer language models
- Used for complex data structures with inheritance hierarchies
Providing examples
You can provide examples to help the LLM understand the expected format:
val exampleForecasts = listOf(
WeatherForecast(
temperature = 25,
conditions = "Sunny",
precipitation = 0,
// Other fields
),
WeatherForecast(
temperature = 18,
conditions = "Cloudy",
precipitation = 30,
// Other fields
)
)
Requesting structured responses
To request a structured response from an LLM, use the requestLLMStructured
method within a writeSession
:
val structuredResponse = llm.writeSession {
this.requestLLMStructured(
structure = weatherForecastStructure,
fixingModel = OpenAIModels.Chat.GPT4o,
)
}
Fixing model
The fixingModel
parameter specifies the language model to use for re-parsing or error correction during retries. This helps ensure that you always get a valid response.
Integrating with agent strategies
You can integrate structured data processing into your agent strategies:
val agentStrategy = strategy("weather-forecast") {
val setup by nodeLLMRequest()
val getStructuredForecast by node<Message.Response, String> { _ ->
val structuredResponse = llm.writeSession {
this.requestLLMStructured(
structure = forecastStructure,
fixingModel = OpenAIModels.Chat.GPT4o,
)
}
"""
Response structure:
$structuredResponse
""".trimIndent()
}
edge(nodeStart forwardTo setup)
edge(setup forwardTo getStructuredForecast)
edge(getStructuredForecast forwardTo nodeFinish)
}
Full code sample
Here is a full example of using the Structured Data Processing API:
// Note: Import statements are omitted for brevity
@Serializable
@SerialName("SimpleWeatherForecast")
@LLMDescription("Simple weather forecast for a location")
data class SimpleWeatherForecast(
@LLMDescription("Location name")
val location: String,
@LLMDescription("Temperature in Celsius")
val temperature: Int,
@LLMDescription("Weather conditions (e.g., sunny, cloudy, rainy)")
val conditions: String
)
val token = System.getenv("OPENAI_KEY") ?: error("Environment variable OPENAI_KEY is not set")
fun main(): Unit = runBlocking {
// Create sample forecasts
val exampleForecasts = listOf(
SimpleWeatherForecast(
location = "New York",
temperature = 25,
conditions = "Sunny"
),
SimpleWeatherForecast(
location = "London",
temperature = 18,
conditions = "Cloudy"
)
)
// Generate JSON Schema
val forecastStructure = JsonStructuredData.createJsonStructure<SimpleWeatherForecast>(
schemaFormat = JsonSchemaGenerator.SchemaFormat.JsonSchema,
examples = exampleForecasts,
schemaType = JsonStructuredData.JsonSchemaType.SIMPLE
)
// Define the agent strategy
val agentStrategy = strategy("weather-forecast") {
val setup by nodeLLMRequest()
val getStructuredForecast by node<Message.Response, String> { _ ->
val structuredResponse = llm.writeSession {
this.requestLLMStructured(
structure = forecastStructure,
fixingModel = OpenAIModels.Chat.GPT4o,
)
}
"""
Response structure:
$structuredResponse
""".trimIndent()
}
edge(nodeStart forwardTo setup)
edge(setup forwardTo getStructuredForecast)
edge(getStructuredForecast forwardTo nodeFinish)
}
// Configure and run the agent
val agentConfig = AIAgentConfig(
prompt = prompt("weather-forecast-prompt") {
system(
"""
You are a weather forecasting assistant.
When asked for a weather forecast, provide a realistic but fictional forecast.
""".trimIndent()
)
},
model = OpenAIModels.Chat.GPT4o,
maxAgentIterations = 5
)
val runner = AIAgent(
promptExecutor = simpleOpenAIExecutor(token),
toolRegistry = ToolRegistry.EMPTY,
strategy = agentStrategy,
agentConfig = agentConfig
)
runner.run("Get weather forecast for Paris")
}
Best practices
-
Use clear descriptions: provide clear and detailed descriptions using
@LLMDescription
annotations to help the LLM understand the expected data. -
Provide examples: include examples of valid data structures to guide the LLM.
-
Handle errors gracefully: implement proper error handling to deal with cases where the LLM might not produce a valid structure.
-
Use appropriate schema types: select the appropriate schema format and type based on your needs and the capabilities of the LLM you are using.
-
Test with different models: different LLMs may have varying abilities to follow structured formats, so test with multiple models if possible.
-
Start simple: begin with simple structures and gradually add complexity as needed.
-
Use polymorphism Carefully: while the API supports polymorphism with sealed classes, be aware that it can be more challenging for LLMs to handle correctly.