Handling failures
This page describes how to handle failures for LLM clients and prompt executors using the built-in retry and timeout mechanisms.
Retry functionality
When working with LLM providers, transient errors like rate limits or temporary service unavailability may occur.
The RetryingLLMClient decorator adds automatic retry logic to any LLM client.
Basic usage
Wrap any existing client with the retry capability:
// Wrap any client with the retry capability
val client = OpenAILLMClient(apiKey)
val resilientClient = RetryingLLMClient(client)
// Now all operations will automatically retry on transient errors
val response = resilientClient.execute(prompt, OpenAIModels.Chat.GPT4o)
Configuring retry behavior
Koog provides several predefined retry configurations:
| Configuration | Max attempts | Initial delay | Max delay | Use case |
|---|---|---|---|---|
RetryConfig.DISABLED |
1 (no retry) | - | - | Development and testing |
RetryConfig.CONSERVATIVE |
3 | 2s | 30s | Normal production use |
RetryConfig.AGGRESSIVE |
5 | 500ms | 20s | Critical operations |
RetryConfig.PRODUCTION |
3 | 1s | 20s | Recommended default |
You can use them directly or create custom configurations:
// Use the predefined configuration
val conservativeClient = RetryingLLMClient(
delegate = client,
config = RetryConfig.CONSERVATIVE
)
// Or create a custom configuration
val customClient = RetryingLLMClient(
delegate = client,
config = RetryConfig(
maxAttempts = 5,
initialDelay = 1.seconds,
maxDelay = 30.seconds,
backoffMultiplier = 2.0,
jitterFactor = 0.2
)
)
Retry error patterns
By default, the RetryingLLMClient recognizes common transient errors.
This behavior is controlled by the RetryConfig.retryablePatterns patterns.
Each pattern is represented by
RetryablePattern
that checks the error message from a failed request and determines whether it should be retried.
Koog provides the predefined retry configurations and patterns that work across all the supported LLM providers. You can keep the defaults or customize them for your specific needs.
Pattern types
You can use the following pattern types and combine any number of them:
RetryablePattern.Status: Matches a specific HTTP status code in the error message (such as429,500,502, etc.).RetryablePattern.Keyword: Matches a keyword in the error message (such asrate limitorrequest timeout).RetryablePattern.Regex: Matches a regular expression in the error message.RetryablePattern.Custom: Matches a custom logic using a lambda function.
If any pattern returns true, the error is considered retryable, and the LLM client can retry the request.
Default patterns
Unless you customize the retry configuration, the following patterns are used by default:
-
HTTP status codes:
429: Rate limit500: Internal server error502: Bad gateway503: Service unavailable504: Gateway timeout529: Anthropic overloaded
-
Error keywords:
- rate limit
- too many requests
- request timeout
- connection timeout
- read timeout
- write timeout
- connection reset by peer
- connection refused
- temporarily unavailable
- service unavailable
These default patterns are defined in Koog as RetryConfig.DEFAULT_PATTERNS.
Custom patterns
You can define custom patterns for your specific needs:
val config = RetryConfig(
retryablePatterns = listOf(
RetryablePattern.Status(429), // Specific status code
RetryablePattern.Keyword("quota"), // Keyword in error message
RetryablePattern.Regex(Regex("ERR_\\d+")), // Custom regex pattern
RetryablePattern.Custom { error -> // Custom logic
error.contains("temporary") && error.length > 20
}
)
)
You can also append custom patterns to the default RetryConfig.DEFAULT_PATTERNS:
val config = RetryConfig(
retryablePatterns = RetryConfig.DEFAULT_PATTERNS + listOf(
RetryablePattern.Keyword("custom_error")
)
)
Streaming with retry
Streaming operations can optionally be retried. This feature is disabled by default.
val config = RetryConfig(
maxAttempts = 3
)
val client = RetryingLLMClient(baseClient, config)
val stream = client.executeStreaming(prompt, OpenAIModels.Chat.GPT4o)
Note
Streaming retries only apply to connection failures that occur before the first token is received. After streaming has started, any errors will be passed through.
Retry with prompt executors
When working with prompt executors, you can wrap the underlying LLM client with a retry mechanism before creating the executor. To learn more about prompt executors, see Prompt executors.
// Single provider executor with retry
val resilientClient = RetryingLLMClient(
OpenAILLMClient(System.getenv("OPENAI_API_KEY")),
RetryConfig.PRODUCTION
)
val executor = SingleLLMPromptExecutor(resilientClient)
// Multi-provider executor with flexible client configuration
val multiExecutor = MultiLLMPromptExecutor(
LLMProvider.OpenAI to RetryingLLMClient(
OpenAILLMClient(System.getenv("OPENAI_API_KEY")),
RetryConfig.CONSERVATIVE
),
LLMProvider.Anthropic to RetryingLLMClient(
AnthropicLLMClient(System.getenv("ANTHROPIC_API_KEY")),
RetryConfig.AGGRESSIVE
),
// The Bedrock client already has a built-in AWS SDK retry
LLMProvider.Bedrock to BedrockLLMClient(
identityProvider = StaticCredentialsProvider {
accessKeyId = System.getenv("AWS_ACCESS_KEY_ID")
secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY")
sessionToken = System.getenv("AWS_SESSION_TOKEN")
},
),
)
Timeout configuration
All LLM clients support timeout configuration to prevent hanging requests.
You can specify timeout values for network connections when creating the client using
the ConnectionTimeoutConfig class:
val client = OpenAILLMClient(
apiKey = apiKey,
settings = OpenAIClientSettings(
timeoutConfig = ConnectionTimeoutConfig(
connectTimeoutMillis = 5000, // 5 seconds to establish connection
requestTimeoutMillis = 60000, // 60 seconds for the entire request
socketTimeoutMillis = 120000 // 120 seconds for data on the socket
)
)
)
Tip
For long-running or streaming calls, set higher values for requestTimeoutMillis and socketTimeoutMillis.
Error handling
When working with LLMs in production, you need to implement error handling, including:
- Try-catch blocks to handle unexpected errors.
- Logging errors with context for debugging.
- Fallbacks for critical operations.
- Monitoring retry patterns to identify recurring issues.
Here is an example of error handling:
fun main() {
runBlocking {
val logger = LoggerFactory.getLogger("Example")
val resilientClient = RetryingLLMClient(
OpenAILLMClient(System.getenv("OPENAI_API_KEY")),
RetryConfig.PRODUCTION
)
val prompt = prompt("test") { user("Hello") }
val model = OpenAIModels.Chat.GPT4o
fun processResponse(response: Any) { /* implmenentation */ }
fun scheduleRetryLater() { /* implmenentation */ }
fun notifyAdministrator() { /* implmenentation */ }
fun useDefaultResponse() { /* implmenentation */ }
try {
val response = resilientClient.execute(prompt, model)
processResponse(response)
} catch (e: Exception) {
logger.error("LLM operation failed", e)
when {
e.message?.contains("rate limit") == true -> {
// Handle rate limiting specifically
scheduleRetryLater()
}
e.message?.contains("invalid api key") == true -> {
// Handle authentication errors
notifyAdministrator()
}
else -> {
// Fall back to an alternative solution
useDefaultResponse()
}
}
}
}
}