Expand description
Streaming LLM provider support for Gestura
This module provides streaming capabilities for LLM responses, enabling real-time token-by-token delivery to the frontend with cancellation support.
Modules§
- pricing
- Pricing per 1M tokens (input/output) for various providers Prices are in USD and updated as of January 2026
Structs§
- Anthropic
Stream Request - Stream a response from Anthropic Claude API
- Cancellation
Token - Cancellation token for streaming requests
- Public
Narration - Structured public narration content rendered between major loop events.
- Task
Runtime Snapshot - Runtime-authored task scheduler snapshot streamed to UI surfaces.
- Task
Runtime Task View - Compact task view for runtime-authored task-state updates.
Enums§
- Cancellation
Disposition - Requested interruption disposition for a streaming request.
- Narration
Stage - Public-facing narration stage for brief between-tool updates.
- Shell
Output Stream - Which output stream a shell chunk originated from.
- Shell
Process State - Lifecycle state of a shell process.
- Shell
Session State - Lifecycle state of a long-lived interactive shell session.
- Stream
Chunk - A chunk of streaming response
- Token
Usage Status - Token usage status indicator for visual feedback
Functions§
- split_
think_ blocks - Split a complete assistant message into (user-facing text, optional thinking) based on
<think>...</think>blocks. - start_
streaming - Start a streaming LLM request based on config.
- start_
streaming_ with_ fallback - Start streaming with fallback to secondary provider on failure Implements jittered exponential backoff with rate-limit-aware delay selection before falling back.
- stream_
anthropic - stream_
gemini - Stream a response from Google Gemini API (Generative Language API).
- stream_
ollama - Stream a response from Ollama local API
- stream_
openai