Everything under the hood.
Simple REST API. No SSML, no markup — the engine handles pacing and emphasis.
Audio Output
- Formats
- MP3, WAV, FLAC, OGG, raw PCM
- Sample Rates
- 22.05 kHz, 44.1 kHz, 48 kHz
- Bit Depth
- 16-bit, 24-bit
- Channels
- Mono (stereo on request)
- Voices
- Emily (general prose). Additional voices in development.
Platform
- Max Input
- 100,000 characters per request. Batch mode for full books.
- Latency
- Streaming: first audio in <500ms. Full render: ~1s per minute of output.
- Preprocessing
- Optional editorial text pipeline with custom pronunciation lexicon.
- Authentication
- Bearer token (API key)
- Rate Limits
- 1,000 requests/hour (higher limits available)
import { StarbeeLabs } from 'starbeelabs';
const client = new StarbeeLabs('your-api-key');
const audio = await client.speak({
text: "The decline of Rome was the natural and
inevitable effect of immoderate greatness.",
voice: 'emily',
format: 'mp3',
sampleRate: 48000
});
// → AudioBuffer (48kHz, mono, 24-bit)
from starbeelabs import Client
client = Client("your-api-key")
audio = client.speak(
text="The decline of Rome was the natural and
inevitable effect of immoderate greatness.",
voice="emily",
format="mp3",
sample_rate=48000
)
# → bytes (MP3, 48kHz, mono)
curl https://api.starbeelabs.org/v1/speak \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"text": "The decline of Rome was the natural
and inevitable effect of immoderate
greatness.",
"voice": "emily",
"format": "mp3",
"sample_rate": 48000
}' \
-o output.mp3
# → output.mp3 (48kHz, mono, 24-bit)
Simple pricing.
Per-hour of generated audio. No seat fees. No subscriptions.
Standard
- MP3 & WAV output
- 48 kHz, mono
- 1,000 req/hour
- Email support
Volume
- All formats (MP3, WAV, FLAC, OGG)
- 48 kHz, 24-bit
- 10,000 req/hour
- Custom pronunciation lexicon
Enterprise
- Dedicated infrastructure
- Batch processing (full books)
- Unlimited rate · SLA
- Contact for details ›
Publishing inquiries and early API access.
Voices
Trained for long-form prose — not short prompts or chatbots.






Available Now
- Emily
- General-purpose prose voice. Trained on thousands of hours of narrated nonfiction. Clear, authoritative, and warm — designed for extended listening without fatigue.
- Register Adaptation
- Emily adjusts register automatically based on content. Military dispatches sound formal. Travel narratives sound expansive. Legal texts sound deliberate. No manual tuning required.
- Pronunciation
- Built-in lexicon for historical names, places, and foreign terms. Gibbon's Theodora, Mommsen's Sulpicius — handled correctly out of the box.
In Development
- Additional Voices
- New voices are being trained for specific use cases: verse narration, dramatic dialogue, and documentary-style delivery. Each voice undergoes the same editorial training pipeline as Emily.
- Custom Voice Training
- Enterprise customers can commission a custom voice trained to their specifications. This includes register profiling, pronunciation tuning, and prosody calibration for your specific content domain.
- Voice Consistency
- The engine maintains consistent voice character across arbitrarily long texts. Chapter 1 and Chapter 170 sound like the same narrator. No drift, no degradation.
Output Formats
Production-quality audio in every major format.
Supported Formats
- MP3
- Variable bitrate up to 320 kbps. Ideal for distribution, audiobook platforms, and streaming. The most widely compatible option.
- WAV
- Uncompressed PCM audio. Use this when you need lossless quality for post-production, mastering, or archival.
- FLAC
- Lossless compression. Typically 50–60% the size of WAV with bit-perfect reproduction. Good for storage-conscious lossless workflows.
Encoding Details
- OGG Vorbis
- Open-source lossy format. Excellent quality-to-size ratio. Preferred for game engines and open-source projects.
- Raw PCM
- Headerless PCM data for custom pipelines. You specify sample rate, bit depth, and endianness. For applications that need direct buffer access.
- Sample Rates
- 22.05 kHz (telephone/podcast), 44.1 kHz (CD quality), or 48 kHz (broadcast/film). Default is 48 kHz.
- Bit Depth
- 16-bit (standard) or 24-bit (studio). 24-bit provides greater dynamic range for post-production work.
Latency & Performance
Fast enough for real-time, thorough enough for books.
Streaming Mode
- Time to First Audio
- Under 500 milliseconds. The engine begins streaming audio chunks before the full render is complete. Suitable for interactive applications.
- Chunk Delivery
- Audio is delivered in sequential chunks via Server-Sent Events or chunked HTTP transfer. Each chunk is a valid audio fragment that can be played immediately.
- Connection
- Persistent HTTPS connections. Keep-alive supported for sequential requests without reconnection overhead.
Batch Mode
- Full Render Speed
- Approximately 1 second of processing per minute of output audio. A 10-minute chapter renders in roughly 10 seconds.
- Book-Length Processing
- Batch mode accepts full books (up to 100,000 characters per request). The engine processes chapters sequentially, maintaining voice consistency and prosody continuity across chapter boundaries.
- Webhook Delivery
- For batch jobs, provide a webhook URL. The engine POSTs the completed audio file when rendering finishes. No polling required.
Text Preprocessing
Editorial preparation that makes the difference between reading and narrating.
The Pipeline
- Editorial Normalization
- Footnote markers, chapter headings, epigraphs, block quotes — the pipeline recognizes document structure and adjusts pacing accordingly. A footnote doesn't sound like body text.
- Abbreviation Expansion
- Handles “cf.”, “viz.”, “ibid.”, regnal numbers (Henry VIII), dates, and scholarly notation. Expanded naturally, not robotically.
- Sentence Boundary Detection
- Disambiguates periods in abbreviations, initials, and decimals from sentence endings. “Dr. Johnson arrived at 3.15” is one sentence, not three.
Pronunciation
- Golden Ibis Dictionary
- A curated pronunciation lexicon covering thousands of historical names, places, and foreign terms. Continuously expanded as new texts are processed.
- Custom Lexicon
- Volume and Enterprise tiers can upload a custom pronunciation dictionary. Useful for domain-specific terminology, character names, or house style preferences.
- Per-Book Review
- For publishing clients, our editorial team reviews pronunciation for each title before final render. A/B comparison sweeps catch edge cases the dictionary misses.
Authentication
Simple bearer token authentication. No OAuth, no sessions.
Getting Started
- API Keys
- Each account receives an API key upon registration. Include it as a Bearer token in the Authorization header of every request.
- Key Management
- Rotate keys at any time from your dashboard. Old keys are immediately invalidated. You can maintain multiple active keys for different environments (dev, staging, production).
- Security
- All API traffic is encrypted via TLS 1.3. Keys are stored hashed — we cannot retrieve your key after issuance. If lost, generate a new one.
Request Format
- Header
Authorization: Bearer your-api-key- Content Type
- All requests use
application/json. Audio is returned as a binary response with the appropriate MIME type. - Error Responses
- Standard HTTP status codes. 401 for invalid key, 429 for rate limit exceeded, 400 for malformed request. All errors include a JSON body with a human-readable message.
Simple pricing.
Per-hour of generated audio. No seat fees. No subscriptions.
Standard
- MP3 & WAV output
- 48 kHz, mono
- 1,000 req/hour
- Email support
Volume
- All formats (MP3, WAV, FLAC, OGG)
- 48 kHz, 24-bit
- 10,000 req/hour
- Custom pronunciation lexicon
Enterprise
- Dedicated infrastructure
- Batch processing (full books)
- Unlimited rate · SLA
- Contact for details ›
Rate Limits
Generous defaults. Higher limits on request.
Default Limits
- Standard Tier
- 1,000 requests per hour. Each request can contain up to 100,000 characters. This is enough to render roughly 15–20 hours of audio per hour of clock time.
- Volume Tier
- 10,000 requests per hour. Designed for production pipelines processing multiple titles concurrently.
- Burst Allowance
- Short bursts above your hourly limit are tolerated (up to 2x for 60 seconds). Sustained overages return 429 status codes.
Enterprise & Custom
- Unlimited Rate
- Enterprise accounts have no rate limits. Requests are queued and processed on dedicated infrastructure with guaranteed throughput.
- Concurrency
- Standard: 5 concurrent requests. Volume: 20 concurrent. Enterprise: unlimited. Concurrent requests process in parallel for faster batch throughput.
- Monitoring
- Usage dashboards show request counts, audio hours generated, error rates, and average latency. Exportable as CSV for your own analytics.
Code Examples
Three lines to your first audio. SDKs for JavaScript and Python, or use cURL directly.
import { StarbeeLabs } from 'starbeelabs';
const client = new StarbeeLabs('your-api-key');
const audio = await client.speak({
text: "The decline of Rome was the natural and
inevitable effect of immoderate greatness.",
voice: 'emily',
format: 'mp3',
sampleRate: 48000
});
// → AudioBuffer (48kHz, mono, 24-bit)
from starbeelabs import Client
client = Client("your-api-key")
audio = client.speak(
text="The decline of Rome was the natural and
inevitable effect of immoderate greatness.",
voice="emily",
format="mp3",
sample_rate=48000
)
# → bytes (MP3, 48kHz, mono)
curl https://api.starbeelabs.org/v1/speak \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"text": "The decline of Rome was the natural
and inevitable effect of immoderate
greatness.",
"voice": "emily",
"format": "mp3",
"sample_rate": 48000
}' \
-o output.mp3
# → output.mp3 (48kHz, mono, 24-bit)
SDKs
- JavaScript / TypeScript
npm install starbeelabs
Works in Node.js 18+ and modern browsers (via fetch).- Python
pip install starbeelabs
Python 3.8+. Async support viaasyncio.
Direct API
- REST Endpoint
POST https://api.starbeelabs.org/v1/speak- Response
- Binary audio data with
Content-Type: audio/mpeg(or appropriate MIME type). Stream withAccept: text/event-streamfor chunked delivery.