Prosody Engine API

Send any text — a sentence, a chapter, or an entire book — and receive production-quality narration. Trained prosody, editorial pronunciation, and natural pacing.

Request API Access

48 kHz · 24-bit

Studio-quality output

<500ms Latency

Streaming first audio

100K Characters

Per request · batch mode

Custom Lexicon

Your pronunciation rules

Everything under the hood.

Simple REST API. No SSML, no markup — the engine handles pacing and emphasis.

Audio Output

Formats
MP3, WAV, FLAC, OGG, raw PCM
Sample Rates
22.05 kHz, 44.1 kHz, 48 kHz
Bit Depth
16-bit, 24-bit
Channels
Mono (stereo on request)
Voices
Emily (general prose). Additional voices in development.

Platform

Max Input
100,000 characters per request. Batch mode for full books.
Latency
Streaming: first audio in <500ms. Full render: ~1s per minute of output.
Preprocessing
Optional editorial text pipeline with custom pronunciation lexicon.
Authentication
Bearer token (API key)
Rate Limits
1,000 requests/hour (higher limits available)
Starbee Labs — API
JavaScript
Python
cURL
import { StarbeeLabs } from 'starbeelabs'; const client = new StarbeeLabs('your-api-key'); const audio = await client.speak({ text: "The decline of Rome was the natural and inevitable effect of immoderate greatness.", voice: 'emily', format: 'mp3', sampleRate: 48000 }); // → AudioBuffer (48kHz, mono, 24-bit)
from starbeelabs import Client client = Client("your-api-key") audio = client.speak( text="The decline of Rome was the natural and inevitable effect of immoderate greatness.", voice="emily", format="mp3", sample_rate=48000 ) # → bytes (MP3, 48kHz, mono)
curl https://api.starbeelabs.org/v1/speak \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "text": "The decline of Rome was the natural and inevitable effect of immoderate greatness.", "voice": "emily", "format": "mp3", "sample_rate": 48000 }' \ -o output.mp3 # → output.mp3 (48kHz, mono, 24-bit)

Simple pricing.

Per-hour of generated audio. No seat fees. No subscriptions.

Start Here

Standard

$8 / hour of audio
  • MP3 & WAV output
  • 48 kHz, mono
  • 1,000 req/hour
  • Email support
Best Value

Volume

$5 / hour of audio
  • All formats (MP3, WAV, FLAC, OGG)
  • 48 kHz, 24-bit
  • 10,000 req/hour
  • Custom pronunciation lexicon
Dedicated

Enterprise

Custom pricing

Publishing inquiries and early API access.

Voices

Trained for long-form prose — not short prompts or chatbots.

Emily
Emily
Hathor
Hathor
Cheops
Cheops
Nunut
Nunut
Bennu
Bennu
Toth
Toth

Available Now

Emily
General-purpose prose voice. Trained on thousands of hours of narrated nonfiction. Clear, authoritative, and warm — designed for extended listening without fatigue.
Register Adaptation
Emily adjusts register automatically based on content. Military dispatches sound formal. Travel narratives sound expansive. Legal texts sound deliberate. No manual tuning required.
Pronunciation
Built-in lexicon for historical names, places, and foreign terms. Gibbon's Theodora, Mommsen's Sulpicius — handled correctly out of the box.

In Development

Additional Voices
New voices are being trained for specific use cases: verse narration, dramatic dialogue, and documentary-style delivery. Each voice undergoes the same editorial training pipeline as Emily.
Custom Voice Training
Enterprise customers can commission a custom voice trained to their specifications. This includes register profiling, pronunciation tuning, and prosody calibration for your specific content domain.
Voice Consistency
The engine maintains consistent voice character across arbitrarily long texts. Chapter 1 and Chapter 170 sound like the same narrator. No drift, no degradation.

Output Formats

Production-quality audio in every major format.

Supported Formats

MP3
Variable bitrate up to 320 kbps. Ideal for distribution, audiobook platforms, and streaming. The most widely compatible option.
WAV
Uncompressed PCM audio. Use this when you need lossless quality for post-production, mastering, or archival.
FLAC
Lossless compression. Typically 50–60% the size of WAV with bit-perfect reproduction. Good for storage-conscious lossless workflows.

Encoding Details

OGG Vorbis
Open-source lossy format. Excellent quality-to-size ratio. Preferred for game engines and open-source projects.
Raw PCM
Headerless PCM data for custom pipelines. You specify sample rate, bit depth, and endianness. For applications that need direct buffer access.
Sample Rates
22.05 kHz (telephone/podcast), 44.1 kHz (CD quality), or 48 kHz (broadcast/film). Default is 48 kHz.
Bit Depth
16-bit (standard) or 24-bit (studio). 24-bit provides greater dynamic range for post-production work.

Latency & Performance

Fast enough for real-time, thorough enough for books.

Streaming Mode

Time to First Audio
Under 500 milliseconds. The engine begins streaming audio chunks before the full render is complete. Suitable for interactive applications.
Chunk Delivery
Audio is delivered in sequential chunks via Server-Sent Events or chunked HTTP transfer. Each chunk is a valid audio fragment that can be played immediately.
Connection
Persistent HTTPS connections. Keep-alive supported for sequential requests without reconnection overhead.

Batch Mode

Full Render Speed
Approximately 1 second of processing per minute of output audio. A 10-minute chapter renders in roughly 10 seconds.
Book-Length Processing
Batch mode accepts full books (up to 100,000 characters per request). The engine processes chapters sequentially, maintaining voice consistency and prosody continuity across chapter boundaries.
Webhook Delivery
For batch jobs, provide a webhook URL. The engine POSTs the completed audio file when rendering finishes. No polling required.

Text Preprocessing

Editorial preparation that makes the difference between reading and narrating.

The Pipeline

Editorial Normalization
Footnote markers, chapter headings, epigraphs, block quotes — the pipeline recognizes document structure and adjusts pacing accordingly. A footnote doesn't sound like body text.
Abbreviation Expansion
Handles “cf.”, “viz.”, “ibid.”, regnal numbers (Henry VIII), dates, and scholarly notation. Expanded naturally, not robotically.
Sentence Boundary Detection
Disambiguates periods in abbreviations, initials, and decimals from sentence endings. “Dr. Johnson arrived at 3.15” is one sentence, not three.

Pronunciation

Golden Ibis Dictionary
A curated pronunciation lexicon covering thousands of historical names, places, and foreign terms. Continuously expanded as new texts are processed.
Custom Lexicon
Volume and Enterprise tiers can upload a custom pronunciation dictionary. Useful for domain-specific terminology, character names, or house style preferences.
Per-Book Review
For publishing clients, our editorial team reviews pronunciation for each title before final render. A/B comparison sweeps catch edge cases the dictionary misses.

Authentication

Simple bearer token authentication. No OAuth, no sessions.

Getting Started

API Keys
Each account receives an API key upon registration. Include it as a Bearer token in the Authorization header of every request.
Key Management
Rotate keys at any time from your dashboard. Old keys are immediately invalidated. You can maintain multiple active keys for different environments (dev, staging, production).
Security
All API traffic is encrypted via TLS 1.3. Keys are stored hashed — we cannot retrieve your key after issuance. If lost, generate a new one.

Request Format

Header
Authorization: Bearer your-api-key
Content Type
All requests use application/json. Audio is returned as a binary response with the appropriate MIME type.
Error Responses
Standard HTTP status codes. 401 for invalid key, 429 for rate limit exceeded, 400 for malformed request. All errors include a JSON body with a human-readable message.

Simple pricing.

Per-hour of generated audio. No seat fees. No subscriptions.

Start Here

Standard

$8 / hour of audio
  • MP3 & WAV output
  • 48 kHz, mono
  • 1,000 req/hour
  • Email support
Best Value

Volume

$5 / hour of audio
  • All formats (MP3, WAV, FLAC, OGG)
  • 48 kHz, 24-bit
  • 10,000 req/hour
  • Custom pronunciation lexicon
Dedicated

Enterprise

Custom pricing

Rate Limits

Generous defaults. Higher limits on request.

Default Limits

Standard Tier
1,000 requests per hour. Each request can contain up to 100,000 characters. This is enough to render roughly 15–20 hours of audio per hour of clock time.
Volume Tier
10,000 requests per hour. Designed for production pipelines processing multiple titles concurrently.
Burst Allowance
Short bursts above your hourly limit are tolerated (up to 2x for 60 seconds). Sustained overages return 429 status codes.

Enterprise & Custom

Unlimited Rate
Enterprise accounts have no rate limits. Requests are queued and processed on dedicated infrastructure with guaranteed throughput.
Concurrency
Standard: 5 concurrent requests. Volume: 20 concurrent. Enterprise: unlimited. Concurrent requests process in parallel for faster batch throughput.
Monitoring
Usage dashboards show request counts, audio hours generated, error rates, and average latency. Exportable as CSV for your own analytics.

Code Examples

Three lines to your first audio. SDKs for JavaScript and Python, or use cURL directly.

Starbee Labs — API
JavaScript
Python
cURL
import { StarbeeLabs } from 'starbeelabs'; const client = new StarbeeLabs('your-api-key'); const audio = await client.speak({ text: "The decline of Rome was the natural and inevitable effect of immoderate greatness.", voice: 'emily', format: 'mp3', sampleRate: 48000 }); // → AudioBuffer (48kHz, mono, 24-bit)
from starbeelabs import Client client = Client("your-api-key") audio = client.speak( text="The decline of Rome was the natural and inevitable effect of immoderate greatness.", voice="emily", format="mp3", sample_rate=48000 ) # → bytes (MP3, 48kHz, mono)
curl https://api.starbeelabs.org/v1/speak \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "text": "The decline of Rome was the natural and inevitable effect of immoderate greatness.", "voice": "emily", "format": "mp3", "sample_rate": 48000 }' \ -o output.mp3 # → output.mp3 (48kHz, mono, 24-bit)

SDKs

JavaScript / TypeScript
npm install starbeelabs
Works in Node.js 18+ and modern browsers (via fetch).
Python
pip install starbeelabs
Python 3.8+. Async support via asyncio.

Direct API

REST Endpoint
POST https://api.starbeelabs.org/v1/speak
Response
Binary audio data with Content-Type: audio/mpeg (or appropriate MIME type). Stream with Accept: text/event-stream for chunked delivery.