Starbee Labs

Everything under the hood.

Simple REST API. No SSML, no markup — the engine handles pacing and emphasis.

Audio Output

Formats: MP3, WAV, FLAC, OGG, raw PCM
Sample Rates: 22.05 kHz, 44.1 kHz, 48 kHz
Bit Depth: 16-bit, 24-bit
Channels: Mono (stereo on request)
Voices: Emily (general prose). Additional voices in development.

Platform

Max Input: 100,000 characters per request. Batch mode for full books.
Latency: Streaming: first audio in <500ms. Full render: ~1s per minute of output.
Preprocessing: Optional editorial text pipeline with custom pronunciation lexicon.
Authentication: Bearer token (API key)
Rate Limits: 1,000 requests/hour (higher limits available)

Starbee Labs — API

JavaScript

Python

cURL

                            import { StarbeeLabs } from 'starbeelabs';

const client = new StarbeeLabs('your-api-key');

const audio = await client.speak({
  text: "The decline of Rome was the natural and
        inevitable effect of immoderate greatness.",
  voice: 'emily',
  format: 'mp3',
  sampleRate: 48000
});

// → AudioBuffer (48kHz, mono, 24-bit)
                        

                            from starbeelabs import Client

client = Client("your-api-key")

audio = client.speak(
    text="The decline of Rome was the natural and
          inevitable effect of immoderate greatness.",
    voice="emily",
    format="mp3",
    sample_rate=48000
)

# → bytes (MP3, 48kHz, mono)
                        

                            curl https://api.starbeelabs.org/v1/speak \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The decline of Rome was the natural
             and inevitable effect of immoderate
             greatness.",
    "voice": "emily",
    "format": "mp3",
    "sample_rate": 48000
  }' \
  -o output.mp3

# → output.mp3 (48kHz, mono, 24-bit)
                        

Simple pricing.

Per-hour of generated audio. No seat fees. No subscriptions.

Start Here

Standard

$8 / hour of audio

MP3 & WAV output
48 kHz, mono
1,000 req/hour
Email support

Best Value

Volume

$5 / hour of audio

All formats (MP3, WAV, FLAC, OGG)
48 kHz, 24-bit
10,000 req/hour
Custom pronunciation lexicon

Dedicated

Enterprise

Custom pricing

Dedicated infrastructure
Batch processing (full books)
Unlimited rate · SLA
Contact for details ›

Publishing inquiries and early API access.

Get in Touch Listen Again

Voices

Trained for long-form prose — not short prompts or chatbots.

Emily

Hathor

Khufu

Nunut

Bennu

Toth

Available Now

Emily: General-purpose prose voice. Trained on thousands of hours of narrated nonfiction. Clear, authoritative, and warm — designed for extended listening without fatigue.
Register Adaptation: Emily adjusts register automatically based on content. Military dispatches sound formal. Travel narratives sound expansive. Legal texts sound deliberate. No manual tuning required.
Pronunciation: Built-in lexicon for historical names, places, and foreign terms. Gibbon's Theodora, Mommsen's Sulpicius — handled correctly out of the box.

In Development

Additional Voices: New voices are being trained for specific use cases: verse narration, dramatic dialogue, and documentary-style delivery. Each voice undergoes the same editorial training pipeline as Emily.
Custom Voice Training: Enterprise customers can commission a custom voice trained to their specifications. This includes register profiling, pronunciation tuning, and prosody calibration for your specific content domain.
Voice Consistency: The engine maintains consistent voice character across arbitrarily long texts. Chapter 1 and Chapter 170 sound like the same narrator. No drift, no degradation.

Output Formats

Production-quality audio in every major format.

Supported Formats

MP3: Variable bitrate up to 320 kbps. Ideal for distribution, audiobook platforms, and streaming. The most widely compatible option.
WAV: Uncompressed PCM audio. Use this when you need lossless quality for post-production, mastering, or archival.
FLAC: Lossless compression. Typically 50–60% the size of WAV with bit-perfect reproduction. Good for storage-conscious lossless workflows.

Encoding Details

OGG Vorbis: Open-source lossy format. Excellent quality-to-size ratio. Preferred for game engines and open-source projects.
Raw PCM: Headerless PCM data for custom pipelines. You specify sample rate, bit depth, and endianness. For applications that need direct buffer access.
Sample Rates: 22.05 kHz (telephone/podcast), 44.1 kHz (CD quality), or 48 kHz (broadcast/film). Default is 48 kHz.
Bit Depth: 16-bit (standard) or 24-bit (studio). 24-bit provides greater dynamic range for post-production work.

Latency & Performance

Fast enough for real-time, thorough enough for books.

Streaming Mode

Time to First Audio: Under 500 milliseconds. The engine begins streaming audio chunks before the full render is complete. Suitable for interactive applications.
Chunk Delivery: Audio is delivered in sequential chunks via Server-Sent Events or chunked HTTP transfer. Each chunk is a valid audio fragment that can be played immediately.
Connection: Persistent HTTPS connections. Keep-alive supported for sequential requests without reconnection overhead.

Batch Mode

Full Render Speed: Approximately 1 second of processing per minute of output audio. A 10-minute chapter renders in roughly 10 seconds.
Book-Length Processing: Batch mode accepts full books (up to 100,000 characters per request). The engine processes chapters sequentially, maintaining voice consistency and prosody continuity across chapter boundaries.
Webhook Delivery: For batch jobs, provide a webhook URL. The engine POSTs the completed audio file when rendering finishes. No polling required.

Text Preprocessing

Editorial preparation that makes the difference between reading and narrating.

The Pipeline

Editorial Normalization: Footnote markers, chapter headings, epigraphs, block quotes — the pipeline recognizes document structure and adjusts pacing accordingly. A footnote doesn't sound like body text.
Abbreviation Expansion: Handles “cf.”, “viz.”, “ibid.”, regnal numbers (Henry VIII), dates, and scholarly notation. Expanded naturally, not robotically.
Sentence Boundary Detection: Disambiguates periods in abbreviations, initials, and decimals from sentence endings. “Dr. Johnson arrived at 3.15” is one sentence, not three.

Pronunciation

Golden Ibis Dictionary: A curated pronunciation lexicon covering thousands of historical names, places, and foreign terms. Continuously expanded as new texts are processed.
Custom Lexicon: Volume and Enterprise tiers can upload a custom pronunciation dictionary. Useful for domain-specific terminology, character names, or house style preferences.
Per-Book Review: For publishing clients, our editorial team reviews pronunciation for each title before final render. A/B comparison sweeps catch edge cases the dictionary misses.

Authentication

Simple bearer token authentication. No OAuth, no sessions.

Getting Started

API Keys: Each account receives an API key upon registration. Include it as a Bearer token in the Authorization header of every request.
Key Management: Rotate keys at any time from your dashboard. Old keys are immediately invalidated. You can maintain multiple active keys for different environments (dev, staging, production).
Security: All API traffic is encrypted via TLS 1.3. Keys are stored hashed — we cannot retrieve your key after issuance. If lost, generate a new one.

Request Format

Header: Authorization: Bearer your-api-key
Content Type: All requests use application/json. Audio is returned as a binary response with the appropriate MIME type.
Error Responses: Standard HTTP status codes. 401 for invalid key, 429 for rate limit exceeded, 400 for malformed request. All errors include a JSON body with a human-readable message.

Simple pricing.

Per-hour of generated audio. No seat fees. No subscriptions.

Start Here

Standard

$8 / hour of audio

MP3 & WAV output
48 kHz, mono
1,000 req/hour
Email support

Best Value

Volume

$5 / hour of audio

All formats (MP3, WAV, FLAC, OGG)
48 kHz, 24-bit
10,000 req/hour
Custom pronunciation lexicon

Dedicated

Enterprise

Custom pricing

Dedicated infrastructure
Batch processing (full books)
Unlimited rate · SLA
Contact for details ›

Rate Limits

Generous defaults. Higher limits on request.

Default Limits

Standard Tier: 1,000 requests per hour. Each request can contain up to 100,000 characters. This is enough to render roughly 15–20 hours of audio per hour of clock time.
Volume Tier: 10,000 requests per hour. Designed for production pipelines processing multiple titles concurrently.
Burst Allowance: Short bursts above your hourly limit are tolerated (up to 2x for 60 seconds). Sustained overages return 429 status codes.

Enterprise & Custom

Unlimited Rate: Enterprise accounts have no rate limits. Requests are queued and processed on dedicated infrastructure with guaranteed throughput.
Concurrency: Standard: 5 concurrent requests. Volume: 20 concurrent. Enterprise: unlimited. Concurrent requests process in parallel for faster batch throughput.
Monitoring: Usage dashboards show request counts, audio hours generated, error rates, and average latency. Exportable as CSV for your own analytics.

Code Examples

Three lines to your first audio. SDKs for JavaScript and Python, or use cURL directly.

Starbee Labs — API

JavaScript

Python

cURL

                            import { StarbeeLabs } from 'starbeelabs';

const client = new StarbeeLabs('your-api-key');

const audio = await client.speak({
  text: "The decline of Rome was the natural and
        inevitable effect of immoderate greatness.",
  voice: 'emily',
  format: 'mp3',
  sampleRate: 48000
});

// → AudioBuffer (48kHz, mono, 24-bit)
                        

                            from starbeelabs import Client

client = Client("your-api-key")

audio = client.speak(
    text="The decline of Rome was the natural and
          inevitable effect of immoderate greatness.",
    voice="emily",
    format="mp3",
    sample_rate=48000
)

# → bytes (MP3, 48kHz, mono)
                        

                            curl https://api.starbeelabs.org/v1/speak \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The decline of Rome was the natural
             and inevitable effect of immoderate
             greatness.",
    "voice": "emily",
    "format": "mp3",
    "sample_rate": 48000
  }' \
  -o output.mp3

# → output.mp3 (48kHz, mono, 24-bit)
                        

SDKs

JavaScript / TypeScript: npm install starbeelabs
Works in Node.js 18+ and modern browsers (via fetch).
Python: pip install starbeelabs
Python 3.8+. Async support via asyncio.

Direct API

REST Endpoint: POST https://api.starbeelabs.org/v1/speak
Response: Binary audio data with Content-Type: audio/mpeg (or appropriate MIME type). Stream with Accept: text/event-stream for chunked delivery.