API Documentation
Complete reference for the TTSFM Text-to-Speech API. Free, simple, and powerful.
Overview
The TTSFM API provides a modern, OpenAI-compatible interface for text-to-speech generation. It supports multiple voices, audio formats, and includes advanced features like text length validation and intelligent auto-combine functionality.
http://tts.isp.skin/api/
Key Features
- π€ 11 different voice options - Choose from alloy, echo, nova, and more
- π΅ Multiple audio formats - MP3, WAV, OPUS, AAC, FLAC, PCM support
- π€ OpenAI compatibility - Drop-in replacement for OpenAI's TTS API
- β¨ Auto-combine feature - Automatically handles long text (>4096 chars) by splitting and combining audio
- π Text length validation - Smart validation with configurable limits
- π Real-time monitoring - Status endpoints and health checks
Authentication
Currently, the API supports optional API key authentication. If configured, include your API key in the request headers.
Authorization: Bearer YOUR_API_KEY
Text Length Validation
TTSFM includes built-in text length validation to ensure compatibility with TTS models. The default maximum length is 4096 characters, but this can be customized.
Validation Options
max_length: Maximum allowed characters (default: 4096)validate_length: Enable/disable validation (default: true)preserve_words: Avoid splitting words when chunking (default: true)
API Endpoints
GET /api/voices
Get list of available voices.
Response Example:
{
"voices": [
{
"id": "alloy",
"name": "Alloy",
"description": "Alloy voice"
},
{
"id": "echo",
"name": "Echo",
"description": "Echo voice"
}
],
"count": 6
}
GET /api/formats
Get available audio formats for speech generation.
Available Formats
We support multiple format requests, but internally:
- mp3 - Returns actual MP3 format
- All other formats (opus, aac, flac, wav, pcm) - Mapped to WAV format
Response Example:
{
"formats": [
{
"id": "mp3",
"name": "MP3",
"mime_type": "audio/mp3",
"description": "MP3 audio format"
},
{
"id": "opus",
"name": "Opus",
"mime_type": "audio/wav",
"description": "Returns WAV format"
},
{
"id": "aac",
"name": "AAC",
"mime_type": "audio/wav",
"description": "Returns WAV format"
},
{
"id": "flac",
"name": "FLAC",
"mime_type": "audio/wav",
"description": "Returns WAV format"
},
{
"id": "wav",
"name": "WAV",
"mime_type": "audio/wav",
"description": "WAV audio format"
},
{
"id": "pcm",
"name": "PCM",
"mime_type": "audio/wav",
"description": "Returns WAV format"
}
],
"count": 6
}
POST /api/validate-text
Validate text length and get splitting suggestions.
Request Body:
{
"text": "Your text to validate",
"max_length": 4096
}
Response Example:
{
"text_length": 5000,
"max_length": 4096,
"is_valid": false,
"needs_splitting": true,
"suggested_chunks": 2,
"chunk_preview": [
"First chunk preview...",
"Second chunk preview..."
]
}
POST /api/generate
Generate speech from text.
Request Body:
{
"text": "Hello, world!",
"voice": "alloy",
"format": "mp3",
"instructions": "Speak cheerfully",
"max_length": 4096,
"validate_length": true
}
Parameters:
text(required): Text to convert to speechvoice(optional): Voice ID (default: "alloy")format(optional): Audio format (default: "mp3")instructions(optional): Voice modulation instructionsmax_length(optional): Maximum text length (default: 4096)validate_length(optional): Enable validation (default: true)
Response:
Returns audio file with appropriate Content-Type header.
Python Package
Long Text Support
The TTSFM Python package includes built-in long text splitting functionality for developers who need fine-grained control:
from ttsfm import TTSClient, Voice, AudioFormat
# Create client
client = TTSClient()
# Generate speech from long text (automatically splits into separate files)
responses = client.generate_speech_long_text(
text="Very long text that exceeds 4096 characters...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
max_length=2000,
preserve_words=True
)
# Save each chunk as separate files
for i, response in enumerate(responses, 1):
response.save_to_file(f"part_{i:03d}.mp3")
Developer Features:
- Manual Splitting: Full control over text chunking for advanced use cases
- Word Preservation: Maintains word boundaries for natural speech
- Separate Files: Each chunk saved as individual audio file
- CLI Support: Use `--split-long-text` flag for command-line usage
POST /api/generate-combined
Generate a single combined audio file from long text. Automatically splits text into chunks, generates speech for each chunk, and combines them into one seamless audio file.
Request Body:
{
"text": "Very long text that exceeds the limit...",
"voice": "alloy",
"format": "mp3",
"instructions": "Optional voice instructions",
"max_length": 4096,
"preserve_words": true
}
Response:
Returns a single audio file containing all chunks combined seamlessly.
Response Headers:
X-Chunks-Combined: Number of chunks that were combinedX-Original-Text-Length: Original text length in charactersX-Audio-Size: Final audio file size in bytes
POST /v1/audio/speech
Enhanced OpenAI-compatible endpoint with auto-combine feature. Automatically handles long text by splitting and combining audio chunks when needed.
Request Body:
{
"model": "gpt-4o-mini-tts",
"input": "Text of any length...",
"voice": "alloy",
"response_format": "mp3",
"instructions": "Optional voice instructions",
"speed": 1.0,
"auto_combine": true,
"max_length": 4096
}
Enhanced Parameters:
- auto_combine (boolean, default: true):
true: Automatically split long text and combine audio chunks into a single filefalse: Return error if text exceeds max_length (standard OpenAI behavior)
- max_length (integer, default: 4096): Maximum characters per chunk when splitting
Response Headers:
X-Auto-Combine: Whether auto-combine was enabled (true/false)X-Chunks-Combined: Number of audio chunks combined (1 for short text)X-Original-Text-Length: Original text length (for long text processing)X-Audio-Format: Audio format of the responseX-Audio-Size: Audio file size in bytes
docs.examples_title
# Short text (works normally)
curl -X POST http://tts.isp.skin/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Hello world!",
"voice": "alloy"
}'
# Long text with auto-combine (default)
curl -X POST http://tts.isp.skin/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Very long text...",
"voice": "alloy",
"auto_combine": true
}'
# Long text without auto-combine (will error)
curl -X POST http://tts.isp.skin/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Very long text...",
"voice": "alloy",
"auto_combine": false
}'
Use Cases:
- Long Articles: Convert blog posts or articles to single audio files
- Audiobooks: Generate chapters as single audio files
- Podcasts: Create podcast episodes from scripts
- Educational Content: Convert learning materials to audio
Example Usage:
# Python example
import requests
response = requests.post(
"http://tts.isp.skin/api/generate-combined",
json={
"text": "Your very long text content here...",
"voice": "nova",
"format": "mp3",
"max_length": 2000
}
)
if response.status_code == 200:
with open("combined_audio.mp3", "wb") as f:
f.write(response.content)
chunks = response.headers.get('X-Chunks-Combined')
print(f"Combined {chunks} chunks into single file")
WebSocket Streaming
Real-time audio streaming for enhanced user experience. Get audio chunks as they're generated instead of waiting for the complete file.
Connection
// JavaScript WebSocket client
const client = new WebSocketTTSClient({
socketUrl: 'http://tts.isp.skin',
debug: true
});
// Connection events
client.onConnect = () => console.log('Connected');
client.onDisconnect = () => console.log('Disconnected');
Streaming TTS Generation
// Generate speech with real-time streaming
const result = await client.generateSpeech('Hello, WebSocket world!', {
voice: 'alloy',
format: 'mp3',
chunkSize: 1024, // Characters per chunk
// Progress callback
onProgress: (progress) => {
console.log(`Progress: ${progress.progress}%`);
console.log(`Chunks: ${progress.chunksCompleted}/${progress.totalChunks}`);
},
// Receive audio chunks in real-time
onChunk: (chunk) => {
console.log(`Received chunk ${chunk.chunkIndex + 1}`);
// Process or play audio chunk immediately
processAudioChunk(chunk.audioData);
},
// Completion callback
onComplete: (result) => {
console.log('Streaming complete!');
// result.audioData contains the complete audio
}
});
WebSocket Events
Client β Server Events
| Event | Description | Payload |
|---|---|---|
generate_stream |
Start TTS generation | {text, voice, format, chunk_size} |
cancel_stream |
Cancel active stream | {request_id} |
Server β Client Events
| Event | Description | Payload |
|---|---|---|
stream_started |
Stream initiated | {request_id, timestamp} |
audio_chunk |
Audio chunk ready | {request_id, chunk_index, audio_data, duration} |
stream_progress |
Progress update | {progress, chunks_completed, total_chunks} |
stream_complete |
Generation complete | {request_id, total_chunks, status} |
stream_error |
Error occurred | {request_id, error, timestamp} |
Benefits
- Real-time feedback: Users see progress as audio generates
- Lower latency: First audio chunk arrives quickly
- Cancellable: Stop generation mid-stream if needed
- Efficient: Process chunks as they arrive
Example: Streaming Audio Player
// Create a streaming audio player
const audioChunks = [];
let isPlaying = false;
const streamingPlayer = await client.generateSpeech(longText, {
voice: 'nova',
format: 'mp3',
onChunk: (chunk) => {
// Store chunk
audioChunks.push(chunk.audioData);
// Start playing after first chunk
if (!isPlaying && audioChunks.length >= 3) {
startStreamingPlayback(audioChunks);
isPlaying = true;
}
},
onComplete: (result) => {
// Ensure all chunks are played
finishPlayback(result.audioData);
}
});
Try It Out!
Experience WebSocket streaming in action at the WebSocket Demo or enable streaming mode in the Playground.