Prompt-to-Speech

OverviewCopied!

Prompt-to-Speech (PTS) allows you to generate conversational speech using natural language prompts to interact with AI voices. This is particularly useful when you want to have dynamic conversations with AI agents or NPCs.

PrerequisitesCopied!

Before starting, you'll need:

A valid API key
A Speaker ID (see Voice Selection)
Prompts describing what you want to discuss with the AI voice

Basic Usage (Text-based prompting)Copied!

The basic PTTS process involves sending a prompt that will guide the conversation and speech generation:

import requests
url = "https://api.replicastudios.com/v2/speech/ptts"

# Request payload
payload = {
    "user_prompt": "When shall we approach the Rubicon?",
    "system_prompt": "You are Julius Caesar and know his history - as it is now yours.",
    "conversation_history": {
        "messages": [
            {
                "role": "user",
                "target": "assistant",
                "text": "Ave Caesar, morituri te salutant.",
                "additional_context": None
            },
            {
                "role": "assistant",
                "target": "user",
                "text": "Ave my general, let me know that your resolve is firm.",
                "additional_context": None
            }
        ]
    },
    "speech": {
        "speaker_id": "9b1f5c24-a18b-4b9e-a785-b3a3b3b8751a",  # Replace with your chosen Speaker ID
        "model_chain": "latest",
        "language_code": "en",
        "global_pace": 1.0,
        "global_pitch": 0,
        "global_volume": 0
    }
}

headers = {
    "Content-Type": "application/json",
    "X-Api-Key": "..."  # Replace with your API key
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Audio-based promptingCopied!

PSTS (Prompt Speech-to-Speech) allows you to use an audio recording as your prompt instead of text. This means you can provide a voice recording of your prompt, and the system will understand and respond to it while maintaining the conversation context.

import requests
url = "https://api.replicastudios.com/v2/speech/psts"

# Request payload
payload = {
    "user_prompt_audio": "data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAA...",
    "system_prompt": "You are Julius Caesar and know his history - as it is now yours.",
    "speech": {
        "speaker_id": "9b1f5c24-a18b-4b9e-a785-b3a3b3b8751a",
        "model_chain": "latest",
        "language_code": "en"
    }
}

headers = {
    "Content-Type": "application/json",
    "X-Api-Key": "..."  # Replace with your API key
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Prompt Audio Requirements

Format: Base64 encoded audio URI (data:audio/[format];base64,...)

Supported formats:
- WAV (audio/wav)
- MP3 (audio/mpeg)
- OGG (audio/ogg)
- WebM (audio/webm)
Duration: Up to 30 seconds
Sample rate: 16kHz or higher
Channels: Mono or stereo
Clear speech without background noise
Single speaker only

Additional ParametersCopied!

The speech parameters of the generations can be customized in your payload as speech. These behave just like in standard speech generation:

{
    "user_prompt": "When shall we approach the Rubicon?",
    "system_prompt": "You are Julius Caesar and know his history - as it is now yours.",
    "speech": {
        "speaker_id": "9b1f5c24-a18b-4b9e-a785-b3a3b3b8751a",
        "model_chain": "latest",
        "language_code": "en",
        "global_pace": 1.2,
        "global_pitch": 0.5,
        "global_volume": 0.5,
        "voice_preset_id": "custom-preset-uuid"
    }
}

For more details on these parameters, see [Global Controls](docs/global-controls.md).

Conversation HistoryCopied!

The conversation history is a list of messages used to maintain the conversation context. It is a list of dictionaries, where each dictionary contains a role and content key. The role key can be either assistant or user, and the content key contains the message text.

All messages adhere to the following format:

{
    "role": "assistant" | "user",
    "target": "assistant" | "user",
    "text": string,
    "additional_context": null | string  // Currently unused
}

Writing Effective PromptsCopied!

When writing prompts, consider:

1. User Prompt

This is your main question/conversation starter (or a continuation of a previous conversation)
Be specific about what you want to discuss
Example: "I wish to learn more about your teachings of stoicism"

2. System Prompt (Optional)

Sets the context and persona for the AI voice
Helps shape the character's knowledge and personality
Example: "You are a wise mentor giving advice to a student ala Marcus Aurelius"

3. Conversation History (Optional)

Provides context from previous interactions
Helps maintain consistency in tone and content
Useful for multi-turn conversations
Can be passed on from the response of a previous request

Best PracticesCopied!

1. Prompt Length

Keep prompts clear and concise
Focus on the most important aspects of the conversation
Avoid overly complex or contradictory statements

2. Natural Language

Use natural language rather than technical terms
Write prompts as you would speak to another person
Be clear about what you want to discuss

3. Testing

Experiment with different prompt phrasings
Start with simple prompts and iterate
Save successful prompts for reuse with similar content

For troubleshooting common issues, refer to our Troubleshooting Guide.

Previous Page

Speech-to-Speech

Next Page

Troubleshooting