Prompt-to-Speech

OverviewCopied!

Prompt-to-Speech (PTS) allows you to generate conversational speech using natural language prompts to interact with AI voices. This is particularly useful when you want to have dynamic conversations with AI agents or NPCs.

PrerequisitesCopied!

Before starting, you'll need:

  • A valid API key

  • A Speaker ID (see Voice Selection)

  • Prompts describing what you want to discuss with the AI voice

Basic Usage (Text-based prompting)Copied!

The basic PTTS process involves sending a prompt that will guide the conversation and speech generation:

import requests
url = "https://api.replicastudios.com/v2/speech/ptts"

# Request payload
payload = {
    "user_prompt": "When shall we approach the Rubicon?",
    "system_prompt": "You are Julius Caesar and know his history - as it is now yours.",
    "conversation_history": {
        "messages": [
            {
                "role": "user",
                "target": "assistant",
                "text": "Ave Caesar, morituri te salutant.",
                "additional_context": None
            },
            {
                "role": "assistant",
                "target": "user",
                "text": "Ave my general, let me know that your resolve is firm.",
                "additional_context": None
            }
        ]
    },
    "speech": {
        "speaker_id": "9b1f5c24-a18b-4b9e-a785-b3a3b3b8751a",  # Replace with your chosen Speaker ID
        "model_chain": "latest",
        "language_code": "en",
        "global_pace": 1.0,
        "global_pitch": 0,
        "global_volume": 0
    }
}

headers = {
    "Content-Type": "application/json",
    "X-Api-Key": "..."  # Replace with your API key
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Audio-based promptingCopied!

PSTS (Prompt Speech-to-Speech) allows you to use an audio recording as your prompt instead of text. This means you can provide a voice recording of your prompt, and the system will understand and respond to it while maintaining the conversation context.

import requests
url = "https://api.replicastudios.com/v2/speech/psts"

# Request payload
payload = {
    "user_prompt_audio": "data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAA...",
    "system_prompt": "You are Julius Caesar and know his history - as it is now yours.",
    "speech": {
        "speaker_id": "9b1f5c24-a18b-4b9e-a785-b3a3b3b8751a",
        "model_chain": "latest",
        "language_code": "en"
    }
}

headers = {
    "Content-Type": "application/json",
    "X-Api-Key": "..."  # Replace with your API key
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Prompt Audio Requirements

Format: Base64 encoded audio URI (data:audio/[format];base64,...)

  • Supported formats:

    • WAV (audio/wav)

    • MP3 (audio/mpeg)

    • OGG (audio/ogg)

    • WebM (audio/webm)

  • Duration: Up to 30 seconds

  • Sample rate: 16kHz or higher

  • Channels: Mono or stereo

  • Clear speech without background noise

  • Single speaker only

Additional ParametersCopied!

The speech parameters of the generations can be customized in your payload as speech. These behave just like in standard speech generation:

{
    "user_prompt": "When shall we approach the Rubicon?",
    "system_prompt": "You are Julius Caesar and know his history - as it is now yours.",
    "speech": {
        "speaker_id": "9b1f5c24-a18b-4b9e-a785-b3a3b3b8751a",
        "model_chain": "latest",
        "language_code": "en",
        "global_pace": 1.2,
        "global_pitch": 0.5,
        "global_volume": 0.5,
        "voice_preset_id": "custom-preset-uuid"
    }
}

For more details on these parameters, see [Global Controls](docs/global-controls.md).

Conversation HistoryCopied!

The conversation history is a list of messages used to maintain the conversation context. It is a list of dictionaries, where each dictionary contains a role and content key. The role key can be either assistant or user, and the content key contains the message text.

All messages adhere to the following format:

{
    "role": "assistant" | "user",
    "target": "assistant" | "user",
    "text": string,
    "additional_context": null | string  // Currently unused
}

Writing Effective PromptsCopied!

When writing prompts, consider:

1. User Prompt

  • This is your main question/conversation starter (or a continuation of a previous conversation)

  • Be specific about what you want to discuss

  • Example: "I wish to learn more about your teachings of stoicism"

2. System Prompt (Optional)

  • Sets the context and persona for the AI voice

  • Helps shape the character's knowledge and personality

  • Example: "You are a wise mentor giving advice to a student ala Marcus Aurelius"

3. Conversation History (Optional)

  • Provides context from previous interactions

  • Helps maintain consistency in tone and content

  • Useful for multi-turn conversations

  • Can be passed on from the response of a previous request

Best PracticesCopied!

1. Prompt Length

  • Keep prompts clear and concise

  • Focus on the most important aspects of the conversation

  • Avoid overly complex or contradictory statements

2. Natural Language

  • Use natural language rather than technical terms

  • Write prompts as you would speak to another person

  • Be clear about what you want to discuss

3. Testing

  • Experiment with different prompt phrasings

  • Start with simple prompts and iterate

  • Save successful prompts for reuse with similar content

For troubleshooting common issues, refer to our Troubleshooting Guide.