Speech-to-Speech

OverviewCopied!

Speech-to-Speech (STS) allows you to convert existing audio of someone speaking into speech using a Replica voice. This is a three-step process:

1. Create a speech job

2. Upload your source audio

3. Start the speech generation

PrerequisitesCopied!

Before starting, you'll need:

  • A valid API key

  • A Performer Style ID (see Voice Selection)

  • Source audio of someone speaking (supported formats: WAV, MP3, OGG)

Step 1: Create Speech JobCopied!

First, you need to obtain an upload URL. This is done through a POST request to create a speech job:

import requests

url = "https://api.replicastudios.com/v2/speech/sts"

# Initial payload to get upload URL
payload = {
    "performer_style_id": "07e62901-72c4-46e5-b009-aa0938d749df",  # Replace with your chosen Performer Style ID
    "model_chain": "vox_1_0",
}

headers = {
    "Content-Type": "application/json",
    "X-Api-Key": "..."  # Replace with your API key
}

# Get the upload URL
response = requests.post(url, json=payload, headers=headers)
upload_data = response.json()
s3_data = upload_data["additional_fields"]["upload_url"]
job_id = upload_data["uuid"]

Step 2: Upload Source AudioCopied!

Now upload your audio file to the provided URL:

# Upload the audio file
with open("source_audio.wav", "rb") as audio_file:
    upload_response = requests.post(
        s3_data['url'],
        data=s3_data['fields'],
        files={'file': audio_file}
    )

if upload_response.status_code != 200:
    print("Upload failed:", upload_response.text)

Step 3: Start Speech GenerationCopied!

Once your audio is uploaded, you can start the speech generation:

# Start the speech generation
start_url = f"https://api.replicastudios.com/v2/speech/sts/{job_id}/start"
start_response = requests.post(start_url, headers=headers)

if start_response.status_code == 200:
    result = start_response.json()
    print("Job Status:", result["state"])  # IN_PROGRESS

Additional ParametersCopied!

You can customize the speech generation by including additional parameters in your initial payload:

{
  "performer_style_id": "07e62901-72c4-46e5-b009-aa0938d749df",
  "model_chain": "vox_1_0",
  "global_pace": 1.2,
  "global_pitch": 0.5,
  "global_volume": 0.5
}

For more details on these parameters, see Global Controls.

Best PracticesCopied!

1. Audio Quality

  • Use clear, high-quality recordings

  • Minimize background noise

2. Language Support

  • The language_code parameter is more for project organization in STS.

  • The language of the source audio will be maintained in the conversion.

LimitationsCopied!

  • Audio is not directly limited by duration, but there is an upload limit of 50MB per request.

  • Supported input formats: WAV, MP3, OGG

  • Audio should contain clear speech without background music or noise

Complete ExampleCopied!

Here's a complete example including error handling:

import requests
import time

def generate_speech_from_audio(audio_file_path, performer_style_id, api_key):
    base_url = "https://api.replicastudios.com/v2"
    headers = {
        "Content-Type": "application/json",
        "X-Api-Key": api_key
    }
    
    # Step 1: Create the speech job / get upload URL
    try:
        create_response = requests.post(
            f"{base_url}/speech/sts",
            headers=headers,
            json={
                "performer_style_id": performer_style_id,
                "model_chain": "vox_1_0"
            },
        )
        create_response.raise_for_status()
        
        job_data = create_response.json()
        s3_data = job_data["additional_fields"]["upload_url"]
        job_id = job_data["uuid"]
        
        # Step 2: Upload audio file
        with open(audio_file_path, "rb") as audio_file:
            upload_response = requests.post(
                s3_data['url'],
                data=s3_data['fields'],
                files={'file': audio_file}
            )
            upload_response.raise_for_status()
        
        # Step 3: Start generation
        start_response = requests.post(
            f"{base_url}/speech/sts/{job_id}/start",
            headers=headers,
            json={
                "extensions": ["wav"]
            }
        )
        start_response.raise_for_status()
        
        result = start_response.json()
        print(result)
        if result["state"] == "ERROR":
            print(f"Error during speech generation: {result['additional_fields']['error']}")
            return None
        
        # Poll for the completion of the job
        while result["state"] == "IN_PROGRESS":
            time.sleep(0.1)
            result = requests.get(f"{base_url}/speech/{job_id}", headers=headers).json()
        
        return result["url"]
        
    except requests.exceptions.RequestException as e:
        print(f"Error during speech generation: {e}")
        return None

# Usage
api_key = "..."
performer_style_id = "07e62901-72c4-46e5-b009-aa0938d749df"
audio_file = "source_audio.wav"

result_url = generate_speech_from_audio(audio_file, performer_style_id, api_key)
if result_url:
    print(f"Generated audio available at: {result_url}")
else:
    print("Speech generation failed")

For troubleshooting common issues, refer to our Troubleshooting Guide.