Speech-to-Speech
OverviewCopied!
Speech-to-Speech (STS) allows you to convert existing audio of someone speaking into speech using a Replica voice. This is a three-step process:
1. Create a speech job
2. Upload your source audio
3. Start the speech generation
PrerequisitesCopied!
Before starting, you'll need:
-
A valid API key
-
A Performer Style ID (see Voice Selection)
-
Source audio of someone speaking (supported formats: WAV, MP3, OGG)
Step 1: Create Speech JobCopied!
First, you need to obtain an upload URL. This is done through a POST request to create a speech job:
import requests
url = "https://api.replicastudios.com/v2/speech/sts"
# Initial payload to get upload URL
payload = {
"performer_style_id": "07e62901-72c4-46e5-b009-aa0938d749df", # Replace with your chosen Performer Style ID
"model_chain": "vox_1_0",
}
headers = {
"Content-Type": "application/json",
"X-Api-Key": "..." # Replace with your API key
}
# Get the upload URL
response = requests.post(url, json=payload, headers=headers)
upload_data = response.json()
s3_data = upload_data["additional_fields"]["upload_url"]
job_id = upload_data["uuid"]
Step 2: Upload Source AudioCopied!
Now upload your audio file to the provided URL:
# Upload the audio file
with open("source_audio.wav", "rb") as audio_file:
upload_response = requests.post(
s3_data['url'],
data=s3_data['fields'],
files={'file': audio_file}
)
if upload_response.status_code != 200:
print("Upload failed:", upload_response.text)
Step 3: Start Speech GenerationCopied!
Once your audio is uploaded, you can start the speech generation:
# Start the speech generation
start_url = f"https://api.replicastudios.com/v2/speech/sts/{job_id}/start"
start_response = requests.post(start_url, headers=headers)
if start_response.status_code == 200:
result = start_response.json()
print("Job Status:", result["state"]) # IN_PROGRESS
Additional ParametersCopied!
You can customize the speech generation by including additional parameters in your initial payload:
{
"performer_style_id": "07e62901-72c4-46e5-b009-aa0938d749df",
"model_chain": "vox_1_0",
"global_pace": 1.2,
"global_pitch": 0.5,
"global_volume": 0.5
}
For more details on these parameters, see Global Controls.
Best PracticesCopied!
1. Audio Quality
-
Use clear, high-quality recordings
-
Minimize background noise
2. Language Support
-
The
language_code
parameter is more for project organization in STS. -
The language of the source audio will be maintained in the conversion.
LimitationsCopied!
-
Audio is not directly limited by duration, but there is an upload limit of 50MB per request.
-
Supported input formats: WAV, MP3, OGG
-
Audio should contain clear speech without background music or noise
Complete ExampleCopied!
Here's a complete example including error handling:
import requests
import time
def generate_speech_from_audio(audio_file_path, performer_style_id, api_key):
base_url = "https://api.replicastudios.com/v2"
headers = {
"Content-Type": "application/json",
"X-Api-Key": api_key
}
# Step 1: Create the speech job / get upload URL
try:
create_response = requests.post(
f"{base_url}/speech/sts",
headers=headers,
json={
"performer_style_id": performer_style_id,
"model_chain": "vox_1_0"
},
)
create_response.raise_for_status()
job_data = create_response.json()
s3_data = job_data["additional_fields"]["upload_url"]
job_id = job_data["uuid"]
# Step 2: Upload audio file
with open(audio_file_path, "rb") as audio_file:
upload_response = requests.post(
s3_data['url'],
data=s3_data['fields'],
files={'file': audio_file}
)
upload_response.raise_for_status()
# Step 3: Start generation
start_response = requests.post(
f"{base_url}/speech/sts/{job_id}/start",
headers=headers,
json={
"extensions": ["wav"]
}
)
start_response.raise_for_status()
result = start_response.json()
print(result)
if result["state"] == "ERROR":
print(f"Error during speech generation: {result['additional_fields']['error']}")
return None
# Poll for the completion of the job
while result["state"] == "IN_PROGRESS":
time.sleep(0.1)
result = requests.get(f"{base_url}/speech/{job_id}", headers=headers).json()
return result["url"]
except requests.exceptions.RequestException as e:
print(f"Error during speech generation: {e}")
return None
# Usage
api_key = "..."
performer_style_id = "07e62901-72c4-46e5-b009-aa0938d749df"
audio_file = "source_audio.wav"
result_url = generate_speech_from_audio(audio_file, performer_style_id, api_key)
if result_url:
print(f"Generated audio available at: {result_url}")
else:
print("Speech generation failed")
For troubleshooting common issues, refer to our Troubleshooting Guide.