Create expressive speech by combining a clear transcript, a voice choice, and performance direction.
Write the Transcript
Enter the exact words you want Gemini TTS to speak, including speaker names when creating dialogue.
Choose a Voice
Select a prebuilt voice that fits the desired mood, such as bright, upbeat, firm, breathy, smooth, or warm.
Direct the Performance
Use natural language to describe the speaker profile, scene, tone, accent, pacing, and emotional delivery.
Generate and Refine
Preview the generated audio, then adjust the transcript or direction until the spoken result matches your intent.
Write the Transcript
Enter the exact words you want Gemini TTS to speak, including speaker names when creating dialogue.
Choose a Voice
Select a prebuilt voice that fits the desired mood, such as bright, upbeat, firm, breathy, smooth, or warm.
Direct the Performance
Use natural language to describe the speaker profile, scene, tone, accent, pacing, and emotional delivery.
Generate and Refine
Preview the generated audio, then adjust the transcript or direction until the spoken result matches your intent.
Traditional text-to-speech often reads text clearly but gives creators limited control over the performance behind the words.

Gemini TTS is designed for exact text recitation with fine-grained natural language control over style, sound, and dialogue delivery.

Use Gemini TTS when you need accurate script reading plus richer control over how the speech should feel.