Add text-to-speech

Text-to-speech is a form of assistive technology that converts text strings into speech sounds using an artificial voice. In addition to improving the accessibility of your experiences for players with vision, mobility, or cognitive disabilities, TTS allows you to generate speech dynamically so that you don't have to pre-record audio for all possible narrative scenarios.

Using the Gingerbread House - Start .rbxl file as a starting place and Gingerbread House - Text-to-Speech as a reference, this tutorial shows you how to add both basic and context-aware TTS audio to your experiences, including guidance on:

  • Triggering TTS for common gameplay scenarios that will never change, such as UI interactions and tutorials.
  • Configuring TTS so that it adapts to player actions, environmental status, or flexible objectives.

If at any point you become stuck in the process, you can use Gingerbread House - Text-to-Speech as a reference to compare your progress.

Audio Objects

To create TTS audio, it's important to understand the audio objects that you will be working with throughout this tutorial. There are five main types of audio objects for TTS:

  • The AudioTextToSpeech object converts text strings into speech sounds.
  • The AudioEmitter object is a virtual speaker that emits audio into the 3D environment.
  • The AudioListener object is a virtual microphone that picks up audio from the 3D environment.
  • The AudioDeviceOutput object is a physical hardware device within the real world, such as a speaker or headphones.
  • Wires carry audio streams from one object to another.

All of these audio objects work together to emit TTS sound in response to player actions. Let's take a look at how this works in practice for 3D audio using an example of a player wearing a headset while playing an experience with their laptop:

  • The AudioTextToSpeech loads and converts text into audio whenever a player touches a part near a non-playable character (NPC).
  • The AudioEmitter emits a stream of the TTS audio from the NPC into the 3D environment
  • A Wire carries the stream from the AudioTextToSpeech to the AudioEmitter so that the stream comes out of the NPC.
  • The character's child AudioListener object listens to that sound within the 3D environment and feeds it back to their headset.
  • The AudioDeviceOutput object carries the sound from the AudioListener to the player's physical speaker, or in this case, their headphones.

The following sections dive deeper and reference these objects as you learn how to play both basic and context-aware audio. As you review these objects with the upcoming techniques, you can more accurately predict how to capture and feed sound from the experience to the player.

Basic TTS

Basic TTS is the most common form of text-to-speech in which the artificial voice reads a text string regardless of player or environment context. This means that whenever the player triggers the TTS audio, the words and the way that the artificial voice reads the words remains consistent no matter the state of the player, their actions, or environmental status.

This form of TTS is useful in most gameplay scenarios, such as players interacting with UI menus, tutorials, or routine NPC interactions like vendor offerings or enemy barks. Roblox supplies the following types of voices that you can experiment with for any of these interactions:

VoiceIDVoice DescriptionAudio Example
1British male
2British female
3United States male #1
4United States female #1
5United States male #2
6United States female #2
7Australian male
8Australian female
9Retro voice #1
10Retro voice #2

To recreate the basic 3D TTS audio in the sample Gingerbread House - Text-to-Speech place file:

  1. Enable a default listener that's attached to your player character.
    1. In the Explorer window, select the SoundService.
    2. In the Properties window, set DefaultListenerLocation to Character. When you run the experience, the engine automatically:
      • Creates a AudioListener under each player character's Humanoid.RootPart so that you can hear sounds shift in your real-world speakers according to the position and scale of sound sources within the experience.
      • Creates an AudioDeviceOutput under SoundService.
  2. In the Explorer window, navigate to Workspace > DialogueVolume, then:
    1. Insert an AudioTextToSpeech object to create an audio speech generator for the volume around the snowman.
    2. Insert an AudioEmitter object to emit a positional stream from DialogueVolume.
    3. Insert a Wire object to carry the stream from the audio speech generator to the audio emitter.
  3. Select the AudioTextToSpeech object, then in the Properties window:
    1. Set Text to "Collect every single last gumpdrop to open my home!"
    2. Set VoiceId to 2 to set the artificial voice to emulate a British female.
    3. Set the Volume to 3 to play the audio at a high volume so you hear the TTS sound over other audio sources within the experience.
  4. Select the Wire, then in the Properties window:
    1. Set SourceInstance to your new AudioTextToSpeech to specify that you want the wire to carry audio from this specific audio speech generator.
    2. Set TargetInstance to your new AudioEmitter to specify that you want the wire to carry audio to this specific audio emitter within the volume.
  5. Back in the Explorer window, navigate to StarterPlayer > StarterCharacterScripts, then insert a LocalScript, rename it PlayBasicTTSAudioWhenInVolume, and paste the following code into the local script:

local Workspace = game:GetService("Workspace")
local Players = game:GetService("Players")
local humanoid = script.Parent:WaitForChild("Humanoid")
local volumeDetector = Workspace.DialogueVolume
local trigger = humanoid:WaitForChild("Animator")
local debounce = false
local localPlayer = Players.LocalPlayer
volumeDetector.Touched:Connect(function(hit)
if debounce then
return
end
local hitCharacter = hit:FindFirstAncestorWhichIsA("Model")
local hitPlayer = Players:GetPlayerFromCharacter(hitCharacter)
if hitPlayer ~= localPlayer then
return
end
debounce = true
local audioTextToSpeech = Workspace.DialogueVolume.AudioTextToSpeech
audioTextToSpeech:Play()
audioTextToSpeech.Ended:Wait()
debounce = false
end)

This script starts by getting the Workspace and Players services so it can reference their children and functionality. For each player character that loads or respawns back into the experience, the script waits for:

  • The character's Humanoid and Animator objects.
  • The volume object in the workspace named DialogueVolume.

When anything collides with the volume, the Touched event handler function gets the first ancestor that's a Model, which should be the character if the BasePart that collided with the volume is a descendant of a character model. If it is, the function then:

  • Sets debounce to true.
  • Plays and waits for the TTS audio to end.
  • Sets debounce back to false.

Setting debounce from false to true to false again after the basic TTS audio finishes playing is a debounce pattern that prevents the audio from repeatedly triggering as players continuously collide with the volume. For more information on this debounce pattern, see Debounce - Detect collisions.

  1. Playtest the experience to hear the instructional character dialogue when your player character touches the volume around the snowman.

You can further experiment with this audio by modifying the Text, VoiceID, Pitch, and Speed properties to new values. The generated speech becomes entirely different without the need to record and upload a new audio file for each scenario.

Context-aware TTS

Context-aware TTS is a more advanced form of text-to-speech in which the artificial voice reads a text string in relation to the player, the state of their environment, or gameplay status. This means that whenever the player triggers the TTS audio, the words and the way the artificial voice reads the words adapts accordingly.

This form of TTS is useful for gameplay scenarios that are ever-changing, such as directional audio cues, objective status, or unique NPC interactions. Consequently, because context-aware TTS needs to transform to be accurate, you must configure gameplay elements so that you can track their status as players navigate through the environment and complete gameplay objectives.

While there are many ways to accomplish this task, the sample uses custom attributes to track the color and location of each gumdrop that the player must collect in order to enter the gingerbread house. For more information on attributes, see Properties and attributes.


Each gumdrop object has attributes that describe their color and location in the environment.
A close up view of the yellow gumdrop
A close up view of the green gumdrop
A close up view of the red gumdrop

To recreate the context-aware 3D TTS audio in the sample Gingerbread House - Text-to-Speech place file:

  1. In the Explorer window, navigate to Workspace > GumDrops.

  2. Configure three custom attributes to track the yellow gumdrop.

    1. Select the yellow gumdrop, then in the Properties window, navigate to the Attributes section, then click the plus icon. A pop-up dialog displays.
    2. In the Name field, input ColorDescription.
    3. In the Type dropdown menu, select string.
    4. Click the Save button.
    5. Set the new ColorDescription attribute to yellow.
    6. Using this process, create two more attributes using the following values.
    NameTypeValue
    HintOrdernumber0
    LocationDescriptionstringby the waterfall
  3. Configure three custom attributes to track the green gumdrop.

    1. In the Explorer window, select the green gumdrop.
    2. In the Properties window, create three attributes using the following values.
    NameTypeValue
    ColorDescriptionstringgreen
    HintOrdernumber1
    LocationDescriptionstringon the ledge
  4. Configure three custom attributes to track the red gumdrop.

    1. In the Explorer window, select the red gumdrop.
    2. In the Properties window, create three attributes using the following values.
    NameTypeValue
    ColorDescriptionstringred
    HintOrdernumber2
    LocationDescriptionstringbehind the fence
  5. In the Explorer window, navigate to Workspace > HintVolume, then:

    1. Insert an AudioTextToSpeech object to create an audio speech generator for the volume around the reindeer.
    2. Insert an AudioEmitter object to emit a positional stream from HintVolume.
    3. Insert a Wire object to carry the stream from the audio speech generator to the audio emitter.
  6. Select the Wire, then in the Properties window:

    1. Set SourceInstance to your new AudioTextToSpeech to specify that you want the wire to carry audio from this specific audio speech generator.
    2. Set TargetInstance to your new AudioEmitter to specify that you want the wire to carry audio to this specific audio emitter within the volume.
  7. Back in the Explorer window, navigate to StarterPlayer > StarterCharacterScripts, then insert a LocalScript, rename it PlayContextTTSAudioWhenInVolume, and paste the following code into the local script:


local Workspace = game:GetService("Workspace")
local Players = game:GetService("Players")
local humanoid = script.Parent:WaitForChild("Humanoid")
local volumeDetector = Workspace.HintVolume
local trigger = humanoid:WaitForChild("Animator")
local debounce = false
local localPlayer = Players.LocalPlayer
function getRemainingGumdrops(): {Part}
local gumdropsFolder = Workspace.Gumdrops
local gumdrops = gumdropsFolder:GetChildren()
local remainingGumdrops = {};
for _, gumdrop in gumdrops do
if gumdrop:GetAttribute("Active") then
remainingGumdrops[#remainingGumdrops + 1] = gumdrop
end
end
table.sort(remainingGumdrops, function(a, b)
return a:GetAttribute("HintOrder") < b:GetAttribute("HintOrder")
end)
return remainingGumdrops
end
function getReindeerHint(remainingGumdrops: {Part}): string
local remainingGumdrops = getRemainingGumdrops()
if (#remainingGumdrops == 0) then
return "There are no gumdrops left. Check inside the house."
end
local nextGumdrop = remainingGumdrops[1]
local colorDescription = nextGumdrop:GetAttribute("ColorDescription")
local locationDescription = nextGumdrop:GetAttribute("LocationDescription")
local message = #remainingGumdrops > 1
and "There are " .. #remainingGumdrops .. " gumdrops left. Look for the " .. colorDescription .. " one " .. locationDescription .. "."
or "There is one gumdrop left. It's " .. colorDescription .. " and it's " .. locationDescription .. "."
return message
end
volumeDetector.Touched:Connect(function(hit)
if debounce then
return
end
local hitCharacter = hit:FindFirstAncestorWhichIsA("Model")
local hitPlayer = Players:GetPlayerFromCharacter(hitCharacter)
if hitPlayer ~= localPlayer then
return
end
print("Player touched volume.")
debounce = true
local remainingGumdrops = getRemainingGumdrops()
local message = getReindeerHint(remainingGumdrops)
local audioTextToSpeech = Workspace.HintVolume.AudioTextToSpeech
audioTextToSpeech.Text = message
audioTextToSpeech:Play()
audioTextToSpeech.Ended:Wait()
debounce = false
end)

This script starts by getting the Workspace and Players services so it can reference their children and functionality. For each player character that loads or respawns back into the experience, the script waits for:

  • The character's Humanoid and Animator objects.
  • The volume object in the workspace named HintVolume.

The script's getRemainingGumdrops() function returns a list of gumdrop Part objects within the Gumdrops folder and filters out any part that the player has collected. The remaining gumdrops are sorted in a specific order according to each gumdrop's HintOrder attribute.

The script's getReindeerHint(remainingGumdrops) function takes that list from getRemainingGumdrops() and returns a string message that describes where the player can find any remaining gumdrops, including both the color and location description of the next gumdrop the player needs to collect. However, if the player has collected all gumdrops in the environment, the string message describes where to go next after they have collected all gumdrops in the environment.

When anything collides with the volume, the Touched event handler function gets the first ancestor that's a Model, which should be the character if the BasePart that collided with the volume is a descendant of a character model. If it is, the function then:

  • Sets debounce to true.
  • Gets the list of remaining gumdrops and the string message describing the next gumdrop the player needs to find.
  • Converts the string into audio using the AudioTextToSpeech object, plays the audio, and waits for the audio to end.
  • Sets debounce back to false.

Setting debounce from false to true to false again after the context-aware TTS audio finishes playing is a debounce pattern that prevents the audio from repeatedly triggering as players continuously collide with the volume. For more information on this debounce pattern, see Debounce - Detect collisions.

  1. Playtest the experiences to hear contextual hints when your player character touches the volume around the reindeer. The TTS audio changes according to the number and color of gumdrops the player character has already found in the environment.