Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length

Advanced search

1380415 Posts in 65818 Topics- by 58212 Members - Latest Member: User_Touche

August 09, 2020, 12:18:10 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperAudioHow to get consecutive voice clips to sound natural
Pages: [1]
Author Topic: How to get consecutive voice clips to sound natural  (Read 703 times)
Foolish Mortals
Level 0

View Profile WWW
« on: September 26, 2019, 11:35:58 AM »

In the game I'm working on I have to play several short consecutive voice-clips to form a complete sentence. Example (each <> bracket is a different voice clip):

"<Bob here,> <we're at>  <some town> <and are on our way to> <some city>."

Stitching together different voice-clips like this makes it sound stilted and disconnected. This is because there are unnatural pauses when switching clips, and the pitch and tone of the speaker changes.

My current efforts include two methods for removing the unnatural pauses:

1. starting the next clip early if a silence is detected at the end of the preceding clip
2. skipping the first few milliseconds of the new clip up to the first detected 'sound'.

These work OK at removing the unnatural pausimh, but detecting what 'silence' is is difficult, especially when dealing with multiple voice-actors and microphones.

How could I make stitching together voice-clips sound more natural? Any advice would be appreciated. This has to be done in real-time inside the game (I'm using Unity if that matters), and can't be pre-processed or done ahead of time.

Level 3

Iron Synth Chef & Voltage Architect

View Profile WWW
« Reply #1 on: September 30, 2019, 04:40:25 PM »

Interesting question...
Are you stuck with the current audio recordings?
I could imagine if the lines are recorded/spoken in a certain way, with deliberate pauses etc it could sound natural but unfortunately I don't have any advice beyond that.

Richard Kain
Level 10

View Profile WWW
« Reply #2 on: October 15, 2019, 10:22:41 AM »

The tone isn't something that you can really edit extensively. But the pitch and the timing are both manageable. Also, the delay you are experiencing when switching samples might be due to decompression. For speech that needs to avoid delays it is common to store those files as WAV files, with as little compression as possible. That is the easiest way to avoid delays when loading an audio file up for playback.

If you record your audio samples with a relatively normal pitch, you can use an "average" pitch value to apply to those samples during playback. Most modern audio systems allow you to adjust the pitch of audio samples at run-time. If there is too much variance between the pitch of each file, you could also apply a per-sample pitch adjustment. This would require a bit more setup for each instance, but should improve your results.
Level 1

View Profile WWW
« Reply #3 on: April 06, 2020, 06:58:02 AM »


I gather you already have everything recorded? I only ask as the best way to deal with things like this is at the source, working with the voice actor to make natural phrases that can easily be switched around, it's not uncommon in game VO.
For instance, you could smooth the gap between two files by incorporating a connective 'err...' or '*cough*'. Not too over-the-top or too often, or it'll sound unnatural.
Another common way to get around this is by using DSP processing or other sound effects to break up the dialog. Very easy if the VO is delivered through a radio, just use static.

Ultimately human ears are very good at working out if speech sounds natural or not. But not perfect. I've done lots of work as a dialog editor, and it's staggering how much splicing and pasting you can get away with. But it does have to be on a case-by-case basis. No two sentences have the same rhythm, no two speakers have the same intonation, even if it's the same person delivering the same line.

What you could do is put the phrase together in a DAW like Reaper or an editing tool like Audition and craft the most natural sounding clip of it. Then re-export the separate files to be played back in-engine. Personally I find a spectrogram to be far superior to a waveform when editing speech.

Hope that helps a bit
Level 0

View Profile
« Reply #4 on: May 13, 2020, 10:11:18 AM »

If not already mentioned try to silence any breath or inhalation souds when recording before saving out the final clips of voice audio.
Pages: [1]
Jump to:  

Theme orange-lt created by panic