Course:Phys341 2020/Voice source-filter

From UBC Wiki

Perceiving Sounds

Sound is what we typically perceive as pressure fluctuations in the surrounding air (or other acoustic medium, such as water or helium gas). Sound properties depend to a certain extent on the acoustic medium, such as how quickly pressure fluctuations travel through the medium, and how resistant the medium is to such fluctuations. In other words, sound is produced when pressure fluctuations are impinged on the eardrum, which form neural impulses that sends a signal to the brain.

An acoustic waveform is a record of sound-producing pressure fluctuations over time.

Sound produced at one place sets up a sound wave that travels through the acoustic medium. The travelling pressure fluctuation propagates through any medium that is elastic enough to allow molecules to crowd together and move apart, in which they travel in rarefaction and compression waves.

Types of Sounds

Simple Periodic Waves

Also known as sine waves, they are in simple harmonic motion. Simple waves rarely occur in natural speech, but a vocal cord vibration is very close to sinusoidal. Complex sounds that occur in speech are combinations of sine waves that make complex waves.

Complex Periodic Waves (What We See in Natural Speech)

They are just like sine waves, but composed of at least 2 sine waves. The rate of the complex pattern repeating is called the fundamental frequency. Sometimes it is possible to figure out what the frequencies of the component waves are just by looking at the graph, but typically a Fast Fourier Transform is used to analyze the sine wave components that could have been used to produce that complex wave.

Aperiodic Waves (In Some Speech Sounds)

They do not have a repeating pattern and they possess a random waveform. An aperiodic sound can be a white noise, or it could be transients.

White noise is like radio static or like wind blowing through trees. It is very similar to a type of speech sound that we call fricatives.

Transients are sudden clanks and bursts that produce sudden pressure pressure fluctuation that is not repeated nor sustained over time (e.g. door slams, balloon pops, etc.)

Acoustic Filters

Acoustic filters are filters that blocks or passes components of sound of different frequencies.

Low-pass Filters

A low-pass filter blocks the high-frequency components of a wave and lets through the low-frequency components.

High-pass Filters

A high-pass filter blocks the low-frequency components of a wave and lets through the high-frequency components.

Band-pass Filters

Band-pass filters model some aspects of articulation and hearing. It has two cut off frequencies (one at the low end and one at the high end).

The Aerodynamics of Speech

Figure 1. A mid-sagittal view of the human head for illustrating parts that are involved in articulating speech. Processes involved in the production of speech is also depicted in the diagram.

The air is the vocal tract is submitted to relatively large-scale and relatively slow pressure changes. The air in the vocal tract that fills a certain space is the volume, that can be at any range of pressures, and they can be static, or moving with a particular velocity and volume-velocity.

In Figure 1 on the right, is a diagram of the human head depicting the parts that are responsible for the production of speech.

Vocal Tract Pressures

The maximum vocal tract pressure can be generated only when expiration is being attempted against resistance. Possible ranges are from -100 to +160 cm H2O.

In order to keep vocal folds vibrating, subglottal pressure (pressure below the glottis, initiated in the lungs) has to be at least 2 cm H2O greater than supraglottal pressure.

Pressure and Volume

According to Boyle's law, pressure and volume has an inverse relationship. The greater the volume, the lower the pressure and vice versa.

Phonetic pressures are usually stated in terms of excess over atmospheric pressure, which is modelled at around 1030 cm H2O.

Source-Filter Theory and Tube Models

Source-Filter Theory of Speech Production

The source-filter theory basically states that speech is produced by having a sound source generated at the glottis, and then the sound signal is filtered by our vocal tract into the speech sounds that we perceive in everyday life.

The frequency at which the glottis vibrates is called the fundamental frequency (f0), which relates to the perception of the pitch of one's voice.

Figure 2. This is the single tube model that is used for vowel production. The smaller tube at the back is from the glottis to the back of the oral cavity, with length labelled as lb. The bigger tube in the front is from the back of the mouth to the opening of the lips, with its length labelled as lf.

Single Tube Model

This tube model is primarily used for vowel production - indicating how resonant frequencies are present in humans as they would be in tubes - where the cross-sectional area of the vocal tract is assumed to be uniform (from the glottis to the lips).

According to Figure 2, we can see that a closed-open tube is used to model vowel production. Where it is closed at the glottis, air is pushed out from the lungs and it is filtered by the vocal tract and exits from the lips (open end).

The resonant frequencies can be calculated by using the formula Fn = (2n-1)c/4L, which are also called formants (that corresponds with the vowel quality of the vowel produced). These formants echo in the vocal tract, and the lowest 3 formants distinguish different vowels from each other.

Vocal folds vibration is the pulse of acoustic energy, which causes air to vibrate differently in different parts of the cavity. However, since the sound source and filter are independent of each other, the resonance (formant frequencies depend on the shape of the vocal tract) is independent of pitch (rate of vibration of vocal folds).

References

Johnson, K. (2012). Acoustic and auditory phonetics Wiley-Blackwell.