Course:Phys341 2020/Voice recording

The process of recording sound, particularly speech and music, can be done mechanically, electronically, magnetically or digitally. The various techniques invented so far could be divided into analog and digital.^[1]

Analog recording relies on a microphone diaphragm to measure the acoustic pressure changes, which are represented on a mechanical medium, such as grooves on a cylinder or disc. Magnetic tape recording uses an electrical current produced by the microphone diaphragm to magnetize distinct areas on a magnetically-coated plastic tape.

Digital recording uses sampling to represent the acoustic data captured by the microphone. Amplitude levels are sampled at distinct time intervals and this information is stored in binary sequences.

History

Early Era^[2]

The earliest system of recording music was patented in 1857 in Paris by Leon Scot de Martinville. His ‘Phonautograph’ traced a line whose shape could accurately represent acoustic deviations, but it did not reproduce sound. In 1877 Charles Cross, also a Parisian, wrote a breakthrough thesis on recording sound onto a cylinder using oscillation and a screw.

The following year, before Cross could develop his prototype, the American inventor Thomas Edison who had been working on sound recording independently, invented the ‘Phonograph,’ which recorded sound onto a tinfoil cylinder and could also play it back. Edison’s first recording in 1878 was of the nursery rhyme ‘Mary Had a Little Lamb.”

In 1887 this cylinder would be improved by researchers at Alexander Graham Bell’s Volta Laboratories, with their ‘Graphophone” machine which used wax-covered cardboard cylinders.

Emile Berliner made a huge leap forward in 1888 when his “Gramophone” replaced the cylinder with a flat disc upon which sound was recorded in grooves.

In 1902 the mass production of sound recordings was made possible by a new moulding techniques, and in 1904 Enrico Caruso became the world’s first music recording star with over a million pounds in royalties earless from the London-based ‘Gramophone Company’

In 1899 Vladimir Poulsen developed a method for magnetically recording sound onto wire, tape and disc. His voice recording from the Paris World Fair of 1900 captures the speech of Austrian Emperor Franz Joseph and is the oldest magnetic recording we have.

Electrical and Magnetic Era^[3]

In the 1920s, researchers at the Bell Laboratories who had previously worked on telephone technology published a paper which was the foundation of the Western Electrical recording system, (also known as Westrex). The frequency range and quality of audio captured by electrical recording outperformed any of the older acoustic techniques, and all recording studios switched to electronic microphones.

By the end of the 1920s, 16-inch lacquer-coated discs became the standard for broadcast and home recording. This would continue to be the dominant sound storage until the development of mass adoption of magnetic tape in the 1970s. Before WWII, these discs had a frequency response of up to 8500 cycles/second, however military needs of governments to detect higher frequencies in submarine warfare, prompted technology to increase the range to 14,000 Hz by the end of the war.

In 1931 Alan Blumeil developed the stereo sound system, still in use today, which recreates the spatial and directional characteristics of the sound by recording with multiple microphones.

Recording made further advances in Germany in the 1930s when Fritz Pfleumer used sheets of paper with iron oxide and lacquer as magnetic tape for recording. Magnetic tapes containing Bruckner’s 9th symphony recorded in Berlin in 1944 were recovered by the Allies post-war, and served as the basis for the invention of the American ‘Ampex’ model 200 and the British EMI BTR 1 machine, which revolutionized recording.

Radio star Bing Cosby began to pre-record the sound for his shows rather than performing it live, and was so impressed with the Ampex 200 magnetic tape recorder that he invested in the company. Soon, the company began producing machines which could record multiple audio tracks.^[4]

This allowed different elements of the recording to be literally cut and pasted, or their respective levels could be modified independently. Disney’s Fantasia being the first to use four-track sound in a movie. By the 1960, all major recording studios were using multi-track recording, with bands such as the Beatles and the Rolling Stones as the first ones to use it.

Although record discs had switched materials from wax to shellac to vinyl, by the 1950s all discs were being made from original recordings done on magnetic tape. In 1963 the cassette tape was invented by Phillips, and a decade later it would become the dominant form of music recording distribution.

Digital Era

In 1937 Englishman Alec Reeves developed a system of transmitting the "human voice in electronically coded sequences of digital pulses.” This is known as Pulse-Code Manipulation (PSM), and serves as the basis for modern digital recording.^[5]

In 1971, Nippon Columbia released the first commercial digital recording using its Denon brand system. In 1972 it launched the DN-024E with 8-channel, 13 bit resolution and a sample rate of 47.25 kHz. In 1977 US company 3M developed its own digital recorder featuring a 16-bt system and a 50KHz sampling rate. Respectively, two years later, European recording house Decca Records released its own version which settled on 48HkHz and 28-bit resolution.^[6]

IN 1982 Phillips and Sony released the digital Compact Disc (CD), and just as with cassettes, it took a decade for it to become the new dominant mass-selling recording medium worldwide.

Digital Recording

Digital Pulse-Code Manipulation^[7]

All modern sound recording is stored using digital Pulse-Code Manipulation (PCM) on computer files, CDs, and digital devices. PCM is use to digitally represent the recorded analog levels in binary form (ones and zeros) which can be processed by computers.

The continuous analog signal is sampled at regular intervals, known as bits, which are then quantized into a sequence representing the amplitude level at each respective time interval. For example, quantizing 16 amplitude levels requires 4 sample bits. The quantized data points are then encoded into binary values, which can be digitally stored or broadcasted without any loss in sound quality. Digital recording quality depends on the sample rate and the bit depth.

The sample rate is the number of samples taken in a time period, and is measured in Hertz. Human hearing capabilities range from 20 Hz to 20 kHz, therefore digital recordings are usually done around 48kHz because CDs files, and the most common digital audio files (eg. mp3s), are distributed at 44.1 kHz, however the most advanced recording equipment can record up to 192 kHz.^[8]

Bit depth is the number of quantized data points per sample. Standard CD Audio uses 16-bit recording while more high quality players supporting up to 24-bit. There is no need to increase bit depth beyond this due to noise distortion caused by quantization errors. The process of quantizing the input values of amplitude at each level requires mathematical rounding which can distorts the original sound, this is called the Signal-to-Noise Ratio (SNR). Besides SNR, the bit depth affects the dynamic range of the recording, which is the range between its maximum and minimum amplitude levels. 16-bit integer depth correlates with a 96 dB dynamic range, while 24-bit recording has a dynamic range of 144 dB.^[9]

Voice Recording Effects

The ability to edit audio data digitally means that there are a number of ways to edit and master sound, especially the human voice.

Equalization (EQ)

EQ editing involves changing the levels of the relative frequencies in the recording. Not only does this change the dynamic range of the recording but it can also change the sound wave shape, for example, from a sawtooth shape to smoother ratio of frequencies. The human vocal tract generates frequencies in the low-mids, so if low-mid levels are increased, it gives the voice a boomier effect, whereas if it is lowered, the voice recording is thinned out.^[10]

Autotune

Autotune is meant to alter, or 'correct', the pitch of an audio recording in real time. This is based on sending the frequency spectrum as a frequency series using Fast Fourier Transform (FFT), then correcting the pitch, and reinserting it into the original time series to be played by the digital-analog converter in the speaker which is playing the sound.^[11]

Compression

Compression is the process of limiting the dynamic range, of bringing closer the peaks and throughs of the amplitude levels. This gives a uniform effect to voice recording and ensures that the listener perceives a whisper or a yell with equal loudness on the recording.^[12]

Layering

Beyond the ability to record and play multiple audio tracks simultaneously, layering involves digitally copying the recording and then "stacking" the copied recording by playing it simultaneousy across multiple tracks. This is also known as the Chorus effect because it is especially used on the Chorus section, as it adds weigh to vocal effects and creates a sense of space. Doubled layers are usually mixed to lower amplitude levels than the main track and this allows them multiple tracks to sound like one single source to the listener rather than multiple voices.^[13]

Reverb

Reverb is an effect added to a voice recording which gives a sense of space, for example replicating the echo effect of a large room. This effect changes the timbre of the sound without changing the pitch.^[14]

↑ "Sound recording and reproduction". Wikipedia.
↑ "History of Recording". EMI Archive Trust.
↑ Roger Beardsley, Daniel Leech-Wilkinson. "A Brief History of Recording to ca. 1950". AHRC Research Centre for the History and Analysis of Recorded Music.
↑ "The History of Audio Recording". Los Sendros Studios.
↑ Roger Beardsley, Daniel Leech-Wilkinson. "A Brief History of Recording to ca. 1950". AHRC Research Centre for the History and Analysis of Recorded Music.
↑ Fine, Thomas. "The Dawn of Commercial Digital Recording" (PDF). Audio Engineering Society.
↑ "Pulse Code Modulation" (PDF). Sonoma State University.
↑ "Sampling (signal processing)". Wikipedia.
↑ "Audio bit depth". Wikipedia.
↑ Hawkins, Dashiel. "https://www.sonarworks.com/blog/learn/math-and-science-in-audio-mixing/". External link in |title= (help)
↑ Waltham, Chris. "Autotune". PHYS 341, University of British Columbia.
↑ Brown, Griffin (December 17th, 2018). "Mixing Vocals: What Makes a Professional Vocal Sound?". Izotope. Check date values in: |date= (help)
↑ Brown, Griffin (December, 17th, 2018). "Mixing Vocals: What Makes a Professional Vocal Sound?". Izotope. Check date values in: |date= (help)
↑ Cox, Trevor. "Why does music sound better with reverb?". The Sound Blog.

[1] "Sound recording and reproduction". Wikipedia.

[2] "History of Recording". EMI Archive Trust.

[3] Roger Beardsley, Daniel Leech-Wilkinson. "A Brief History of Recording to ca. 1950". AHRC Research Centre for the History and Analysis of Recorded Music.

[4] "The History of Audio Recording". Los Sendros Studios.

[5] Roger Beardsley, Daniel Leech-Wilkinson. "A Brief History of Recording to ca. 1950". AHRC Research Centre for the History and Analysis of Recorded Music.

[6] Fine, Thomas. "The Dawn of Commercial Digital Recording" (PDF). Audio Engineering Society.

[7] "Pulse Code Modulation" (PDF). Sonoma State University.

[8] "Sampling (signal processing)". Wikipedia.

[9] "Audio bit depth". Wikipedia.

[10] Hawkins, Dashiel. "https://www.sonarworks.com/blog/learn/math-and-science-in-audio-mixing/". External link in |title= (help)

[11] Waltham, Chris. "Autotune". PHYS 341, University of British Columbia.

[12] Brown, Griffin (December 17th, 2018). "Mixing Vocals: What Makes a Professional Vocal Sound?". Izotope. Check date values in: |date= (help)

[13] Brown, Griffin (December, 17th, 2018). "Mixing Vocals: What Makes a Professional Vocal Sound?". Izotope. Check date values in: |date= (help)

[14] Cox, Trevor. "Why does music sound better with reverb?". The Sound Blog.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]