Project 10: About Autotune

From UBC Wiki

Auto-Tune

Auto-Tune (commonly autotune) is an audio-processing tool used by musicians and audio technicians to alter the pitch of recordings for corrective or otherwise artistic purposes. It was invented in 1997 by Antares, making an immediate impact on the music industry, but truly exploded in popularity two years later, with artists like T-Pain and Kanye West popularizing the distorted-/robotic-like vocal effect that the technology could be used to achieve. Since then, it has become a staple in the music industry, being one of the most frequently used audio-processing software ever.[1] Auto-Tune, while perhaps the most recognizable pitch correction software, is just one of many such techniques.

Background

Prior to autotune's invention, pitch correction of audio was a tedious, complicated process that required an analogue approach.[2] Auto-tune was invented in in 1997 by Andy Hildebrand, the original CEO of Antares, who trademarked and own the technology.[3] When it was first invented, it was revolutionary because it provided a solution to the most time consuming issue that came with pitch correction: the inability to do it in real time. Prior to this technology there was no simple way to "bypass" the the relationship between frequency and wavelength - in other words, there was no convenient way to adjust the pitch of a sound without changing the duration of that sound.

A geophysicist turned software engineer, Hildebrand used the concept of autocorrelation to devise an algorithm that his technology used to identify, adjust, and "correct" pitch.[4] An audio processor used this algorithm to alter the frequency of some kind of sustained note (more broadly, a segment of wavelengths of sound that exhibit Simple Harmonic Motion), while artificially lengthening or shortening the wavelength affected by this shift in frequency. This allowed engineers to fix botched notes that were sung slightly sharp or flat, or notes that "warbled" in-and-out of tune as the singer tried to hold them. It quickly became invaluable to singers and their technicians as it removed the need to re-record countless takes of the same verse and allowed the artists to put more emphasis on the emotive parts of their performances.[5]

History & Function

In principle, all audio processing technologies attempt to do the same thing: adjust the pitch of a musical tone, without changing the overall "sound" of the note. However, they are all faced with a central problem:

Pitch correction applied to a wavelength.

In order for something to be considered a musical note, it needs to be a pure tone – a sound-wave that repeats periodically at a certain frequency. By design, a note’s pitch corresponds to the frequency of the sound wave – a higher frequency means a higher pitch. It also means a shorter wavelength; wavelengths are measured in space (cm), but (for the purposes of hearing) they are processed in terms of time, because of how sound is processed by the ear. So altering the pitch of some hypothetical audio clip would mean shortening the wavelengths of that recording, thereby shortening the length of the whole clip. Autotune attempts to correct this problem by adjusting the frequency of a particular sound without changing the overall runtime.     

Consider for example this extreme scenario: an artist records some vocals in a baritone register and decides that they want same thing except in a soprano register. Baritone voices fall in the A2-A4 range, whereas sopranos sing in the C4-C6 range. If the baritone records and holds a note at the lowest end of the register for 5s, it would measure in at 110 Hz (A2). If the target note in mind is a C6, the frequency would have to go from 110 Hz to 1065.5 Hz. However, the duration of the note would decrease considerably from the original 5s runtime. Since the sound is traveling through air, there is no way to change the wave speed, so that remains constant. With this in mind, and making use of the relationship between wave speed, frequency, and wavelength, one determines that the property of the sound that changes is the wavelength – specifically, an A2 note has a λ of 313.64 cm, and when transforming the sound to a C6, that λ shortens all the way to 32.97 cm. This is where autotune comes in. In order to preserve the original length of the clips being modified, autotune technology creates new, synthetic wavelengths that “fill in” for the time that gets lost when raising the frequency/shortening the wavelength.

However, it is not so simple. Like the vast majority of musical sounds produced by instruments both natural and synthesized, human voices do not produce pure tones. Rather, the note is comprised of a series of frequencies that contains the fundamental and number of intrinsically linked harmonics. This is what gives the sound its timbre - arguably the most important feature that makes the note "sound" as it does, and that's not even taking into account the natural imperfections that can harm a vocal performance.[6] Audio processing technologies have to ensure that what they do to a sound does not affect its timbre either, which means somehow correcting every tone that makes up the original sound, making a new sound with a different pitch (frequency) that is still equivalent in "shape" - timbre, length, and thus overall "sound". Of course, that would mean somehow adjusting frequencies in the range of the tens of thousands, when considering the full range of harmonics. This is what held pitch correction technology back for so long. Even though it was theoretically possible - one could use a technology to analyze a segment of sound, deconstruct it through the use of the Fourier Theorem, and after individually modifying each pitch, adjusting it proportional to the fundamental, while also adding the synthetic wavelengths to maintain the time, and then finally reconstruct it somehow to get a new pitch-perfect sound, but doing so would be a fool's errand, and a massive waste of time. Early audio engineers recognized this, and the methods that were invented to capture the "perfect recording" were numerous, but it was the vocoder that pioneered the technology that Auto-Tune was built off of. The vocoder, a device used every day in devices such as your phones, is a device that synthesizes sound. To slightly oversimplify things, a vocoder can be tuned to a specific set of frequencies (or a range of them), and when a sound is played to it, it will capture the frequencies of the recording in the ranges that the vocoder was set to. By doing this, it can filter out unwanted frequencies to produce a sound(s) of a fixed tone, by transcribing the audio and creating a new sound, one with a heavily distorted, robotic sound. Auto-Tune's technology uses this concept of tuning an audio processor to respond to desire frequencies, but it is able to do so in real time. By using an algorithm, it is able to isolate and "correct" every wavelength of a sound - that is, every moment - and tune them all to a new frequency without losing the original idiosyncrasies that give each wavelength its shape; the sound its timbre. As an early tester described it:

"Auto-Tune is the first plug-in that has the ability to correct intonation problems for out-of-tune vocals or instruments in real time. Although Auto-Tune does this by varying pitch up or down, it is not a pitch-shifting plug-in in the traditional sense. In fact, Auto-Tune changes the pitch of tracks in small increments, usually no more than 30-80 cents. The way it works is simple -- at least on the surface. Pitch is measured by analyzing repeating cycles of audio. This measured pitch is continuously compared to a user-controlled scale. The output pitch is adjusted to be more in tune with the scale note."[7]

From its inception, it was intended to work just as this user described: for minor changes. As the same tester surmised in his review of it, Auto-Tune "[is] A marvelous tool for Pro Tools III or Cubase VST users who are serious about correcting intonation problems in vocals or other solo instruments." This review was written in 1997,[7] and for two years it was used as "intended". However, the vocal effect that has become synonymous with the name of the technology was used to massive success in Cher's 1999 "Believe", two years later. The artificially enhanced sound that it gave a person's voice was a "distinctive sonic effect the program create[d] when dialed away from 'natural'- sounding parameters toward an impossibly fast, pitch-jumping effect. On this setting, the maximally constrained voice takes no time to travel from pitch to pitch; there are no messy 'human' transitions."[6] In this sense, then, the specific effect that has become known as "autotune" within the mass consciousness is a misnomer, because it is simply the result of an exaggerated use of the technology, not the intended design.


When Cher's "Believe" reached no.1 in 1999, it sparked a revolution that forced people to question and re-evaluate the relation between between the authenticity and emotional quality of a recording. Twenty years later and counting, there's no denying the impact it's made on the industry.

References

  1. Reynolds, Simon (September 17, 2018). "How Auto-Tune Revolutionized the Sound of Popular Music".
  2. Hughes, Diane (May/Jun 2015). "Technological pitch correction: controversy, contexts, and considerations". Journal of Singing. 71–5: 587–594. Check date values in: |date= (help)
  3. Since then, they have continued to be the industry leader and develop the technology: [1]
  4. Auto-Tune Pro User Guide. Antares Audio Technologies.
  5. Katz, Mark (2010). Capturing Sound: How Technology Has Changed Music. University of California Press. p. 51.
  6. 6.0 6.1 Provenzano, Catherine (2018). Auto-Tune, Labor, and the Pop-Music Voice. Oxford Scholarship Online. p. 6. ISBN 9780199985227.
  7. 7.0 7.1 Green, Adam (July 1997). "Keyboard Reports: AnTares Auto-Tune". Keyboard Magazine. Retrieved Mar 16, 2021.