PHYS341/2024/Project25

Audio File Resolution and Digital Distortion

First Draft:

VIDEO SCRIPT

Let's explore the different forms of distortion that take place when audio is encoded in low-resolution formats.

We'll take a look at how low resolution audio files sound distorted by listening to audio clips and analyzing the properties of the waveforms they produce. Specifically, we'll be taking a look at their frequency spectrum graphs and their time graphs. We'll also talk about why this digital distortion occurs.

Moreover, we'll be putting to the test the claim that higher sample rates must be better.

When encoding lossless audio files like Waveform Audio File Format (WAV) or Free Lossless Audio Codec (FLAC), we have two factors to consider:

- The first is the bit depth. The bit depth describes the amount of bits we can use to represent the amplitude of a sample at a particular point. A lower bit depth will limit us to a smaller range of values to describe the amplitude, whereas a larger bit depth will allow us to more accurately capture nuances by allowing for a larger range of values to assign to each amplitude. - The next factor to consider is sample rate. When representing audio digitally, we need a couple of the sample points we talked about to be able to reconstruct a sound wave to play back. Since each wave would need at least two sample points to mark the highest peak and the lowest trough, we usually tend to go for sample rates that are at least twice as high as 20 kHz, which is the highest frequency human ears are capable of hearing. So when we look at a number like 44.1 kHz, what this means is that we have 44,100 sample points to represent our audio waveform per second. A higher sample rate will allow us to accurately capture all information within the audible range while a lower sample rate will not be able to reproduce higher frequencies. Lower sample rates could also result in a form of digital distortion we'll be talking about called aliasing

Now what is digital distortion? Digital distortion refers to a unique process by which the digital limits of how we encode audio alter the reproduced soundwave. These limits are unique to the digital world so when talking about analog distortion, we won't typically see these effects being reproduced.

There are two main ways we can create digital distortion in our sounds.

Bit crushing and noise

For example, we can try to reduce the bit depth and see what this does to the quality of our sound

plays various audio samples*

As you can hear, our audio doesn't sound very pleasant when we reduce the bit depth. What's happening as we reduce the bit depth, is that we give our encoder a smaller range of values to work with when assigning values for the amplitude. What ends up happening is that some amplitudes are then rounded up to the nearest distinct value it can be represented as. The way an encoder makes this happen is by compensating the gaps with random noise. And as you probably heard, lower bit depths sound a lot more noisy.

Now if we take a look at the time graph of the waveforms and compare what we see, you'll notice that the bitcrushed audio has a lot more non-harmonic frequencies added to the waveform.

Aliasing

Now lets look at what happens when we reduce the sample rate.

plays various audio samples*

Now what appears to be obvious is how the frequency range reproduced is radically narrowing, simply because we don't have enough sample points to capture the higher frequencies.

However, lower sample rates also result in what is known as aliasing. Aliasing is caused by digital limitations to the way higher frequency information is recorded. As we said before, lower sample rate files can't capture higher frequencies that well. And when that happens, certain frequencies can be reconstructed that don't accurately represent what our audio is supposed to sound like. Let's see how this works by taking a sine wave sample and pitching it up and see what happens on this spectral graph

shows demonstration*

As you can see on this graph, lower frequecies start to reflect back into our audio and as we recreate that with a lower sample rate, these reflections increase. These frequencies aren't typically our resonant frequencies so they produce an unwanted distortion effect that might mess up our audio if taken to the extreme

plays audio sample*

So it seems that when we're trying to capture information even at 44.1 kHz, that's not always sufficient to prevent aliasing from occurring.

Thankfully, aliasing only occurs when audio is not recorded and sounds aren't synthesized at higher sample rates. A way audio engineers or producers overcome through oversampling. A digital synthesizer would produce audio information in a higher sample rate to give it enough room to exist without the need for aliasing to occur. After that happens, the synthesizer brings the sample rate down again so as not to take up unnecessary processing power or disk space.

Same can be done when recording audio into a mic. Recording at a sample rate of say 96 kHz will allow for a more accurate reproduction of sound. The audio can then be exported to a smaller file that's only 44.1 kHz without us losing any information as 44.1 kHz will still allow us to reproduce the entire audible frequency range.

Last section: The relationship between oversampling and quality

…

↑ Anet, Christophe (May 30, 2023). "Why is Dynamic Range so important?".
↑ Brown, Griffin (May 9, 2021). "Digital Audio Basics: Audio Sample Rate and Bit Depth".
↑ Mantione, Philip (March 14, 2021). "Oversampling in Digital Audio: What Is It and When Should You Use It?".

[1] Anet, Christophe (May 30, 2023). "Why is Dynamic Range so important?".

[2] Brown, Griffin (May 9, 2021). "Digital Audio Basics: Audio Sample Rate and Bit Depth".

[3] Mantione, Philip (March 14, 2021). "Oversampling in Digital Audio: What Is It and When Should You Use It?".

[1]

[2]

[3]