What is audio? Discussing sample rate and bit depth

In this post we will talk about some of the first things one should know before he starts making music on a computer: what is audio? We will try to present the fundamentals of audio, without delving into much detail, so that we don’t cause too much confusion.
Sound is nothing more than a wave transmitted through the air. There are two ways to capture this and reproduce it: the analog way and the digital way.
Analog encoding works by using a property as an "analog" of the properties of sound. We can use for example the physical texture of a phonograph or the magnetic fluctuations of a tape. Analog recording are continuous.
Digital recording are seperated in discrete steps. These steps are codified by other measures when we capture the waveform, and then they are decoded when we play it back.
In digital audio we have two properties: Sample rate and Bit Depth.
Sample rate determines the highest frequency that can be reproduced.
Bit depth determines dynamic range.

A digitally sampled waveform
The sample rate we use for CDs is 44100 Hz (see Herz on wikipedia). What this means is that when we record something, like for example an electric guitar plugged directly into our audio interface, or an acoustic guitar through a microphone, we take 44100 "pieces" of the sound per second. Sampling is not a notion that is limited to audio. Sampling also occurs, for example, in statistics, and digital audio theory has a lot to do with statistics
. When a statistician chooses from a given population of 10.000 a sample of 1.000 to give them a questionnaire, he is trying to make viable claims for the whole population (of 10.000 people) from these 1.000.
The same happens with audio capturing. Sound is continuous. We can’t seperate it in discrete steps. However, when we sample a sound source, we take discrete steps, that we use to recreate the sound we heard.

Digital sampling occurs in discrete, quantized, steps as shown above
The human ear can hear sounds ranging from 20 hz to 20.000Hz. According to the Nyquist-Shannon sampling theorem, in order to reproduce a waveform, we need at least twice the sampling rate of the highest frequency of the waveform. This means, that since we need to have frequencies up to 20.000 that is the threshold of hearing, then, we need at least a sampling rate of 40.000 Hz.
However, there is an issue called aliasing, which causes frequencies that exceed the Nyquist frequency to appear as "aliases" of the original sound, causing inharmonic digital distortion. The subject of aliasing is something that will be covered in another article and is not so important for now, but I just mention it to get to my next point.
CDs use 44100 and not 40.000 sampling rate. The reason for this, is that in order to avoid aliasing, we use filters that attenuate the higher frequencies, above the threshold of human hearing. Due to the nature of filters, it is impossible to create a filter with a slope of 90 degrees (like a fall) that starts at 40.000 and stops at 40.001. Instead, filters are like slopes. The sampling rate of 44.100 was also used because of technical limitations of the time that the CD medium was first created.

A typical low-pass filter with a typical slope
So, let’s get now to bit depth.
Bit depth determines dynamic range. This means three things. First, we can capture a higher range of harmonics, or other sounds, coming from the sound source, that would, otherwise, be too quiet to capture. Secondly, our signal becomes louder. 1 bit approximates to 6 extra db of dynamic range. Thirdly, all things in nature (audio included) have a certani degree of noise. Higher dynamic range means that the noise floor, can be much lower than the sound source. For example, if you have a dynamic range of 6 db, then, if you capture a guitar but you have a noise floor of 3 db, you lose much of its clarity. However, if you have 96 db of dynamic range, it becomes much easier to get the guitar much louder than the noise floor.
A better example to demonstrate this is through digital pictures.

In the picture tou see the immense difference between, 1 and 24 bits. As you add more bits, the picture gets clearer, but only in the 24-bit case, where we have 16 million colors we have a clear representation of the picture. The same thing goes with audio.
So, in this post we explained the two most fundamental properties of digital audio: sample rate and bit depth. To recapitulate what we just said, sample rate determines the highest frequency that can be reproduced, while bit depth determined dynamic range which translates into clarity and loudness.
CD format uses 44100 sample rate, which can surpasses the human threshold of hearing and a bit depth of 16 bits that provides 96 db of dynamic range that translates to 96 db of dynamic range.
You may have found some notions you didn’t understand, such as aliasing, loudness or decibel. We will cover these on next posts so stay tuned!
Further Reading:
May 30th, 2009 at 10:53 am
[...] digitally." That’s something of a simplified version, but it gets the idea across. A slightly more nuanced explanation here makes a good analogy between the bitrate of digital sound and the bitrate/color depth of digital [...]
September 30th, 2009 at 11:30 pm
Good and clear document.
Need more information on the different audio parameters
January 7th, 2010 at 6:14 am
Fab, clear, concise explainations