How digital sound works

A lot of us have preconceptions about the differences between analog and digital sound. Here are some of the facts.

Speaking of Zeroes and Ones ...

Among audiophiles, the analog vs. digital debate rages without end. I, like a lot of other musicians and music fans, have my own preferences--I own many more LPs than CDs, and have paid dearly to record some of my bands' music onto 2-inch tape instead of direct to hard drive. But included in those preferences are some preconceptions. You've heard it before: digital music sounds "colder" or "cleaner" or "more sterile" because it's delivering a stream of 0s and 1s, instead of a pure sound wave. Or something like that.

Audio professionals don't use terms like these, largely because they're subjective and imprecise, and sometimes inaccurate. Recently, one of these professionals presented the best explanation of analog vs. digital sound that I've ever heard. Here's a super-condensed version of an already simplified explanation.

A pure tone can be represented as a perfect sine wave. Each point on the sine wave has two values, height (volume) and its point in time. Looking at the overall wave, the distance between the top and bottom of that wave is the volume of that tone. The distance between one peak and the next is the frequency, or pitch, of that tone.

But pure tones don't occur in nature. Think of a roomful of people all saying their names at the exact same time. At any given moment, there are dozens of individual voices, all at different pitches and volumes.

Imagine a microphone recording this incident: at any given moment, the diaphragm of that microphone can only be at one position. So, at each moment, it's taking the average of all the frequencies and volumes of all the noises in the room and presenting a value for it. A stream of such values, over time, can be charted as on a graph. But instead of a perfect sine wave, it appears as a complicated squiggly line. There's only one value at any given moment, but over the course of a second or so, the changes in that average give you the overall character of the sound, which your ear interprets as a group of people saying their names. Or, in a more relevant example, a group of musicians playing instruments.

In recording, the question is: how do you capture that changing value? Tape uses tiny magnetizable bits of metal. Very roughly, the more of those pieces aligned in the same direction, the higher the value. (Remember, we've taken time out of the equation because this is one point in the tape, so all that's left to measure is the value itself.) Even if there's no sound, the metal pieces are still passing by the head, which creates tape noise. Other types of noise arise from irregularities in the surface of the tape (modulation and asperity), or from trying to force more signal onto the tape than it can handle (oversaturation). For various reasons--physics and years of conditioning--these types of noise can sound acceptable, or even desirable, to many people.

In digital recording, a software program takes samples of that sound. If you imagine the graph of the squiggly line, it's plotted against a theoretical top (the highest value the program can record) and bottom (the lowest possible value). The program asks a series of questions to determine the value of the sound at the moment it's taking the sample. The first question: is the value above or below the halfway point? Let's say it's above half, which the program records as a "1." The second question: if you cut that first half into half again, would the value be higher or lower? Above=1, below=0. And so on, until you've narrowed it enough to come up with a number that's close to the real value.

Bitrate represents the number of questions that the program asks at each moment. So, 16-bit, which is standard CD-quality sound, asks 16 questions, and each value is represented by a 16-digit binary number, something like 0100010101101111. Some professional audio programs record at 24-bit, which lets it ask eight more "above or below" questions per moment, making it 256 times more accurate.

Sample rate represents the number of samples the program takes in a second. There's a hard and fast rule (the Nyquist theorem) stating that the sample rate must be more than twice as high as the highest note you want to record. If a note that's higher than that frequency is allowed to enter the sound stream, it will create a false frequency, leading to harmonic distortion. While some forms of analog distortion sound good to some ears, digital distortion never does. So, the programs use a filter to make sure that these high notes aren't recorded.

Standard CD sound is 44.1KHz, or 44,100 samples per second, which means the highest note it can record is about 20kHz. Most professional audio programs are capable of recording at 48kHz, and 96kHz isn't unheard of.

After all these samples are taken, you end up with a very fine stairstep wave, which is then fed through various programs to smooth it out and a digital-to-analog converter to translate it back into an electrical signal form that audio equipment can then play through a speaker or headphone.

Analog lovers might argue that, although the human ear cannot hear tones above about 20KHz (or 22KHz, or even 48KHz), those tones nonetheless affect the character of the overall signal at any given moment. Therefore, eliminating those tones makes the tone "inaccurate." Digital lovers might respond that there are far more opportunities for distortion in analog forms of recording and playback.

 

Join the discussion

Conversation powered by Livefyre

Show Comments Hide Comments