Sound bite: Despite Pono's promise, experts pan HD audio
By raising $4.3 million on Kickstarter, Neil Young's startup shows an appetite for better sound quality. The only hitch: experts say there's little point going beyond CD quality.
Pono Music's roaring success on Kickstarter, raising $4.3 million so far, shows that thousands of people believe better audio quality is worth paying for.
The company -- backed by star musician Neil Young and selling a $400 digital audio player along with accompanying music -- promises people will hear a difference between Pono Music and ordinary music that's "surprising and dramatic." The company's promise is based in part on music files that can contain more data than not only conventional MP3 files, but also compact discs.
There's no doubt that highly compressed music files, played over tinny laptop speakers or cheap earbuds, leave a lot of room for improvement. But outdoing CD quality? That's a harder sell.
Established players in the music business have tried for years with technology like DVD-A (DVD-Audio) and SACD (Super Audio Compact Disc). But even overlooking their general lack of commercial success, experts say there's not much point. In other words, yes, the CD audio format that Philips and Sony introduced in 1982 really is good enough.
Just as some skeptics think 4K TVs is wasted on human eyes, which mostly can't perceive an image quality improvement over mainstream HD 1080p under normal viewing conditions, others think CD audio technology that's now more than three decades old is actually very well matched to human hearing abilities. For playback, they're fine with two key aspects of CD audio encoding: its 16-bit dynamic range, which means audio is measured with a precision of 65,536 levels, and its 44.1kHz "sampling" frequency that means those levels are measured 44,100 times each second.
"From a scientific point of view, there's no need to go beyond," said Bernhard Grill, leader of Fraunhofer Institute's audio and multimedia division and one of the creators of the MP3 and AAC audio compression formats. "It's always nice to have higher numbers on the box, and 24 bits sounds better than 16 bits. But practically, I think people should much more worry about speakers and room acoustics."
Pono's recordings will range from CD-quality 16-bit/44.1kHz to 24-bit/192kHz "ultra-high resolution." To house the data, Pono follows in the footsteps of the digital audiophile industry by sticking with a file format called FLAC (Free Lossless Audio Codec) that compresses files for smaller sizes but not to the degree of alternatives including MP3 and AAC that throw away some of the original data. The company also is betting its success on a player with better electronics and a catalog of HD music designed to let listeners hear music true to its original sound in the recording studio.
Pono is joining earlier efforts to improve audio quality in the "HD audio" movement also called high-resolution or high-definition audio. The idea behind the movement is that more data allows a higher dynamic range -- the span between the loudest and quietest passages of music -- and comes closer to the detail of live, original sound. (Pono executives couldn't be reached for comment for this story.)
The High Fidelity Pure Audio consortium is a newer effort to popularize HD audio with a version of Blu-ray discs, but Pono's digital player skips discs altogether in favor of downloads. That means it's adding to existing high-resolution digital download efforts such as HDtracks, iTrax, eClassical, and HighResAudio. That's closer to Sony's high-resolution audio effort, which includes products such as a $999 home audio player that stores data on a 500GB hard drive.
There can be a real difference with HD audio recordings, because the music industry has an incentive to put more effort into recording, editing, and creating the master version of the final product. But because of limits in source material, human hearing, and playback conditions, it's not safe to bet a downloaded HD audio track is guaranteed to sound better.
A prominent part of the case against high-resolution audio is a 2007 study by E. Brad Meyer and David Moran of the Boston Audio Society that concluded listeners couldn't tell the difference between SACD and DVD-A music on the one hand and CD-quality versions of the same recordings on the other.
In that experiment's 554 tests, listeners correctly identified when a SACD or DVD-A recording compared to a CD only 49.8 percent of the time -- in other words, they didn't do better than randomly guessing. To ensure that higher-quality recordings for the audiophile market weren't a factor, Moran and Meyer created CD versions from the higher-resolution originals.
"Our test results indicate that all of these recordings could be released on conventional CDs with no audible difference," they concluded. "They would not, however, find such a reliable conduit to the homes of those with the systems and listening habits to appreciate them."
Another high-profile non-believer is Christopher "Monty" Montgomery, an engineer who writes codec software for the Xiph.Org Foundation and who works for Firefox developer Mozilla. The most prominent part of his effort is a video arguing that CD quality sound is good enough.
Montgomery's video, illustrated with lucid demonstrations and backed by a blog post, persuasively debunks misconceptions such as the idea that encoding music digitally reduces it to a series of jagged stairsteps instead of the original smooth curves.
Montgomery and his allies have yet to persuade everyone on two points, including the idea that 16-bit resolution and 44.1kHz is sufficient.
"Monty is wrong. Twenty-four bits does matter -- but for a very small sliver of the music business," said Mark Waldrep, an audio engineer who's founder and chief executive of AIX Records and iTrax.com and who focuses on high-resolution audio -- including efforts of his own to debunk some claims. And of the sampling frequency he said, "I'd rather err on having those frequencies in the signal rather than assuming we don't need them."
But Grill thinks any purported benefit would be lost in the real world. "The limiting factor is the loudspeaker, the room acoustics, and the human ear," he said.
One thing everyone agrees on, though, is that the debate is mostly irrelevant for the mass market for music. There, the industry actually is moving away from HD audio: instead of concentrating on maximum dynamic range by accommodating music that spans both quiet and loud passages, "loudness wars" have pushed audio engineers to actually reduce music's dynamic range.
"You amplify the low-volume components so everything is loud," Grill said. By making otherwise quiet parts of music louder, the average loudness increases without violating broadcasting power limits so radio stations can outdo quieter rivals.
Loudness escalation also means music producers can better cope with the practical limits of listening to music in noisy environments like cars, airplanes, or public transit. But it means that even decades-old music technology has more than enough dynamic range. (High-end magnetic tape could handle 13-bit dynamic range, and the best tape cassettes a 9-bit range, Montgomery said.)
"We have to be responsible in making recordings that use the dynamic range and not have these loudness wars," said Sean Olive, director of acoustic research at Harman International and president of the Society of Audio Engineers.
Offsetting this is a good trend that's come along with the HD movement: a new focus on high-quality recording, editing, and mastering.
"We're hoping this HD audio music debate raises awareness of sound quality," Olive said. "With HD recordings, [audio engineers] generally take more care in how they make recordings and how they mix them."
The stairstep fallacy
One thing that triggered Montgomery's video was a response to a blog post that argued 24-bit, 192kHz recordings "make no sense." Though there's disagreement among experts on that point, Montgomery was surprised to find some disagreement on something for experts isn't even an issue: the stairstep.
When digital music is encoded, the first step is to record the level of the music's audio signal. CD's 16-bit design means sound data is described by 65,536 (2 to the 16th power) different levels. (And as mentioned earlier, the level is sampled at a frequency of 44.1 kilohertz.)
The original audio signal is a smooth curve whose amplitude continuously rises and falls. The digital signal, though, is often shown as a jagged rising and falling stairstep, because each sample is at a discrete point that corresponds to each flat step. The trouble is that many think that jagged stairstep visualization is actually how digital music is played back -- that it only looks like a smooth curve if you unfocus your eyes for a fuzzed-out view.
The stairstep misconception still is in evidence. "Digital noise still resides in the flat parts of each step," a guide at PCRecording.com says. Even loudspeaker manufacturer Bowers & Wilkins' 24-bit music explanation declares that a digital signal "is simply unable to trace the analogue waveform smoothly." Even Sony prominently displays a highly misleading stairstep diagram in its effort to show the advantages of HD audio.
In reality, though, digital music playback reconstructs the original smooth audio signal from each sample point. There's no stairstep involved, just a series of audio levels that encode a unique curve (unique as long as the range of audio pitches is limited to a certain range, a limit digital equipment obeys).
Digital audio constraints
There are real constraints with converting analog sounds into digital signals. One of them, quantization error, happens when an actual sound level at a particular moment is between the discrete levels that digital audio can record. Left alone, quantization error can produce noticeable problems with audio.
Happily, quantization error can be eliminated with a technique called dithering whose drawback is a small increase in noise.
"I am a fan of dither," said Jayant Data, associate vice president of audio research and development at THX, a company specializing in cinema sound and home theater. "The dither makes absolutely no difference to a full-amplitude sine wave [a component of music played loud]. You only hear it for [quiet] low-amplitude signals -- but that's the signal it actually helps. There's no free lunch."
Another constraint is that high-frequency sounds -- those with high pitches -- are limited by a phenomenon called the Nyquist-Shannon sampling theorem. Specifically, a sound can only be recorded by digital recording equipment whose sampling frequency is double the sound's pitch. That means CD audio, with a 44.1kHz sampling rate, is limited to recording sounds with a pitch of 22.05kHz.
But a 22kHz limit isn't so bad considering that the generally accepted range for human hearing is about 20Hz to 20kHz.
High-resolution audio fans generally aim for a sampling rate of 96kHz or 192kHz, though sometimes 48kHz is an option, too. A 192kHz sampling frequency can record a maximum pitch of 96kHz, and 96kHz can record up to 48kHz sounds.
That's needless, according to Dan Lavry of digital audio product maker Lavry Engineering (PDF). "Clearly it has nothing to do with more bandwidth: the instruments make next to no 96KHz sound, the microphones don't respond to it, the speakers don't produce it, and the ear cannot hear it," he said.
Again, Waldrep prefers to capture the extra octave, though. "If somebody says you can't hear that, I'm not concerned with that. But, as a format that's easy to deliver and cost effective, why not? Why would you roll that off just because somebody says I can't tell the difference?"
One reason is that playback of ultrasonic sounds can produce an "uncontrolled spray" of audible distortion problems lower down in the audible range, Montgomery said. Best not to record them in the first place than try to screen them out during playback.
The older people get, the more they lose their ability to hear the high frequencies, too -- a fact that amuses wags who point out that the people who've lived long enough to afford expensive audiophile equipment are also those worst equipped to actually hear any advantage.
"Once you're up to [age] 18 or 19, you're well within the capability of 44.1kHz," Grill said.
Jayant believes we still haven't read the final word on human hearing, a sense that's extremely expensive to carefully test. But even open-minded about the merits of high sampling rates only goes so far, he said: "I'm probably one of the few people who still listens to SACD and DVD-Audios. I have to say, I don't know if it makes a big difference."
Greater bit depth
When dealing with the amplitude of an audio signal -- its strength, varying from moment to moment -- bit depth is an important measurement. Using more bits essentially means that a signal with greater dynamic range can be captured, where dynamic range is the difference between silence and extremely loud.
CDs encode audio with a depth of 16 bits, a number that corresponds to 65,536 separate levels. High-resolution audio fans want more, though -- usually 20 or 24 bits.
Dynamic range corresponds to decibels when it comes to judging audio, and the CD's 16-bit range corresponds to a 96 decibels, or 96dB.
"Twenty-four bits is a number rarely achieved in practice. You typically get equipment only to 20 bits," said Laurie Fincham, senior vice president of audio research and development at THX. "It's a very large dynamic range and very close to human hearing,"
Grill said you'd need extraordinary circumstances to be able to use more than 96dB. Even in the quietest rooms, the human body's noise accounts for about 30 or 40 decibels, so putting CD's dynamic range on top of that spans nearly to 140dB. "That's getting close to sound pressure level a jet engine produces close up," Grill said. There are some loudspeakers that can produce sound at that level, but they've got speaker cones 2 meters across, he said, and that's just one of many practical constraints to music playback.
"Whatever you do, your loudspeakers and room acoustics will be the weakest link in the chain," Grill said. "All the other components are so good these days that if you have a decent amplifier costing a couple hundred dollars, and any decent CD player, you will be able to produce quality way beyond what speakers are able to reproduce."
Garbage in, garbage out
There's one domain that HD audio is accepted as a good idea: the recording studio. Using a high sampling rate and bit depth means more latitude in setting volume levels and less risk of noise that can accumulate when combining many recordings into a single piece.
"The professional audio market favors even the slightest improvements in quality that come with high definition audio," said Tony Cariddi, marketing director for Avid's Pro Audio tools. "Even if stereo masters get compressed for final consumer use, what you put in greatly influences what you get out."
The problem is taking the HD audio philosophy too far, as some with HD audio gear and music to sell do. "To the extent they're selling high-quality, well mastered recordings, they're not silly," Xiph's Montgomery said. "When they attribute the superiority of anything they're selling to the higher resolution, that is indeed silly."
But even if there's plenty of scoffing at Pono's means, those in the audio business appreciate the values it's trying to promote during a time of highly compressed streaming music and loudness wars.
"A CD, really well made and recorded, sounds fantastic. I don't think we're even living up to that standard right now," said Harman's Olive. "We're hoping this HD audio music debate raises awareness of sound quality."