This review page is supported in part by the sponsors whose ad banners are displayed below

And because clocks are so fundamental to the entire digital recording, processing and playback process from the moment sound captured by a microphone is turned into a number until it’s turned back to sound in the living room, a little clock error can create a lot of trouble. Jitter lurks in the bushes every step of the way. Furthermore, the steady progress of digital music towards higher sampling rates and increased resolution per sample also unfortunately means inherently increased sensitivity to jitter as will be explained presently.


Although the topic sounds somewhat complicated, perhaps esoteric, it really isn’t and can be understood fairly readily if the overall effects are broken down into a few main categories. Firstly let’s look at the effects of random jitter that is essentially natural and usually not correlated with the music. Secondly there’s deterministic  jitter that is systematic, repetitive or related to the signal in various ways. And thirdly there is the role of the computer interface and S/PDIF data link layer protocol. We’ll also explore how much this really matters in today’s crop of USB interfaces and DACs. Let’s walk through the fundamentals.


Jitter relating to slew rate and aperture time: Analogue electrical waveforms change over time as we all know. And for a given sample word resolution—for example 16 or 24 bits—one needs to initiate a conversion—either incoming analogue to digital or outgoing digital to analogue—within a rather small amount of time to recreate the signal accurately. This interval is the timing aperture during which the signal must be sampled. The amount of time available to do this varies with the slew rate of the signal and the desired resolution of the digitization or conversion. The aperture time is actually independent of the sampling frequency e.g. 44.1 kHz provided the Nyquist limit has been met for a given bandwidth.


Although we’ll be discussing jitter mainly from the perspective of converting analogue to digital, the reverse conversion back to digital is governed by similar principles with a few additional wrinkles. There’s jitter at both ends of the production chain and the errors in general accumulate to degrade the sonic results.


Let’s see how the aperture time is determined in the first place. A 16-bit conversion divides the incoming voltage, for example a nominal 1.0V sine wave, into 65.536 discrete levels. Each unit or least significant bit (LSB) in the resulting 16 bit data word represents about 15μV. An incoming waveform with a frequency of 1kHz slews through this 15μV or 1 LSB in just 25ns (nanoseconds). Thus the aperture time for 1kHz has to be less than 25ns or the signal will have moved on to some new value which is sampled instead, distorting the results. Conversely if the sampling is too early, same problem: the waveform is still not being captured at the right place. Aperture time shortens linearly with increasing frequency so 10kHz conversions need to be initiated within a mere 2.5ns window, ten times smaller than for 1kHz.


One may now begin to appreciate that even though the audio spectrum comprises relatively low frequencies compared to, say television broadcasts, when one examines the details of the signal and how it’s handled in practical system, small units of time are definitely a major part of the landscape. As discussed above, if the clock controlling the conversion randomly moves forwards or backwards more than the aperture time, the sample will be taken from the wrong part of the waveform. The primary result of this error is lower signal-to-noise ratio (SNR) across the entire spectrum.


In the case of 24-bit word lengths increasingly en vogue these days, the incoming waveforms are now in theory quantized into a staggering 16.777.216 discrete levels or numbers each representing a 256 times finer voltage than for formerly-adequate 16-bit samples. The valid aperture is now around 10 picoseconds or just 1/2500th of that for 16-bit systems.


Some readers may be asking how can this be because it’s not possible to complete a 16-bit conversion in 25ns, let alone a 24-bit conversion in 10ps. Is it? They’re absolutely correct. It’s not possible. Exotic 'sample and hold' circuitry is used to capture (the sample) a snippet of analogue data during this tiny interval and keep it (the hold) at a constant level long enough for the A/D converter to figure out what number to assign. Then it’s time to sample the waveform once again.


The first graph shows how the binary digits or bits per sample (which determine the quantization accuracy), signal frequency and aperture time all interrelate. This graph and the ones following were kindly prepared by my audio colleague Donald Herman Jr.  in Colorado, who developed these using the Matlab numerical analysis and measurement suite. Don works on advanced telecommunications systems and in this context 10ps is just the beginning. In telecom extraordinarily low-noise clocks are critically important.

Higher-resolution quantization for recording and playback puts enormous demands upon clock accuracy and stability. And as discussed, the situation gets proportionately worse for higher frequencies.
Randomized jitter of this sort even substantially less than 1nS can reduce the effective dynamic range by 10 to 20dB in the midrange and treble, which is highly undesirable.
Such random noise is typically the result of 'nature fights back' realities, meaning it’s very hard to control or eliminate. The sources according to Don are thermal noise (the hotter things get, the more electrons move around in ways the designer did not have in mind); shot noise (relating to the statistics of electrons and other elementary particles) and 1/f noise (nature has more low frequencies than high frequencies). One can however through exceptional care to layout, grounding and power supply design keep these sources to a minimum.


Quantitatively all these random jitter factors work together to degrade the SNR considerably. With 5ns of aperture jitter, 16-bit quantization and a simple 10kHz test tone, Donald’s analysis shows that the noise floor is already rising to nearly audible levels. With more natural meaning complex waveforms, the SNR may degrade even more.


Deterministic jitter: This is a more pernicious form described in some detail in my review of the Weiss INT 202. In this case non-random periodic clock jitter creates unintended modulation at potentially audible levels within the audio spectrum. Depending upon its period and amplitude, jitter may create a host of sidebands that concentrate energy in an unnaturally narrow range. These spurs are typically not part of any natural harmonic structure of music as they have arbitrary relationships with the signal and may be more apparent than their power spectral density alone suggests.


Sources of deterministic jitter include internal and external electromagnetic-interference (EMI), ripple in power supplies, contamination and coupling of clocks in other devices and so on. Compared to the naturally occurring sources random jitter, these artifacts are somewhat more under one’s control.


Computer interface: A great deal of marketing hype surrounds the discussion of which interface (USB or FireWire) is best for getting digital music out of a computer and into a DAC or other external device. FireWire is a rich peer-to-peer network technology but no longer common on even high-spec personal computers despite its recording-industry heritage and many technical strengths. USB is architecturally simpler in that it requires a bus master but it has of course become, well, universal. Although each technology has a variety of advantages and disadvantages, the most important capability of any such interface with respect to reducing jitter is allowing the downstream device to contain the master clock and control the rate at which the computer sends data. 


In the case of USB, its asynchronous mode enables the Audiophilleo to be bus master and to pull data from the computer on demand. In synchronous, streaming or adaptive modes, the computer is the master and the downstream slave device is in this case dependent upon computer interface-generated USB packet timing information, which usually has very high levels of jitter for audio purposes. Your mouse or keyboard could care less but a DAC does. This arrangement is especially sensitive to cabling and other electromechanical issues in addition to the limitations of using USB packet framing as a clock.


S/PDIF data layer protocol: This legacy protocol has inherent weaknesses which are somewhat similar to those of synchronous USB. The upstream transmitting device and its clocks are the master and the downstream device once again has to extract timing information from an external bit stream that is not under its control. S/PDIF uses biphase mark code (BMC) encoding to embed timing information in the bit stream. The downstream device recognizes the BMC codes and uses them as the basis for generating the sample clock. Any jitter present must be dealt with by the DAC; cabling, connectors etc. are actively involved in this scenario and thus introduce additional variables that may ultimately degrade clock accuracy.

Enlarge!