Digital Audio at the National Library of Canada

Gilles St-Laurent
Network Notes #49
ISSN 1201-4338
Music Division
National Library of Canada

August 8, 1997

1. Introduction

"Mary had a little lamb" were the first words recorded by Thomas Edison in 1877 on his newest invention, the phonograph. Although the technology was crude, its underlying recording and playback concepts were to remain unchanged until the advent of digital technology over a century later. Digital audio technology has profoundly affected communications and has revolutionized the art and science of recording and manipulating sound.

The National Library is fortunate to have state-of-the-art digital audio processing equipment -- technology which has greatly influenced day-to-day conservation and document delivery activities.

2. Digitizing Audio

Sound is created by a vibrating surface (string, reed, drum skin, etc.) which produces sound waves of compressing and decompressing air. To make an analog recording of sound waves, a microphone is used to transfer the continuous change in air pressure into a continuous change in voltage, i.e., an analogous voltage image of the original vibration. For analog storage, a transducer ¹ is used to transfer the continuous change in voltage into a continuous physical change of a groove wall of an LP, or into a continuous change of particle alignment within a cassette tape. At playback, these continuous changes in groove wall or particle alignments are then transferred back into a continuous voltage image by the player's transducers. It is this voltage image, when presented to the input of an analog to digital converter, which is digitized.

2.1 Sampling Rate

The accuracy (the resolution), of sound reproduction in digital audio is determined by two factors: sampling rate and digital word length.

Digital recordings, rather than being a continuous physical image of changes in electrical voltage, are based on a series of discrete electrical voltage measurements. For example, on a CD, the electrical voltage image is measured 44 100 times per second; this is its "sampling rate". At a particular time the voltage, (for argument's sake) might be 0.5 volts out of a maximum 1 volt. 1/44100^th of a second later the voltage might be 0.505 volts, the following 1/44100^th of a second 0.509 volts, and so on.

2.2 Digital Word Length

In a digital system, each voltage reading is expressed as a number that the computer can interpret. Just as 2:00 p.m. can be expressed as 1400 hrs, so any value can be expressed using binary digits -- 1s and 0s. Also, 1/3 can be expressed as 0.3, or more accurately 0.33, or better yet as 0.333 and so on: the greater the number of decimal places, the more precise the expression of the translation. Hence, the greater the number of digital bits used to express a voltage reading, the more accurate is the translation (not the louder). For the compact disc, the number of digital bits used to translate or "digitize" a voltage reading is 16 which gives 65 536 possible voltage values.

Measuring the Sound Wave As An Electrical Voltage i.e. Digitizing

The voltage measurements in this diagram are expressed in Arabic numbers rounded to the third decimal place. In a digital audio system, these measurements would be expressed in numbers (words) composed of a series of bits. In a digital system, the first reading, 0.483, using words rounded to 16 bits, would be 0111101110100101; the second, 0.815, would be 1101000010100011; the third, 0. 888, would be 1110001101010011, and so on.

2.3 Analog to Digital (A to D) Converters

The Recorded Sound Studio at the National Library of Canada is equipped to digitize analog signals using 20-bit (1 048 576 possible values) analog to digital converters. The advantage of 20-bit over 16-bit conversion is immediately audible in low level signals (room reverberation, quiet passages, etc.) and in the overall "naturalness" of the sound.

An important consideration when digitizing audio is that the converter can "measure" up to a certain maximum voltage only, regardless of the number of bits used. Any voltage exceeding that limit cannot be represented. Thus, the converter will generate digital clipping in the audio. Therefore, a high quality digital loudness meter, which measures both peak and average levels is used to monitor recording levels closely.

2.4 Signal Routing

Once the audio signal has been digitized, it must be routed to another piece of equipment, such as a recorder or noise-reduction system. There are several digital audio communication standards. The most popular professional one, and the one the National Library uses, is AES (Audio Engineering Society). AES can carry two channels of up to 24-bit audio on a single cable, and is configured to minimize inductance of outside interference into longer cable runs found in studios. The National Library studio also houses an AES standard-digital patchbay, in which all digital equipment inputs and outputs are connected, so that the output of one piece of equipment can be conveniently routed to another piece of equipment.

3. Digital Audio Workstation

The National Library recently acquired a PC-based digital audio workstation (DAW), "Pyramix" by Merging Technologies, which combines real-time hard-disc recording, digital audio mixing, editing, audio-effects processing, and CD-R mastering. The system is based on a card, containing four 32-bit floating-point AT&T DSP 3210 microprocessors, which provides an aggregate peak power of 133 Mflops (Million Floating Point Operations per Second). All operations are executed in 32-bit (4 294 967 296 possible values). The recording function records from 16 to 32 bits. The system will be capable of handling the proposed DVD standard of 96kHz sampling rate with 24 bits.

3.1 Digital Recording and Editing

One of the advantages of recording directly onto hard disc is that fragile and/or damaged material need be played only once, thereby minimizing wear and tear on the artifact, should further work be required. Another advantage is that it allows random access: it is possible to jump from the beginning to the end (or anywhere in the file) of a one-hour selection instantaneously. Gone is the need to wait while 2 500 feet of tape rewinds!

Once on the hard disc, the DAW can be instructed to draw the waveform so that the sound waves can be seen and manipulated visually. Probably the greatest advantage of the DAW is that editing is non-destructive. Editing reel-to-reel tape involves physically cutting the medium with a razor blade, then re-attaching the sections (splicing). This process makes it very difficult, if not impossible, to reverse errors or editing. Digital editing involves placing markers at the beginning and end of sections to be edited. The edit points can be moved, removed or altered at any time. While in place, they instruct the DAW on the next action required, without altering the original file in any way.

Editing on a DAW is far more sophisticated than on tape. As stated, tape editing involves a blunt cut across the tape; in digital editing, while the equivalent of the blunt tape cut is possible, a function called cross fading also exists. This allows the simultaneous fading-out of one signal while another fades in at the edit point. This operation can take place within milliseconds (or as long as each clip) and produces a smooth, seamless edit. Fade-ins and fade-outs can also be programmed using various curves and can be moved or changed at any time.

3.2 Digital Processing

Also available on the DAW is digital processing such as equalization (EQ) and dynamic processing. EQing is the boosting and cutting of certain frequencies. On a DAW, this function is automated so that, for instance, the most used playback EQ curves for 78-rpm discs can be programmed and recalled at any time. Since the DAW operates in 32-bits and digital EQing is a mathematical process, phase shifting, ringing and distortion, which can be problematic in analog processing, are absent.

Dynamic processing is a function which automatically alters the volume level of the signal (the parameters are set by the user). In the compression function, the lower level passages are made louder so that the difference between the louder and softer passages is smaller. This is very useful when dealing with media with limited word lengths, such as RealAudio which, to minimize bandwidth, uses only 8-bits (2⁸ = 256 values) to encode sound. To maximize the use of those 8-bits, levels must be as high and even as possible.

3.3 CD-R Mastering

The DAW can also be used to create Red Book (the audio CD standard) CD-R masters. The time between each selection, CD Start, Index, and Stop marks, is stored in a special file. The program can then use that image to burn a CD-R at twice the speed, in the background (while the computer is used for other work) using the SCSI ports of the computer and the CD recorder.

4. Digital Noise Reduction

Noise reduction/removal is a function which is simply not possible with any fidelity in the analog domain. It is most needed for older, noisy media, such as 78-rpm discs, although the result is no less dramatic with quieter media. Three classes of noises are found on sound recordings: transient noises such as clicks, crackle and hiss.

The National Library's Recorded Sound Studio houses a CEDAR (Computer Enhanced Digital Audio Restoration) system which can remove or reduce these imperfections: the DC-1 De-Clicker, the CR-1 De-Crackler and the DH-2 De-Hisser. These units are based on twin 40-bit (1 099 511 627 776 possible values!) floating point processors which process sound in real-time (there is no waiting while the units are calculating the results). CEDAR, which was jointly developed by the British Library National Sound Archive and Cambridge University's Engineering Department, is the single most important development in the field of audio conservation/restoration.

4.1 De-Clicking

The result of removing high frequency, high-energy transient noises, such as clicks and pops, is immediately apparent. The DC-1 removes both clicks and any underlying music. It then re-creates the missing sound wave by analyzing pre- and post-click samples and interpolating the results using high order algorithms. The number of samples that the DC-1 examines depends on the length of the click. A short click requires fewer samples (10) than a longer pop (60 to 200) to rebuild the sound wave. The De-Clicker can remove up to 2 500 clicks per second per channel in real-time.

4.2 De-Crackling

Crackle is a burst of short, small spikes which is added to the original sound by poor record surface quality, buzzing caused by improperly wired or grounded equipment, or distortion caused by overloading amplifier mixer outputs or digital clipping. These all add harshness to the sound. Crackle is a more subtle and difficult form of noise to remove than a click. The De-Crackler addresses this problem by dividing the input signal into "genuine" signal and "crackle/distortion" signal (the ratio is determined by the operator) and working solely on the signal with crackle. First, the operator must set the crackle mode to either Crackle 1 (sharp and well defined) or Crackle 2 (‘grungy' and not so well defined). Then the operator sets the amount of crackle that the CR-1 is required to remove. Finally the signal is recombined. The CR-1 can have a detrimental effect on the sound if not adjusted properly.

4.3 De-Hissing

Hiss is quite obvious to a human listener but is far more difficult for to a machine to detect. Therefore, it is harder to remove than a sharp click or crackle. The DH-2 removes hiss by analyzing the tonal, transient and ambiance content of the signal at hundreds of frequency bands and removing the frequency bands when it does not detect any musical signal. The operator must first adjust the noise level parameter, giving the DH-2 a rough idea of the amount of noise present in any given signal. Next, the operator must adjust the Attenuation, which sets a maximum limit on the amount of noise that the DH-2 will remove at any given frequency. Finally, the operator must adjust the Brightness algorithm to preserve the appropriate amount of presence by controlling the speed at which the DH-2 will remove noise. The DH-2 is the most difficult CEDAR box to adjust properly and can have a very detrimental effect on audio quality if adjusted improperly.

The net aural effect of removing noises from sound recordings can be spectacular. Digital noise reduction can effectively free music from the shortcomings of its recording medium, uncovering details which were once masked by noise.

5. Storage

At present, the CD is by far the most popular digital storage medium with a projected lifespan of between 40 and 100 years. Since there is no physical contact when playing the disc (unlike DAT -- Digital Audio Tape -- which must travel over a rotating playback drum), there is virtually no chance of damaging the disc during playback. While equipment obsolescence is a serious problem in machine-readable archives, DVD, the newest digital audio format, is backwards compatible with the CD. This should assure that playback hardware is available for both for a long time.

The drawback of the CD is that it is a 16-bit medium only. Therefore, anything longer than 16 bits must be removed. Straight truncation introduces unwanted artifacts such as distortion and noise modulation. Ironically, special DSP algorithms are used to add an extremely low-level digital noise known as dither (created by a random number generator) to the digital signal so that the last few bits are tuned on and off at random. This effectively smoothes out the sound. Unfortunately broad-band noise is introduced into the system and effectively degrades its performance.

Over the past few years, "noise shaping" dither has been established as a sonically superior process for shortening word lengths. Noise shaping relies on the fact that the ear is more sensitive to midrange frequencies (around 4 kHz) than it is to either low or high frequencies. The National Library of Canada uses the Apogee AD-1000 with UV22 Super CD-Encoding system which, from a 20-bit signal, removes the last four bits, feeds them back into the input signal via a filter which adds an algorithmically-generated "clump" of energy around 22 kHz (beyond the theoretical threshold of hearing and at the upper frequency limit of the CD). The net result is a lower noise floor and a much more natural sounding CD which captures the resolution and detail of 20-bit signals in a 16-bit word length.

6. Conclusion

The National Library's Recorded Sound Studio is a world-class facility. It accommodates the full range of musical styles, recording media and technologies from the entire history of recorded sound. For example, the studio was recently used for several projects, including:

The restoration of the three earliest recordings of the pop group, the Guess Who. The only master tape that had survived was found in the Randy Bachman fonds. The two other discs had to be taken from pristine LPs in the National Library's collection, de-clicked and sent to a studio in Vancouver for further processing.
The restoration of several 1950s recordings taken from the Glenn Gould fonds for releases on the CBC Records label.
The restoration and mastering of a CD for the MusicWorks label of works by electroacoustic music pioneer, instrument inventor, and composer Hugh Le Caine. Taken from the Le Caine fonds, the tapes dating from the 1950s and 1960s, after restoration, produced a sonic quality higher than that which Mr. Le Caine would have heard.
The oldest sound recordings in the Recorded Sound Collection, 11 five-inch 78s, dating from the 1880s were processed, edited to remove skips, and put onto a CD for the National Library's service collection.

Digital audio is assisting in the preservation and promotion of Canadian sound recordings by revitalizing older materials with newer techniques and technologies. The studio constitutes a fruitful convergence of old and new.

¹ Transducer: a device for transferring power generated in one system to another