How do computers process audio data?

How Waveforms are represented

There is a more detailed explanation of how audio is represented in the Audacity manual:

Waveform

...the height of each vertical line is represented as a signed number.


More about Digital Audio

  • The Audacity wiki has some information about how algorithms in Audacity work. If there is a specific audio effect in Audacity that you want to know more about, that isn't already covered, you can leave a question there.
  • If you are looking at source code, the echo effect is a good place to start.
  • For much much more about digital audio, click the Wikipedia buttons for links that interest you on this page. The ones at the foot of that page are particularly useful for digging deeper into the different audio file formats that are out there.

You may notice that all these links come from the Audacity project. That's not a coincidence.


Digital audio is stored as a sequence of numbers, called samples. Example:

5, 18, 6, -4, -12, -3, 7, 14, 4

If you plot these numbers as points on a Cartesian graph: the sample value determines the position along the Y axis, and the sample's sequence number (0, 1, 2, 3, etc) determines the position along the X axis. The X axis is just a monotonically increasing number line.

Now trace a line through the points you've just plotted.

Congratulations, you have just rendered the waveform of your digital audio. :-)

The Y axis is amplitude and the X axis is time.

"Sample rate" determines how quickly the playback device (e.g. soundcard) advances through the samples. This is the "time value" of a sample. For example CD quality digital audio traverses 44,100 samples every second, reading the amplitude (Y axis value) at every sample point.

† The discussion above ignores compression. Compression changes little about the essential nature of digital audio. Much like zipping up a bitmap image doesn't change the core nature of a bitmap image. (The topic of audio compression is a rich one - I don't mean to oversimplify it, it's just that all compressed audio is eventually uncompressed before it is rendered -- that is, played as audible sound or drawn as a waveform -- at which point its compressed origins are of little consequence.)


You can read this lecture from Lothar Reichel where he explains a little the topic "Digital Audio Compression", and post some Matlab code:

Sound is a complicated phenomenon. It is normally caused by a moving object in air (or other medium), for example a loudspeaker cone moving back and forth. The motion in turn causes air pressure variations that travel through the air like waves in a pond. Our eardrums convert the pressure variations into the phenomenon that our brain processes as sound.

Computers “hear” sounds using a microphone instead of an eardrum. The microphone converts pressure variations into an electric potential with amplitude corresponding to the intensity of the pressure. The computer then processes the electrical signal using a technique called sampling. Computers sample the signal by measuring its amplitude at regular intervals, often 44,100 times per second. Each measurement is stored as a number with fixed precision, often 16 bits.

Computers emit sound by more or less reversing the above process. Samples are fed to a device thatgenerates an electric potential proportional to the sample values. A speaker or other similar device may thenconvert the electric signal into air pressure variations.The rate at which the measurements are made is called thesampling rate. A common sampling rate is 44,100 times per second (used by compact disc, or CD, audio). The bit rate of a set of digital audio data is the storage in bits required for each second of sound. If the data has fixed sampling rate and precision (as does CD audio), the bit rate is simply their product. For example, the bit rate of one channel of CD audio is 44,100 samples/second×16 bits/sample = 705,600bits/second. The bit rate is a general measure of storage, and is not always simply the product of samplingrate and precision. For example, we will discuss a way of encoding data with variable precision.

Hope it helps.


Taking your WAV file example:

A WAV file will have a header, which specifies key information to a player or audio processor about the number of channels, sample rate, bit depth, length of data etc. After the header comes the raw bit pattern, which stores the audio samples (I'm assuming you know what sampling is - if not, see Wikipedia). Each sample is made up of a number of bytes (specified in the header) and specifies the amplitude of the waveform at any given point in time. Each sample could be stored in signed or unsigned form (also specified in the header).

Tags:

Audio