An Introduction to Data Compression
Data compression is a process that makes files smaller, which has the benefit that they take up less space on disk or take less time to transmit over a network. There are many different compression standards such as the zip file format, often used for text files, and MP3 which is used for audio files. Each type of compression is best suited to a particular kind of data. Zip works poorly with audio files and MP3 is unsuited to text files. The correct compression format needs to be applied to the correct type of data.
Some applications would not be possible without data compression. In the early days of the internet it was not feasible to transmit high fidelity audio over a telephone line in real time. The amount of data exceeded the maximum capacity of the line. The MP3 format was originally conceived to deliver acceptable quality audio at the maximum data rate that was achievable via a telephone line at the time.
Transmitting video data over the internet is particularly problematic. An HD video file at 30 frames per second requires about 178 megabytes per second, which is quite a high transmission rate. It is only in recent times that home broadband connections have been offered at all that can achieve this data rate and a mobile phone data contract with a 100 gigabyte allowance would be exhausted in less than 10 minutes. But even where appropriate data capacity is available to the consumer, streaming video service providers would have found it extremely difficult to deliver this data rate to all of their customers simultaneously. Customer expectations are always rising and users now expect resolutions higher than HD. Video compression has enabled the possibility of streaming video via today’s internet technology.
Data compression works by finding a more efficient way to represent the original information. In the following diagram we have a black and white image of a square on a white background. The image is 8 pixels across and 8 pixels down and is therefore 64 pixels in total. Each pixel is represented by one bit. White pixels are represented by a 0 and black pixels by a 1. The image therefore requires 64 bits in total to store.
Instead of representing the picture as a 2 dimensional grid of pixels, we can put all the pixels in a line like this. It’s the same picture but with the pixels rearranged into a row instead of an 8x8 grid:
Instead of storing one bit for each pixel, we could instead store a series of counts of how many pixels of each colour there are. Counting from the top left there are 18 white pixels (two rows of 8 and then 2 more = 18 pixels) before we reach a change in colour. Next there are 4x black pixels, then 4x white pixels and so on.
To represent the number 18 in binary we would need 5 bits as 18 = 10010
. Then we need to know which colour we are counting, so we could prefix the count with a bit representing the colour. So now we have 010010
to mean count 18x white pixels, that is zero to select white colour followed by 10010 for 18. Next we need to count 4 black pixels so we have 100100
which starts with a 1 to mean select black colour and then the the number 4 in binary meaning we should draw 4 pixels in blacks.
If we use this scheme we end up with this series of bits to represent the entire image:
Now we have only used 54 bits to represent the same picture instead of the original 64 bits. We have compressed the image. Storing the lengths, or runs, of the same colour is called run-length compression.
Run-length compression is one of several forms of lossless compression. This means that we can recover the exact same information that we put it. When we expand out the counts of pixels, the exact identical image results. Another form of compression is called lossy compression which does not recover exactly the same information as was put in.
In order to be reproduced, music needs to be represented as a waveform. The x-axis of the waveform represents time and the y-axis is the position of the speaker cone at that time. When the waveform is near the top of the chart, the speaker cone is fully extended outwards and when nearer the bottom the speaker cone is fully extended in the opposite direction. When the waveform is at zero, the speaker cone is in the centre position.
Run-length compression doesn’t work very well on a waveform as the speaker cone does not stick in one position for very long. The waveform continuously oscillates up and down, therefore there are few runs of the same number to be found in the data and no compression to be had.
Lossy compression works by transforming the data into a form that is more amenable to standard data compression tricks. You may have seen music represented on a spectrum analyser, such as a bar graph that moves in time to the music. These devices are often simulated in music visualisations for streaming audio. The bar chart represents the music as all of the audio frequencies that comprise it at an instant in time. Each bar on the chart is a range of frequencies and height of the bar is the loudness of those frequencies.
If the bar chart is at a fine enough level of detail (if there are enough bars on the chart), it is mathematically possible to convert it backwards from this form into the original waveform the chart was constructed from and therefore actually play the sound that the chart represents.
When in this frequency form at a sufficient level of detail, the amount of data needed to store the bar chart is not particularly smaller than the waveform representation that is needed to play the music, however, it has the advantage that we can do some compression tricks.
Sometimes some of the bars are close to zero. This happens when that specific frequency is playing at a very soft volume. If a sound is very soft then the chances are the listener can’t hear it, in which case we can round it down to zero. If we end up with several frequencies close together that we have rounded down to zero, then we have a run of zeros. In which case run-length compression can be applied and we can compress the audio in bar chart form.
Sometimes some of the bars are pretty close together (look at the first five bars on the above spectrum analyser). In which case, we can round them to make them exactly the same. This again means we have a run of identical numbers and can apply run-length compression. Rounding the bars that are similar to be the exact same value is called quantisation.
The problem with rounding out values is that when we reverse the bar chart back into the waveform to play the music, it’s not quite the same as what we originally put in. There are losses, hence it is lossy. People with particularly good hearing might be able to tell the difference, but most people probably can’t. The fidelity of the music depends on the number of bars on the chart and the extent to which we round up and down the values. The more rounding we do, the better the compression because we get longer runs of bars with the same value, but the greater the losses and the greater the chance someone will hear the difference.
Lossy compression isn’t any use for storing data where the output needs to be identical to the input. For example you could not use lossy compression for storing software as every byte needs to be correct otherwise the software won’t work. Lossy compression is useful where a person perceiving the data won’t notice the difference, hence it is also sometimes called perceptual compression.
Another compression trick is called temporal compression. This works by looking for regions that are the same between one data item and the one following it. This concept is widely used in video compression. Video is composed of a series of frames. Many video frames do not change a great deal from one frame to the next. Video compression breaks down the image into squares and then checks if the square has changed on the next frame compared with the current one. If it hasn’t, then the square is stored only once and shared between both frames.
In the below example, showing two frames of video, we mark squares in the frame that are the same on both frames. Between the frames the rabbit and the butterfly move, causing the squares to be slightly different, however much of the background remains identical and only needs to be stored once for both frames.
Image Credit: Spectrum Analyser - Touho_T, Big Buck Bunny - Blender