Data compression

When we compress data, we transform one coding scheme to another that uses less data. Consider, for example, a coding scheme for a FAX machine in which images are made up of black and white dots (pixels). If we use a one-bit per pixel coding scheme saying 0 = white and 1 = black, a row of pixels might be represented as:

000000000000000000000000001111111111111111111111110000 etc.

an alternative representation would be:

0,26,24,4 etc.

Which says begin with a 0 (white dot) after 16 of those, switch to black dots, after 20 of those, switch to white dots, after 4 of those, switch to black dots, etc. (This is an example of run length encoding).

For most documents, the latter scheme would produce a smaller file than the former. (Can you give an example in which the first scheme would produce a smaller file)?

There is a basic engineering tradeoff between file size and compression and de-compression time. One could write a simple program that took raw data from a FAX scanner and converted it to our alternative, compressed code and a companion program that converted the compressed code back to a string of 0s and 1s for printing. (Such programs are often referred to as codecs, coder/decoders). These would be simple programs, that took relatively little time, but compressing and un-compressing data always takes time. Compression and de-compression time can be an important consideration when dealing with large audio or video files.

We can think of two types of compression -- lossy and lossless. With lossless compression, we retain all of the information so that when a file or message is de-compressed we get an exact duplicate of the original. With lossy compression, we discard some information, so it is not possible to recreate the original.

If, for example, you were going to compress an email message or a spreadsheet before transmitting them, you would want to use lossless compression. When the message was received, it could be de-compressed producing an exact copy of the original.

If you compress a photograph before sending it, you would often be willing to use a lossy compression method. In that case, the recipient would see a lower quality image. There is a basic engineering tradeoff between file size and the quality of the compressed image. In deciding how much to compress an image, you would consider the application, people's ability to perceive quality differences, and file size.

When working with lossy compression, we edit and manipulate files without compressing them. When we have the final file ready to store or transmit, we compress it the desired amount. Compression is the last step.


Disclaimer: The views and opinions expressed on unofficial pages of California State University, Dominguez Hills faculty, staff or students are strictly those of the page authors. The content of these pages has not been reviewed or approved by California State University, Dominguez Hills.