Data compression

Smaller files are easier to transmit across a as they require fewer to be sent. Their reduced size also means more files can be stored in any given area of .

Modern computers often generate files of very large sizes. For example, audio files often run to , while high definition video can be in size. Such files require lots of storage space and, because of their size, are difficult to transmit. These problems can be overcome by using .

There are two types of compression that can be applied to files:

compression
compression

To play this video you need to enable JavaScript in your browser.

This video can not be played

An explanation of lossy and lossless compression

Open Transcript

Lossy compression

With lossy compression, some is removed and discarded, thereby reducing the overall amount of data and the size of the file.

An image can be compressed by reducing its . This reduces the range of colours that the image contains. In practice, this results in an averaging of shades of colours. For example, a very light shade of green could be averaged with a not so light shade - the very light shade might be discarded and the affected by it re-coloured with the darker shade.

A high resolution image next to a compressed version of the same image

Similarly, an audio file can be compressed by reducing the bit depth of the samples. is a lossy audio file format.

Various lossy exist:

the file format works on this principle, which is why JPEG files tend to be smaller in size
the file format compresses audio and video, making it more suitable for streaming media
MP3 is a lossy format for audio including music

Disadvantages of lossy compression are that there is some loss of quality, and the full data can never be retrieved.

Lossless compression

There are some files that we would not want to lose data from. For example:

text files
spreadsheets
financial records
emails

With lossless compression, files are reduced in size without the loss of data. However, lossless compression does not usually achieve the same file size reduction as lossy compression.

Various lossless standards exist:

allows lossless compression of text documents
is a lossless image file format

Calculating compression ratios

One method of lossless compression is run length encoding (RLE). RLE looks at the data in a file for consecutive runs of the same data. These runs are stored as one item of data instead of many.

Consider this row in a image:

Bitmap image showing ten data values with two characters each

Each pixel in the image uses to specify the colour. In this example 00 is white and 11 is red. The data for this is 00 00 00 11 11 11 11 00 00 00, which is ten data values of two characters each, giving 20 characters in total. RLE looks for the runs of each data and records what the data is and how many times in succession it occurs. These values are stored instead of the original data.

So

00000011111111000000 (20 characters)

becomes

608160 (6 characters)

This is a compression of 10:3. A 10MB file compressed to a 2MB file will have a compression ratio of 5:1. This is useful when calculating the space needed to store data.

Storage and data organisation - EduqasData compression