Data compression
Smaller files are easier to transmit across a networkA group of interconnected computers/devices. as they require fewer data packetA piece of data sent over a network. Messages have to be broken down into binary data packets before they are transferred. to be sent. Their reduced size also means more files can be stored in any given area of storageThe hardware of a computer that stores data long term, such as a memory card or hard drive..
Modern computers often generate files of very large sizes. For example, audio files often run to MBMegabyte (MB) - a measurement of file size or storage capacity. 1,048,576 bytes., while high definition video can be GBGigabyte (GB) - a measurement of file size or storage capacity, 1,024 megabytes, or 1 billion bytes. in size. Such files require lots of storage space and, because of their size, are difficult to transmit. These problems can be overcome by using compressionA method of reducing file sizes, particularly in digital media such as photos, audio and video..
There are two types of compression that can be applied to files:
- lossyA form of compression that reduces digital file sizes by removing data. compression
- losslessA form of compression that encodes digital files without losing detail. Files can also be restored to their uncompressed quality. compression
An explanation of lossy and lossless compression
Lossy compression
With lossy compression, some dataUnits of information. In computing there can be different data types, including integers, characters and Boolean. Data is often acted on by instructions. is removed and discarded, thereby reducing the overall amount of data and the size of the file.
An image can be compressed by reducing its colour depthThe amount of bits available for colours in an image.. This reduces the range of colours that the image contains. In practice, this results in an averaging of shades of colours. For example, a very light shade of green could be averaged with a not so light shade - the very light shade might be discarded and the pixelPicture element - a single dot of colour in a digital bitmap image or on a computer screen. affected by it re-coloured with the darker shade.
Similarly, an audio file can be compressed by reducing the bit depth of the samples. MP3A standard audio file format which uses lossy compression. Compatible with most media players. Designed by the Moving picture experts group - layer 3. is a lossy audio file format.
Various lossy standardAn agreed way of doing things. exist:
- the JPEG Joint Photographic Experts Group - JPEG is a digital image format which uses lossy compression. file format works on this principle, which is why JPEG files tend to be smaller in size
- the MPEGMoving Picture Experts Group – Layer 4 - a standard video file format using lossy compression. file format compresses audio and video, making it more suitable for streaming media
- MP3 is a lossy format for audio including music
Disadvantages of lossy compression are that there is some loss of quality, and the full data can never be retrieved.
Lossless compression
There are some files that we would not want to lose data from. For example:
- text files
- spreadsheets
- financial records
- emails
With lossless compression, files are reduced in size without the loss of data. However, lossless compression does not usually achieve the same file size reduction as lossy compression.
Various lossless standards exist:
- PDFPortable Document Format - a file format developed by Adobe in an effort to standardise the way documents are shared. allows lossless compression of text documents
- GIFGraphics Interchange Format - an 8-bit digital image format which uses lossless compression. Also used for short animations. is a lossless image file format
Calculating compression ratios
One method of lossless compression is run length encoding (RLE). RLE looks at the data in a file for consecutive runs of the same data. These runs are stored as one item of data instead of many.
Consider this row in a bitmapAn image made up of pixels. This type of image loses quality if its width and/or height are increased. image:
Each pixel in the image uses binaryA number system that contains two symbols, 0 and 1. Also known as base 2. to specify the colour. In this example 00 is white and 11 is red. The data for this is 00 00 00 11 11 11 11 00 00 00, which is ten data values of two characters each, giving 20 characters in total. RLE looks for the runs of each data and records what the data is and how many times in succession it occurs. These values are stored instead of the original data.
So
00000011111111000000 (20 characters)
becomes
608160 (6 characters)
This is a compression ratioA ratio is a way to compare amounts of something. It is usually written in the form a:b. of 10:3. A 10MB file compressed to a 2MB file will have a compression ratio of 5:1. This is useful when calculating the space needed to store data.
More guides on this topic
- The CPU - Eduqas
- Primary storage - Eduqas
- Secondary storage and embedded systems - Eduqas
- Networks - Eduqas
- Internet and cybersecurity - Eduqas
- Data representation - Eduqas
- Operating systems - Eduqas
- Principles of programming - Eduqas
- Algorithms - Eduqas
- Sorting, searching and validation - Eduqas
- Software development - Eduqas
- Impacts of digital technology on wider society - Eduqas