Data representationCharacters

Data goes through the central processing unit which utilises main and cache memory to improve system performance. Peripherals use interfaces to communicate between the system and a connected device.

Part ofComputing ScienceComputer systems

Characters

Characters can also be represented in binary. Characters are usually grouped together in a character set. A character set includes:

  • alphanumeric data (letters and numbers)
  • symbols (*, &, : etc.)
  • control characters (Backspace, Horizontal tab, Escape etc.)

ASCII

ASCII was originally developed for basic computers and printers. It uses a 7-bit code to represent characters.

As more computers began to work with 8-bit groups of data, ASCII was written as 8 bits. The most significant bit was sometimes used as a parity bit to perform a parity check (a form of error checking). Other computers set the most significant bit to 0.

So ASCII represents 128 characters (the equivalent of 7 bits) with 8 bits rather than 256.

For example, the ASCII code for lower case z is 122 and is shown below:

Parity Bit/Eighth Bit6432168421
01111010
Parity Bit/Eighth Bit
64
32
16
8
4
2
1
0
1
1
1
1
0
1
0

Extended ASCII

It is possible to use the most significant bit of an 8-bit byte to allow ASCII to represent 256 characters. This is known as extended ASCII. There are different versions of extended ASCII in use.

Limitation of ASCII

The 128 or 256 character limits of ASCII and Extended ASCII limits the number of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.

Unicode

Unicode is a universal character set. It is aimed to include all the characters needed for any writing system or language.

The first code point positions in Unicode use 16 bits to represent the most commonly used characters in a number of languages. This Basic Multilingual Plane allows for 65,536 characters.

Additional supplementary planes allow around one million other code point positions to be used. As of Version 14.0, released in September 2021, the Unicode Standard contains 144, 697 characters.