Units and data representation - OCRCharacters

All data is represented as binary digits, whether it is numbers, text, images or sound. Calculations are also done in binary.

Part ofComputer ScienceComputer systems

Characters

The use of binary codes to represent characters

Computers work in . As a result, all characters, whether they are letters, punctuation or digits are stored as binary numbers. All of the characters that a computer can use are called a .

Two standard character sets in common use are:

ASCII code

ASCII uses seven , giving a character set of 128 characters. The characters are represented in a table, called the ASCII table. The 128 characters include:

  • 32 control codes (mainly to do with printing)
  • 32 punctuation codes, symbols, and space
  • 26 upper case letters
  • 26 lower case letters
  • numeric digits 0-9

We tend to say that the letter ‘A’ is the first letter of the alphabet, ‘B’ is the second and so on, all the way up to ‘Z’, which is the 26th letter. In ASCII, each character has its own assigned number. For example:

CharacterDenaryBinaryHexadecimal
A65100000141
Z9010110105A
a97110000161
z12211110107A
048011000030
957011100139
Space32010000020
!33010000121
CharacterA
Denary65
Binary1000001
Hexadecimal41
CharacterZ
Denary90
Binary1011010
Hexadecimal5A
Charactera
Denary97
Binary1100001
Hexadecimal61
Characterz
Denary122
Binary1111010
Hexadecimal7A
Character0
Denary48
Binary0110000
Hexadecimal30
Character9
Denary57
Binary0111001
Hexadecimal39
CharacterSpace
Denary32
Binary0100000
Hexadecimal20
Character!
Denary33
Binary0100001
Hexadecimal21

‘A’ is represented by the denary number 65 (binary 1000001, hex 41), ‘B’ by 66 (binary 1000010, hex 42) and so on up to ‘Z’, which is represented by the denary number 90 (binary 1011010, hex 5A).

Similarly, lowercase letters start at denary 97 (binary 1100001, hex 61) and end at denary 122 (binary 1111010, hex 7A).

When data is stored or transmitted, it is its ASCII or Unicode number that is used, not the character itself.

For example, in binary, the word "Computer" would be represented as:

1000011 1101111 1101110 1110000 1110101 1110100 1100101 1110010

Question

What would this message say?

1001000 1100101 1101100 1101100 1101111 0100001

Extended ASCII

Extended ASCII uses eight bits, giving a character set of 256 characters. This allows for special characters such as those with accents in languages such as French and Spanish.

Unicode

While suitable for representing English characters, 256 characters is far too small to hold every character in other languages, such as Chinese or Arabic. Unicode uses 16 bits, giving a range of over 65,000 characters. This makes it more suitable for those situations.