A character encoding standard
The Unicode Standard is an international standard that defines how characters should be encoded, which means how they are represented as numbers or binary code. The Unicode Consortium, an industry non-profit organization that develops, maintains, and promotes the Unicode Standard, works closely with major companies to ensure that text can be exchanged across platforms without any loss of data or information.
Why was Unicode developed?
Unicode was developed to address the need for a single standard that could be used across all computers and operating systems.
It assigns a unique number to every character in every language, allowing different applications to exchange data without knowing what language is being used. Unicode also provides a set of rules for combining multiple characters into a single code point (also called "combining characters"), which can be used to create complex characters from simpler ones.
The Unicode Consortium maintains that there are three main principles guiding their work:
Unicode should be a universal character set that encompasses all of the scripts used worldwide. It should also include symbols and characters that represent languages not written with a Roman-based alphabet.
Unicode should be backward compatible with ASCII, an encoding scheme so that existing text can be converted into Unicode without losing any information.
Unicode should be a fixed-width format so that characters do not take up varying amounts of space on the screen or when printed.
Unicode character sets
Unicode includes all of the characters of previous standards, such as ASCII and ISO-8859-1, and adds many more characters from other languages, including less common ones like Georgian. An encoding standard defines the properties of Unicode characters. The Unicode Standard defines two encoding schemes: UTF-8 and UTF-16.
UTF-8 can represent any Unicode character in a single-byte stream. A significant advantage of UTF-8 is its backward compatibility with ASCII. If you use only ASCII characters (consisting solely of letters and numbers), you'll never need to know about any other character encodings. Another advantage of UTF-8 is that it allows a single-byte stream to represent any Unicode character. You can use UTF-8 to describe English and Hindi text in the same file — Japanese, Hebrew, etc. The only downside of UTF-8 is that it takes up more space than ASCII files.
The other Unicode encoding scheme is UTF-16. It supports the same repertoire of characters as UTF-8 but has different properties. UTF-16 is a 16-bit encoding scheme representing each Unicode character using 2 bytes or 16 bits. Each byte contains the numeric value of one of the possible values. UTF-16 was introduced because it allows developers to work with Unicode text without worrying about issues such as endianness (the order in which bytes are arranged).
Unicode is a universal character encoding standard that encompasses all modern writing systems and most ancient ones. It allows you to represent any written language using the same characters, which can be displayed in any system or application that supports Unicode.
- Universal character encoding standard