🏠 Learn new characters – Unicode

Français ici !

Did you know you can type way more characters than what is on your keyboard? Learn when to use them.

Each character in the computer world has a number, for example A is 65, a is 97, é is 233, the greek letter α is 945. This number is called the Unicode code point. You may have heard of ASCII, US-ASCII code is old and created by Americans, it represents only the first 128 characters, so 65 for A is an ASCII code and Unicode code, but 945 for α did not exist at the time of ASCII.

When one talks about a unicode code point like 945, everyone likes to write the standard U+xxxx notation where xxxx is the code in hexadecimal. The letter Ă© is then called U+00E9 because E9 in hexadecimal is 233 in decimal. Tables exist with the list of all characters. Unicode also gives a unique name using only A-Z characters and the dash like LATIN SMALL LETTER E WITH ACUTE for Ă©.

To encode one or more characters in sequence of bytes), one must use an encoding like the very popular and used broadly utf-8. For a longer story, come here. A reference on the Internet is also this blog post

Unicode is much more complex than just choosing number for each character, it features bidirectional characters, combining characters, CJK (Chinese, Japanese, Korean) characters, emoji, normalization, and much more.

On this page. Do not hesitate to share a specific paragraph by right clicking on one of the numerous #hashtag and give the link to your friends or collegues!

Stuff in math

Lots of languages

The aim of Unicode is to easily manage every language of the word, even the old/anciant ones.

Attention should be taken that the symbol A can then be different characters :

Diacritics are also part of Unicode:

Different currency symbols are useful: $ € „ Âą € ₿ â‚œ àžż â‚č. List on wikipedia.

Typography

How to type accents?

How to type any Unicode character, including accents?

If you know the Unicode codepoint like U+2205

Please notice that memorising a list of numbers is stupid so prefer another method on this page...

If you know how to write it by handwriting

Notes on French keyboards

Virtual keyboards on the web

Unicode operations

Python has built-in unicode support since version 3, in python 2 there were two types of strings, normal strings and unicode strings. Python also have import unicodedata module for more advanced Unicode-related operation.

In interactive mode in my python calculator, I can for example do uniline("αÎČÎł") to have a nice list:

["U+03B1 α GREEK SMALL LETTER ALPHA",
 "U+03B2 ÎČ GREEK SMALL LETTER BETA",
 "U+03B3 Îł GREEK SMALL LETTER GAMMA"]

Here is the definition of my function uniline, for more fun, see my file called uniutils.py:

import unicodedata

def uniname(s):
    """ Ă© → LATIN SMALL LETTER E WITH ACUTE """
    if len(s) == 1:
        return unicodedata.name(s, '?')
    else:
        return [uniname(x) for x in s]

def uord(s):
    """ Ă© → U+00E9 """
    if len(s) == 1:
        return 'U+' + hex(ord(s))[2:].zfill(4).upper()
    else:
        return [uord(x) for x in s]

def uniline(s):
    """ Ă© → U+00E9 Ă© LATIN SMALL LETTER E WITH ACUTE """
    if len(s) == 1:
        return uord(s) + ' ' + s + ' ' + uniname(s)
    else:
        return [uniline(x) for x in s]
More readings

https://docs.python.org/fr/3.7/howto/unicode.html