# 🏠 Learn new characters – Unicode

Did you know you can type way more characters than what is on your keyboard? Learn when to use them.

Each character in the computer world has a number, for example A is 65, a is 97, é is 233, the greek letter α is 945. This number is called the Unicode code point. You may have heard of ASCII, US-ASCII code is old and created by Americans, it represents only the first 128 characters, so 65 for A is an ASCII code and Unicode code, but 945 for α did not exist at the time of ASCII.

When one talks about a unicode code point like 945, everyone likes to write the standard U+xxxx notation where xxxx is the code in hexadecimal. The letter é is then called U+00E9 because E9 in hexadecimal is 233 in decimal. Tables exist with the list of all characters. Unicode also gives a unique name using only A-Z characters and the dash like LATIN SMALL LETTER E WITH ACUTE for é.

To encode one or more characters in sequence of bytes), one must use an encoding like the very popular and used broadly utf-8. For a longer story, come here. A reference on the Internet is also this blog post

Unicode is much more complex than just choosing number for each character, it features bidirectional characters, combining characters, CJK (Chinese, Japanese, Korean) characters, emoji, normalization, and much more.

## Stuff in math

• Use LaTeX! Fractions like \frac{1}{1+x}, integrals like \int_{1}{x} x^2 dx, symbols like \iff... Everything is there. Available for pdf, web, svg, PowerPoint.

• 3 · 2, 3 ⋅ 2 ou 3 × 2 NOT 3 . 2, 3.2, 3•2, 3x2 — The multiplication cross does not have the same shape as a x, in handwriting the x should be rounded (). In french litterature, the multiplication may be noted 2.3 but I prefer to use the English convention that has the dot in the middle 2·3 such that the height is the same as the others 2+3, 23, 2×3 or 2÷3. U+00B7 · MIDDLE DOT \cdot &middot;, U+22C5 ⋅ DOT OPERATOR, U+00D7 × MULTIPLICATION SIGN \times, U+2022 • BULLET.

• 2−1 = −1 NOT 2-1 = -1 — The dash - used in compound words in French and other langauges is not the same as the one from the substraction operation, the minus used in substraction has the same width as the + or the ×. LaTeX in math mode will replace the simple dash - by the correct sign . U+2212 − MINUS SIGN &minus;.

• ≥ = ≠ NOT >= == != /= — Of course, when writing code, one must use the correct characters. However when you can't input unicode character (which would almost never happen because you read this article), consider using != over /= because the /= has a different meaning in programming.

• 1/2 1÷2 NOT 1:2 1⁄2 1∕2 — See wikipedia division where ISO 80000-2-9.6 states ÷ should not be used, the latter two are the fraction slash and the division slash but they do not appear as they should on all browsers, use simple / or the fraction characters below. The 1:2 notation is not international and americans for instance could misinterpret it.

• ½ ¼ ⅓ when you find it more clear than writing 1/2 or 1/4 or 1/3. List here.

• Vector: LaTeX \vec{x} U+20D7 x⃗ COMBINING RIGHT ARROW ABOVE. It's a combining character, it must be placed after the character that will be decorated.

• Arrows: ←↓→↑↙↘↖↗↔⇔↦ — list here.

• NOT x^2 — latex x^2, html x<sup>2</sup>. There are also ⁰¹²³⁴⁵⁶⁷⁸⁹⁺₊⁻₋ and others.

• ± latex \pm, html &plusmn;. U+00B1 ± PLUS-MINUS SIGN.

• Micro like in µm (U+00B5 µ MICRO SIGN LaTeX).

• ∅ for the empty set. Beware in LaTeX \varnothing NOT \emptyset NOT \phi or other variations. Wikipedia Empty Set.

• ∨∧ ¬ ā NOT v \/ /\ ! — Wikipedia List.

• Greek? Change your keyboard layout to Greek! And know the shortcut to quickly change it, on Windows Windows+Space, on Mac Command+Space or Alt+Space or goto control panel to change shortcut, on Linux goto control panel to choose.

• 1 ⩽ 2, 2 ⩾ 1 (U+2A7D \leqslant, U+2A7E \geqslant). In order to be international, use . The are found in French and Russian litterature.

• When coding in a programming langauge, one will of course use the ASCII characters forced by the language, one will then write >= == != * - and not ≥ = ≠ × · − (only true for code!).

## Lots of languages

The aim of Unicode is to easily manage every language of the word, even the old/anciant ones.

• Russian

• U+0430 а CYRILLIC SMALL LETTER A
• U+0431 б CYRILLIC SMALL LETTER BE
• U+0433 г CYRILLIC SMALL LETTER GHE
• U+0432 в CYRILLIC SMALL LETTER VE
• U+0434 д CYRILLIC SMALL LETTER DE
• etc.
• Greek

• U+03B1 α GREEK SMALL LETTER ALPHA
• U+03B2 β GREEK SMALL LETTER BETA
• U+03B3 γ GREEK SMALL LETTER GAMMA
• U+03B4 δ GREEK SMALL LETTER DELTA
• etc.

Attention should be taken that the symbol A can then be different characters :

• U+0041 A LATIN CAPITAL LETTER A
• U+0391 Α GREEK CAPITAL LETTER ALPHA
• U+0410 А CYRILLIC CAPITAL LETTER A

Diacritics are also part of Unicode:

• Diacritics: accents, cedilla, strokes… é exists in both composed and decomposed form, the decomposed form uses two code points for this character:

• U+00E9 LATIN SMALL LETTER E WITH ACUTE
• U+0065 LATIN SMALL LETTER E, U+0301 COMBINING ACUTE ACCENT

Local keyboards do produce the composed form because it's more useful. Of course, they all have a caps version, see note here if you think avoiding accents on capital letters is a good idea.

Different currency symbols are useful: \$ € ¥ ¢ ¤ ₿ ₽ ฿ ₹. List on wikipedia.

## Typography

• Always put accents on capital letters, no excuses. The Académie Française strongly recommends it, if you don't trust the Académie française, I don't know who you will follow with regards to language of French correctness. Accepting capital letters without accents dates from the typewritter era when it was mechanically impossible to have them, the typewritters era, it was 30 years ago wasn't it? However on first names and last name, everyone can choose how his or her name can be written, asking a French Elise if she wants to write an accent on her E is a totally acceptable question.

• To emphasize on a word, do not write it in CAPS, put it in bold. However, caps are sometimes easier to read. On the contrary, omitting accents on capital letters makes it harder to read. Une autre alternative est de simplement mettre la première lettre en majuscule ou d'utiliser les petites capitales, ce qui marche aussi avec une Majuscule. Les petites majuscules sont disponibles en CSS via font-variant: small-caps ou sous Microsoft Word (ici ou ).

• The non breaking space (nbsp) (U+00A0) (html &nbsp;). This special space can not be split in two to do a line break, to put a non breaking space before a ? is useful to have a space but being impossible for a line to begin with ?. It must be written before : ; ? ! and between the french guillemets «» (That's a french from France convention, in English and French from Canada one does not put a space). Latex and Word know, nothing to do there. In CSS, for presentation purpose, one can also use white-space: nowrap.

• French guillemet vs English quotes vs straight quotes « word » word "word". All variations will be understood. Wikipedia Quotation Mark.

• The dash -, the En-dash and the Em-dash are different length of dashes. I like using the En-dash for spans (lived 1850–1920). Em-dash are used in books in dialogues.. Wikipedia about Dash, Wikipedia sur le Tiret. Do not mix up with the minus sign even if I clearly prefer to see a substraction like "5–2" written with a En-dash rater than a substraction like "5-2" written with a dash. The width of the En-dash character being often the same as the one of the minus sign (En-dash 5–2 vs 5−2 minus sign).

• Mettre du texte en majuscule pour attirer l'attention, par exemple sur une vidéo youtube, est une pratique je déplore.

• Il se doit de mettre une espace (nom féminin) à l'extérieur des parenthèses mais jamais à l'intérieur, ainsi on écrira Hello (bonjour) Alice et non Hello ( bonjour ) Alice. Latex et word sont au courant, rien à faire à ce niveau là. Cependant en math, dans le contexte de fonction, on ne met pas d'espace avant la parenthèse ouvrante. On écrira sin(x) et non sin (x), le dernier pourrait être confondu avec une multiplication.

## How to type accents?

• On Linux and (some) Mac, if you have a keyboard layout like the AZERTY that can type accent but no capital letters with accents: Caps Lock then é then Caps Lock gives you a É, the key is a Caps Lock (put all letters in Capital Letter) and not a Shift Lock (act as if Shift was pressed). The latter behaviour is due to the typewritters when the key induced a physical move. On Windows, the Shift Lock behaviour is used. Here is French from France keyboard layout with the Caps Lock behaviour. Linux behaviour can of course be changed.
• On Windows to mimic Linux behaviour, if you have a keyboard layout like the AZERTY that can type accent but no capital letters with accents: download WinCompose and activate the option Caps Lock is used to produced accentuated uppercase letters, then you can press Shift Lock then ç and you will obtain a uppercase Ç.
• On Windows (Microsoft Word only) support for accents via Ctrl+' then E, cedilla is ,, Diaeresis/Tréma/Ümlaut is ". If you know the AltCode, read the note below.
• On the internet there exists Firefox and Chrome extensions that allow to write fastly accent by doing for example Ctrl+' then E giving É.
• Je conseille également la lecture ce petit article wikipedia.

## How to type any Unicode character, including accents?

• On Linux & Mac, your keyboard have probably more symbols. AltGr and AltGrShift are filled with a lot of characters. For example on my keyboard AltGr+I = . You can also use an extended layout with the symbols you want or create your own keyboard layout by modifying some text files.
• On Linux use Compose key! It does not depend on keyboard layout at all, I've mapped it to Pause because no app uses that. Go in system settings to choose where to map it. Then for example type Compose then ' then e gives é or Compose < < gives "«" or Compose > = gives and a lot more on Ubuntu Help. Here is also a list I found.
• On Windows download WinCompose and use Compose key! It does not depend on keyboard layout at all, I've mapped it to Pause because no app uses that. Then for example type Compose then ' then e gives é or Compose < < gives "«" or Compose > = gives and a lot more on Ubuntu Help. Here is also a list I found.
• On Windows there is this official app. to create yourself a keyboard layout, However the installation of the new layout is a bit unusual. You can for example choose that AltGr+w produces the "«" character ?
• On Android/iOS: long press on a key to get more symbols.
• On Android/iOS: download a new keyboard with the symbols you like.
• On Android/iOS: use Personal Dictionary with shortcuts.
• On PowerPoint de LibreOffice Impress: use Personal Dictionary with shortcuts (called Autocorrection).
• On Mac (Cocoa App only) long press on a key will pop up other characters.
• On Latex, there is probably a sequence like \times for ×.
• In Html, there is probably a sequence like &times; for ×. Wikipedia List.
• On vim: digraph! That's the same idea as the Compose Key.
• On Any platform: Google the name and copy paste... or go in the OS Character Map app but use this as a last resort because it is so slow.

### If you know the Unicode codepoint like U+2205

Please notice that memorising a list of numbers is stupid so prefer another method on this page...

• On Linux GTK+ App: Ctrl+Shift+U 2 2 0 5 Enter.
• On Windows Alt and + on keypad then 2 2 0 5 (on the keypad). requires Windows registery setting. Method 1 Method 2. It's possible to enter some characters without modifying a registery key but the method is not universal, more info here. But prefer other input method it's stupid to memorise the numbers...
• On Windows on Word: 00C5 Alt+C
• On macOs: Alt + Maj + F or Alt + 00C5
• In HTML: &#x2205;
• In code: "\u2205" (Java/Javascript/Python), chr(0x2205) (Python), String.fromCharCode(0x2205) (Javascript), Character.toString((char)0x2205) (Java).

## Notes on French keyboards

• On French from France Windows keyboard, no way to input Acute accents directly from keyboard (on capital letters or non french letters like í), change to Belgian or use other methods in that list.
• On Belgian Keyboard there is a dead key for Acute accent and Grave accent on AltGr+ù and AltGr+µ.
• None of those keyboards gives you Ç or the guillemets "«»" or the non breaking space.
• The BÉPO keyboard is a good ergonomic keyboard to write in French but is very different from QWERTY and AZERTY. However, it contains a lot of characters used in French typography like « » É Ç and the non breaking space

## Virtual keyboards on the web

• Fastly write characters in lots of languages — Lexilogos or Type It.

## Unicode operations

Python has built-in unicode support since version 3, in python 2 there were two types of strings, normal strings and unicode strings. Python also have import unicodedata module for more advanced Unicode-related operation.

• Numeric ↔ Character Conversion: chr(233) == chr(0xe9) == "é", ord("é") == 233
• Character via code point in a stringNOT TRANSLATED  :: print("The \u03B3-rays are dangerous!")
• Uppercase, lowercase: "été".upper() == "ÉTÉ", "été".lower() == "été"
• Name query: unicodedata.name("é") == 'LATIN SMALL LETTER E WITH ACUTE'
• Composition, decompositionunicodedata.normalize('NFC', "é") # U+00E9, unicodedata.normalize('NFD', "é") # U+0065 U+0301

In interactive mode in my python calculator, I can for example do uniline("αβγ") to have a nice list:

["U+03B1 α GREEK SMALL LETTER ALPHA",
"U+03B2 β GREEK SMALL LETTER BETA",
"U+03B3 γ GREEK SMALL LETTER GAMMA"]


Here is the definition of my function uniline, for more fun, see my file called uniutils.py:

import unicodedata

def uniname(s):
""" é → LATIN SMALL LETTER E WITH ACUTE """
if len(s) == 1:
return unicodedata.name(s, '?')
else:
return [uniname(x) for x in s]

def uord(s):
""" é → U+00E9 """
if len(s) == 1:
return 'U+' + hex(ord(s))[2:].zfill(4).upper()
else:
return [uord(x) for x in s]

def uniline(s):
""" é → U+00E9 é LATIN SMALL LETTER E WITH ACUTE """
if len(s) == 1:
return uord(s) + ' ' + s + ' ' + uniname(s)
else:
return [uniline(x) for x in s]