Did you know you can type way more characters than what is on your keyboard? Learn when to use them.
Each character in the computer world has a number, for example A
is 65, a
is 97, Ă©
is 233, the greek letter α
is 945.
This number is called the Unicode code point.
You may have heard of ASCII, US-ASCII code is old and created by Americans,
it represents only the first 128 characters, so 65 for A
is an ASCII code and Unicode code,
but 945 for α
did not exist at the time of ASCII.
When one talks about a unicode code point like 945, everyone likes to write the standard
U+xxxx notation where xxxx is the code in hexadecimal.
The letter Ă©
is then called U+00E9 because E9 in hexadecimal is 233 in decimal.
Tables exist
with the list of all characters.
Unicode also gives a unique name using only A-Z characters and the dash like
LATIN SMALL LETTER E WITH ACUTE for Ă©.
To encode one or more characters in sequence of bytes), one must use an encoding like the very popular and used broadly utf-8. For a longer story, come here. A reference on the Internet is also this blog post
Unicode is much more complex than just choosing number for each character, it features bidirectional characters, combining characters, CJK (Chinese, Japanese, Korean) characters, emoji, normalization, and much more.
Use
LaTeX!
Fractions like
\frac{1}{1+x}
,
integrals like
\int_{1}{x} x^2 dx
,
symbols like \iff
...
Everything is there.
Available for pdf,
web, svg, PowerPoint.
3 · 2, 3 â
 2 ou 3 Ă 2 NOT 3 . 2, 3.2, 3âą2, 3x2 â
The multiplication cross does not have the same shape as a x, in handwriting the x should be rounded
().
In french litterature, the multiplication may be noted 2.3
but I prefer to use the English convention that has the dot in the middle
2·3 such that the height is the same as the others
2+3, 2â3, 2Ă3 or 2Ă·3.
U+00B7 · MIDDLE DOT \cdot
·
,
U+22C5 â
DOT OPERATOR,
U+00D7 Ă MULTIPLICATION SIGN \times
,
U+2022 âą BULLET.
2â1 = â1 NOT 2-1 = -1 —
The dash -
used in compound words in French and other langauges is not the same as the one from the substraction operation,
the minus used in substraction has the same width as the + or the Ă.
LaTeX in math mode will replace the simple dash - by the correct sign â.
U+2212 â MINUS SIGN −
.
â„ = â NOT >= == != /= —
Of course, when writing code, one must use the correct characters.
However when you can't input unicode character
(which would almost never happen because you read this article),
consider using !=
over /=
because the /=
has a different meaning in programming.
1/2 1Ă·2 NOT 1:2 1â2 1â2 â See wikipedia division where ISO 80000-2-9.6 states Ă· should not be used, the latter two are the fraction slash and the division slash but they do not appear as they should on all browsers, use simple / or the fraction characters below. The 1:2 notation is not international and americans for instance could misinterpret it.
Âœ ÂŒ â when you find it more clear than writing 1/2 or 1/4 or 1/3. List here.
Vector: LaTeX \vec{x}
U+20D7 xâ COMBINING RIGHT ARROW ABOVE.
It's a combining character,
it must be placed after the character that will be decorated.
Arrows: ââââââââââ⊠â list here.
xÂČ NOT x^2 â latex x^2
, html x<sup>2</sup>
.
There are also â°ÂčÂČÂłâŽâ”â¶â·âžâčâșââ»â and
others.
± latex \pm
, html ±
.
U+00B1 ± PLUS-MINUS SIGN.
Micro like in ”m (U+00B5 ” MICRO SIGN LaTeX).
â
for the empty set.
Beware in LaTeX \varnothing
NOT \emptyset
NOT \phi
or other variations. Wikipedia Empty Set.
âšâ§ ÂŹ Ä NOT v \/ /\ ! â Wikipedia List.
Greek? Change your keyboard layout to Greek! And know the shortcut to quickly change it, on Windows Windows+Space, on Mac Command+Space or Alt+Space or goto control panel to change shortcut, on Linux goto control panel to choose.
1 ⩜ 2, 2 ⩟ 1 (U+2A7D \leqslant
, U+2A7E \geqslant
).
In order to be international, use ≤ â„.
The ⩜ ⩟ are found in French and Russian litterature.
When coding in a programming langauge, one will of course use the ASCII characters forced by the language, one will then write
>= == != * -
and not â„ = â à · â
(only true for code!).
The aim of Unicode is to easily manage every language of the word, even the old/anciant ones.
Russian
U+0430 Đ° CYRILLIC SMALL LETTER A
U+0431 б CYRILLIC SMALL LETTER BE
U+0433 Đł CYRILLIC SMALL LETTER GHE
U+0432 ĐČ CYRILLIC SMALL LETTER VE
U+0434 ĐŽ CYRILLIC SMALL LETTER DE
Greek
U+03B1 α GREEK SMALL LETTER ALPHA
U+03B2 ÎČ GREEK SMALL LETTER BETA
U+03B3 Îł GREEK SMALL LETTER GAMMA
U+03B4 ÎŽ GREEK SMALL LETTER DELTA
Attention should be taken that the symbol A can then be different characters :
U+0041 A LATIN CAPITAL LETTER A
U+0391 Î GREEK CAPITAL LETTER ALPHA
U+0410 Đ CYRILLIC CAPITAL LETTER A
Diacritics are also part of Unicode:
Diacritics: accents, cedilla, strokes⊠é exists in both composed and decomposed form, the decomposed form uses two code points for this character:
U+00E9 LATIN SMALL LETTER E WITH ACUTE
U+0065 LATIN SMALL LETTER E
, U+0301 COMBINING ACUTE ACCENT
Local keyboards do produce the composed form because it's more useful. Of course, they all have a caps version, see note here if you think avoiding accents on capital letters is a good idea.
$ ⏠„ Âą € âż âœ àžż âč
. List on wikipedia.
Always put accents on capital letters, no excuses. The Académie Française strongly recommends it, if you don't trust the Académie française, I don't know who you will follow with regards to language of French correctness. Accepting capital letters without accents dates from the typewritter era when it was mechanically impossible to have them, the typewritters era, it was 30 years ago wasn't it? However on first names and last name, everyone can choose how his or her name can be written, asking a French Elise if she wants to write an accent on her E is a totally acceptable question.
To emphasize on a word, do not write it in CAPS, put it in bold. However, caps are sometimes easier to read. On the contrary, omitting accents on capital letters makes it harder to read. Une autre alternative est de simplement mettre la premiĂšre lettre en majuscule ou d'utiliser les petites capitales, ce qui marche aussi avec une Majuscule. Les petites majuscules sont disponibles en CSS via font-variant: small-caps
ou sous Microsoft Word (ici ou lĂ ).
The non breaking space
(nbsp) (U+00A0) (html
).
This special space can not be split in two to do a line break, to put a non breaking space before a ? is useful to have a space but being impossible for a line to begin with ?.
It must be written before
: ; ? !
and between the french guillemets
«»
(That's a french from France convention, in English and French from Canada one does not put a space).
Latex and Word know, nothing to do there.
In CSS, for presentation purpose, one can also use white-space: nowrap
.
French guillemet vs English quotes vs straight quotes « word » âwordâ "word". All variations will be understood. Wikipedia Quotation Mark.
The dash -, the En-dash â and the Em-dash â are different length of dashes. I like using the En-dash for spans (lived 1850â1920). Em-dash are used in books in dialogues.. Wikipedia about Dash, Wikipedia sur le Tiret. Do not mix up with the minus sign even if I clearly prefer to see a substraction like "5–2" written with a En-dash rater than a substraction like "5-2" written with a dash. The width of the En-dash character being often the same as the one of the minus sign (En-dash 5–2 vs 5−2 minus sign).
Mettre du texte en majuscule pour attirer l'attention, par exemple sur une vidéo youtube, est une pratique je déplore.
Il se doit de mettre une espace (nom fĂ©minin) Ă l'extĂ©rieur des parenthĂšses mais jamais Ă l'intĂ©rieur, ainsi on Ă©crira Hello (bonjour) Alice et non Hello ( bonjour ) Alice. Latex et word sont au courant, rien Ă faire Ă ce niveau lĂ . Cependant en math, dans le contexte de fonction, on ne met pas d'espace avant la parenthĂšse ouvrante. On Ă©crira sin(x) et non sin (x), le dernier pourrait ĂȘtre confondu avec une multiplication.
\times
for Ă.×
for Ă. Wikipedia List.digraph
! That's the same idea as the Compose Key.Please notice that memorising a list of numbers is stupid so prefer another method on this page...
∅
"\u2205"
(Java/Javascript/Python), chr(0x2205)
(Python), String.fromCharCode(0x2205)
(Javascript), Character.toString((char)0x2205)
(Java).Fastly write characters in lots of languages â Lexilogos or Type It.
TrĂšs pratique quand on est, je conseille d'Ă©pingler l'onglet, voire de l'avoir dans une nouvelle fenĂȘtre de petite taille, et de connaĂźtre Alt+Tab !
Python has built-in unicode support since version 3, in python 2 there were two types of strings, normal strings and unicode strings. Python also have import unicodedata
module for more advanced Unicode-related operation.
chr(233) == chr(0xe9) == "Ă©"
, ord("Ă©") == 233
print("The \u03B3-rays are dangerous!")
"Ă©tĂ©".upper() == "ĂTĂ"
, "été".lower() == "été"
unicodedata.name("Ă©") == 'LATIN SMALL LETTER E WITH ACUTE'
unicodedata.normalize('NFC', "Ă©") # U+00E9
, unicodedata.normalize('NFD', "Ă©") # U+0065 U+0301
In interactive mode in my python calculator, I can for example do uniline("αÎČÎł")
to have a nice list:
["U+03B1 α GREEK SMALL LETTER ALPHA",
"U+03B2 ÎČ GREEK SMALL LETTER BETA",
"U+03B3 Îł GREEK SMALL LETTER GAMMA"]
Here is the definition of my function uniline
, for more fun, see my file called
uniutils.py
:
import unicodedata
def uniname(s):
""" Ă© â LATIN SMALL LETTER E WITH ACUTE """
if len(s) == 1:
return unicodedata.name(s, '?')
else:
return [uniname(x) for x in s]
def uord(s):
""" Ă© â U+00E9 """
if len(s) == 1:
return 'U+' + hex(ord(s))[2:].zfill(4).upper()
else:
return [uord(x) for x in s]
def uniline(s):
""" Ă© â U+00E9 Ă© LATIN SMALL LETTER E WITH ACUTE """
if len(s) == 1:
return uord(s) + ' ' + s + ' ' + uniname(s)
else:
return [uniline(x) for x in s]