A character

This should be an easy one, right? A string of text consists of characters. Everyone knows that!

I think that from a user perspective, a character is something you can iterate on. Press left on a keyboard and a cursor moves by one character. Sounds easy. Let's begin!

– Does A a character?
– Sure!
– Does Á a charter?
– Sure is!
– Does fi a character?
– Nope! It's two characters.
– Wrong! It's a ligature and therefore a single character (can depend on a font).
– Does => a character?
– No?
– It can be! Depending on a font.
– Does 😃 a character?
– Sure.
– Does क्षि a character?
– Hm...
– It is!
– Does द्ध्र्य a character?
– Again?!
– Yes! And it is.
– Does ȧ̶̻̫̍̽̔ a character?
– Wut?!
– It is! And it's called Zalgo.

A modern text layout is a ridiculously complicated topic. It includes Unicode, TrueType fonts, shaping and many more. But all we care for now is just the definition of a character.

In modern typography, all characters above are called grapheme clusters. This is what a character is from a computer perspective. And what SVG uses as well.

A grapheme cluster is not a Unicode code point (like UTF-32), but rather a collection of code points. And to get grapheme clusters from a string we would have to pass it, along with a font, to a shaper.


Ok, so an SVG character is a grapheme cluster. Needs some time to wrap your head around and study modern typography, but looks simple enough. Wrong!

First, SVG 1.1 doesn't actually define what a character is. Seriously. If you open the spec there are no mentions of it.
Luckily, SVG 2 fixed this and provided us with a definition of... two kinds of characters?! What?!
Welcome to the world of SVG!

In SVG 2 we have an addressable character aka UTF-16 code point and a typographic character aka an extended grapheme cluster (but not really).
To quote the spec:

Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in UAX29, as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal — and is expected to tailor them differently depending on the operation as needed.

If someone knows what this mean in a human language - please let me know.

In short, all the above simply means that x, y, dx, dy and rotate attributes operate on addressable characters and everything else works with typographic characters.

Weirdly enough, characters placement along the path (in the case of textPath) is done using typographic and not addressable characters. While technically this is still a positioning phase.