014 - putting words on the screen

03 May, 2024

Every day we look at words on screens. This is the principal way that computers communicate information to us. This was true with the early computers that could only display text on a terminal, and it's true today with ChatGPT which displays the results of the most cutting edge AI models as words on your screen.

Over the decades, programmers have made it very easy to display words on the screen. Most developers today do not spend much time thinking about this problem. In my last ten years as a programmer I spent zero time thinking about how to get words to display on a screen. I simply put the words I wanted to display in HTML and then the web browser like Chrome or Safari made them appear on the screen. Now, I am writing a piece of software "from scratch" so I don't want to have Chrome put the words on the screen. I want to put them there myself.

At the lowest level, all a computer can do is store zeroes and ones. These are called bits. You can put several bits together to store larger numbers. For instance, with two digits, you can store four discrete values 00, 01, 10, 11. We could use these values to count to three: 0, 1, 2, 3. When we put 8 bits together we can store 256 discrete values. This is called a byte and it can store the numbers from 0 to 255. If we want to store numbers greater than 255 we can put several bytes together. The math that determines how many values can be stored in a sequence of bits is the length of the sequence raised to the power of two. This is why many computer specs are powers of two (like 256gb harddrive or 512mb of ram).

Computers cannot store letters. Only numbers. To get them to store letters, we need to agree on a way to translate numbers into letters. For instance, we could say that A is 1 and Z is 26. The computer would also need to know about lower case letters: we could number them 27 to 52. We would also want some other symbols like ! and $. We could make these numbers 53 and 54. This is called an encoding. An encoding maps each available number to exactly one character.

An encoding only works if everyone we share data with also knows the encoding. If I call 1 A and someone else calls 1 a then when we share data with one another, all the capitalization will be reversed. To deal with this, standardized encodings were invented that everyone in the world could agree on.

The simplest, widely used encoding is the "American Standard Code for Information Interchange". ASCII for short. ASCII maps each of the 256 values of a byte to exactly one character. For instance A is 65. Funnily enough, the arabic numbers we are familiar with also have to be mapped to numbers. So 1 is 49 in ASCII. To store the word hello in ASCII encoding, you store five numbers in a row: 104 101 108 108 111. It can be a little confusing working with regular numbers like this, because a single byte could be a two digit or a three digit number. To solve this, programmers often represent numbers as hexadecimal values. hexadecimal allows us to count to 16 with a single digit by using 6 extra letters. So you count 7, 8, 9, A, B, C, D, E, F. We can represent hello in hexadecimal as 68 65 6C 6C 6F.

Some languages use more characters in their writing than English. For example, in French, vowels commonly have accents on them. To address this, other encodings emerged. For example, the Latin-1 encoding drops some of the lesser used symbols in ASCII in favour of characters like á and é. Things get really crazy when you try to make an encoding for a language like Japanese or Chinese where there are thousands of characters - way more than you can ever fit in a single byte. To solve this problem, an encoding called Unicode was invented. A unicode character can be up to 4 bytes. This allows it to represent up to 1,112,064 distinct characters. For instance, the emoji "💩" is represented by three bytes: 01 F4 A9.

Now we know how you store a word in a computer. Getting the computer to display the word on a screen is a whole other thing. Computer displays are made up of pixels. To display a character, you need to know which pixels to turn on and which ones to turn off to create the visual representation of the character. This information is called a glyph. Of course there is more than one way to visualize a character, so we group visually coherent sets of glyphs into a font. The font Times New Roman, provides glyphs for all the upper and lowercase letters (and many other characters) with a particular visual style.

In early days of computing, glyphs were stored as bitmaps like this:

eno writer

A bitmap is a list of bytes, with each byte describing what value a specific pixel should have. In a grayscale bitmap, a value of 0 means a pixel should be black and and a value of 255 means it should be white. For a color bitmap, we need three bytes for each pixel to represent each of the Red, Green and Blue values.

Once computer monitor resolutions improved, it was possible to get glyphs to look more like the glyphs of traditional printing presses stamped on paper. To store these high resolution glyphs in bitmaps would be very storage intensive and every single font size would need a separate bitmap. To get around this, modern fonts store glyphs as mathematical vector equations describing shapes. When it's known what font size is wanted, the vector shapes are "rasterized" into pixel values at that particular font size and pixel density.

"Rasterizing" glyphs is an expensive process. So, instead of rasterizing every letter on the screen, each letter is rasterized only once and stored at some coordinates on a large bitmap. The coordinates are stored separately if you need that glyph again, you can simply look up the coordinates, find that location on the bitmap, and copy the pixels. This is called a glyph atlas.

To put the letters on the screen, you need to know where to put them. In bitmap fonts this was easy because each character was the same width. In modern fonts, the glyphs are all different widths. Also, there are special rules that when certain letters appear together, they should be closer. For example, in many fonts short letters can be tucked in a little closer under the overhang of an "f".

In early fonts, each character mapped to exactly one glyph. Modern fonts often have something called "ligatures". A ligature is when two characters are merged together into a single glyph. It is very common for "f" and "l" and "f" and "i" to be a ligatures: "fl" "fi". Here's the fi ligature in Times New Roman:

eno writer

You might notice that picture looks a bit blurry. That's because I resized it from 500 by 500 pixels to 250 by 250 pixels. Remember what I said earlier about having to rasterize glyphs at every font size you want to display?

Putting all these pixels on the screen is quite a bit of work, and with 4k monitors and Retina displays it gets even more intensive. Today's computers usually have a processing unit called a GPU which is specially designed for graphics work. We hear a lot about GPUs lately because they are also very useful for AI. The strength of GPUs is that they are massively parallel, meaning they can take a tiny piece of code and run it many times at the exact same time with different values. This is very useful for drawing thousands of pixels to the screen at a high frame rate. If you want to have a 60fps application, then every frame needs to be calculated and drawn in less than 16.6 milliseconds.

Drawing letters on a screen at 60fps is no different than drawing any other graphics on the screen, so, in a state of the art modern word processor, you definitely want to use the GPU for rendering your glyphs. To do this, you need to rasterize all the vector glyphs of the font to a glyph atlas and then upload the glyph atlas to the GPU. Then, every frame, you pass the GPU a list of coordinates that describe where every letter on the screen should go and where to find the appropriate glyph in the glyph atlas. The GPU then does the work of filling in all the necessary pixels. And voila! You have words on the screen.

As you can see, it's pretty straightforward to put words on the screen.

If you enjoyed this post, subscribe below to get notifications for the next one.

We also have an RSS feed