Unicode at ChiPy

I gave a talk at the last ChiPy on the relevance of Unicode to the typical programmer. The point I was attempting to make was that Unicode is a tool for writing culture‐independent software the same way high‐level languages were tools for writing machine‐independent software.

There’s a lot of interest in the industry, and particularly among socially‐minded nerds in the role computers have in improving the lot of humanity. In the talk, I start out with a little historical review of electronic communication. I talk a lot about the evolution of the telegraph network into a system that allowed global instantaneous communication, but one that was very expensive and low fidelity.

Computers changed the equation—at least in the West—making communication cheap and universally available. With the computerization of the network in the 1960s, they developed a new character set that was freed from the electromechanical limitations of the old network: ASCII. Unlike the old telegraph character sets, ASCII allows us to write high‐fidelity English. This is what allowed the computerized network to shift from being just a cheaper form of the telegraph network into an online always‐available compendium of knowledge.

After ASCII, there was an explosion of character sets that all tried to provide the ability to write high‐fidelity text in every language on Earth. The problem is that the sets were incompatible and many were very poor from a technical perspective. What’s more, programmers dealing in text tended to use idioms that made sense in their own language, but perhaps not for others. The result is that there is now a lot of software that worked great in English, adequately for other Latin‐based languages, and very poorly for everyone else.

Unicode tries to solve the problem by

  1. superseding all of the (often horrible) legacy character sets, and
  2. providing programmers with a set of tools to refer to language structures (such as words) and common tasks (such as sorting) in a language-independent way.

After explaining some of the features Unicode has for writing culture‐independent code, I explore the Unicode features of Python and demonstrate that many of the most important features aren’t yet available.

At any rate, here's the talk:

I really enjoyed giving the talk, and I would like to thank ChiPy—and in particular Brian Ray and Carl Karsten—for allowing me to give it.

In the future, I need to remember to repeat the questions from the audience into the microphone, as it’s not impossible to hear anything the audience is doing. The last third is probably not worth watching as it’s me answering questions you can’t hear.