Technology’s language bias

One of the many underlying tensions in Sri Lanka is that between people fluent in English and those who aren’t. After independence and Sinhala made the country’s official language, education was converted to “Swabasha”.

But the elite kept English. It is the medium in most private and international schools. University courses are mostly in English, as are the offices of big corporations. There is a huge wage premium in an English education.

All of which have resulted in an inevitable pushback.

The debate over English education has been a backdrop to my last couple of months, as I’ve been learning to code and scraping the web.

Just the other day I was scraping the Sri Lankan parliament website, and commenting to my grandpa how the HTML is all in English. But how can this be?

This is the central point of a great Wired article:

In theory, you can make a programming language out of any symbols. The computer doesn’t care. The computer is already running an invisible program (a compiler) to translate your IF orinto the 1s and 0s that it functions in, and it would function just as effectively if we used a potato emoji 🥔 to stand for IF and the obscure 15th century Cyrillic symbol multiocular O ꙮ to stand for. The fact that programming languages often resemble English words like body or if is a convenient accommodation for our puny human meatbrains, which are much better at remembering commands that look like words we already know.

But only some of us already know the words of these commands: those of us who speak English. The “initial promise of the web” was only ever a promise to its English-speaking users, whether native English-speaking or with access to the kind of elite education that produces fluent second-language English speakers in non-English-dominant areas.

There are a couple of multilingual programming languages, and programming languages based on other natural languages, but they’re nowhere near as supported or extended.

Without a large community of fellow travellers there aren’t big archives of questions and answers to query when you yourself have a problem, or huge repositories of packages and modules to extend your code.

I’ve recently been studying natural language processing, which is another domain with a huge bias towards English. The letters and symbols of many languages aren’t even supported by Unicode.

The Wired article ends with a hopeful message that we might eventually have “Swahili HTML” and “Russian HTML” in addition to “English HTML” (rather than HTML being synonymous with English). The author notes that people learn coding better in their native tongues, and that European writing was once synonymous with Latin before branching into many vernaculars.

But there needs to be a blend of these answers. There likely won’t be any one small natural language-based computer language that can compete with a large one for usability and flexibility. But that doesn’t mean localised computer languages can’t be powerful, useful or a great pathway.

Until then a lot of the power of modern technologies will remain the privilege of those with English language skills.

Putting a copyright notice here feels kind of pointless. So I'm just going to appeal to your better nature - please don't steal without credit. A backlink would be nice :)