Technology’s language bias

One of the many underlying tensions in Sri Lanka is that between people fluent in English and those who aren’t. After independence and Sinhala made the country’s official language, education was converted to “Swabasha”.

But the elite kept English. It is the medium in most private and international schools. University courses are mostly in English, as are the offices of big corporations. There is a huge wage premium in an English education.

All of which have resulted in an inevitable pushback.

The debate over English education has been a backdrop to my last couple of months, as I’ve been learning to code and scraping the web.

Just the other day I was scraping the Sri Lankan parliament website, and commenting to my grandpa how the HTML is all in English. But how can this be?

This is the central point of a great Wired article:

In theory, you can make a programming language out of any symbols. The computer doesn’t care. The computer is already running an invisible program (a compiler) to translate your IF orinto the 1s and 0s that it functions in, and it would function just as effectively if we used a potato emoji 🥔 to stand for IF and the obscure 15th century Cyrillic symbol multiocular O ꙮ to stand for. The fact that programming languages often resemble English words like body or if is a convenient accommodation for our puny human meatbrains, which are much better at remembering commands that look like words we already know.

But only some of us already know the words of these commands: those of us who speak English. The “initial promise of the web” was only ever a promise to its English-speaking users, whether native English-speaking or with access to the kind of elite education that produces fluent second-language English speakers in non-English-dominant areas.

There are a couple of multilingual programming languages, and programming languages based on other natural languages, but they’re nowhere near as supported or extended.

Without a large community of fellow travellers there aren’t big archives of questions and answers to query when you yourself have a problem, or huge repositories of packages and modules to extend your code.

I’ve recently been studying natural language processing, which is another domain with a huge bias towards English. The letters and symbols of many languages aren’t even supported by Unicode.

The Wired article ends with a hopeful message that we might eventually have “Swahili HTML” and “Russian HTML” in addition to “English HTML” (rather than HTML being synonymous with English). The author notes that people learn coding better in their native tongues, and that European writing was once synonymous with Latin before branching into many vernaculars.

But there needs to be a blend of these answers. There likely won’t be any one small natural language-based computer language that can compete with a large one for usability and flexibility. But that doesn’t mean localised computer languages can’t be powerful, useful or a great pathway.

Until then a lot of the power of modern technologies will remain the privilege of those with English language skills.

Computers are alien to me

I’ve been reading a series of technical and computing books recently, as I pick up my Python studies. I naively stumbled into coding assuming all I needed were the terms and grammar of a knew “language”. But there’s so much more to learn for someone who has mostly had a liberal arts education.

It’s a completely different mode of thinking. You often see coding/development described as “problem solving”, but I don’t think that is quite acurate. I have been bumbling my way through a kind of alien logic.

Actually, the best example comes from a recent problem I encountered on Brilliant:

 

Define a comparison as an operation which takes in one number and tells you whether it is larger than, smaller than, or equal to, another. Suppose you are given a sorted array with 1000 elements, and you can use at most n comparisons to determine whether a certain number is in this array. What is the smallest value of n such that you will always be capable of making this determination, regardless of the values of the elements in the array?

Essentially what they are asking me to do is conjure up a number and find that same number in a sorted list of 1000 numbers. What’s the smallest number of comparisons I would need to make to find the match? I sat stumped for quite a while with this problem.

Eventually I gave up. I couldn’t fathom how to even approach it.  Having now seen the solution I can’t imagine I would ever have arrived there through deduction alone. It wasn’t so much a lack of knowledge of terms etc., my regular heuristics simply don’t apply.

Another example comes from a book called How Software Works, which I am currently working my way through. This is from a chapter on protecting passwords:

 

Authentication systems need a way to strengthen hashing without a performance-crushing number of hash iterations; that is, they need a method of storing passwords that requires an impractical time investment from attackers without creating an equally unrealistic time burden on legitimate access. That method is called salt… The salt is a string of characters, like a short, random password, that is combined with the user’s password before hashing. For example, user mrgutman chooses falcon as his password, and the system generates h38T2 as the salt….

The salt and password can be combined in various ways, but the simplest is appending the salt to the end of the password, resulting in falconh38T2 in this example. This combination is then hashed, and the hash code stored in the authentication table along with the username and the salt…

If there are, say, 100,000 users in a stolen authentication table, and the salts are numerous enough that no salt is duplicated in the table, the attacker will need to create 100,000 tables. At this point, we can’t even call them precomputed tables because the attacker is creating them for each attack.

 

Buried in a section about scrambling (not the right word I know) passwords so that they cannot be reverse engineererd and easily matched, we have this interesting solution. Simply append a random string to the password before scrambling. Put this way it’s quite simple and I can follow the logic. But getting this far requires a kind of thinking to which I am unaccustomed.

On the surface, that different fields have different approaches isn’t particularly profound. But it’s not often that we get to feel out of our depth on a “I don’t even understand how to logic this out” level. I am definitely feeling that right now.

 

(As always, my emphasis)

Another adjacent possible

Working my way through one of the more fascinating technology books I’ve ever come across, Code by Charles Petzold. I stumbled across this passage:

nobody in the nineteenth century made the connection between the ANDs and ORs of Boolean algebra and the wiring of simple switches in series and in parallel. No mathematician, no electrician, no telegraph operator, nobody. Not even that icon of the computer revolution Charles Babbage (1792–1871), who had corresponded with Boole and knew his work, and who struggled for much of his life designing first a Difference Engine and then an Analytical Engine that a century later would be regarded as the precursors to modern computers…

This is from a chapter on Boolean logic (aka Boolean algebra), which you might have come across if you have ever studied programming, statistics or electrical engineering.

I’ve never before had it explained to me in such a cogent fashion. But what this sections highlights in particular (and the book as a whole rams home) is the power of bringing together seemingly disconnected ideas, theories and fields.

…What might have helped Babbage, we know now, was the realization that perhaps instead of gears and levers to perform calculations, a computer might better be built out of telegraph relays…

This is a great book if you want to understand how computers work, as it combines engineering and information theory to construct a virtual computer, step by step. Starting with a simple light bulb circuit, through logic gates, operating systems and graphical interfaces.

But it is arguably more valuable in demonstrating how something as complex as a computer draws from many fields.