Programming Languages are [Syntax Error]

Have you ever wondered if there are programming languages which are preferable for left- or right-handers? Or what the general distribution of keystrokes is while you are programming? No? Well, tough luck because this is what we are going to have a look at today.

It is clear that code just like any other electronic document is a string of letters typed on a keyboard. So, every letter can be mapped to a keystroke made by the writer. As you are reading this, you basically follow my typing. Which got me wondering about if there is a general pattern to each programming language. This applies to both the standard identifier (from the documentation) and a complete script/program. The former giving a more general overview of the language and the latter one giving a comparison to natural languages.

Raiders of the Lost Documentation

Let’s focus on two languages here: Java 10 and Python 3.6. Both provide documentation indices… in HTML. Oh boy, you know what that means. Crawling! To be fair though, this was relatively easy because of the list-like structure of both indices. With a complete text of all identifiers, we count each occurrence of a letter and plot it onto a keyboard. The major problem here is that some letters are a secondary configuration on a key. Let’s have a look at the US keyboard layout.

keyboard

Note: The image with the US keyboard layout can be found here.

You can see for example that _ and - are on the same key. For these cases I built a mapping for the two symbols to the same key. Another hard-coded map is the one with the keys’ positions on the keyboard. These positions are points in a 2D coordinate system. They later tell the plotter where each key is located and where the matching color should go.

Counting Words and Peas just like Cinderella

After going over all standard identifiers and counting all the keystrokes, we get the following images for Java and Python. The colors represent the number of strokes. The redder the key the more often it gets pressed by the user.

keyboard keyboard2

The colors are normalized to 7 categories in each picture. They are therefore comparable. Special keys like space and pressing shift are omitted because they would distort the color space due to their high stroke count. The first notable thing is the different number of letters in the documentation. Java’s doc has over 10 times more than Python’s. Either Java has more functionality or more complex identifiers. It is probably both. Otherwise you can tell that both languages make heavy usage of the letters e and t. This is to be expected for they are using the English vocabulary and English has a very high frequencies for these letters. Java also has a lot of occurrences of r, i, o, a, s and n which are also high-frequent letters. For better comparison I generated the difference of the two images. Green/blue means that this key is used more often in Python; red/yellow means more use in Java.

difference

One point getting clearer now is the usage of special characters which are on the number keys. Python uses them more often than Java, but then Java makes more use of the symbols < and > instead of [ and ] in Python. The > key has a similar count in both languages because they both use . for function calls from objects. Python uses _ over - in Java, but because there are the same key the difference is not visible.

A word about scaling. The count of keystrokes in the images is scaled logarithmically. So, changes for a key with less strokes have a higher impact on the color than for a key with a lot. Imagine a key which gets only 10 strokes compared to one which has 5000. For the first key an additional keystroke is 10% of the total occurrences whereas for the second key the change is only marginal. It is not really interesting if a key gets hit 5000 or 5001 times. They are both in the category ‘hit a lot’. This is represented by the logarithmic scale.

None of the programing languages seems to be dominant for right- or left-handers. It looks more like Java works on lower rows of keys than Python. So, more an up and down issue than a left and right one. One might say that Java is more centered around the core of the keyboard whereas Python spreads over the whole keyboard with a light tendency to the right.

Tap, Tap, Tap, Ding!

Another interesting thing to show is the typing behavior whilst writing code. The following animation shows one of my random Python scripts and the distribution of keystrokes every 100 strokes as they appear in order of reading.


Apparently, I am not a fan of the letter j. But what if we took a text in English and compare it to the code? Let’s take the Wikipedia entry for the The Hitchhiker’s Guide to the Galaxy. Note that because it is much longer than my code the scaling gets another step for more than 8000 keystrokes and the resolution is now 1000 keystrokes.


End of line

Let me summarize what we learned.

  • Programming languages behave like English because the identifiers are in English most of the time,
  • Special characters make programming languages special (hence the name) and unique,
  • There is no general right- or left-hand language just like the keyboard layout is not oriented for left or right,
  • Typing code like a normal text leads to the same stroke patterns as for natural languages,
  • Other programming languages can be included in this analysis.

Of course, nobody writes code like normal text. But to get a feeling for the vexed typing you do when you are looking for a bug I would need a key logger and nobody wants that. And therefore back to Leroy Anderson, Master of Typewriters.

Built with Jupyter Notebooks and matplotlib.