For those not familiar with Stack Overflow, it's a "a question and answer site for professional and enthusiast programmers." Since 2011 they have conducted a developer survey, and have made the anonymized raw data available online. The survey covers a number of topics, some serious, others are more light hearted. For example, in previous years: more than 75% of developers drank at least one caffeinated beverage per day, 1% drank ten or more per day; 49% prefer dogs while 35% prefer cats; and 65% think Star Wars is better than Star Trek.
One of the more controversial findings reported on numerous blogs and media outlets is the topic of spaces vs. tabs. If you have never worked as a developer or spent time around developers, an argument over the use of spaces or tabs to indent code might seem frivolous in the extreme. However, it is something a lot of developers take very seriously.
A Google search on 'spaces vs. tabs' returns more than 24 million results. David Robinson, data scientist at Stack Overflow, analyzed the Stack Overflow survey data and found an interesting relationship between the use of spaces vs. tabs and salary. Basically, developers who use spaces earn more than those who use tabs. The R script he used is available on Github.
Rather than just report what he found, I decided to see if I could reproduce the results. I prefer Python and Pandas; my script is also available on Github (there is also a vigorous debate in the data world about which is better, Python or R). My script returns a dictionary of key value pairs; the key is the amount of experience the developer has and the value is the average salary for that experience level. The amount of data is small so I chose to plot it using Excel. The relationship for the 2017 data is clear to see:
Note: the data also includes a 'both' category for those who switch between tabs and spaces. I did not include this data in the plot, it is very similar to the graph for tabs above.
It is difficult to explain this relationship but the difference is significant. I decided to test this for previous years as well. Unfortunately, the question was only asked in 2015 and the experience and salary data were recorded differently from 2017, but we can still plot the data:
The same relationship seems to exist in the 2015 data. Of course correlation does not mean there is actual causation. The website Spurious correlations has numerous examples of very close correlations that have no relationship in the real world, for example US spending on science has a 99.8% correlation with suicides by hanging, and accidental pool drownings correlate with Nicolas Cage movies. Even if you are not a Nicolas Cage fan it would be very unfair to claim any kind of causation between his film roles and accidental drownings. The world is full of strange coincidental correlations, this spaces vs. tabs correlation could just be one of those. It is interesting, however, that the correlation seems to also exist in the 2015 dataset. Do you think there's a correlation, or is there something else at play?