WARNING: This document was not written while sober. Apparently I think I’m hilarious when I’m drunk, so don’t take the tone of this document too seriously. This is not an academic document, and it makes extensive use of adult language.

1 About This Document

1.1 tl;dr

  • Version: 2.0
  • Text License: CC BY-SA.
  • Code License: public domain
  • Naughty Language: yes

1.2 Long Version

This document is licensed under a creative commons license. Creative Commons License

In addition, all source code presented in this document, appearing in boxes like this:


is released into the public domain. Or if you live in some kind of fake country that doesn’t recognize the public domain, you may treat it as MIT licensed. I really don’t give a fuck is what I’m trying to say.

This article contains the use of adult language. I assume that only an adult would be interested in this topic, but if you’re a giant weepy baby, feel free to read someone else’s inferior explanation of how this stuff works.

2 An Introduction to Parallelism

People talk a lot about parallelism these days. The basic idea is really simple. Parallelism is all about independence, literally the ability to do multiple things at the same time in a deterministic way. Mostly, parallelism isn’t the hard part. Really the only reason anyone thinks this stuff is hard is because software engineering is an inherently dysfunctional discipline practiced exclusively by sociopathic masochists.

Programmers like to act like parallelism is this super complicated thing that the hoi polloi are too impossibly dumb to ever grasp. In actual fact, parallelism is really easy most of the time, especially for scientific workflows. Making good use of 15 cores might be a challenge for an Android app that interprets the stomach rumbling sounds of those near you as “hunger” or “you know damn well that guy just ate” (dear venture capitalists, call me 😉). But scientific workflows tend to be very (eheheheh) regular, and predictable.

There’s a great quote about this by trollmaster Linus Torvalds that assholes often like to try to take wildly out of context:

The whole “let’s parallelize” thing is a huge waste of everybody’s time. There’s this huge body of “knowledge” that parallel is somehow more efficient, and that whole huge body is pure and utter garbage. Big caches are efficient. Parallel stupid small cores without caches are horrible unless you have a very specific load that is hugely regular (ie graphics).

The only place where parallelism matters is in graphics or on the server side, where we already largely have it. Pushing it anywhere else is just pointless.