A Dispatch on: Message Passing

An article I reread often is Memory Safety Simplifies Microprocessor Design. It describes a processor that is memory-safe by design, achieving safety through features like instruction level concurrency and message passing. Joe Armstrong has a talk called How we program multicores in which he recommends a similar message passing instruction set primitive. What is it about message passing specifically that makes it safer? More importantly, why use it instead of the shared memory approach that nearly every modern processor uses?

This isn't the first time that I've been a little confused about message passing. When I started reading Fault-Tolerant Message-Passing Distributed Systems I found myself asking why the term "message-passing" was included in the title. At the time I assumed that all distributed systems were message passing systems. I eventually learned that there are shared memory distributed systems. But, that didn't help me define message passing. When I did define it, something that was clearly message passing but didn't fit the definition would force me to find a new definition.

There’s the obvious self-descriptive explanation that message passing is about communicating by passing messages around. But that obvious explanation is what led to my confusion. Message passing instructions on a processor are possible, because you can just shove messages in a queue and deliver them. But what makes that more useful than just putting values into memory and having another process read that memory? Similarly, programming languages like Go have channels that provide message passing, but those are implemented with shared memory.

So I went back to the basics. In distributed systems, we talk about two different methods of communication: shared memory and message passing. They are equivalent, as in they solve equivalent problem and they can be implemented in terms of each other. Another example of equivalence is atomic broadcast and consensus.

The more I thought about this the more clear it was that my definitions of message passing have been too narrow.1 In reality, message passing is not about passing messages.

Well, there's a missing qualifier there. Message passing is not always about passing messages. Obviously, if you have a system in which you pass messages, then you have a message passing system, that is tautological. But there are cases where one might not actually notice they are doing the semantic equivalent of passing messages.

Many definitions of message passing assume it is a means to an end. Search engines surface the Wikipedia article about Message Passing, which makes this assumption. The article makes a distinction between message passing and directly invoking a function. This view sees message passing as an implementation detail, not as a high level concept. The article doesn't even mention shared memory, but it does imply that message passing requires copying memory whereas function or procedure invocation does not. This is similar to problems understanding distributed systems caused by people believing that they require multiple machines connected by a network. Sure, that is an accurate way to describe some distributed systems, but it doesn’t explain the concept as a whole. Google did provide a link to the ScienceDirect page on Message Passing, which contains this definition:

Message passing is a distributed memory approach in which each processor has its own memory address space and does not share variables among the processors.

This is closer to my understanding of message passing. One reason why this is a better example is because it places PCIe squarely outside of being a message passing system. While PCIe is a transaction based packet switched networking protocol, it implements a shared memory system and is therefore not a message passing system. It implements direct memory access, which provides a shared memory space for all of the components involved. There are similar systems for inter-machine direct memory access that usually fall under the umbrella term RDMA, or remote direct memory access. Despite the fact that packet switched networks are used to implement both of these technologies, they are shared memory systems.2

The key difference between shared memory and message passing is whether the shared address space is mutable or immutable. Let's add some nuance. First, we define communication in terms of a shared address space: sending a message is writing to an address, and receiving a message is reading from an address. Next, we define whether, once a shared address has been written to, if it can be written to again. If it can, then it's mutable, so it implements shared memory. If it cannot, then it's immutable, and it implements message passing. This is the reason why a memory-safe processor design would use message passing instead of shared memory, by definition immutable memory is safe. Message passing, regardless of how it's implemented, is conceptually write-once memory. Shared memory, regardless of how it's implemented, is conceptually write-many memory.

This is a much wider definition of both shared memory and message passing, but it helps to clear up some of the potential confusion around the usage of these terms. It would be helpful to redefine these two terms to mutable shared addressing and immutable shared addressing, but like many things in computing, it is unlikely that changing the terms we use would be effective or useful.


  1. I'd experienced this same narrow understanding when I began to study distributed systems formally. My new understanding of message passing was helped by already having gone through something similar. It made me much more comfortable believing that most of the definitions and literature are wrong in about this concept. ↩︎

  2. It's important to note here that while two things might be equivalent, that does not mean they are the same. While atomic broadcast and consensus are equivalent they are conceptually different because they are solving two different problems. It's that the two problems are equivalent. In the case of atomic broadcast and consensus, the problems are ensuring all non-faulty processes deliver the same message and all non-faulty processes decide the same value, respectively. The problems are equivalent because if you can deliver the same message across all non-faulty processes, then you can choose a single value across all non-faulty processes. Similarly, if you choose a single value across all non-faulty processes, then you can deliver the same message to all non-faulty processes. Shared memory provides access to a shared address space while message passing provides access to a communication substrate. If you have a shared address space, then you can implement a communication substrate. If you have a communication substrate, you can implement a shared address space. ↩︎