Hello! 👋🏽 Welcome to the first proper issue of PL Papers You Might Love. Today, I'll be taking a look at A Flexible Type System for Fearless Concurrency (PLDI 2022) by Mae Milano, Joshua Turcotti, and Andrew C. Myers (paper PDF).
Over the past decade or so, Rust has popularized the notion of "fearless concurrency." The Rust compiler prevents data races1 at compile-time by maintaining an ownership-and-borrowing system that is integrated with its type system. In doing so, Rust has carved out a niche of its own amongst mainstream languages:2
However, safety isn't free in terms of cognitive cost. Certain programs that are very easy to write in a language like Java become very hard or impossible to write in safe Rust. If you've tried to implement any tricky pointer-based data structures in Rust, or you've read Learn Rust With Entirely Too Many Linked Lists, you're probably nodding along in agreement. There are workarounds for this, such as using integer indices and/or unsafe Rust, but neither of those are very satisfying:
In academia, there has been a bunch of work related to separation logic and linear types which also achieves data-race-freedom. However, such type systems are often hard-to-use, requiring many annotations. Or they can require manually converting your code to CPS, which is often undesirable for various reasons.
In A Flexible Type System for Fearless Concurrency, the authors design a new type system that:
if disconnected
primitive that dynamically determines if a region can be safely split, by proving that the object graphs reachable from two references are mutually disjoint.They also provide a proof of correctness and a type checker implementation.4
iso
): Qualifier for object fields intended to contain transitively dominating references.Instead of going through the details of the paper, I think it is useful to walk through one of the fun code examples they showcase. I've added explicit return
keywords for clarity as well as a bunch of inline comments.
// A circular doubly-linked list with 1 or more elements.
struct dll_node {
iso payload : data;
next : dll_node; // note: structs are reference types
prev : dll_node;
}
// A circular doubly-linked list with 0 or more elements.
struct dll {
iso hd : dll_node? // using the built-in optional type '?'
}
// Get the payload from the tail if the list is non-empty.
def remove_tail(l : dll) : data? {
let some(hd) = l.hd in { // non-empty case
let tail = hd.prev;
// rewire tail's predecessor and successor
tail.prev.next = hd; hd.prev = tail.prev;
// disconnect tail's object graph from rest of linked list
// to ensure disjointness for if disconnected
tail.next = tail; tail.prev = tail;
// `if disconnected` is a built-in operation that checks if
// two object graphs pointed to by the arguments do not overlap
if disconnected(tail, hd) {
l.hd = some(hd); // l.hd invalid at branch start
return some(tail.payload)
} else { // single-element case
l.hd = none;
return some(hd.payload)
}
} else { return none } // list was empty
}
My first thought on reading this was: it's kinda' amazing that circular doubly-linked lists can be made to work with code that largely looks like a typical imperative language.
That said, I also had a couple of immediate questions:
if disconnected
: It's not clear how expensive if disconnected
is. Naively, it seems like it would be awful in the general case.if disconnected
: It's not super clear what is going on with the "l.hd
invalid at branch start" aspect, and if the type system can be tweaked to avoid having to do such seemingly-redundant assignments.Let's discuss each of those.
if disconnected
This point is discussed in Section 5.2 Efficiently Checking Mutual Disconnection. It's a little complicated, so I've tried to explain it here in my own words below.5
iso
reference is transitively dominating if and only if it is not explicitly tracked by the type system.R
for which all iso
references contain transitively dominating references, and suppose R
contains two iso references A
and B
which transitively dominate object graphs OA
and OB
respectively. If OA
and OB
intersect, it must be the case that OA
contains B
and OB
contains A
(by the definition of transitively dominating references). Conversely, if OA
does not contain B
, then it cannot intersect with OB
. So it is sufficient to stop checking for the intersection of object graphs at transitively dominating iso
references.if disconnected
"ensures that its arguments come from the same region, and that nothing within that region is tracked".These three points taken together imply that if disconnected
can get away with checking object graph intersection only "up to" iso
references. As far as I can tell, that's the core idea behind making this check efficient.
There is a second part about reference counts too.
We implemented a version of the if disconnected check that is efficient based on two usage assumptions. The first assumption is that data structure designers prefer to keep regions small when possible, placing the
iso
keyword at abstraction boundaries—for example, collections place their contents iniso
fields [..]. The second assumption is thatif disconnected
is commonly used to detach a small portion of a region—often as small as a single object.Following these assumptions, we propose a two-step pro- cess for the efficient implementation of
if disconnected
. First, store a reference count which tracks immediate heap references stored in non-iso
fields of structures. This stored reference count is updated only on field assignment, and does not need to be modified—or checked—on assignment to local variables, function invocation, or at any other time. Thus, it is lighter-weight than conventional reference counts.Second, the
if disconnected
check itself is implemented via interleaved traversals of the object graphs rooted by its two arguments, ignoring references which point outside the current region, and stopping when the smaller of the two has been fully explored (or a point of intersection has been found). During this traversal, the algorithm counts the number of times it has encountered each object, assembling a traversal reference count. At the end of the traversal, it compares this traversal reference count with the stored reference count, concluding that the object graphs are disconnected if the counts match, and conservatively assuming that they remain connected if the counts do not match.
At a high level, the aspect about comparing counts instead of actual object graphs makes sense from an efficiency perspective. However, I don't fully understand the details. Specifically, based on the "conservatively assuming" point, it seems like the proposed algorithm cannot definitively answer if two object graphs are certainly connected, it can only answer that they might be connected. If I'm understanding that right, it's not clear why this that is sufficient for writing code correctly.
if disconnected
Copying the point from before:
It's not super clear what is going on with the "
l.hd
invalid at branch start" aspect, and if the type system can be tweaked to avoid having to do such seemingly-redundant assignments.
I skimmed through parts of the type system description, but I still don't fully understand this one. I'm describing my intuition below based on one of their diagrams (the diagram is slightly out-of-sync with the code, with head
instead of hd
):
let some(hd) = l.hd in {
let tail = hd.prev;
// rewire tail's successor and predecessor
tail.prev.next = hd; hd.prev = tail.prev;
// disconnect tail's object graph to ensure disjointness
tail.next = tail; tail.prev = tail;
if disconnected(tail, hd) {
l.hd = some(hd); // l.hd invalid at branch start
return some(tail.payload)
} else // ...
} else // ...
The l.hd
field points within the same region which includes hd
and tail
-- this fact is understood by the type system. In the true
branch for if disconnected(tail, hd)
, the original region is split into two regions, but the type system can't know which of those two regions l.hd
points to. (In this specific case, it might seem a little silly since the last use was if some(hd) = l.hd
, but it doesn't seem resolvable in the general case.) So l.hd
is "invalid" and needs to be restored to a valid state before the function returns.
Based on this, it seems like reference invalidation/re-assigment is fundamental to the design of if disconnected
. Perhaps you could add some special-casing for specific situations, but you couldn't get rid of this "problem" in the general case.
OK! That was just one code example. The paper covers a lot more things; you're free to read it if you find it interesting. For example, some fun stuff that I didn't cover here:
send
and recv
primitives for concurrency.~
. (Pro tip: Check the appendix for some amazing function signatures.)Details of the paper aside, here are some rough thoughts on adoption etc.
The authors mention this as one of their assumptions:
data structure designers prefer to keep regions small when possible, placing the
iso
keyword at abstraction boundaries—for example, collections place their contents iniso
fields
At face value, this seems reasonable. However, it's not clear to me how this affects composing collections.
As a somewhat contrived example, say you want to stash the nodes of a doubly-linked list in a tree, and the tree nodes store contents within an iso
field. Is that even possible?
Based on my understanding so far, that wouldn't be possible, because the iso
field would no longer be a transitively dominating reference. Sure, you could temporarily break this by having the type system track the iso
fields for a few tree nodes, but it seems like this wouldn't be possible for an entire tree (unless you had some fancy type shenanigans).
Contrived example aside, it's not clear to me how often this problem would come up in practice. What happens if you have a bunch of domain-specific types which contain references, and you try putting those in containers with iso
-content nodes... are there common situations where you'd run into issues?
Based on the examples, it does seem like writing code in such a language would impose less mental and annotation overhead compared to something like Rust. It seems like the tradeoff is that using such a language is that it requires more thorough thinking about object graph connectivity, whereas Rust has what is arguably a simpler mantra: "mutation xor aliasing."
That said, I concede that I am biased here to some extent: I've written a fair amount of Rust code, whereas I don't really have experience writing code in this paper's language.
One potential thing that can go wrong is that programmers may overuse if disconnected
to avoid reasoning about object graphs. Or they may use it in ways where the runtime cost is not small when things go wrong. For example, trying to write a GUI library where the graph of widgets is a spaghetti ball of mutability might be possible, but it's probably still a bad idea.
One of the things I didn't discuss is the point about "Virtual Transformations" in the paper. The authors point out:
An astute reader may note that, unlike the initial typing rules in figure 10, TS1-Virtual-Transformation-Structural is not syntax-directed. We present a decision procedure for type checking with virtual transformations. It runs in common-case polynomial and worst-case exponential time.
[...]
We note however that, due to our choice to limit typeable
iso
field accesses to only fields of currently declared variables, the number of H contexts reachable by virtual transformation is bounded above by the number of variables currently in scope. Thus, even a naive search suffices to obtain completeness, at the cost of run time exponential in the number of variables and the length of the longest function. Heuristics for speeding up search are briefly discussed in section 5.1.
In Section 5.1, they write:
By employing liveness analysis of variables and isolated fields as a unification oracle, our checker can verify our largest examples in a handful of seconds. When necessary, our tool still falls back to search. Other approaches—such as user annotations or an external constraint solver—may be useful for pathological cases. More details appear in the appendix.
For a production type-checker, "a handful of seconds" is quite a long time. I don't have a good sense for how much these pathological examples will matter in practice. For example, Algorithm W has bad worst-case time complexity, but it works quite well in practice on typical ML code.
If you're going to be implementing this (or a similar) type system, it seems like tackling this challenge early on is probably a good idea, to figure out if it is a deal-breaker or not in practice.
I think this paper presents a promising avenue for supporting "fearless concurrency" in application languages, especially implementations with tracing GCs.6
Some follow-ups I'd be interested in seeing:
That's it for now! This issue was sent to 57 subscribers. If you're interested in getting the next issue in your inbox directly, enter your email below. :) In the next issue, I will be covering Stabilizer: Statistically Sound Performance Evaluation (ASPLOS 2013) by Charlie Curtsinger and Emery Berger.
A couple of reasons for why you might want to prevent data races altogether:
There are also many not-yet-mainstream-but-growing languages in this space, such as Pony, Carp, Vale and Cone. ↩
In this, it is similar to Microsoft's Project Verona rather than Rust. ↩
I don't know if the code is available in a more easily consumable form, such as on GitHub, but it's available on Zenodo as a compressed archive. ↩
The reasoning here seems a bit subtle, so I hope I didn't make any mistakes or misunderstand the paper. Please let me know if that's the case. ↩
Recall that the if disconnected
check requires some cooperation with the memory management subsystem. RC-based language implementations and C/C++/Rust
often rely on just malloc
/free
from the system directly. ↩