Tuesday, January 26, 2016

Unboxing Packages: collection

I often find myself working with a class of Dart packages I like to think of as core library expansion packs. They aren’t the splashiest, they don’t produce cool demos, but they’re extremely useful. These expansion packages, which are always named after a dart: library, build upon that library’s design and functionality to provide utility classes and fill in missing features.

Actual core libraries—the things you get when you write import "dart:..."—are very useful because they’re always there, but they’re also really painful to change. Exactly because they’re always there, they have much stronger stability guarantees than the rest of the ecosystem. It’s dangerous to even add a new class because it might conflict with a user’s class name and break existing code. And forget about adding an instance method. What if someone was implementing that class as an interface?

For those reasons, the core libraries are kept as slim as possible. So what happens when we want to add a new collection type, or a class for manipulating streams? That’s exactly what expansion packs are for. Because they’re versioned packages, they can safely be added to and modified without breaking the world. But we can still keep them to the high standard of API design and stability to which we hold the core libraries.

The best way to understand the general idea is to look at a particular example. Let’s dive into the collection package, the oldest core expansion and one of the oldest Dart packages period. This package has a bunch of classes and mixins to help implement the various (rather hefty) collection interfaces, as well as advanced equality support, and a few specialized functions and collection types.

Wrappers

A wonderful feature of Dart’s type system is how easy it is to expose something that users see as a normal List or Map, but that has customized behavior or representation internally. The wrappers provided by the collection package are a key component of making this work. They make it as easy as extends to write a class that modifies the behavior of a collection.

The simplest examples are the delegating classes: DelegatingIterable, DelegatingList, DelegatingMap, DelegatingSet, and DelegatingQueue. These classes wrap an object of the corresponding type and, by default, forward all methods to that object. But you can extend them to override specific methods:

/// A delegating list that improves the `toString()` to only include modified
/// flags.
class FlagList extends DelegatingList<VMFlag> {
  FlagList(List<VMFlag> flags) : super(flags);

  String toString() {
    return "[" + this.where((flag) => flag.modified).join(", ") + ", ...]";
  }
}

The unmodifiable wrappers also come up a lot in practice. UnmodifiableListView and UnmodifiableMapView were so popular that they were actually brought into dart:collection itself, but UnmodifiableSetView is only in the package. These classes are like the delegating wrappers, but make all methods that would modify the collection instead throw UnsupportedErrors. They’re great for writing classes that expose collections that shouldn’t be modified externally:

class Engine {
  List<LiveTest> get liveTests => new UnmodifiableListView<LiveTest>(_liveTests);
  final _liveTests = <LiveTest>[];

  // The class internally modifies _liveTests, but external users can't.
}

There are also unmodifiable mixins that complement the views: UnmodifiableListMixin, UnmodifiableMapMixin, and UnmodifiableSetMixin. You can mix these in to your own, non-view collections to easily make an unmodifiable version.

Equality

Most of the time, Dart’s == is perfectly good for telling if two objects are the same. But sometimes, you need more, and the collection package has your back. It defines an Equality<T> interface that defines two main operations: whether two objects of type T are equal, and what the hash code for an object of type T is.

The most useful type of equality after == is DeepCollectionEquality. This compares supported collections (iterables and maps) based on their contents, whereas == only returns true if they’re the exact same object. There’s even a new DeepCollectionEquality.unordered() constructor if you don’t care about ordering.

void assertEqual(actual, expected) {
  if (const DeepCollectionEquality().equals(actual, expected)) return;
  throw "Expected $actual to equal $expected.";
}

There’s also DefaultEquality, which is the same as ==, and IdentityEquality, which always checks if two objects are the same. These aren’t very useful on their own, but they’re important for constructing custom equalities. Which, by the way, is a thing you can do! The equality classes are constructed to allow you to build new equalities out of old ones.

For example, new DeepCollectionEquality() optionally takes an Equality parameter that it uses to compare non-collection objects. MapEquality takes two different Equality paramters, one for keys and one for values. And MultiEquality takes a whole list that it tries in order.

Specialized Collections

My favorite part of the collection package is its original collection classes. Partly this is because I wrote a bunch of them and I have a mother’s pride, but it’s also because a good collection does what you need and does it well, and that’s just cool.

CanonicalizedMap

Most of these classes are implementations of existing interfaces, but with useful specialized behavior. The CanonicalizedMap class, for example, is a map that converts keys to a canonical form. You can use it to do things like make a case-insensitive map for, say, HTTP headers.

class CaseInsensitiveMap<V> extends CanonicalizedMap<String, String, V> {
  CaseInsensitiveMap()
      : super(
          // This callback determines the canonical form of the user's key.
          (key) => key.toLowerCase(),

          // Null keys aren't allowed. You can't call null.toLowerCase()!
          isValidKey: (key) => key != null);
}

void main() {
  var headers = new CaseInsensitiveMap<String>();

  // This canonicalizes the key to "content-type", then stores the value.
  headers["CONTENT-TYPE"] = "application/dart";

  // This reads back the same value, since the canonical key is the same.
  print(headers["Content-Type"]);
}

When carefully thinking about API boundaries, which everyone should do all the time forever, it occasionally becomes clear that the collection a class uses internally doesn’t provide the best interface for users of that class. You can always copy the data into a new form when the user tries to read it, but in some circumstances that’s not as efficient as it could be.

MapKeySet and MapValueSet

One of those circumstances is when you’ve got a map internally, but you want to expose it as a set. Sets and maps both ensure that all elements or keys are unique, after all. That’s what the MapKeySet and MapValueSet classes are for. They each wrap a map and expose its keys and values (respectively) as a set, without actually copying those contents. This means that if the underlying map is updated, that update is automatically reflected in the wrapper as well.

MapKeySet is the simpler of the two: it’s literally just a set of all the keys in the map. It’s unmodifiable, because there’s no way to add a key to the map without a corresponding value, but it’s great for exposing an internal map’s keys.

MapValueSet exposes the values of a map as a set instead of the keys. You can’t efficiently figure out whether a value is in a map without some help, so this takes an extra callback that takes a (potential) value and returns the key that corresponds to it. And of course, that only works if it’s possible to compute the key for a value at all. If you have such a function, though, you get an efficient value set. And it can even be modified!

class DoofyDatabase<T> {
  // Internally represent the database as a map so it can be indexed by ID.
  final _map = <int, Doof>{};

  // Externally expose the database values as a set.
  final Set<Doof> allDoofs;

  DoofyDatabase() : allDoofs = new MapValueSet(_map, (doof) => doof.id);

  Doof doofById(int id) => _map[id];
}

PriorityQueue

A priority queue is one of those classic data structures you learn about in school, and PriorityQueue is the Dart implementation. Well, actually, HeapPriorityQueue is the implementation. PriorityQueue should probably have a factory constructor—let me file an issue about that real quick. If one of you lovely readers wanted to send in a pull request to fix that, I’d personally review and merge it.

A priority queue is a collection that’s efficiently kept constantly ordered even as objects are added and removed. The exact ordering depends on the Comparator you pass to the HeapPriorityQueue constructor, but by default it just assumes the contents are Comparable and calls a.compareTo(b). PriorityQueue implement any of the more common collection interfaces, but you’ll recognize most of the methods from Set and Queue.

QueueList

QueueList has all the powers of both Queue and List! The normal queue interface doesn’t have an [] operation because some implementations can’t support it efficiently. But other implementations actually use a list as the underlying data structure, which makes [] very efficient. QueueList takes advantage of this fact to implement both interfaces at once.

The result is a data structure that’s efficient at everything it does. Unlike a list, elements can be added and removed from the beginning in O(1) time. Unlike a queue, elements can be accessed from the middle in O(1) time. It’s a beautiful thing.

Collecting the Goods

You can find the source code for collection on GitHub, where you should feel free to send pull requests and add your own awesome collection classes. And next time your API needs to expose a set that can’t be modified or needs to remove an element from the beginning of a collection and access elements in the middle, you can download it with pub.

Join me in two weeks when I talk about how to make your parse errors look amazing.