Tuesday, February 9, 2016

Unboxing Packages: source_span

One of the best aspects of a package ecosystem is its ability to foster language-wide conventions without direct involvement from the SDK. Packages work together and build on one another to create something more than any of them could be on their own. The glue that holds this structure together is made of little packages that establish these conventions, that provide a shared language for the ecosystem as a whole to use.

One of these glue packages that I find most useful is source_span. This package began its life as part of the more specialized source_maps package, it quickly became clear that the idea of referring to chunks of code was much more broadly applicable. I factored it out, tweaked the API to make it more general, and over time added a few additional features. The result is the package I’ll talk about today.

The basic idea underlying source_span is that there should be a straightforward and consistent way for packages to refer to spans of text. This is useful for associating spans within a source map, but it’s also useful for emitting nice errors when parsing text. The den package uses them to edit YAML without disrupting the existing formatting.

The structure of the SourceSpan class is very straightforward. Here are the most important parts:

class SourceSpan {
  Uri get sourceUrl;
  SourceLocation get start;
  SourceLocation get end;
  String get text;
}

class SourceLocation {
  Uri get sourceUrl;
  int get offset;
  int get line;
  int get column;
}

A span knows where it starts, where it ends, and what text it covers; a location knows its offset in the text, as well as its line and column numbers (all of which are zero-based). Both of them know the URL of the file they refer to. That’s the heart of the package. Now you know. Article over. Good job, Natalie!

Just Kidding

Okay, there’s more to it than that. For starters, there are some handy methods on those classes. SourceSpan.length, for example, tells you how long a span is so you don’t have to do the math (or call .text.length) manually. SourceLocation.distance() tells you the number of characters between two locations. And both classes are Comparable—locations are ordered by their offsets, and spans are ordered by their start locations.

But my favorite method is SourceSpan.message(). This formats a message so that it’s clearly associated with the given span. It sounds simple, but it’s a big part of what makes the package so nice to use, since it provides such a nice user experience for so little effort. Here’s an example of what it looks like:

Error on line 24, column 15 of pubspec.yaml:
Invalid version constraint: Expected version number after "^" in "^1.2.".
  pub_semver: "^1.2."
              ^^^^^^^

Dart users might recognize this particular error, because it comes from pub, which was one of the first users of the source_span package. Here’s how that error was produced:

stderr.writeln("Error on ${span.message(error.message, color: true)}");

That’s all you need for beautiful messaging. They can be about errors or anything else associated with a particular span of a file.

Creating Spans

If you want, you can construct spans and locations by calling new SourceSpan() and new SourceLocation(), but when you’re parsing a bunch of text in a single file, that’s a lot of work (and memory-inefficient to boot). You’re much better off using a SourceFile to create them. You can create one by passing in the file text and URL to new SourceFile().

A SourceFile represents a single source file, and spits out spans and locations for that file. SourceFile.span() gets you a span for a chunk of text, and SourceFile.location() gets you a location at a specific offset. You can also get assorted metadata about parts of the file without creating extra objects by calling SourceFile.getLine(), SourceFile.getColumn(), SourceFile.getOffset(), and SourceFile.getText().

Implementation and Efficiency

A SourceFile does more than just fill in the blanks for the normal span and location constructors. Because it has access to the entire file, it’s able to do some clever tricks to make the spans more efficient, especially in terms of memory. And when you’re parsing a large file and creating a span for every node, memory usage can matter a lot.

First of all, a SourceFile’s spans don’t actually store SourceLocation objects. Instead, they store the start and end offsets as integers and create SourceLocations on the fly when the corresponding getters are called.

The locations’ line and column information is also determined on the fly. SourceFile keeps an internal list of the offsets for each newline character in the file, which is quite compact. But it also means that looking up the line or column for an offset requires a binary search, which is O(log(n))—not slow, but not super fast either. There’s some caching which mitigates this when looking up multiple nearby offsets, but it’s still important to keep the worst case in mind.

As a user, this all means that you should be careful when looking up spans’ locations, and especially when accessing those locations’ line and column numbers, in code that needs to be high-performance. You don’t need to avoid these calls entirely, but make sure you aren’t calling them dozens of times in a core loop.

SourceSpanWithContext

I admit it: I cheated a little bit when I was talking about SourceSpan.message() earlier. The error message I gave as an example contained extra text from the pubspec—the text “pub_semver:” wasn’t covered by the span, and thus a plain SourceSpan wouldn’t know to print it. It would have printed a less-useful message that just included "^1.2.".

However, often spans do have that extra context, and when they do it makes for a much better message. That’s what SourceSpanWithContext is for. It’s a SourceSpan with one extra field: SourceSpanWithContext.context, which returns the full line containing the span’s text. This is enough to produce the extra-helpful message I included above.

FileSpan

Because a SourceFile always has access to extra context information, all the spans it produces are SourceSpanWithContexts. But more than that, they’re FileSpans. This is a special kind of span that’s only generated by a SourceFile, and in fact has a getter for the original file. It also has a few superpowers of its own.

All spans have a SourceSpan.union(), which combines two spans into one. However, it requires the spans to be adjacent or overlapping; otherwise, there would be no way to determine the text of the resulting span.

But that’s no problem for FileSpans, because they know the full contents of the file. So they support FileSpan.expand() as well, which only takes another span from the same file and returns a new span that covers the entire distance between them. This is really useful when constructing spans for use in a source map.

Span Exceptions

One of the most important uses of spans is to indicate to the end user where an error was discovered, so it makes sense that there would be a standard way to attach them to exceptions. The source_span package provides two different exception types: SourceSpanException represents a general exception that has a span attached, and SourceSpanFormatException also implements FormatException since so much code that uses spans deals with parsing as well.

These exceptions’ APIs are pretty simple. They provide access to the span, of course, and their toString() methods use SourceSpan.message() to format the error message. You can even pass in a color parameter to print it with nice terminal colors on Linux and OS X.

Despite their simplicity, though, the exception classes are powerful because they allow unrelated packages to work together. Any package can throw SourceSpanExceptions as a matter of course, and any package can catch them and format them nicely.

The Spannotation Pattern

“Spannotation” is less of a well-known design pattern than something I named just now as I wrote this, but that doesn’t make it any less useful when dealing with parsed data. The idea is very straightforward: as you parse text into higher-level data structures, keep a span along to remind you of where the data came from.

As an example, let’s look at pub. It initially parses a pubspec into a class that stores the original fields as a YamlMap, which has spans attached to every node it contains:

class Pubspec {
  final YamlMap fields;

  Pubspec.parse(String contents, {sourceUrl})
      : fields = loadYamlNode(contents, sourceUrl: sourceUrl);
}

When a caller wants to know how the transformers are set up, they’re parsed from these fields into TransformerConfig objects:

class Pubspec {
  List<Set<TransformerConfig>> get transformers {
    return fields['transformers'].nodes.map((phase) {
      return phase.nodes.map((transformer) {
        var name = transformer.nodes.keys.single;
        var config = transformer.nodes.values.single;

        // Keep track of the transformer name's span.
        return new TransformerConfig(name.value, config, name.span);
      }).toSet();
    });
  }
}

TransformerConfig keeps track of the span that was parsed from the transformer’s name:

class TransformerConfig {
  final String name;
  final Map config;
  final SourceSpan span;

  TransformerConfig(this.name, this.config, this.span);
}

That way, when an error is detected relating to this transformer, we can attach the span to that error to make the error much nicer.

Future<Transformer> loadTransformer(TransformerConfig config) async {
  try {
    return new RemoteTransformer(spawnTransformerIsolate(config));
  } catch (error) {
    // Attach the configuration's span to the error so the user knows what
    // transformer caused it and where to configure it.
    throw new SourceSpanException(error.toString(), config.span);
  }
}

Spanning the Gap Between Packages

I love package ecosystems, and the source_span package embodies many of the aspects I love best. It’s pretty small and pretty simple, but it serves an important role. It provides a consistent way for packages to expose and interact with sections of source text, and it does so in a way that makes the end user’s life better as well.

Come back again in two weeks when I go over a package that, among other things, makes it really easy to create spans.