Thursday, March 31, 2016

The new AdWords UI uses Dart — we asked why

Google just announced a re-designed AdWords experience. In case you’re not familiar with AdWords: businesses use it to advertise on google.com and partner websites. Advertising makes up majority of Google’s revenue, so when Google decides to completely redo the customer-facing front end to it, it’s a big deal.

The Dart team is proud to say that this new front end is built with Dart and Angular 2. Whenever you asked us whether Google is ‘even using Dart for anything,’ this is what we had in mind but couldn’t say aloud. Until now.

'This is built with Dart' - and an arrow pointing to screenshots of the new AdWords UI.

We asked Joshy Joseph, the primary technical lead on the project, some questions. Joshy is focusing on things like infrastructure, application latency and development velocity, so he’s the right person to ask about Dart.

 

Q: What exactly did we launch on Monday?

It’s a complete redesign of the AdWords customer experience that is rolling out slowly as a test to a small initial set of advertisers. The most noticeable thing is probably the Material Design look and feel. But there are many underlying improvements to the user experience. Read the Inside AdWords post for more information.

Q: How big is the team building this and how big is the project?

Can’t give specific details but the team has dozens of engineers, and the codebase is in the hundreds of thousands of lines of code.

Q: What's the tech stack?

The new AdWords advertiser UI is built as a collection of large single page applications integrated together in the browser. The browser side stack is based on Dart and Angular2 for Dart. We have some infrastructure built on top of these that we share with many other apps at Google.

Q: How is that different from what the stack your team used previously (for the current version of AdWords)?

The present version of AdWords uses a stack based on GWT. The core foundation of the stack is from about eight years ago.

Q: Why even change the tech stack? Didn't it work?

GWT actually worked very well for us. Eight years ago, different browsers worked very differently. We needed to support IE6 for instance, and that was a big challenge because we were building one of the largest single page applications at the time. Chrome hadn’t even launched yet. GWT gave us an abstraction layer across the different browsers. It also gave the language and tools support to scale to very large codebases and teams.

On the other hand, many things have changed over the last several years with respect to browser technology and UI infrastructure. Browsers have become a lot more consistent in implementing standard APIs. Also, new UI development frameworks like Angular have become very popular. There are new emerging standards like web components that we would like to take advantage of. We also have new languages like Dart that are specifically designed to transpile well to Javascript.

So a couple of years ago, when we were starting to think about building a new version of AdWords with much improved UI and performance, we took a fresh look at the tech stack. We wanted to use the opportunity to upgrade to a modern infrastructure that will serve us well for the next 7+ years.

Q: Can you describe the decision process that had led to using Dart?

Since updating the technology stack that is used for the entire AdWords UI is a huge deal, this has been a multi-year process where we first implemented portions of current AdWords using Dart and Angular. Using this stack, we successfully built a very large internal application before deciding to use it for the new AdWords UI.

Q: Why not Closure lib? GWT? TypeScript? Vanilla JS?

Those are all options we considered very carefully since — given the scale of AdWords — we can’t switch tech stacks easily. We wanted to use a stack that will enable building very large mission critical applications such as AdWords with very good user experience, application latency and feature velocity.

We wanted to provide a lot of flexibility to our UX designers to innovate and build a visually appealing and productive UI.

We also wanted to have world class application latency. A lot of people stay logged into AdWords all day working with large amounts of data. So, having a very fast application is critical.

At the same time, AdWords team as whole is constantly innovating and launching several new features every week. We wanted to not only maintain that velocity but make it even better.

Meeting all these very high bars for user experience, latency and feature velocity at the same time in a very large mission critical application is very hard. We thought Dart and Angular together was a good foundation to build the additional infrastructure we wanted to build for achieving all these goals.

Q: Were your assumptions about switching to Dart proved correct? Was something better or worse than you expected?

Being able to scale to large teams and code bases have been true. Also, having Angular available for Dart has been great. We are also doing fairly well on being on track to achieve our UX, latency and feature velocity goals. These are all areas we continue to work hard on making things even better over the coming months.

One thing we found out was that developers preferred even stronger type checking than what Dart was providing. So, we are very excited about the work on Dart Strong Mode. We are also looking forward to cross-browser, fast edit refresh with the upcoming Dart Dev Compiler. The new js interop is also a big improvement. We also realized that we can’t really use mirrors in production apps and have been avoiding doing that.

Q: How did developers on the team react to the decision to use Dart?

The decision was primarily driven by the engineers on the team. We actually had a very formal evaluation at some point after we had some experience with the stack and a small team of senior engineers considered various options across more than 50 key requirements and unanimously recommended Dart.

Q: How long did they take to learn the new language?

It is typically in the order of a couple of weeks. Since Dart looks familiar to many of the popular languages, developers tend to pick it up pretty fast.

Q: Is it hard to get Dart developers?

In general, staffing up the team hasn’t been a problem. Most engineers join AdWords because they love working on a high impact product, working with large data sets and solving complex problems. Having a state of the art tech stack for building large scale business apps helps as well.

There are some engineers though who have a strong preference for a language and make project decisions based on language. So, we have actually seen some engineers joining the team because they love Dart. There are also probably a few engineers who went to other projects because they loved the language that project was using.

Q: What is your favorite feature of Dart?

I really like how Dart is terse. Angular is also terse. So, with the help of these and other infrastructure we built on top of these, our hope is that the new AdWords UI can be implemented with less than 50% of the lines of code as the previous version while having a similar or larger set of features.

Q: What's missing with Dart?

I think the cross-browser story for fast edit refresh is the biggest gap. It is being addressed with the upcoming Dart Dev Compiler though.

Q: Is there a type of project, software or team to which you'd especially recommend Dart?

With Angular2, the application performance and code size are very similar between the JS and Dart versions. So any team considering Angular2 should consider Dart as well.

I think Dart and Angular2/Dart are especially suitable for large scale business web applications. The tooling support and static correctness checking are particularly valuable for those types of applications.

With the ongoing work on Flutter, Dart could become a good option to consider also for teams that need to build native mobile apps across Android and iOS.


Interested in Dart? Awesome. You can learn about the ideas behind Dart, or you can get your feet wet directly.

Wednesday, March 23, 2016

Unboxing Packages: async Part 2

Two weeks ago, I introduced you to some of the marvels available for use in the async package. But that package is so big—there are so many marvels—that I couldn’t fit them all in. In fact, I only ended up writing about APIs that deal with futures and the individual values they represent.

This week I’ll focus on the other major asynchronous type: Stream. I often find it useful to think of asynchronous types as analogous to synchronous ones, and by that metaphor if futures are individual values then streams are Iterables. But in addition to representing asynchronously-computed collections of data, streams can represent things like UI events or communication (for example over a WebSocket).

Dart’s Stream class is very powerful. In addition to dispatching events, it allows the user to pause or cancel their subscription to the stream. What’s more, the creator of the stream can be notified of pauses or cancellations (or the initial subscriptions) and take appropriate action—like closing an HTTP connection when its data stream is canceled. Building this logic into the core types makes it easy to do the right thing with streams without the author having to think about it at all.

For example, Stream.first automatically cancels its subscription once the first event arrives. So if you call WebSocket.first, it’ll close the underlying connection once you have the event you need.

The other side of this power, though, is that you need to take extra care when writing code that manipulates streams to ensure that you handle pausing and canceling correctly. If your stream transformer doesn’t forward cancellations properly, you might end up with dangling WebSocket connections, which is bad news all around.

That’s why async’s stream utilities are so useful. Not only do they help you manipulate your streams, they make sure all your listens, pauses, and cancellations are handled exactly right. I call this being cancel-correct. This gives you, the developer, freedom to hook up components as you need without worrying about any extra complexity.

LazyStream

I mentioned briefly that stream creators can be notified when an initial stream subscription is created. You can pass an onListen argument to new StreamController() to only start emitting events once a listener exists. And usually that’s enough—but sometimes you aren’t the one producing the original events, and you don’t want to have to manually forward them to the controller. That’s what LazyStream is for.

When you call new LazyStream(), you pass in a callback that returns a Stream (or a Future<Stream>). This callback will only be invoked once the LazyStream.listen is called, and its events will automatically be piped to the listen handlers. Most of the time it won’t even create an intermediate StreamController, so it’s extra efficient, too.

/// Returns the contents of the getting [url].
///
/// Only starts the request when the returned stream is listened to.
Stream<List<int>> httpStream(Uri url) {
  return new LazyStream<List<int>>(() async {
    var client = new HttpClient();
    var request = await client.getUrl(Uri.parse(url));
    return await request.close();
  });
}

StreamCompleter

If you’ve done much asynchronous programming in Dart, you’re doubtless familiar with the Completer class. It lets you return a future immediately but only fill in its result later on when whatever asynchronous work you’re doing is finished. Well, a StreamCompleter is pretty much the same thing for streams.

StreamCompleter.stream returns a stream immediately, just like Completer.future. But this stream doesn’t emit any events until you call setSourceStream(), which—like Completer.complete()—provides the concrete value. All events from the source stream are forwarded to the output stream, and of course pauses and cancellations are passed back to the source.

What if you fail to get the source stream? Call setError() instead! Like Completer.completeError, this indicates that the stream completion was unsuccessful. In a more practical sense, it makes the output stream emit the error and then close immediately.

The whole process of getting a stream’s value asynchronously sounds a lot like a Future<Stream>. There are even a few places, like WebSocket.connect(), which return such futures in practice. That’s why the StreamCompleter.fromFuture() static utility function exists. It just takes a Future<Stream> and returns a Stream:

Stream fromFuture(Future<Stream> streamFuture) {
  var completer = new StreamCompleter();
  streamFuture.then(completer.setSourceStream,
      onError: completer.setError);
  return completer.stream;
}

In some cases—especially when you want to use async/await—it’s way easier to just use fromFuture() than to manually deal with a StreamCompleter yourself.

StreamGroup

I think of StreamGroup kind of like a funnel: it takes a bunch of different input streams and merges them all into a single output stream. Like FutureGroup, which I wrote about last time, it implements Sink. This means that you call add() to add new streams to the group, and close() once you’ve added all the streams that need adding.

The events from the input streams are piped into the output stream, which is accessible by calling stream. This stream will only close once all the input streams close and close() is called, because otherwise more streams may still be added later on. And of course, it’s cancel-correct: it only listens to the source streams once the output stream has a listener, and if the output stream is paused or canceled then all the source streams are as well.

class Phase {
  final _transformers = <Transformer, TranformerRunner>{};

  /// A broadcast stream of all log events from all transformers in this phase.
  Stream<LogEvent> get onLog => _onLogGroup.stream;
  final _onLogGroup = new StreamGroup<LogEvent>.broadcast();

  void addTransformer(Transformer transformer) {
    var runner = new TransformerRunner(transformer);
    _transformers[transformer] = runner;
    _onLogGroup.add(runner.onLog);
  }
}

If you don’t care about adding streams later on, you can also use the StreamGroup.merge() static utility function. This just takes an Iterable<Stream> and merges all the streams immediately. It’s the same as adding them all to a StreamGroup and closing it, but it’s a lot cleaner than doing that manually. That’s why we call it a utility function!

StreamSplitter

If StreamGroup is a funnel, then StreamSplitter is a sprinkler: it takes a single source stream and splits it into a bunch of identical copies. Each copy independently emits the same events as the source stream, with its own buffering and everything. There’s no way at all for actions on one copy to affect any others.

A splitter is created by passing the source stream to new StreamSplitter(), and copies are created by calling split(). Once you don’t need any more copies, call close() to let the splitter know.

Closing the splitter lets it maintain cancel-correctness. Before it’s closed, it can never safely cancel or pause the source stream, since a new copy could be created that might need additional events. But once it’s closed, the source stream can be canceled as soon as all of the copies are canceled.

Like StreamGroup.merge(), there’s a handy utility method for splitting streams: StreamSplitter.splitFrom(). This takes a source stream and splits it into a set number of copies—two by default, but it can be any number you want. I use it pretty frequently when I’m debugging to see what a stream is emitting without affecting its normal usage.

SubscriptionStream

Some APIs are made for end-users to use directly in their applications, and some are made for implementing other APIs. The SubscriptionStream class falls squarely into the latter category. It’s used internally in the async package itself, and it’s good to know about in case you ever need to implement some asynchronous utilities yourself.

When you call Stream.listen() to subscribe to a stream, you get a StreamSubscription back. This is usually just used to pause and cancel events, but it can also be used to replace the event handlers by calling methods like onData()1. SubscriptionStream takes advantage of this capability to convert a subscription into a brand new stream that can itself be listened to.

When you pass a subscription into new SubscriptionStream(), its old event handlers are removed and it’s paused. This means any additional events are buffered until the SubscriptionStream.listen() is called. Once it is, it sets the new handlers on the original subscription and just returns that. Most of the time it doesn’t even create any extra intermediate objects!

/// Calls [onFirst] with the first event emitted by [source].
///
/// Returns a stream that emits all events in [source] after the first.
Stream firstAndRest(Stream source, void onFirst(value)) {
  var completer = new StreamCompleter();
  var subscription;
  subscription = source.listen((event) {
    onFirst(event);
    completer.setSourceStream(new SubscriptionStream(subscription));
  });
  return completer.stream;
}

await last

This about covers the async package’s stream APIs, but there are other APIs still to come. The package has a lot of cool stuff! Join me again in two weeks when I cover everything else, sinks and queues and timers and all.


  1. This method should really be called setOnData(), or else should be a setter with a corresponding getter, but it’s hard to change APIs in the core libraries.

Wednesday, March 9, 2016

Dart 1.15: Updated Dartium and improved live code analysis

Dart 1.15 is now available. This release includes important updates to our tools.

Updated Dartium version


Dartium has been updated from Chrome 39 to Chrome 45. While the underlying browser has been changed, there have been no changes in this release to the corresponding APIs in dart:html, dart:svg, etc. We will roll out API updates in a future release to maximize stability and ease migration.

Improved live code analysis


The Dart analyzer service, used by WebStorm, IntelliJ, and Atom has been hardened in 1.15. Using error reports from Dart users within Google, we've been able to eliminate many common sources of instability. You should notice a much more reliable experience within your favorite Dart IDE.

Release cadence changes


Historically the Dart SDK has released a new stable release roughly every 2-3 months. The release duration has varied from release to release. Going forward we are going to attempt to ship on a more predictable schedule with a new release of the SDK every six weeks. A release candidate will be made available roughly a week before the final release.

And more...


The SDK changelog has details about all of the updates in Dart 1.15 SDK. Get it now.

Tuesday, March 8, 2016

Unboxing Packages: async Part 1

Writing asynchronous code is hard. Even with Dart’s lovely async/await syntax, asynchrony inherently involves code running in a nondeterministic order, intermingled in a way that’s often difficult to understand and debug. Because there are so many possible ways to execute a program, and which one happens is so contingent on little details of timing, it’s not even possible to test asynchronous code in the same exhaustive way you might test something synchronous.

This is why it’s so important to have abstractions and utilities that are simple and robust and can be used as building blocks for more complex programs. The core libraries provide the most fundamental abstractions: Future for an asynchronous value, and Stream for an asynchronous collection or signal. But it’s up to packages to build the next layer on top of those fundamental primitives.

The async package is where the next-most-basic abstractions live. It’s a core library expansion pack that contains APIs that are just a little more advanced than would fit in dart:async itself. The classes it exposes are well-tested and as straightforward as you can hope from asynchrony, and using them properly can make your code vastly clearer and more reliable.

The async package is just chock-full of cool stuff. So full, in fact, that it won’t all fit in a single blog post—I have to split it up. Today I’ll mostly talk about APIs that deal with individual values. I’ll save Streams for the next post, and I may even need a third to cover the whole package.

AsyncMemoizer

If you’re writing synchronous code and you want to provide access to a value in a class, you just define it as a field. If that value should only be computed when it’s accessed, you can memoize it by writing a getter that sets a field the first time it’s accessed.

String get contents {
  if (_contents == null) _contents = new File(this.path).readAsStringSync();
  return _contents;
}
String _contents;

But what if the value can only be computed asynchronously? You still want to compute it lazily, but you don’t want to have every access recompute it from scratch. Once the computation has started, future accesses should return the same future. You can implement this manually, but it’s a pain; AsyncMemoizer makes it easy:

Future<String> get contents => _contentsMemo.runOnce(() {
  return new File(this.path).readAsString();
});
final _contentsMemo = new AsyncMemoizer<String>();

The first time you call runOnce() on a given memoizer, it invokes the callback and returns its return value as a Future. After that, all calls to runOnce() don’t use the callback at all; they just return the same value. Make sure you always pass the same callback—otherwise you may not be able to tell which one will run!

Some readers may be wondering why the callback isn’t passed to the constructor. After all, it’s only invoked once—future calls to runOnce() just throw it away! The answer is that we want code using AsyncMemoizer to be able to look like the snippet above, with the code the getter executes right there in the body of the getter. We also want the memoizer itself to be usable as a final variable, which means that its constructor couldn’t refer to other fields in the class (like this.path).

Another common use of AsyncMemoizer doesn’t even use the return value of the callback. It just uses the memoizer to ensure that a method’s body is only executed once, and that it always returns the same future. Which, it turns out, is exactly how close() methods work for a lot of classes.

bool get isClosed => _closeMemo.hasRun;

Future close() => _closeMemo.runOnce(() async {
  await _engine.close();
  await _loader.close();
});
final _closeMemo = new AsyncMemoizer();

Notice the isClosed getter in that example. AsyncMemoizer exposes a hasRun property specifically to make that sort of getter possible: hasRun returns true if runOnce() has been called, regardless of whether the callback has completed. It also has a future property, which returns the same future as runOnce() without actually running the memoizer.

Result

When working asynchronously, values and errors are often treated as two sides of the same coin. A future completes with either a value or an error, and a stream emits value events and error events. But in the synchronous world, errors are completely different than values, and that can cause friction when moving between synchronous and asynchronous code.

That’s what the Result class is for. Each Result is either a value or an error, and whichever it is is accessible synchronously. It has two subclasses, one for each state: ValueResult has a value getter, and ErrorResult has a error and stackTrace getters. If you have a Result, you can use isValue and isError to easily check its type, followed by asValue and asError to easily cast it to the proper type.

You can create Results manually using new Result.value() or new Result.error(), but there are utility functions to convert from asynchronous objects: Result.capture() turns a Future into a Future<Result>, and Result.captureStream() turns a Stream into a Stream<Result>. Errors that would have been emitted using the normal future or stream error channels are turned into ErrorResults instead.

You can also reverse this process using Result.release() and Result.releaseStream(). These take a Future<Result> and a Stream<Result>, respectively, and convert ErrorResults to normal error events.

Result has some instance methods for moving back to the async world too. The asFuture getter returns a future that completes to the Result’s value or error, and complete() completes a Completer so that its future does the same. For streams, you can use addTo() to add the value or the error to an EventSink.

ResultFuture

Sometimes you want limited synchronous access to a future. Maybe you want to do something with its value if it exists, but not wait for it if it doesn’t. The ResultFuture class makes this possible by exposing a result getter. Before the future has completed, this is just null; afterwards, it’s a Result. Otherwise, the ResultFuture is just a normal future that works like futures work.

// A Shelf handler that forwards requests to a `Future<Handler>`.
class AsyncHandler {
  final ResultFuture<shelf.Handler> _future;

  AsyncHandler(Future<shelf.Handler> future) : _future = new ResultFuture(future);

  call(shelf.Request request) {
    if (_future.result == null) {
      return _future.then((handler) => handler(request));
    }

    // Because [_future]'s a [Future], we can return it to throw error.
    if (_future.result.isError) return _future;

    return _future.result.asValue.value(request);
  }
}

CancelableOperation

One cool feature of streams in Dart is that their subscriptions can be cancelled. In addition to stopping any more events callbacks from firing, this indicates to the stream producer that it can stop generating events at all. Unfortunately, there’s no similar facility for futures, which is where CancelableOperation comes in.

A CancelableOperation represents an asynchronous operation that will ultimately produce a single value which is exposed as a future (and which may complete to null). It can also be canceled, which causes the value future never to complete and lets the code that created it know to stop work on the operation.

Normally when a CancelableOperation is canceled, it just doesn’t complete—which is analogous to a stream subscription not emitting any events once it’s canceled. But sometimes, especially when using async/await, this isn’t what you want. In that case, you can call valueOrCancellation(), which returns a future that completes even if the operation was canceled. By default it completes to null, but you can pass in a custom value if you want.

There are two ways to create a CancelableOperation. If you already have a value future and you just want to wrap it, you can call new CancelableOperation.fromFuture(). This also takes an onCancel callback that is called if the operation is canceled.

CancelableOperation runSuite(String path) {
  var suite;
  var canceled = false;
  return new CancelableOperation(() async {
    suite = await loadSuite(path);
    if (canceled) return null;

    return suite.run();
  }(), onCancel: () {
    canceled = true;
    return suite?.close();
  });
}

If the onCancel callback returns a Future, it will be forwarded to the return value for the call to cancel(). This is just like how cancelling a StreamSubscription works, except that for consistency CancelableOperation.cancel() never returns null.

You can also create a CancelableOperation using a CancelableCompleter. This works a lot like a Completer for a future: it has complete() and completeError() methods, and it exposes the operation it controls through the operation getter. But its constructor takes an onCancel callback that’s called if the operation is canceled, and it has an isCanceled getter.

/// Like [Stream.first], but cancelable.
CancelableOperation cancelableFirst(Stream stream) {
  var subscription;
  var completer = new CancelableCompleter(
    onCancel: () => subscription.cancel());

  subscription = stream.listen((value) {
    completer.complete(value);
    subscription.cancel();
  }, onError: (error, stackTrace) {
    completer.completeError(error, stackTrace);
    subscription.cancel();
  });

  return completer.operation;
}

FutureGroup

Fun fact: at least three different versions of the FutureGroup class existed across the Dart world before a canonical implementation finally ended up in the async package. That’s a pretty good indication that it’s a broadly-applicable abstraction!

A FutureGroup collects a bunch of input futures and exposes a single output future that completes when all the inputs have completed. Inputs are added using add(), and the output is called future.

Once you’ve added all the futures you want to the group, call close() to tell it that no more are coming. Some astute students of the core libraries will recognize the pattern of add() followed by close() as the hallmark of a sink—and indeed, FutureGroup implements Sink.

// An engine for running tests.
class Engine {
  final _group = new FutureGroup();

  // Completes when all tests are done.
  Future get onDone => _group.future;

  void addTest(Test test) {
    _group.add(test.run());
  }

  void noMoreTests() {
    _group.close();
  }
}

If all the futures that have been added to a FutureGroup have completed but it hasn’t been closed, we say that the group is idle. You can tell whether a group is idle using the isIdle getter. You can also use the onIdle stream to get an event whenever the group becomes idle—that is, whenever the last running future completes.

!isCompleted

I hope I’ve whet your appetite for asynchrony, because there’s plenty more to come. Working with individual values is useful, but the bulk of the package—and, in my opinion, some of the coolest stuff it contains—has to do with streams. Check back in two weeks, when I tell you all about it!

Tuesday, March 1, 2016

Unboxing Packages: string_scanner

Parsing is tricky. There’s the high-level question of what parsing system you use—a parser generator? A handwritten recursive descent? Parser combinators?—but there’s also the issue of how you move through text and track the information you need to track. That’s what the string_scanner package is for.

The Core API

The package exposes one main class, StringScanner, as well as a few subclasses we’ll get to later. This scanner moves through a string, consuming text that matches patterns and giving the user information about it. Here are the most important parts of the API:

class StringScanner {
  Match get lastMatch;
  int position;

  StringScanner(String string);

  bool scan(Pattern pattern);
  bool matches(Pattern pattern);
  void expect(Pattern pattern, {String name});
}

Let’s walk through this, starting with position. This returns the scanners zero-based character position in the source string. The scanner always matches immediately after this position, and it updates the position as it consumes text. You can also set the position if you need to jump back (or forward) while parsing.

The most important method is scan(); after all, it’s right there in the name of the class. It takes a Pattern, which in practice means a String or a RegExp, and tries to consume text that matches that pattern. If the text immediately after current position matches, it returns true; otherwise, it returns false.

The scanner is for parsing, though, not just validating that a string is in the right format. You need to be able to get information out of it. That’s what the lastMatch getter is for. If the last call to scan() (or matches() or expect()) was successful, it contains the data for that match. This, along with the current position, are the only state that the scanner maintains.

The matches() method is just like scan(), except that it doesn’t consume any text. It just returns whether the pattern matches without changing the position at all. This is sometimes used to detect boundaries between different constructs.

Finally, expect() works like scan() but requires that the pattern matches. If it doesn’t, it throws a StringScannerException which indicates that parsing failed. By default this exception includes the pattern that was being used, but the name parameter can be used to provide a more human-readable name for the missing token.

Here’s a simple number parser using these methods:

import 'dart:math' as math;

import 'package:string_scanner/string_scanner.dart';

// It's often a good idea to store regular expressions in variables so they only
// need to be compiled once.
final _digits = new RegExp("[0-9]+");

num parseNum(String text) {
  var scanner = new StringScanner(text);

  // Don't require a whole component so that ".123" works.
  var whole = 0;
  if (scanner.scan(_digits)) {
    whole = int.parse(scanner.lastMatch[0]);
  }

  // If there's no dot, exit immediately.
  if (!scanner.scan(".")) {
    // I'll get to this later on.
    scanner.expectDone();
    return whole;
  }

  // If there is a dot, there must be trailing digits.
  scanner.expect(_digits, name: "0 through 9");
  var decimal = scanner.lastMatch[0];
  var result = whole + int.parse(decimal) * math.pow(10, -decimal.length);

  scanner.expectDone();
  return result;
}

Character-Based Parsing

Some things you want to parse don’t work all that well with strings or regular expressions. You might need to make decisions based on every character individually. To make this possible, the scanner also supports going character-by-character. Like Dart’s strings themselves, it works in terms of UTF-16 code units.1

The readChar() method consumes and returns a single character. It moves the current position forward by one. It returns the characters as integers, which is generally more efficient than single-character strings. I recommend using the constants defined in the charcode package when dealing with these integers.

The peekChar() function is also useful when doing character-based parsing. peekChar() is to readChar() as matches() is to scan(). It returns the same information—the next character—but it doesn’t move the position at all. In addition, peekChar() takes an optional offset parameter that allows you to peek at a character after (or even before) the current position.

These two functions do different things if the scanner is at the end of the text. readChar() will throw an error, complaining that it expected more input. peekChar(), on the other hand, will just return null.

Here’s a character-based integer parser:

import 'package:charcode/ascii.dart';

import 'package:string_scanner/string_scanner.dart';

num parseInt(String text) {
  var scanner = new StringScanner(text);
  var value = 0;

  // The isDone getter indicates whether the scanner is at the end of the text.
  while (!scanner.isDone) {
    var char = scanner.readChar();

    // The $0 and $9 constants are defined in the charcode package.
    if (char < $0 || char > $9) {
      // I'll talk about error() in the next section.
      scanner.error("Invalid character.", position: scanner.position - 1);
    }

    value *= 10;
    value += char - $0;
  }

  return value;
}

Emitting Errors

I’ve already talked about expect(), which is the most common way to emit errors when using a scanner. But it’s not the only way. What if your character-based parser runs into an error? What if you only discover that a token is invalid after you’ve already parsed it? What if the source has extra junk at the end after you’ve finished parsing?

The most flexible error-emitting method is error(). It can emit an error associated with any chunk of the text at all—all you need to do is pass in position and length parameters. Most of the time you just need the match parameter, which takes a Match object that was returned by lastMatch earlier on. Or you can choose not to pass in any location information at all, and it will default to associating the error with lastMatch (or the current position if lastMatch is null).

If you’re wondering what it means for an error to be “associated with” a match, I encourage you to read my article about source_span. StringScannerException inherits from SourceSpanFormatException, so it uses a source span to indicate where the error occurred in the text.

Source spans can have sourceUrls associated with them, which help the user know where the errors were caused. If you want your parse errors to have file information, you can pass in an optional sourceUrl parameter (which can be a String or a Uri) to new StringScanner().

There’s one other error-emitting function, called expectDone(). It’s pretty single-purpose, but it’s useful all the same. If the scanner isn’t at the end of the text, it emits an error; otherwise, it does nothing. This is really useful when you’re parsing a single value and you want to ensure there’s no extra characters in the string after it.

Tracking Line Information

Some scanners, like the ones in the examples above, are just used for parsing small chunks of text without any larger context. But others involve parsing an entire file, in which case line and column numbers are often very important. The default StringScanner doesn’t track this information, but it has a LineScanner subclass that does.

LineScanner defines line and column getters that return the (zero-based) line and column at the current position. These fields are kept up-to-date as the scanner consumes additional text, so that accessing them is always fast.

There’s also a special way of changing position with a LineScanner. You can use the position setter like you would with a plain StringScanner, but it’s less efficient since it needs to calculate the new line and column. If you know in advance that you might need to jump back to the current position, you should use the state property instead.

This property returns a LineScannerState object, which stores the line, column, and position information for the scanner. Once you have this object, you can pass it back to state= to restore that exact state in the line scanner much more efficiently than setting the position.

Scanning With Source Spans

There’s one more subclass I want to talk about, and it’s the one that provides the most value. It’s SpanScanner, which inherits from LineScanner and provides access to source spans (specifically FileSpans) for the text being parsed.

The most commonly-used addition in SpanScanner is the lastSpan getter. It returns a span covering the same chunk of text as lastMatch, and it’s updated or set to null under the same circumstances.

The emptySpan getter returns a span that covers no text. An empty span is useful for pointing to a specific locations in the source. This one always refers to the scanner’s current position.

Finally, there’s spanFrom, the most powerful SpanScanner addition. It returns a span between two arbitrary points in the text, represented as LineScannerStates (which, if you recall, are returned by the state getter). The second state defaults to the current position, which makes it easy to get a span that covers a given chunk of scanning code.

Lazy or Eager?

For the advanced user, there’s a choice to be made when using a SpanScanner. The choice involves the implementation of the line and column getters: they can either be derived lazily from the current position, or computed eagerly as the scanner consumes text.

Internally, a SpanScanner uses a SourceFile to generate its spans. By default, it also uses this file to get the line and column information for the current position, which means that—as I discussed in my last article—accessing these is not as efficient as possible. The trade-off is that scanning is faster because it doesn’t need to keep this information up-to-date.

If that’s not the trade-off you want to make for your code, though, you can use new SpanScanner.eager() instead of the default constructor. This returns a scanner that works like LineScanner: it updates the line and column as it goes, so accessing them is very fast.

Which implementation is faster very much depends on the specifics of your use-case. Most of the time it won’t matter at all, but if parsing is a bottleneck for you, you should use Dart’s excellent profiler to see whether eager parsing makes a difference.

Parse the World

I love writing parsers. I’ve written parsers for existing formats, and I’ve written parsers for entirely new ones. The ability to write robust parsers with good APIs is an important tool in a programmer’s belt, and the string_scanner package makes it easy to get started. It makes it easy to write small parsers for small formats like the examples I’ve included here, and it makes it easy to expand those to large-scale parsers of complex formats like YAML. Next time you need to parse some text, you’ll know what to use.

Join me next week when I talk about how to bring structure to your asynchronous code.


  1. Note that some Unicode characters appear as two code points, called surrogate pairs. This is an issue inherited from Dart’s string implementation, which in turn inherited it from JavaScript.