Monday, July 9, 2012

Draft spec changes for Metadata in Dart language

Posted by Gilad Bracha


Metadata

This document fleshes out the specification for the metadata proposal given by Peter.

Spec Changes


Metadata gets its own section that describes what a valid annotation is either an identifier referring to a constant variable or a valid call to a constant constructor. Some discussion of reflective access. The rest of the changes are modifications to the grammar in the various places metadata can appear, and a change to the syntax of raw strings (which until now relied on the @ sign).

We might also want to add metadata to libraries, parts or imports, but we’ll hold off until the planned syntax revisions for those things are finalized.

Should we allow metadata to appear everywhere, like a comment? I think so. It would apply to the AST of the following expression, statement or declaration (this needs to be well-defined). For example, it has been suggested that attaching annotations to string literals is useful for internationalization. Or should that be a special case?

Metadata


Dart supports metadata which is used to attach user defined annotations to program structures.  

metadata:      ('@' qualified (‘.’ identifier)? (arguments)?)*
   ;


Metadata consists of  series of annotations, each of which begin with the character @, followed by either a reference to compile-time constant variable, or a call to constant constructor.

Metadata is associated with abstract syntax tree of the program construct p that immediately follows the metadata, assuming p is not itself metadata or a comment . Metadata can be retrieved at runtime via a reflective call, provided the annotated program construct p is accessible via reflection.

Obviously, metadata can also be retrieved statically by parsing the program and evaluating the constants via a suitable interpreter. In fact many if not most uses of metadata are entirely static.

It is important that no runtime overhead be incurred by the introduction of metadata that is not actually used. Because metadata only involves constants, the time at which it is computed is irrelevant so that implementations may skip the metadata during ordinary parsing and execution and evaluate it lazily.

It is possible to associate metadata with constructs that may not be accessible via reflection, such as local variables (though it is conceivable that in the future, richer reflective libraries might provide access to these as well).  This is not as useless as it might seem. As noted above, the data can be retrieved statically if source code is available.

Metadata can appear before a class, typedef, type variable, constructor, factory, function, field, parameter, or variable declaration.

5. Variables

Variables are storage locations in memory.  

variableDeclaration:      declaredIdentifier (',' identifier)*
   ;
initializedVariableDeclaration:      declaredIdentifier ('=' expression)? (',' initializedIdentifier)*
   ;
initializedIdentifierList:      initializedIdentifier (',' initializedIdentifier)*
   ;
initializedIdentifier:      identifier ('=' expression)?
   ;


declaredIdentifier:      metadata  finalConstVarOrType identifier    ;


finalConstVarOrType:      final type?
  | const type?
   |
var    | type    ;



6. Functions


Functions abstract over executable actions.

functionSignature:     metadata  returnType? identifier formalParameterList    ;
returnType:      void    | type
  ;

functionBody:      '=>' expression ';'
   |
block    ;

block:      '{' statements '}'
   ;



6.2.1 Positional Formals

A positional formal parameter is a simple variable declaration.

normalFormalParameter:      functionSignature
   |
fieldFormalParameter    | simpleFormalParameter    ;

simpleFormalParameter:      declaredIdentifier    | metadata identifier    ;
fieldFormalParameter:   metadata finalConstVarOrType? this '.' identifier   ;



7. Classes

A class defines the form and behavior of a set of objects which are its instances.

classDefinition:       metadata  abstract? class identifier typeParameters? superclass? interfaces?
     '{'
classMemberDefinition* '}'
   ;
classMemberDefinition:     metadata declaration ';'
   |
metadata methodSignature functionBody    ;

methodSignature:      factoryConstructorSignature    | static? functionSignature    | getterSignature
  | setterSignature
  | operatorSignature    | constructorSignature initializers?
   ;
declaration:      constantConstructorSignature (redirection | initializers)?
   |
constructorSignature (redirection | initializers)?
   |
getterSignature
  | setterSignature
  | operatorSignature    | functionSignature    | static (final | const) type? staticFinalDeclarationList
  | const type? staticFinalDeclarationList  
  | final type? initializedIdentifierList    | static? (var | type) initializedIdentifierList    ;
staticFinalDeclarationList:    : staticFinalDeclaration (',' staticFinalDeclaration)*
   ;
staticFinalDeclaration:      identifier '=' expression    ;

10.5 Strings


A string is a sequence of valid unicode code points.

stringLiteral:      MULTI_LINE_STRING+
   | SINGLE_LINE_STRING+
   ;


A string can be either a single line string or a multiline string.

SINGLE_LINE_STRING:      '  '' ' STRING_CONTENT_DQ* ' " '
   | ' ' ' STRING_CONTENT_SQ* ' ' '
   | '
r' ' ' ' (~( ' ' ' | NEWLINE ))* ' ' '
   | '
r' ' " ' (~( ' " ' | NEWLINE ))* ' " '
   ;

A single line string is delimited by either matching single quotes or matching double quotes.

Hence, ‘abc’ and “abc” are both legal strings, as are ‘He said “To be or not to be” did he not?’  and “He said ‘To be or not to be’ didn’t he?”. However “This ‘ is not a valid string, nor is ‘this”.

The grammar ensures that a single line string cannot span more than one line of source code, unless it includes an interpolated expression that spans multiple lines.

Adjacent single line strings are implicitly concatenated to form a single string literal.
Here is an example

print("A string" "and then another"); // prints: A stringand then another


Early versions of Dart used the operator + for string concatenation. However, this was  dropped, as it leads to puzzlers such as

print("A simple sum: 2 + 2 = " +
          2 + 2);

which this prints  'A simple sum: 2 + 2 = 22' rather than 'A simple sum: 2 + 2 = 4'.
Instead, the recommended Dart idiom is to use string interpolation.

print("A simple sum: 2 + 2 =  ${2+2}");

String interpolation work well for most cases. The main situation where it is not fully satisfactory is for string literals that are too large to fit on a line. Multiline strings can be useful, but in some cases, we want to visually align the code. This can be expressed by writing smaller strings separated by whitespace, as shown here:

'Imagine this is a very long string that does not fit on a line. What shall we do? '
'Oh what shall we do? '
'We shall split it into pieces '
'like so'


MULTI_LINE_STRING:     '"""'  STRING_CONTENT_TDQ* '"""'
   | '
'''' STRING_CONTENT_TSQ* '''''
  | ‘r’ '"""'  (~("""))* '"""'
  | ‘r’ ''''' (~('''))* '''''
   ;


ESCAPE_SEQUENCE:
   ‘\n’
 | ‘\r’
 | ‘\f’
 | ‘\b’
| ‘\t’
| ‘\v’
| “\x’ HEX_DIGIT HEX_DIGIT
| ‘\u’ HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
| ‘\u{‘ HEX_DIGIT_SEQUENCE ‘}’
:

HEX_DIGIT_SEQUENCE:
   HEX_DIGIT HEX_DIGIT? HEX_DIGIT? HEX_DIGIT? HEX_DIGIT? HEX_DIGIT?
  ;


Multiline strings are delimited by either matching triples of single quotes or matching triples of double quotes. If the first line of a multiline string consists solely of whitespace characters then that line is ignored, including the new line at its end.

Strings support escape sequences for special characters. The escapes are:
  • \n for newline, equivalent to \x0A.
  • \r for carriage return, equivalent to \x0D.
  • \f for form feed, equivalent to \x0C.
  • \b for backspace, equivalent to \x08.
  • \t for tab, equivalent to \x09.
  • \v for vertical tab, equivalent to \x0B.
  • \xHEX_DIGIT1 HEX_DIGIT2, equivalent to \u{ HEX_DIGIT1 HEX_DIGIT2}.
  • \uHEX_DIGIT1 HEX_DIGIT2 HEX_DIGIT3 HEX_DIGIT4, equivalent to \u{ HEX_DIGIT1 HEX_DIGIT2 HEX_DIGIT3 HEX_DIGIT4}.
  • \u{HEX_DIGIT_SEQUENCE} is the unicode scalar value represented by the HEX_DIGIT_SEQUENCE. It is a compile-time error if the value of the HEX_DIGIT_SEQUENCE is not a valid unicode scalar value.
  • $ indicating the beginning of an interpolated expression.
  • Otherwise, \k indicates the character k for any k not in {n, r, f, b, t, v, x, u}.
It is a compile-time error if a non-raw string literal contains a character sequence of the form \x that is not followed by a sequence of two hexadecimal digits. It is a compile-time error if a non-raw string literal  contains a character sequence of the form \u that is not followed by either a sequence of four hexadecimal digits, or by curly brace delimited sequence of hexadecimal digits.

Any string may be prefixed with the character r, indicating that it is a raw string, in which case no escapes or interpolations are recognized.



STRING_CONTENT_DQ:      ~( '\' | '  "  ' | '$' | NEWLINE )
   | '
\' ~( NEWLINE )
   | STRING_INTERPOLATION
   ;

STRING_CONTENT_SQ:      ~( '\' | '\'' | '$' | NEWLINE )
   | '\' ~( NEWLINE )
   | STRING_INTERPOLATION
   ;

STRING_CONTENT_TDQ:      ~( '\' | '  "  ' | '$' )
   | '
\' ~( NEWLINE )
   | STRING_INTERPOLATION
   ;

STRING_CONTENT_TSQ:      ~( '\' | '\'' | '$' )
   | '\' ~( NEWLINE )
   | STRING_INTERPOLATION
   ;
NEWLINE:      \n
   | \r
   ;

All string literals implement the built-in interface String. It is a compile-time error for a class or interface to attempt to extend or implement String. The static type of a string literal is String.

   

String Interpolation


It is possible to embed expressions within non-raw string literals, such that the these expressions are evaluated, and the resulting values are converted into strings and concatenated with the enclosing string. This process is known as string interpolation.

STRING_INTERPOLATION:      '$' IDENTIFIER_NO_DOLLAR
   | '$' '{'
expression '}'
   ;

The reader will note that the expression inside the interpolation could itself include strings, which could again be interpolated recursively.


An unescaped $ character in a string signifies the beginning of an interpolated expression.  The $ sign may be followed by either:
  • A single identifier id that must not contain the $ character.
  • An expression e delimited by curly braces.

The form $id is equivalent to the form ${id}.  An interpolated string ‘s1${e}s2  is equivalent to the concatenation of the strings ‘s1, e.toString() and s2’. Likewise an interpolated string “s1${e}s2” is equivalent to the concatenation of  the strings “s1, e.toString() ands2. In both cases, it is a runtime error if e.toString() does not return an object of type String.


13.3.1 Typedef


A type alias declares a name for a type expression.

functionTypeAlias:      metadata typedef functionPrefix typeParameters? formalParameterList ';'
   ;
functionPrefix:    returnType? identifier    ;
The effect of a type alias of the form  typedef T id (T1 p1, .., Tn pn, [Tn+1 pn+1, …, Tn+k pn+k]) declared in a library L is is to introduce the name id into the scope of L, bound to the function type (T1, .., Tn, [ Tn+1 pn+1:, …,  Tn+k pn+k])  → T.  If no return type is specified, it is taken to be Dynamic. Likewise, if a type annotation is omitted on a formal parameter, it is taken to be Dynamic.

Currently, type aliases are restricted to function types. It is a compile-time error if any default values are specified in the signature of a function type alias. It is a compile-time error if a typedef refers to itself via a chain of references that does not include a class or interface type.

Hence

typedef F F(F f);

is illegal, as are

typedef B A();
typedef A B();

but

typedef D C();
class D { C foo(){}}

is legal, because the references goes through a class declaration.