0% found this document useful (0 votes)

88 views74 pages

Aggressive Unboxing in The Dart VM

This document describes improvements made to the Dart VM compiler to reduce code size and improve performance through aggressive unboxing of numeric values. It discusses adding support for unboxed instance fields and methods that can return or accept unboxed parameters. The changes allow keeping more numeric values in unboxed form, reducing runtime and code size by over 2% for some mobile apps. Specific improvements involved updating boxing instructions, identifying unboxing candidates, generating machine code for unboxed fields, updating offsets, serializers and the garbage collector to support unboxed fields. It also describes unboxing numeric parameters for static methods by identifying unboxed parameters, updating representations, calling conventions, stack maps and the arguments descriptor.

Uploaded by

Vital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views74 pages

Aggressive Unboxing in The Dart VM

Uploaded by

Vital

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Universidade Federal do Rio Grande do Norte

Centro de Ciências Exatas e da Terra

Departamento de Informática e Matemática Aplicada
Bachelor in Computer Science

Aggressive Unboxing in the Dart VM

Victor Agnez Lima

Natal-RN
October 2020
Victor Agnez Lima

Aggressive Unboxing in the Dart VM

Undergraduate Report presented to the De-

partamento de Informática e Matemática
Aplicada of the Centro de Ciências Exatas e
da Terra of the Universidade Federal do Rio
Grande do Norte as a partial requirement for
obtaining the degree of Bachelor in Computer
Science.

Advisor
Elizabeth Ferreira Gouvêa Goldbarg, PhD (DIMAp - UFRN)

Co-supervisor
Stefan Martin Kustermann, MSc (Google LLC)

Universidade Federal do Rio Grande do Norte – UFRN

Departamento de Informática e Matemática Aplicada – DIMAp

Natal-RN
October 2020
Universidade Federal do Rio Grande do Norte - UFRN
Sistema de Bibliotecas - SISBI
Catalogação de Publicação na Fonte. UFRN - Biblioteca Setorial Prof. Ronaldo Xavier de Arruda - CCET

Agnez Lima, Victor.

Aggressive unboxing in the Dart VM / Victor Agnez Lima. -
2020.
73f.: il.

Monografia (Bacharelado) - Universidade Federal do Rio Grande

do Norte, Centro de Ciências Exatas e da Terra, Departamento de
Informática e Matemática Aplicada. Natal, 2020.
Orientadora: Elizabeth Ferreira Gouvêa Goldbarg.
Coorientador: Stefan Martin Kustermann.

1. Computação - Monografia. 2. Compilador - Monografia. 3.

Dart - Monografia. 4. Backend - Monografia. 5. Otimização -
Monografia. I. Goldbarg, Elizabeth Ferreira Gouvêa. II.
Kustermann, Stefan Martin. III. Título.

RN/UF/CCET CDU 004

Elaborado por Joseneide Ferreira Dantas - CRB-15/324

Undergraduate dissertation titled Aggressive Unboxing in the Dart VM presented by Victor
Agnez Lima and accepted by the Federal University of Rio Grande do Norte, with the
approvement of all members of the examination board specified below:

Elizabeth Ferreira Gouvêa Goldbarg, PhD

Advisor
Department of Informatics and Applied Mathematics
Federal University of Rio Grande do Norte

Stefan Martin Kustermann, MSc

Co-supervisor
Google LLC

Monica Magalhães Pereira, PhD

Department of Informatics and Applied Mathematics
Federal University of Rio Grande do Norte

Martin Alejandro Musicante, PhD

Department of Informatics and Applied Mathematics
Federal University of Rio Grande do Norte

Natal-RN, October 8, 2020.

To my mother Mara, my father Pinheiro, my brother Lucas,
my sister Dani, and my dear Cecília.
Acknowledgments

I’d like to start by thanking my advisor Elizabeth Goldbarg for the guidance and
learning experiences she has given to me during my graduation course, and my co-supervisor
Martin Kustermann, who taught me and helped me throughout and after my internship.
They were always ready to provide me with the best advice whenever I needed. I would
also like to thank professors Monica Pereira and Martin Musicante, for accepting to be part
of the examining board, and my former manager Samir Jindel, for the amazing experience
with my internship project.

In addition, I would like to thank Google LLC for the opportunity of doing such an
internship, and my university UFRN for the academic support and for making all of this
possible, especially the Programa Talento Metrópole, for empowering me to be here and
go further.

Finally, regarding aspects not directly related to this work, I would like to thank my
family and my dear Cecília, for all the support, encouragement and unconditional love.
Also, I want to thank my friends (classmates and professors) from UFRN, IFRN, ICPC
(especially the contestants, coaches and organizers that make ICPC possible at UFRN),
CEI and all the other friends I met along this journey, for all the lessons, laughs, good
talks and pleasant moments we had together, particularly my best friend for more than 20
years, Luiza Egito. I wouldn’t be here without all of you.
“What we know is a drop, what we don’t know is an ocean.”

Isaac Newton
Aggressive Unboxing in the Dart VM

Author: Victor Agnez Lima

Advisor: Elizabeth Ferreira Gouvêa Goldbarg, PhD
Co-supervisor: Stefan Martin Kustermann, MSc

Abstract

Dart is an open-source programming language developed by Google LLC and popular

for mobile application development. This work presents several improvements made to
the optimizing compiler of the Dart VM in order to reduce the generated code size of
applications, while achieving better performance. To that end, the project was focused on
keeping more numeric values unboxed, by adding support to unboxed instance fields and
support to methods that return and receive as parameters unboxed values. As a result, the
runtime of several Dart programs was significantly reduced and the code size of important
mobile applications became more than 2% smaller.

Keywords: compiler, Dart, backend, optimization.

Aggressive Unboxing in the Dart VM

Autor: Victor Agnez Lima

Orientadora: Profa . Dra . Elizabeth Ferreira Gouvêa Goldbarg
Coorientador: Me. Stefan Martin Kustermann

Resumo

Dart é uma linguagem de programação de código aberto desenvolvida pelo Google LLC
e popular para o desenvolvimento de aplicações para dispositivos móveis. Este trabalho
apresenta uma série de melhorias feitas para o compilador de otimização da Máquina Virtual
Dart, desenvolvidas durante um projeto de estágio com o objetivo de tornar as aplicações
menores e com melhor desempenho. Para isso, o projeto focou-se em manter mais valores
numéricos desencapsulados, criando suporte para instâncias com campos desencapsulados
e para métodos poderem retornar e receber como parâmetros tais valores. Como resultado,
o tempo de execução de diversas aplicações em Dart foi reduzido significativamente, além
de tornar importantes aplicativos para celulares mais de 2% menores.

Palavras-chave: compilador, Dart, backend, otimização.

Contents

List of figures

List of tables

List of algorithms

List of abbreviations

1 Introduction p. 15

1.1 The Dart Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 15

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 20

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 22

1.3.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . p. 22

1.3.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . p. 22

1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 22

1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 22

1.6 Outline of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 23

2 Unboxing numeric fields p. 24

2.1 AOT Compilation and the Dart VM . . . . . . . . . . . . . . . . . . . p. 24

2.2 Update boxing instructions on ARMv7 . . . . . . . . . . . . . . . . . . p. 30

2.3 Identify unboxing candidates . . . . . . . . . . . . . . . . . . . . . . . . p. 33

2.4 Generate machine code for unboxed fields . . . . . . . . . . . . . . . . . p. 34

2.5 Update offsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 34

2.6 Update the Garbage Collector . . . . . . . . . . . . . . . . . . . . . . . p. 35

2.7 Update serializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 38

2.8 Inline SIMD operations . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 40

3 Unboxing numeric parameters p. 42

3.1 Unbox parameters for static methods . . . . . . . . . . . . . . . . . . . p. 45

3.1.1 Identify unboxed parameters . . . . . . . . . . . . . . . . . . . . p. 45

3.1.2 Update representations . . . . . . . . . . . . . . . . . . . . . . . p. 46

3.1.3 Update Dart calling conventions . . . . . . . . . . . . . . . . . . p. 47

3.1.4 Update stack maps . . . . . . . . . . . . . . . . . . . . . . . . . p. 47

3.1.5 Update the arguments descriptor . . . . . . . . . . . . . . . . . p. 49

3.1.6 Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 50

3.2 Unbox parameters for interface calls . . . . . . . . . . . . . . . . . . . . p. 51

3.2.1 Create a partition of the methods . . . . . . . . . . . . . . . . . p. 51

3.2.2 Update dynamic invocation forwarders . . . . . . . . . . . . . . p. 54

4 Results and discussion p. 57

5 Final Remarks p. 67

References p. 68

Appendix A -- External links p. 70

List of figures

1 Boxing operation for smi and mint . . . . . . . . . . . . . . . . . . . . p. 18

2 Layout of objects during execution of Program 1 . . . . . . . . . . . . . p. 20

3 Diagram of AOT compilation . . . . . . . . . . . . . . . . . . . . . . . p. 25

4 Garbage collection cycle . . . . . . . . . . . . . . . . . . . . . . . . . . p. 29

5 Values in the stack when shared stub performs runtime call . . . . . . . p. 31

6 ARMv7 Box Instruction before shared stub . . . . . . . . . . . . . . . . p. 32

7 ARMv7 Box Instruction after shared stub . . . . . . . . . . . . . . . . p. 33

8 Object layouts and bitmaps for program 3 on 32-bit platforms . . . . . p. 39

9 Control-flow graph for function bar of program 4 . . . . . . . . . . . . p. 44

10 Stack frame and stack map . . . . . . . . . . . . . . . . . . . . . . . . . p. 49

11 Class diagram to illustrate partitioning algorithm . . . . . . . . . . . . p. 53

List of tables

1 Runtime results on Flutter (ARMv7) . . . . . . . . . . . . . . . . . . . p. 58

2 Runtime results on Flutter (ARMv8) . . . . . . . . . . . . . . . . . . . p. 59

3 Code size results on Flutter (ARMv7) . . . . . . . . . . . . . . . . . . . p. 60

4 Code size results on Flutter (ARMv8) . . . . . . . . . . . . . . . . . . . p. 61

5 The 40 most significant regressions on Dart AOT (on x64) . . . . . . . p. 63

6 The 40 most significant improvements on Dart AOT (on x64) . . . . . p. 64

List of algorithms

1 Sample program to store smi and mint values into fields . . . . . . . . . p. 18

2 Wrapper of double value with increment method . . . . . . . . . . . . . p. 21

3 Classes with boxed and unboxed fields . . . . . . . . . . . . . . . . . . p. 37

4 Sample program to illustrate the calling conventions . . . . . . . . . . . p. 43

5 Use of dynamic invocation forwarders . . . . . . . . . . . . . . . . . . . p. 55

List of abbreviations

AOT — Ahead Of Time

API — Application Programming Interface

AST — Abstract Syntax Tree

CFE — Common Front End

CFG — Control-Flow Graph

CL — Changelist

CPU — Central Processing Unit

DSU — Disjoint-Set Union

FFI — Foreign Function Interface

FPU — Floating-Point Unit

GC — Garbage Collector

IL — Intermediate Language

JIT — Just In Time

NaN — Not a Number

NNBD — Non-Nullability By Default

OS — Operating System

PC — Program Counter

SIMD — Single Instruction, Multiple Data

SSA — Static Single Assignment

TFA — Type Flow Analysis

VM — Virtual Machine
15

1 Introduction

1.1 The Dart Language

Dart is a general-purpose programming language developed by Google LLC whose

popularity is growing fast. This growth can be seen on GitHub (2008), the largest code
hosting platform in the world (GOUSIOS et al., 2014), which ranked Dart in the first place
of change of programming language use in 2019, when compared to 2018, increasing 532%
(OCTOVERSE, 2019). The language was first announced as a public preview in 2011 at the
international software development conference GOTO Aarhus (BRACHA; BAK, 2011).

Originally designed with an unsound type system, Dart 1.x (or simply Dart 1) was an
optionally typed language. This means, in particular, that types had no effect on runtime
semantics, being used mainly as annotations to improve readability and for documentation.
However, in August 2018, Dart 2.0 was released, containing several updates to the language,
including a new and sound type system and turning Dart into a statically typed language.
Due to its sound type system, the users of the Dart 2 can trust that their programs will
never get into a state where an expression evaluates to a not permitted value according to
the static type of the expression.

The new type system is also expected to improve even more: every type in Dart is
currently nullable, that is, the null value can be assigned to variables of any type. This will
not be true after a new feature, called Non-Nullability By Default (NNBD) and currently
available as a technology preview, is released. With this feature, type annotations that are
not marked as nullable will not accept null values anymore. These changes will ensure null
safety, avoiding bugs and allowing the compiled code to be further optimized.

The syntax of Dart is similar to some popular programming languages such as C++
(STROUSTRUP, 2000) and Java (ARNOLD et al., 2000). Users of Dart do not need to manage
memory: for this, it uses a generational garbage collector. Dart has single (multilevel)
inheritance, but it also supports implementing interfaces, that are implicitly defined for
16

each class, and to use mixins, which are a way of reusing a given class body in multiple
class hierarchies. The following books present overviews on the language: (WALRATH;
LADD, 2012), (BALBAERT, 2014), (KOPEC, 2014), (BRACHA, 2015) and (SINHA, 2020).

Currently, one of the main uses of Dart is through the Flutter framework (official
website: https://flutter.dev/), a cross-platform toolkit for developing mobile, web and
desktop applications that has Dart as the development language. Web applications in Dart
can be compiled to the JavaScript language (GOODMAN; EICH, 2001), while mobile and
desktop applications can run using Dart Native compilation.

Dart Native compiles Dart code to machine code of the target architecture, supporting
ARMv7 (32-bit), ARMv8 (64-bit) and x86-64 (also called x64) architectures1 . It can run
in Just In Time (JIT) mode and Ahead Of Time (AOT) mode.

On the one hand, JIT mode allows a faster development cycle, as it supports the
binaries to change dynamically; makes runtime optimizations; and also has support to
hot reload, which injects changes in the source code into the running program, preserving
its current state — making it unnecessary to restart the application in order to run
the updated code. On the other hand, Flutter uses AOT for production as it results in
optimized and predictable code with a lower startup latency and smaller code size, which
is critical for mobile applications. This project focuses on the AOT compilation mode.

Since Dart 2 was released, AOT mode has been remarkably improved, aiming to
generate more efficient and smaller code. This was possible thanks to most of the new
changes, including the sound type system. Another change in the language that came up
with Dart 2 was the semantics around integer values: in Dart 1, integers had arbitrary
precision, while Dart 2 introduced 64-bit integers with wrap-around semantics, that is, all
arithmetic operations are performed modulo 264 (arbitrary precision integers became a
separate class named BigInt). This change was convenient because, in practice, before
that, every integer operation had to check if the result overflowed and possibly allocate an
integer object with higher precision to store the result. There were three ways to represent
the primitive integers, which will be discussed in detail ahead: smis, that are small integers,
mints (medium integers) and big-ints. In JIT mode, the code for those allocations could
be generated lazily when overflows are detected, however, in AOT mode, each of such
operations would have to contain the code for all integer types, resulting in a much bigger
generated code and, in many cases, worse runtime performance. The 64-bit integers avoid
those issues and have a range of values that is suitable for most real world applications,
1
JIT mode also supports IA-32 architecture.
17

justifying such decision — in general, the cases that need big-ints are not only rare but
also easy to detect.

Integer objects (mints), like all other objects in Dart, have a uniform layout, which
starts with a one-word size header, containing information such as an integer representing
its class (the class ID), among other information. After the header, comes the data inside
the object: for most types, including all user-defined types, it contains the contents of
the fields enclosed by that object, which consisted only of tagged pointers when this
project started; the exceptions are some predefined classes, such as numeric primitives
(e.g. a floating-point object), which do not contain fields, but, instead, contain the bits
representing their values. Pointers are used because the type of each field might change at
runtime: although each field has a static type, it can hold values of any subtype of that
type, including null. Also, every Dart object has an alignment of at least one word.

The reason why all pointers in Dart are said to be tagged is related to a particular
way of handling integers: the aforementioned smi. It was previously noted that big-ints
are rarely used; in fact, most integers in real world programs fit in a much smaller range,
which is used by smis: the range of signed integers that can be represented with fewer bits
than one word (i.e., 31 bits on 32-bit architectures, and 63 bits on 64-bit architectures).
For this reason, the following optimization is performed: for each value that fits in the smi
range, instead of allocating an object to hold this integer, the value is shifted one bit to
the left, and the least significant bit is set to 0. Additionally, the least significant bit of
each pointer is set to 1, so that it is possible for the VM to distinguish between tagged
pointers and smis, by checking the value of the least significant bit — called tagging bit.
This is safe to do because: for the case of pointers, the least significant bit of the address
of any object must be 0, due to the alignment of Dart objects; and, for the case of smis,
thanks to the way the range of values was chosen, the bits discarded (by the left shift
and by truncating the value to one-word size) must be all equal to the most significant
remaining bit, being all of them either 0 if the value is positive, or 1 if it is negative, so
this operation would not result in loss of information as well.
18

class Point {
var x, y;
}

final globalPoint = Point();

storeValues(xValue, yValue) {
globalPoint.x = xValue;
globalPoint.y = yValue;
}

main() {
storeValues(50, 5000000000);
}

Program 1: Sample program that stores values 50 and 5000000000 into fields.

Program 1 is a Dart code which sets values 50 and 5000000000 to fields x and y,
respectively, of globalPoint, a global variable of type Point. Figure 1 illustrates what
would happen in order to store those values on a 32-bit architecture.

Figure 1: Boxing operation for smi and mint.

At the top of figure 1, we have the decimal notation of the value to store into each of
19

the two fields. Since we are supposing that this program is running on a 32-bit architecture,
the range of smi values is [−230 , 230 − 1]. The example on the left shows that, since value
50 is inside this range, it is represented as a smi, being truncated and shifted to the left.
However, on the right, the value does not fit in the smi range, therefore an allocation
needs to be performed: a mint object (which requires 1 word for the header and 2 words
for the data) is allocated, then the value 5000000000 is stored inside that object, and the
address of that object is used (in this example, the address was 0xFFAABB88), having its
tagging bit set to 1 to indicate that it is not a smi (so it would become 0xFFAABB89). The
output at the bottom of the figure contains the bits that would in fact be stored into the
field. Such operation that receives a scalar value and returns a tagged pointer (or smi) is
called boxing operation (and its inverse is the unboxing operation). Additionally, figure 2
illustrates what would be the layout of objects of program 1 before and after calling the
method storeValues for that execution. For the sake of simplicity, this figure elides any
padding added to the objects due to their alignment.

Smis are useful to avoid pointer dereferencing when retrieving their scalar values and,
mainly, to avoid excessive object allocations. Retrieving the value from (i.e., untagging) a
smi can be done by simply performing a signed right shift on its bits; similarly, retrieving
the pointer address from a tagged pointer only requires the tagging bit to be reset. One
might be wondering the reason why the least significant bit is set to 1 for pointers and
0 for smis and not the opposite; in fact, some benefits come from this choice: it allows
arithmetic operations such as addition and subtraction to be performed directly on smis,
without having to untag them, as long as the operation does not overflow. Besides using
object alignment to tag pointers, there are many other approaches for tagging words that
could have been used instead, being many of them explained in (GUDEMAN, 1995). For
instance, the set of 64-bit floating-point values that are interpreted as Not a Number
(NaN) according to the IEEE standard (ZURAS et al., 2008) is large and would allow 51
bits to be encoded.

All information presented in this document about the Dart compiler refers to the latest
version of the Dart SDK repository on GitHub (https://github.com/dart-lang/sdk)
available on 6 of March of 2020.
20

Figure 2: Layout of objects during execution of Program 1.

1.2 Motivation

A value is said to be boxed (or tagged ) if it is a smi or contains the tagged pointer to
an object, and unboxed when it is a scalar value (e.g. an int value is unboxed if it stores
the integer value itself). As noted previously, boxed values are important in order to allow
variables to hold values of different types. Even if variables are annotated to be of double
type, for instance, boxed values would be needed to distinguish the null value from the
others. However, when compiling in AOT mode, it is possible to prove in certain scenarios
that some variables will never hold the null value. This is done by a global static analysis
called Type Flow Analysis (TFA). In addition, as Dart is expecting to enable NNBD in
the near future, the users of Dart will be able to say whether the types are nullable or not.
In any case, when numeric values are known not to be nullable, keeping them unboxed
21

class Wrapper {
double value;
Wrapper(this.value);
increment(double toAdd) {
value += toAdd;
}
}

main() {
final w = Wrapper(3.0);
w.increment(1.5);
print(w.value);
}

Program 2: Wrapper of double value with increment method.

would allow a better performance and memory management, since it would avoid object
allocations and pointer dereferencing when operating those values, and also result in a
smaller code size, which is currently one of the main goals of the Dart compiler. For this
reason, the project focused on adding features to allow more values to be unboxed. The
main changes performed in the Dart compiler were supporting classes to hold unboxed
fields and functions to receive and return unboxed values. Also, the boxing instructions of
integers on ARMv7 were optimized to reduce the generated code size.

Consider, for example, a class that wraps a floating-point value and has a method
to increment that value. Program 2 gives one possible implementation for it. Before the
changes presented in this project, the field value would always be boxed, even if it always
stores a non-nullable double. Similarly, the increment method would always expect the
parameter toAdd to be boxed. Then, whenever a call is performed to increment, it would
be necessary to unbox the variables value and toAdd, sum those values, perform a boxing
operation — which includes an allocation — and, finally, store the result into value.
Allowing parameters and fields to be unboxed would make this operation simpler: the
parameter and the field would already be unboxed, so it would be possible to sum them
directly, and then the result would be ready to be stored into the field without performing
any allocation. This example illustrates some of the improvements this project provides to
the Dart compiler. Here, we also ignore any optimization the compiler could have done
due to the simplicity of the program, such as inlining the method increment or replacing
its parameter by a constant (since there is only one value passed to it, 1.5).
22

1.3 Objectives

1.3.1 General Objective

• To improve the AOT compilation of Dart.

1.3.2 Specific Objectives

• To reduce the code size of applications developed in Dart;

• To optimize runtime performance of applications developed in Dart.

1.4 Methodology

This project was developed during a 14-week internship in Aarhus (Denmark) at

Google LLC in the Dart VM compiler team, from 2 of December of 2019 to 6 of March of
2020.

The impact of the project was assessed by running several benchmarks on the internal
system, giving statistics about runtime and generated code size, among others. These
results compared a patch that disabled all changes of this project in the code base with
the master branch after the entire project was submitted.

The detailed methodology is presented and explained throughout chapters 2 and 3.

1.5 Contribution

This project contributed to the Dart compiler by:

• Adding support to unbox numeric fields;

• Reducing the code size of boxing operations on ARMv7;

• Adding support to inline arithmetic SIMD operations;

• Adding support to unbox numeric arguments and return values.

1.6 Outline of this work

Chapter 2 presents the changes that were submitted while developing support for
unboxed fields, and additionally it includes technical information about Dart AOT compi-
lation (section 2.1), the internship starting task to update the boxing instructions (section
2.2) and changes made to add support for inlining SIMD arithmetic operations (section
2.8). Chapter 3 focuses on allowing parameter and return values to be unboxed; and it is
divided into two sections. Section 3.1 presents the changes for supporting unboxed values
when performing calls to static methods and constructors, while section 3.2 extends the
support to non-static methods as well. Chapter 4 presents and discusses the results this
work achieved on several benchmarks, responsible for assessing both Flutter applications
and Dart AOT compilation. Finally, chapter 5 offers some final considerations and possible
future work.
24

2 Unboxing numeric fields

2.1 AOT Compilation and the Dart VM

Before discussing the changes developed by this project, it is important to understand

some technical details that are presented in this section about the AOT compilation of
Dart programs and the Dart VM.

Firstly, AOT compilation of Dart is divided into three main parts: the Common Front
End (CFE), followed by several transformations, and then the Dart VM (or simply VM).
CFE is responsible for parsing Dart programs, building the type hierarchy, reporting
syntax and static semantic errors, generating the augmented abstract syntax tree (AST),
which is also called kernel. Then, a series of transformations are made on the kernel,
including a global static analysis called Type Flow Analysis (TFA), which identifies (and
removes) unused methods and classes, and obtains more precise information on the types
of fields and variables, including nullability. The kernel is annotated with this information
as metadata and, after all transformations are done, it is serialized. Finally, the Dart VM
loads all kernel information, builds the control-flow graph (CFG) of each function, provides
the execution environment for the Dart program and generates the binaries. In AOT mode,
the VM is also divided into two parts called precompiler, which runs before executing the
Dart code, and precompiled runtime, the runtime environment. Both CFE and the kernel
transformations are written in Dart, while the VM is written in C++. Figure 3 presents a
diagram of the AOT compilation.

The vertices of the control-flow graph are blocks of low level instructions of an
intermediate representation, called Intermediate Language (IL). This representation is
in Static Single Assignment (SSA) form, meaning that it always defines variables before
using them, and each variable is assigned only once. The IL is the same for every target,
but each IL instruction might generate different assembly instructions depending on the
target architecture.
25

Figure 3: Diagram of AOT compilation.

The “static” part of SSA means that there is only one definition-site for each variable,
even though it could be inside a loop that is executed multiple times at runtime. SSA
form is commonly used because it brings many advantages to data-flow analysis and
optimization algorithms, as it allows each SSA variable to be associated with its unique
definition and a list of all uses this variable has. Practical examples of optimizations
that might take advantage of SSA form are: dead code elimination, improving needless
relationships between different variables and detecting equality of variables (ALPERN;
WEGMAN; ZADECK, 1988), (ROSEN; WEGMAN; ZADECK, 1988).

The list below provides examples of IL instructions that were relevant to the develop-
ment of this project:

• Box instruction: Implemented as a C++ class named BoxInstr, this instruction

receives a variable with unboxed value and returns a boxed value for it. There are
subclasses of this instruction for more specific types, such as BoxInt64Instr, which
performs the boxing of 64-bit integers.

• Unbox instruction: As opposed to the previous instruction, the UnboxInstr receives

a tagged value and returns the unboxed representation of it. It also has more specific
subtypes such as UnboxInt64Instr.

• Load field instruction: Receives a tagged pointer to an object and returns the value
of a given field of that object. Implemented as LoadFieldInstr.

• Store field instruction: Receives two variables, the first one being a tagged pointer to
an object. It stores into a given field of that object the value of the second variable.
The C++ class for this instruction is StoreInstanceFieldInstr.

• Check null instruction: The CheckNullInstr receives a tagged value and checks if it
is null. It is used, for instance, before accessing a field or a method of a variable of
nullable type. If the value is null, it throws an exception; otherwise, the execution
simply proceeds to the next instruction.

• Call instructions: Used to perform a call to a target function. There are several
instructions for calls, such as: StaticCallInstr, used when the target function is
known at compile time; ClosureCallInstr, used for closures; NativeCallInstr,
used when the target is a native function; instance call instructions, used for call-
ing a method of an object when the target needs to be looked up at runtime
(e.g. DispatchTableCallInstr, PolymorphicInstanceCallInstr, etc). More de-
tails about those instructions are presented in chapter 3.

• Push arguments instruction: Used before call instructions, PushArgumentInstr

receives a value and pushes it onto the stack.

• Parameter instruction: This instruction is used in entry blocks (e.g. function entry)
to define the SSA variable of an incoming parameter. It does not emit any native
code itself. The C++ class for this instruction is named ParameterInstr.

• Return instruction: The ReturnInstr is responsible for generating the native code to
return from a function. It receives a variable with the value that should be returned.

• Instructions for arithmetic operations: Those are responsible for performing arithmetic
operations on several types. Two examples are the BinaryInt64OpInstr, which
operates over 64-bit integers, and SimdOpInstr, used for operating values of SIMD
data types (e.g. Float32x4, which stores four single precision floating-point values).

• CheckedSmiOpInstr: This is an optimization that is done when an operator is used

and the compiler assumes that the operands will likely be smis. In this case, the code
27

for the smi operation is inlined, and a runtime check is added to verify if the objects
are indeed smis. If the tagging bit of any of those objects shows that it is not a smi,
then an instance call is performed, which will lookup the target method. When the
operands are smis, no call needs to be performed and the operation is very efficient.

Some IL instructions have a fast path and a slow path. For instance, the boxing
instruction of 64-bit integer has a fast path for the case when the value is a smi and a
slow path, which needs to allocate a new object, when it is not. Similarly, the null check
instruction has a slow path that creates and throws an exception when it receives a null
value, and the slow path of CheckedSmiOpInstr happens when an instance call needs to
be performed.

Regarding memory management, most VM objects, just like Dart objects, are allocated
in the Dart heap. For this reason, they cannot be allocated directly using the C++ new
operator. Such objects are called heap objects (or Raw objects1 ), being always operated
by pointers or, preferably, by object handlers. An example is the RawClass object, which
holds information about a given class, such as the size of instances of that class, its library,
etc, and the handler for this object is Class. Similarly, the RawField object is handled by
Field, and contains the offset of this field relative to the enclosing object address, among
other pieces of information. These objects are available at both compile time and runtime.

The Dart VM manages memory by using a tracing Garbage Collector (GC). Briefly,
tracing collection determines which objects should be collected based on a set of roots: it
traces the heap from those roots and considers every object referenced by them (either
directly or by a chain of references) as reachable; then, it collects only the unreachable
objects. This strategy has advantages when compared to others such as reference counting,
for example by allowing cycles of objects that reference each other to be collected, without
requiring any more sophisticated algorithm to be used for this. More details about garbage
collection strategies and implementations are presented in (JONES, 1996).

There are several implementation strategies for tracing collectors. One such example
is the generational garbage collection (LIEBERMAN; HEWITT, 1983), which is based on
the hypothesis that a large portion of the objects that are reclaimed by GCs have a short
lifetime. Generational GCs split the data into generations according to their creation
time: the most recently created objects are in the youngest generation, the oldest objects
1
The heap objects were renamed after a big refactoring in April of 2020, not being called Raw objects
anymore. This refactoring was done to avoid using C pointers to represent tagged pointers. The RawClass,
for instance, became ClassLayout and tagged pointers to it became ClassPtr instead of RawClass*.
28

are in the oldest generation, and so on. Then, the GC runs collection cycles on younger
generations more frequently than older generations. If an object survives after many
collection cycles, it is promoted to an older generation. By doing this, when the hypothesis
holds (which is the case for most real world applications), the collection cycles become
much more efficient. For this reason, the Dart VM uses a generational GC.

The GC used by the Dart VM has two generations, called new space and old space.
The former is used to store objects that were created more recently, which are expected
to become unreachable sooner, while the latter contains objects that are not expected to
become “garbage”. The root set includes objects in the new space that are referenced by
the stack, by variables or by objects in the old space.

When the GC is triggered, it runs an algorithm based on (CHENEY, 1970): the younger
generation (the new space) is henceforth called from-space and a new block of memory,
named to-space, is allocated; then, for each reachable object in the from-space, this object
is either promoted to the old space or copied to the to-space; finally, the from-space is
deallocated, and the to-space becomes the younger generation. While visiting the objects,
this algorithm also updates the references to the ones that are moved.

Figure 4 illustrates a garbage collection cycle. In this example, it is possible to see

objects in the new and old spaces referenced by global variables and by pointers on the
stack. Additionally, there is an object in the old space referencing another object in the
new space. The objects in the new space that are referenced by those arrows will form
the root set. When the GC is triggered, the to-space is allocated and the roots are copied
there, along with all other objects that are reachable through a chain of references. In the
last step, it is possible to see that the unreachable objects are not copied and therefore will
be reclaimed when the from-space is deallocated. The objects in the old space, however,
are not reclaimed. The resulting new space has more free space and the used data has
been compacted.

There are three configuration modes for the compiler, with only the last two being
intended for users of Dart: debug, release, and product. The debug mode is used for the
Dart VM development, running several assertions in order to identify bugs easier. The
release mode is useful for end-user development, supporting hot-reloading of application
code (in JIT mode) and having an API to examine the running application (e.g. used for
examining CPU profile, memory profile and for debugging). Finally, product is the most
optimized mode.
29

Figure 4: Garbage collection cycle.

2.2 Update boxing instructions on ARMv7

The first issue to be solved in the internship, before start working to support unboxed
fields, was to change the boxing instructions on ARMv7. The reason was that each boxing
operation of 64-bit integer was generating an unnecessarily large piece of 88 bytes of ARM
machine code.

As mentioned in section 2.1, boxing instructions have a fast path, when the integer is
a smi; and a slow path, when it is not and, therefore, it is necessary to allocate a box for
that integer (see figures 1 and 2). The issue was that most of the code was used by the
slow path and it did not need to be inline, as the slow path is already expected to be slow.

For this reason, a shared stub for the slow path was implemented, and each boxing
instruction would jump to this stub only if the integer value to be boxed was not a smi.
If there is not enough free space in the new space to allocate the box, a runtime call is
performed for that allocation, possibly triggering the GC, so the state of the live registers
needs to be saved (stored) onto the stack, as they might be clobbered, and restored after
the call. Since shared stubs can be called from multiple places, each of them having
different live registers, all registers that are not reserved are saved. The exception is the
FPU registers: if they are not being used, it is much cheaper not to save them. Therefore,
two similar stubs were created: one that saves all non-reserved registers, including FPU,
and another one that only saves regular registers.

Another concern was that part of the stub was generated by a shared stub generator,
but it did not have support for the return value, so it had to be updated. In order to make
the saved registers seen as part of the caller frame, the registers are saved before entering
the stub frame. However, the return value of runtime calls is always stored at the top of
the stack. It cannot be the position where a register was saved on the stack, since they
are pushed before entering the stub frame, so the change was to reserve extra space on
the stack just before performing the runtime call. This space must not store a value that
could be seen as a pointer, otherwise the GC would try to use it as an address, so an even
value is pushed (as it would be seen as a smi due to the tagging bit). Finally, after the
call, the result has to be moved from the stack to a register; the R0 register was chosen for
this. Since the registers are saved on the stack, the return value is simply stored into the
position where R0 was pushed. Figure 5 illustrates this procedure. The boxing instruction
was updated to always use R0 as the output register, so no data is lost when the return
value is stored into it.
31

Figure 5: Values in the stack when shared stub performs runtime call.
32

Link 1 of the appendix provides the domain where changelists (CLs), submitted or
undergoing code review, on the Dart SDK codebase are available. Link 2 contains the
shared stub implementation. The green lines of code (on the right) are the lines added
after this change, while the red ones, on the left, were deleted.

After updating the boxing instruction to use the shared stub, it started to emit only
32 bytes of native code: four instructions for the fast path, one loading instruction of the
stub address, one jump to this address, and two store instructions of the integer value into
the allocated box. Link 3 of the appendix contains the changes on the boxing instruction,
and the entire CL can be accessed through link 4.

Lately, another improvement has been made: since the stub is at a fixed address, it is
possible to make a PC-relative call to that address, instead of loading the stub address to
a register and then jumping to this address. This change saves four more bytes, reducing
the initial size of 88 bytes to 28 bytes for each boxing instruction. Link 5 of the appendix
provides the changes for this last improvement.

Figures 6 and 7 show the generated code before and after making those changes,
respectively.

Figure 6: ARMv7 Box Instruction before shared stub.

The first line of each figure shows the IL instruction. The values starting with v (v28,
v3, v4, ...) are the SSA variables. The following lines contain the assembly code: the first
column shows the address; the second one, the instruction in hexadecimal notation; and
33

Figure 7: ARMv7 Box Instruction after shared stub.

the third column contains the instruction in a more readable way.

2.3 Identify unboxing candidates

The first main goal for the internship was to add support to unboxed fields. Fields in
Dart were always stored tagged, but numeric (either integer or floating-point) values that
are known not to be nullable do not need a box. Avoiding it could reduce the overhead
caused by boxing and unboxing instructions, resulting in better performance and smaller
code size.

The first decision to be made for supporting unboxed fields is the way those fields will
be identified. The RawField class already had a bit field called is_unboxing_candidate,
because JIT compilation supports unboxed fields2 , so this bit was used. It is possible to
know that the VM is running in AOT mode by checking the flag FLAG_precompiled_mode,
so this flag guards most of the changes in this project.

The required conditions for a field to be an unboxing candidate are: (1) it cannot
ever store the null value; (2) it needs to be of a numeric type; and (3) the compiler
configuration and target architecture need to support values of its type to be unboxed.
Condition (1) invalidates nullable-type fields. Moreover, it also invalidates late and static
fields, because they store the null value before being initialized. Due to condition (2),
the only accepted types are int, double, and SIMD types, but it does not need to be
the declared type: it can be the type inferred by TFA, which is serialized in the kernel
as metadata. Condition (3) can be checked easily in the VM by calling methods such as
SupportsUnboxedSimd128(). Therefore, this checking is done and the unboxing candidate
bit is set in the kernel loader, and the changes for this are available through link 6 of the
appendix.
2
Actually, the “unboxed” fields in JIT are not exactly unboxed: the enclosing object still stores a
reference to the field; however, instead of holding a reference to the Dart object, which has a header, it
stores a reference to the memory address where the scalar value is stored.
34

2.4 Generate machine code for unboxed fields

There are two IL instructions responsible for operating directly with non-static fields:
LoadFieldInstr and StoreInstanceFieldInstr, in charge of loading from and storing
to instance fields, respectively. Those instructions had to be updated to generate a different
machine code for the case when the field is unboxed. The main difference is that, in this
case, the values would be loaded or stored inside the enclosing object, instead of a box
referenced by that object. The links below provide the changes made for each architecture:

• ARMv7 architecture: link 7 of the appendix;

• ARMv8 architecture: link 8 of the appendix;

• x64 architecture: link 9 of the appendix.

2.5 Update offsets

The AOT compiler has support to cross-compile code for 32-bit architectures even if
running on a 64-bit device. To make it possible, the compiler needs information, such as
offsets, about the layout of objects on both host and target platforms.

There are many kinds of offsets that are used by the VM, such as the instance size
of each class, the offset of each field and the type arguments offset. Type arguments are
an array of types used by generic classes to store the type used on each instance. Those
offsets were computed using the assumption that each field requires exactly one-word size,
as they were expected to store a memory address. However, after enabling unboxed fields,
this might not be true anymore: the size of those fields might be 128 bits, in the case of
SIMD fields, or 64 bits, which is the case of int and double and that is two-word size on
32-bit platforms. Additionally, most offsets were stored in words since this value was the
same for every architecture. Now, they might be different on different platforms and, due
to cross-compilation, there are places where the correct offset to be used is the host offset,
while in other scenarios it is the target offset. For this reason, it was necessary to update
the compiler to keep both values.

Several offsets are precomputed for each architecture by a module called offsets
extractor. The AOT compiler then uses the offsets for the target architecture it is targeting.
Despite this module being usually used to retrieve information regarding the target, many
of the host offsets were being used in places where target offsets were expected, because
35

they were known to be the same (in words) on both host and target. In some cases the
offsets extractor did not even compute those offsets. It was the case of many instance sizes,
so this module had to be updated to include them. However, it was noticed that many
classes have fields that do not exist in product mode or in the precompiled runtime (in
order to obtain performance and memory improvements), while the offsets extractor only
computed the values for release mode of JIT compilation. Therefore, the module had to
be changed to compute the offsets both on release and product configuration modes, and
on JIT and precompiled runtime.

Links 10 and 11 of the appendix contain the CLs responsible for the updates on the
offsets extractor.

Those CLs fixed the offsets that are precomputed by the offsets extractor, but many
other offsets — such as field offsets for Dart fields — are calculated in the VM. Consequently,
those offsets had to be updated to take into account the new size of the fields. Furthermore,
some offsets in the RawField and RawClass were duplicated to store both target and host
values, both of them being calculated together. The Class::CalculateFieldOffsets
method was updated to do so, and the link 12 of the appendix provides such updates. This
method also computes a bitmap of unboxed fields, which is explained in section 2.6. Most
of the other changes in that file were made to update the offsets to be used, from host to
target.

Most updates in the two files available through the following links were made to ensure
that the appropriate offset is used: (1) link 13 of the appendix; and (2) link 14.

The previously mentioned updates to duplicate offsets in the RawClass and RawField
can be found on lines 857 and 1238, respectively, of the file available through link 15 of
the appendix. An optimization performed was to have those two offsets (one for target
and another for host) only when it is not running the precompiled runtime, given that
host information is not available in this scenario.

2.6 Update the Garbage Collector

The GC visits the pointers of live objects to make sure that every object referenced
by them will survive, possibly being moved, and that the pointers will be updated to the
new addresses in those cases. When visiting the pointers of an object, the GC had the
assumption that every word inside that object (except for the header that has a fixed
length in the beginning of each object) would be a tagged value, i.e., they would be either
36

pointers or smis. So, each value stored from the first word in the object (after the header)
to the last was visited as a separate tagged value. The location of the last word in the
object is calculated by the object address and its size, which is retrieved from a shared
class table using the class ID (that is stored in the header of the object). Now, as it is
not true anymore that every value in the object is tagged, the unboxed values need to be
skipped.

Since the GC is performance sensitive, the checking for unboxed fields has to be as
fast as possible. The solution was to store, for each class, a bitmap of unboxed fields. For
each word inside the instances of that class, the bitmap stores a bit to indicate whether
that word is an unboxed value (if the bit is set) or a tagged value (bit not set). Notice
that, since unboxed values might use more than one-word size, they might require more
than one corresponding bit in the bitmap.

As an optimization, given that the instance size is, in general, not larger than 64 words
(not including the header), the bitmap was set to have a fixed length of 64 bits, and for
every position after that, the bit is considered not set.

The bitmap is populated when calculating the field offsets (see section 2.5). If a field
is an unboxing candidate whose offset is more than what is supported by the bitmap, this
field is decided not to be unboxed anymore, and the unboxing candidate bit (described in
section 2.3) is reset to false (zero).

As an example, program 3 contains two classes Foo, which has an unboxed field, and
Bar, which contains two unboxed fields and three boxed fields. The field value in class Foo
can be unboxed because it is an integer (as it has the int type annotation) and because it
is never set to null, as it is set only in the constructor of the class and the program only
initializes it with values 3 and 8 (in function main). The final modifier enforces the field
to never be reassigned after it is initialized, being good practice to use it when variables
are expected to be set only once.

Regarding class Bar, notice that it has a constructor with an optional parameter,
identified by crotchets. When optional parameters are not passed, they receive the default
value: if not specified, it is null. The field unboxedInt is (as the name suggests) unboxed,
for the same reason as the field value in Foo; the difference is that unboxedInt was not
annotated as an integer: instead, this is inferred by TFA, just as its nullability. Nevertheless,
boxedNum is boxed, even though it is never null and only stores numeric values, because
when variable bar1 is created (inside main) it receives an integer value, while when bar2
is created, it receives a floating-point value; the inferred type from TFA is num, which is a
37

class Foo {
final int value;
Foo(this.value);
// ...
}

class Bar {
final unboxedInt;
final boxedNum;
final foo;
final unboxedDouble;
int nullableInt;

Bar(this.unboxedInt, this.boxedNum, this.foo,

this.unboxedDouble, [this.nullableInt]);

void setNullableInt(int val) {

nullableInt = val;
}
// ...
}

main() {
final bar1 = Bar(/*unboxedInt=*/ 1,
/*boxedNum=*/ 2,
/*foo=*/ Foo(3),
/*unboxedDouble=*/ 4.0,
/*nullableInt=*/ 5);

final bar2 = Bar(/unboxedInt=/ 6,

/*boxedNum=*/ 7.0,
/*foo=*/ Foo(8),
/*unboxedDouble=*/ 9.0);
bar2.setNullableInt(10);
// ...
}

Program 3: Classes with boxed and unboxed fields.

super type of both int and double and it is not unboxed in order to distinguish between
those types. The field foo is boxed because it stores an instance of Foo class, which is
not a numeric primitive. The field unboxedDouble is unboxed because it always stores a
double value. Finally, nullableInt is an integer field that is initialized with value 5 when
bar1 is created, but it is initialized with the default value, null, when bar2 is created. It
was not marked as final because the setNullableInt method can reassign a value to it:
just after creating bar2, the value 10 is stored into it. Although this field stores values 5
in bar1 and 10 in bar2, it also stores the null value between bar2 initialization and the
call to setNullableInt, therefore, it is nullable and cannot be unboxed.

Figure 8 shows the layout of instances of the classes Foo and Bar on a 32-bit platform.
It also includes the bitmap for each of them. To illustrate how the bitmap would be
retrieved, it assumes that the class IDs of Foo and Bar are 300 and 301, respectively. This
information is available in the header of the instances, so, when the GC visits them, it asks
the shared class table for the bitmap, passing the class ID as the index. It is important to
note that, since it is a 32-bit platform, each word contains only 32 bits, so each unboxed
field requires two words to store its 64-bit values. Similarly, if there were 128-bit fields
(e.g. of Float32x4 SIMD type), each of them would require four words and consequently
use four bits in the bitmap.

Link 16 of the appendix contains the bitmap implementation, as well as the interface
to access it through the shared class table. The bits of the bitmap are set while calculating
the offsets of the fields, inside the Class::CalculateFieldOffsets method, whose im-
plementation is available through link 12. Finally, methods used by the GC to visit object
pointers had to be updated to skip the words marked as unboxed values by the bitmap;
link 17 provides an example of such updates.

2.7 Update serializers

An important thing to note is that the bitmap for unboxed fields presented in section
2.6 is computed in the precompiler, but the GC might use it in the precompiled runtime.
This is possible because information such as the size and the bitmap of each class is
serialized in the precompiler to be deserialized and used at runtime.

The changes for serializing the bitmap can be seen in the ClassSerializationCluster
class. They are available through link 18 of the appendix (and, similarly, the updates in
the same file on the ClassDeserializationCluster are responsible for deserializing the
39

Figure 8: Object layouts and bitmaps for program 3 on 32-bit platforms.

bitmap).

The previous file also contains important changes in the class responsible for serializing
instances that are computed statically — the class InstanceSerializationCluster. This
case is similar to the changes in the GC: instances being serialized need to ensure that all
of its fields are also serialized, so the pointers to them need to be visited, but unboxed
fields should not be seen as pointers — instead, those bits should be preserved in the
instance —, so the bitmap was also used here to identify them. Since the bitmap does
not store information about the type of the unboxed fields, it was not possible to know
the size of each field — for instance, 16 bytes of unboxed data could correspond to one
SIMD field, or two double fields, or one int and one double fields, and so forth — so
the data of the field is not serialized entirely at once. Instead, each 32 bits are serialized
separately, because this value is a divisor of all word sizes and all unboxed field sizes,
avoiding unnecessary extra bits to be serialized.

Another serializer that had to be updated in a similar way is responsible for runtime
serialization of instances, e.g. when sending objects asynchronously through a port. Link
40

19 of the appendix provides the changes for it.

2.8 Inline SIMD operations

The last step finished the CL for unboxed double and SIMD fields. Link 20 of the
appendix contains the complete CL. Just after that, another similar CL was submitted to
allow integer fields to be unboxed as well (link 21).

However, before submitting those changes, some unexpected results were noticed when
running benchmarks to assess their impact on the VM. More specifically, benchmarks that
use SIMD values, such as NBodySIMD, were expected to have a better performance after
the SIMD fields became unboxed, but they took almost twice as long to execute. More
details about the algorithm this benchmark runs are available in chapter 4.

After investigating, we realized that the arithmetic operations for SIMD values were
not inlined. Instead, the SIMD values, initially unboxed, were being boxed to be passed as
arguments in a static call to the method that operates them, and, after that, the result was
being unboxed to be stored in the field. This overhead of boxing and unboxing operations
was responsible for those regressions. Before changes, this overhead did not exist, since all
values were boxed. Chapter 3 presents changes on the calling conventions for supporting
unboxed arguments. However, no calls should be performed in this case, as they are much
more expensive than simply operating those values inline, and that would avoid boxing
the operands. The reason why those calls were being performed was that the VM had not
added support to inline arithmetic SIMD operations yet. Therefore, this section presents
changes that were made to avoid those benchmark regressions.

The inliner had to be changed to include those operations. Also, to make them work
properly, null checks need to be performed when inlining them. The reason is that, when
an optimization is performed, IL instructions might be inserted as speculative, so that
if the optimization should not have been done and, for instance, types mismatch, it can
bail out and run again with unoptimized code. This is how runtime optimizations are
performed in JIT mode. In AOT, if it bails out, the function is recompiled with some
optimizations disabled; since this is not intended to happen in this case, null checks are
performed so that any boxing or unboxing instruction later inserted is guaranteed to be
non-speculative (unnecessary checks usually are removed automatically).

Additionally, when the first operand (the receiver) of a call is null, the exception thrown
is NoSuchMethod, since that operation is not a method of null; but, if the second operand
41

is null, the exception ArgumentError is expected to be thrown. However, the IL instruction

for null checks was implemented only to throw NoSuchMethod exceptions, so it had to be
updated. The solution was to create an enumeration ExceptionType which corresponds
to the type of the exception to be thrown, and to set this value when creating the null
check instruction. Link 22 of the appendix provides this update in the IL instruction.

In order to generate smaller code, shared stubs are used by the slow path of null
checks, which is executed when the exception needs to be created. Thus, another stub
was created, responsible for throwing ArgumentError exceptions, so that each null check
instruction would use the appropriate stub according to its enumeration value. Link 23 of
the appendix contains the implementation for this stub on ARMv8. The code is similar
for the other architectures.

The mentioned changes in the inliner are available through link 24. Finally, link 25 of
the appendix provides the complete CL. This change, along with the unboxed fields, not
only avoided regressions on those SIMD benchmarks, but also achieved runtime speedups
by at least a factor of 3 when compared to before any changes. Those results are discussed
in detail in chapter 4.
42

3 Unboxing numeric parameters

After unboxed instance fields were enabled, the next step of the internship was to
extend the calling conventions to support unboxed parameters across static and instance
calls.

The language supports required and optional parameters. The required parameters are
the first ones and they are positional, i.e., the position of each argument determines its
corresponding parameter. Optional parameters can be either positional or named (which
needs to have the name of the parameter specified jointly to the value being passed). Dart
also supports generic methods; in these cases, an array called type arguments provides
type information for each invocation of the method.

The calling conventions of Dart functions before changes were: if the function expects
a type arguments array, it is pushed onto the stack before any arguments. Then, all
arguments of the call, starting with the receiver (if any), are tagged and pushed onto the
stack in the same order as they are passed in the source code. If the function needs more
information, e.g. it has optional parameters or it expects type arguments or it is a closure,
metadata about the arguments is passed in a register. This register holds a pointer to
an array-like object, named ArgumentsDescriptor. The arguments descriptor contains
information such as the length of the type arguments array, the names of the named
parameters that were passed and their position, and the total number of arguments. Finally,
when the function returns, the value is boxed and passed in a fixed register. To illustrate
this, consider the sample program 4. The bar method calls the updateVal method of
class Foo, to update the value of receiverFoo. The arguments of the call are another
instance of Foo, named multiplierFoo, and the result of the sum of integers toAdd and
dummyValue. The updateVal method updates the value inside the receiver multiplying it
by the value of the first parameter and then adding the value of the second parameter.
After the call, the value of dummyValue is subtracted from the field of receiverFoo, and
the result is returned from bar.
43

class Foo {
int val;

Foo(this.val);

// pragma annotations are used to avoid the calls

// to be replaced by the inlined function
@pragma('vm:never-inline')
void updateVal(Foo multiplier, int toAdd) {
val = val * multiplier.val + toAdd;
}
}

@pragma('vm:never-inline')
int bar(Foo receiverFoo, int toAdd,
int dummyValue, Foo multiplierFoo) {
receiverFoo.updateVal(multiplierFoo, toAdd + dummyValue);
receiverFoo.val -= dummyValue;
return receiverFoo.val;
}

main() {
var result = bar(Foo(5), 3, 4, Foo(2));
print(result); // prints 5 * 2 + 3 == 13
result = bar(Foo(3), 4, 5, Foo(6));
print(result); // prints 3 * 6 + 4 == 22
}

Program 4: Sample program to illustrate the calling conventions.

Figure 9 contains the CFG for the function bar before the changes presented in this
chapter were enabled. It was simplified to elide parallel moves of registers and other
extra pieces of information not important for this example. Notice that, before adding
the values of parameters toAdd and dummyValue, it needs to unbox them (lines 8 and
9). Then, to pass the result as an argument, it uses a boxing instruction (line 11). Lines
12, 13 and 14 push onto the stack the receiverFoo parameter (whose SSA variable is
v2), the multiplierFoo parameter (v5), and the result of the sum (variable v28, which
represents the result of the boxing instruction), respectively. After the call, the field val
of receiverFoo is loaded (line 16), operated (line 17), updated (line 18) and then the
result needs to be boxed (line 19) to be returned. Since this flow graph was built with
the changes of chapter 2, there was no unboxing instruction after loading the field and no
boxing instruction before storing into it, as it was already unboxed.

Figure 9: Simplified control-flow graph for function bar of program 4.

In general, in scenarios where functions are, for instance, just abstractions for operating
numeric values, it was necessary to box the arguments, make the call, unbox them in the
target function to perform the operation, operate them, box the return value, return it
and, possibly, unbox the result after the call to continue operating with it. The overhead
of boxing and unboxing operations would be avoided if those values were passed unboxed
from the call site to the target function. It is the focus of this chapter.

Section 3.1 explains the initial change for updating static methods and constructors to
45

support unboxed values. Section 3.2 extends those updates to interface calls.

3.1 Unbox parameters for static methods

The reason for treating static methods and constructors separately from the other
calls in the initial change is that calls for those methods are easier to update, because they
have only one possible target, meaning that those methods can be treated individually. For
interface calls, since Dart is object-oriented, the target of that call might be any method
that is an implementation of that interface, but the arguments passed in the call site must
be compatible with what is expected by the callee, for every possible callee. In this section,
this scenario is avoided by the fact that the call site always specifies the exact target to be
called.

3.1.1 Identify unboxed parameters

Similarly to the unboxed fields, it is necessary to identify the parameters (and return
values) that can be unboxed. Since there is no class in the Dart VM for parameters, this is
done in the RawFunction class. Again, the checking occurs in the kernel loader, retrieving
information from TFA to know the inferred type of the parameters and their nullability.
Although the language supports required and optional parameters, the latter are used less
frequently than the required ones, and unboxing them would require more effort. Also,
optional parameters are nullable many times, being null their default value fairly often. In
addition, there was a feature being developed in parallel in the VM team to make optional
parameters to be seen as required if they are passed in every call to the function, reducing
the number of optional parameters in the VM. For all those reasons, only the required
parameters were decided to be unboxed, including required parameters of functions that
also have optional parameters. The verification of the type and nullability of parameters
is therefore performed only for the required ones. Link 26 of the appendix provides the
changes for it.

An auxiliary method, Function::can_unbox_parameters, was added to check if the

function is static or a constructor. It also verifies if the function is not native, because
there are not many native functions in general and changing the way arguments are passed
to them would require changes in the foreign function interface (FFI) calling conventions,
which would be out of the scope of the internship. Link 27 contains the implementation of
this method.
46

Another bitmap was used to store information about the unboxed parameters and
the return value. However, the RawFunction class does not store the inferred type of
each parameter, and this is important to determine the type of the unboxed values, for
instance, when allocating the appropriate register for those values. As creating an array
of types and storing it for each instance of RawFunction would be expensive regarding
memory usage, it was decided to unbox only int and double values, as they are much
more commonly used than SIMD types. The bitmap was implemented to use two bits
for each value: the first one says whether the value should be boxed or unboxed, and the
second one says whether the value is integer or floating-point, in case of being unboxed.
The bitmap implementation can be found through link 28 of the appendix.

3.1.2 Update representations

Representation is an enumeration that indicates the representation of SSA vari-

ables. Examples of representation values are: kTagged, kUnboxedDouble, kUnboxedInt32,
kUnboxedInt64 and kUnboxedFloat32x4, among others. They are useful, for instance,
for automatically inserting boxing and unboxing IL instructions. This is done using two
methods of every IL instruction: Instruction::representation, the representation of
that SSA definition, which returns the representation of the output value produced by
that instruction; and Instruction::RequiredInputRepresentation, which receives an
index and returns the expected representation for the input at that index.

Instructions related to calls had those methods always returning kTagged values, so
they had to be updated to retrieve the appropriate representation using the bitmap of
the function. Those IL instructions are: ParameterInstr, used in the function entry to
represent the parameters; StaticCallInstr, used in the call site to perform the call;
ReturnInstr, for returning from the target to the call site; and PushArgumentInstr, used
for pushing a given argument before performing the call. Link 29 of the appendix shows
the changes on the methods of those instructions. It is also possible to see changes that
add the arguments size to the ArgumentsInfo: this class is used to create the arguments
descriptor, and section 3.1.5 explains these changes.

Additionally, the ParameterInstr was updated to store a parameter offset. This is

important for finding the location of a given parameter in the stack. Before that, the index
of the parameter was enough for finding its position, since tagged values always occupy
one word. The newly stored offset corresponds to the offset, in words, of the last slot of
the parameter, relative to the first slot of the first parameter of that function. Link 30 of
47

the appendix contains the changes for calculating this offset.

The representation and the offset of parameter instructions are used when processing
the entry of functions, during the register allocation step, which determines the location for
each SSA variable. It uses a linear-scan algorithm (POLETTO; SARKAR, 1999), estimating
the live ranges of the variables by live intervals and performing the allocation with linear
time complexity on the total number of variables. It was updated to create two live ranges
for the arguments that require a pair of registers to represent them (which is the case of
unboxed integers on 32-bit platforms), and to use the offset in the parameter instruction
and its representation (since it is not always kTagged anymore) to compute the location
of the parameter. Those changes are available through link 31 of the appendix.

3.1.3 Update Dart calling conventions

The calling conventions had to be updated for supporting unboxed values: as they were
always boxed, regular registers were expected to hold the arguments before pushing them
onto the stack, and the return value was always stored in a specific register (R0 on ARM
architectures and RAX on x64), but unboxed values might require different registers. For the
case of floating-point values, FPU registers are needed, and the following ones were chosen
for the return value on each architecture: V0 on ARMv8, Q0 on ARMv7 and XMM0 on x64.
Also, since Dart uses 64-bit integers, on ARMv7 the result was decided to be stored in the
pair (R0, R1) of registers if the value is an unboxed integer. Those changes were reflected
in the generated code of the following IL instructions: LoadIndexedUnsafeInstr, used
for loading the arguments from the stack; PushArgumentInstr for pushing the arguments
from registers to the stack; ReturnInstr for storing the result in the appropriate register;
and call instructions, such as StaticCallInstr, which had to have the output register
updated. The following links provide those changes:

• ARMv7 architecture: link 32 of the appendix;

• ARMv8 architecture: link 33 of the appendix;

• x64 architecture: link 34 of the appendix.

3.1.4 Update stack maps

Stack maps are bitmaps used to identify pointers in the stack. They are used by the
GC, when visiting and updating stack pointers to objects that are moved.
48

The values stored on stack frames start (ignoring the return address, which is not
represented in the stack maps) with spill slots, that are values that had to be pushed
onto the stack because there were not enough registers to store all live values. Then, there
might be outgoing arguments, i.e., arguments that were pushed by PushArgumentInstr IL
instructions, preceding a call. This is the default case when visiting stack frames, however,
some IL instructions that have a fast path and a slow path could need to push more values
onto the stack (and pop them after that instruction) when running the slow path. This
is the case for the boxing instructions, which could trigger the garbage collector if there
is not enough space to allocate a box for the given value — as discussed in section 2.2
—, so the live registers need to be pushed onto the stack. Moreover, when running the
slow case of the CheckedSmiOpInstr IL instruction (which happens when at least one of
the operands is not a smi), not only do the live registers need to be saved, but a call also
needs to be performed, even though there were no PushArgumentInstr preceding it; in
this case, the slow path code pushes the arguments for this operation.

Since the outgoing arguments were always boxed, the stack maps did not have bits for
those values. Therefore, after supporting unboxed arguments, new bits had to be added to
that bitmap to describe them. The stack maps are serialized as read-only data in binaries,
so it is important to avoid serializing unnecessary bits in order to reduce the metadata for
generated code — this is the reason why the outgoing arguments were not represented
there. The algorithm for visiting a stack frame consists of using the stack maps for the
first spill_slot_count slots (the number of spill slots) of the frame, then using the
remaining bits of the bitmap to visit the end of the frame, and finally visiting each slot of
the remaining range assuming that it is tagged.

The algorithm described above did not have to change: the bitmap only had to
add new bits for the last outgoing arguments, starting from the first unboxed argu-
ment, as the arguments pushed before this one will continue to be considered tagged.
Link 35 of the appendix contains the implementation for this algorithm (inside the
StackFrame::VisitObjectPointers method).

Figure 10 illustrates the stack map of a frame that has five spill slots and five slots
for outgoing arguments, after the changes. Notice that, besides the spill slots, the bitmap
contains bits only for the last three slots of outgoing arguments, because this is the smallest
suffix of those slots containing the values of all unboxed arguments. The other two slots
for outgoing arguments are then considered tagged and visited, following the described
algorithm.
49

Figure 10: Stack frame and stack map.

The FlowGraphCompiler::RecordSafepoint method is responsible for populating

the bitmap. The changes for adding new bits to describe the unboxed arguments are
available through link 36 of the appendix.

3.1.5 Update the arguments descriptor

ArgumentsDescriptor is an object that provides information about the arguments

of a given call, as previously mentioned. It is used, for instance, for handling optional
parameters, giving the number of arguments and the names of the named parameters that
were passed, or to handle the type arguments in generic functions. Whenever necessary,
this is done in the beginning of the function, called prologue, before actually running the
Dart code in the function body.

For optional parameters, the prologue processes the arguments in the order they were
pushed and, in order to obtain the position of the first argument on the stack, the number
of arguments was used as an offset (in words) to jump from the last pushed argument to
the first one. Once again, this could be done because all arguments were tagged, so each
of them used to require exactly one-word size. The process for handling the type of the
arguments in generic functions is similar: for those cases, an array of types is passed as an
extra argument before all the others, so the number of arguments was used to calculate
50

the position of this array.

After supporting unboxed parameters, knowing the number of arguments was not
enough for calculating their size anymore. For this reason, the arguments descriptor was
updated to include this information. Link 37 of the appendix contains the changes for
including the arguments size in the ArgumentsDescriptor. The changes in the prologue
builder for calculating the offset of the arguments using information about the unboxed
parameters are available through link 38.

3.1.6 Intrinsics

Some functions that belong to a list of recognized methods do not have part of their
code (or their entire code) generated automatically from Dart source code. This is done in
order to make such functions more optimized, and examples of them are: the sqrt and
cos functions on numeric values; and the _allocate method of _OneByteString, used to
allocate strings. These optimized pieces of code are called intrinsics, of which there are two
kinds in the Dart VM: graph intrinsics, which are custom-built IL graphs; and assembly
intrinsics, hand-written snippets of assembly.

Most assembly intrinsics are used for native functions, which do not have unboxed
parameters and return values (as mentioned in section 3.1.1). Nevertheless, to avoid
validating all other assembly intrinsics, we have decided to disable parameter and return
value unboxing for functions that have intrinsics of this kind. Therefore, those functions
had their bitmaps of unboxed parameters and return value reset. This is done when
creating the initial state of recognized methods, just after the kernel is loaded. Link 39 of
the appendix provides the changes for this.

Graph intrinsics, however, were updated to add or remove IL instructions for boxing
and unboxing parameters and the return value according to the bitmap, to make them
compatible with the changes in this chapter. The changes for checking if those instructions
are needed and adding them are available through link 40.

This, along with some small tweaks that mainly reflect changes already discussed, ends
the CL for unboxed parameter and return values when performing calls to static methods
and constructors. Link 41 of the appendix provides the entire CL.
51

3.2 Unbox parameters for interface calls

Aside from static calls, instance calls are also frequently used. Instance calls can be
either dynamic or interface calls. In Dart, type annotations are optional, so, when it is
not possible to know statically the type of a variable which is the receiver of a call, a
runtime check is performed to figure that out and then the target function is looked up
— in this case, a dynamic call is performed. Conversely, if it is possible to know that the
variable is of a given type that has a method with that name, we know the interface of the
target, although the target itself might be any method of a derived class that overrides or
implements that interface. One important thing to note is that the method of a class can
be identified just by its name, because Dart does not have overloaded methods (if this was
the case, then the entire signature of the method would be required).

This section is focused on interface calls. The reason is that, when performing dynamic
calls, we do not have any information on the interfaces or the possible targets of the call
at compile time, so the only way of unboxing parameters would be to unbox the same
parameters in all functions with that name, so that we would be sure that the code at the
call site is compatible with all the possible targets; however, most of the parameters would
probably not be unboxed since it is unlike that every method with a given name receive
a non-nullable parameter with the same type at a given position. For interface calls, it
is possible not to aggregate the methods just by their name, but instead use the class
hierarchy to partition the methods into disjoint sets in such a way that minimizes the size
of each set, while ensuring that the targets of an interface call are always in the same set:
by doing so, the unboxed parameters of a given function only need to be compatible with
the other functions in its set.

3.2.1 Create a partition of the methods

The partitioning algorithm was implemented to run with the AOT transformations,
just after the types inferred by TFA are available. The approach used was the following:
suppose that each (possibly abstract) method in the Dart program is a vertex, and add
an undirected edge between two vertices if and only if one of their respective methods
overrides the other. Then, the connected components form a partition of the methods (i.e.,
the vertices) which ensures that targets of any interface call are always grouped together.
This was done using a Disjoint-Set Union (DSU) data structure (GALLER; FISHER, 1964),
using path compression and union by rank.
52

Firstly, each method is assigned into a singleton. Then, the methods are processed,
one by one. When processing a method s, the direct parent classes of its enclosing class
(including, in addition to the base class, any interface class that it implements) are visited
and, if any of them has a method with the same name as s, its set is merged with the set
of s. After that, the parents of those super classes are visited recursively in the same way.
One optimization performed is that: if a class c already has a method t overridden by s,
then the parents of c are not visited recursively, because s is already connected to t, and t
will be connected to the methods it overrides after being processed, hence s will already
be connected through t to those methods.

For each set in the partition, an unboxing metadata called UnboxingInfoMetadata

is assigned, containing information about the parameters and the return value of the
methods in that set. Integer values are used to represent the status of a value: 0 means
that the value should be boxed; 1 means that it is an unboxing candidate of int type; 2
means that it is an unboxing candidate of double type; and 3 means that it is an unboxing
candidate, but without any information on the type — this is the initial value. Thus, the
unboxing metadata has one integer field that describes the status of the return value, and
a list of integers representing the status of the parameters. The length of that list is the
minimum value among the number of parameters of all the methods in the set, because
any parameter after that must be optional and therefore is not unboxed. Link 42 of the
appendix contains the unboxing metadata implementation.

After creating the partition, the status in the unboxing metadata of the parameters
and the return value of each method are updated using the types inferred from TFA. The
status will be updated to indicate that the value should be boxed if any of the following
conditions is true: (1) the inferred type is not double or int; (2) the inferred type is
nullable; (3) the inferred type is double and the status indicate unboxing candidate of
int type; or, analogously, (4) the inferred type is int and the status indicate unboxing
candidate of double type. Otherwise, the status is updated in the following way: if the
status was boxed, it remains boxed; else if the inferred type is int, it becomes an unboxing
candidate of int type; otherwise, it becomes an unboxing candidate of double type.

To illustrate the partitioning algorithm, consider the class diagram in figure 11. Also,
consider that the types of the method signatures correspond to the respective inferred
types and those types are not nullable. There would be four disjoint sets of methods bar:

• The first set would contain the methods of classes Base1, Foo1 and Foo2, since
both Foo1::bar and Foo2::bar override Base1::bar. Since all of them receive a
53

Figure 11: Class diagram to illustrate partitioning algorithm.

non-nullable integer parameter, this parameter could be unboxed.

• The second disjoint set would contain the methods Base2::bar and Foo3::bar,
because the former is overridden by the latter, even though Base2 is not a direct
parent class of Foo3. Since Base2::bar expects a value of type Object, it needs to
be boxed.

• The third set contains only the Foo4::bar method, as it does not override any other
method and there is no derived class of Foo4. Since it expects a non-nullable double
value as parameter, it can be unboxed.

• Finally, the last disjoint set of this example includes the methods Base5::bar,
Foo5::bar and Foo6::bar, given that the last two override the first. In this case,
Base5 is not the base class of Foo5, but, instead, a class whose interface is imple-
mented by Foo5. Since Foo5::bar expects a double value and Foo6::bar expects
an int value, their parameters need to be boxed.
54

Some methods are marked to always have boxed values: native methods, as discussed
in section 3.1.1; methods whose enclosing classes are subtypes of num, e.g. int, because those
methods might be replaced by an optimized IL instruction called CheckedSmiOpInstr,
which always expects the arguments to be boxed; and methods that do not need dynamic
invocation forwarders (the reason for this is explained in section 3.2.2).

The implementation of the DSU data structure and its auxiliary functions can be found
through link 43 of the appendix. Notice that each method is processed in the algorithm
as a Member. This is done because the Member class includes Field, Constructor and
Procedure, however, fields are ignored since the changes in chapter 2 are already responsible
for them. Getters and setters are also ignored.

The registerMember method in the file mentioned above creates a new singleton for
each valid Member; linkWithSuperClasses links a method with all methods it overrides;
finishGraph clears auxiliary information such as the ranks, since no more sets will be
merged, and it makes the unboxing metadata of each set to require boxed values if any
method in that set is marked to do so; applyToArg and applyToReturn update the status
in the unboxing metadata of a given parameter and return value, respectively, according
to the type inferred by TFA; getUnboxingInfoOfMember is used to access the unboxing
metadata of a given member, possibly returning null if the member was ignored, e.g. if
the member is a field. The partitioning algorithm runs after TFA and the metadata is
accessed when annotating the kernel — link 44 of the appendix contains the changes for
this.

In the file mentioned above, each method is annotated with its respective unboxing
metadata, which is serialized in the kernel. Then, in the VM, the kernel loader deserializes
this information to fill the bitmap (described in section 3.1.1). Link 45 of the appendix
provides the changes in the kernel loader. A C++ version of the unboxing metadata,
UnboxingInfoMetadata, is used, whose implementation is available through link 46.

3.2.2 Update dynamic invocation forwarders

When dynamic calls are performed, the target of the call needs to be found at runtime.
Consequently, static checks for the parameters are not possible. For this reason, there are
special automatically generated methods, called dynamic invocation forwarders, which are
called instead of the target itself. Those functions perform type checks on the arguments
before forwarding the call to the actual target, so that type errors are still caught.
55

class Foo {
void bar(int arg) { /*....*/ }
}

main() {
final foo = Foo();
bar(foo);
}

bar(dynamic obj) {
obj.bar(1); // calls obj.dyn:bar(1), which
// forwards the call to obj.bar(1)

obj.bar("a"); // calls obj.dyn:bar("a"), which

// throws an exception
}

Program 5: Use of dynamic invocation forwarders.

Algorithm 5 illustrates how dynamic invocation forwarders are used. The bar method
receives an object of dynamic type as argument. The dynamic type is a top type, just like
Object, i.e., it accepts values of any type; the difference between these two type annotations
is that the static analysis always accepts accesses to members of dynamic values (as long
as there are no errors in any subterm of the expression, e.g. in the arguments), and the
runtime checks ensure soundness in such cases.

The code for the Foo::bar method expects one parameter of integer type, so a dynamic
invocation forwarder, named dyn:bar, is created to check if dynamic calls to this target
match its signature. When obj.bar(1) is executed, obj.dyn:bar is called, and it verifies
that the argument is indeed an integer; then, it forwards the call to Foo:bar. In contrast,
when obj.bar("a") is executed, the code for obj.dyn:bar detects that the call does not
match the interface of the function, therefore it throws an exception.

Dynamic calls always receive and return boxed values, so they were updated to unbox
the parameters using the bitmap before forwarding the call, and possibly box the return
value of the call, if it was unboxed, before returning it. By doing so, we ensure that dynamic
calls work for targets that expect unboxed values. Link 47 of the appendix contains the
changes in the CFG of dynamic invocation forwarders.
56

However, there is a special case for dynamic calls: some functions do not have dynamic
invocation forwarders, and they are called directly from the call site. This happens with
functions that have no parameters and functions whose parameters do not need to have
their types checked. This means that the dynamic invocation forwarder of a function is
created if at least one parameter needs to be checked before the call. A parameter does
not need to be checked if it is covariant (in those cases, the checks are performed in the
body of the target function) or if it is of a top type (because in this case values of any
type would be accepted). For this reason, methods that do not need dynamic invocation
forwarders always expect boxed values, as mentioned in section 3.2.1.

Link 48 of the appendix provides the complete CL. It includes changes in many
expectation files, containing the expected kernel of sample Dart programs, since the kernel
has changed to include the unboxing information metadata. It also contains updates in
IL instructions analogous to the ones presented in section 3.1.2 for static calls, but now
regarding other types of calls: InstanceCallBaseInstr and DispatchTableCallInstr,
that can be used to perform interface calls. This ends the last CL submitted during the
internship.
57

4 Results and discussion

After all CLs had been submitted, a new patch was created disabling all changes in
this project, to assess its total impact on the benchmarks. An internal system, managed
by a different team, is responsible for configuring and distributing the load among several
identical servers dedicated to running those benchmarks.

Firstly, the Flutter benchmarks had improvements in runtime, as it is verifiable on

tables 1 (on ARMv7) and 2 (on ARMv8). The ARMv7 benchmarks run on Moto G4 and
Pixel 2 devices, while ARMv8 only runs on Pixel 2. These benchmarks exercise mainly
some methods of Flutter packages, for instance to perform matrix transformations that
are useful for computer graphics.

The runtime value is calculated after running the benchmarks 10 times and taking
the best run, in order to avoid noise (e.g. caused by swapping, by input and output
operations on disk, or by other running processes). The changes are calculated based on
the actual runtime of the applications (less is better). The internal system also computes
a significance value, so that noisy benchmarks, whose runtime results are unstable, have a
significance closer to 0.0 to avoid highlighting unreal improvements or regressions. The
results are sorted by significance, from the worst to the best result. The first column of
each table contains the benchmark name; the second one provides the percentage of change
in runtime; and the last column contains the significance value.

The benchmarks with the worst results had their significance near 0 in those cases,
even though some of them seemed to have a big change in the final score, such as
rrect_contains_bench (Moto G4) on ARMv7, that had a regression of 36.16%. Besides
the significance, other factors indicate that this was not a real regression, such as the fact
that this benchmark only regressed 15.21% on other device (Pixel 2), and even improved
4.07% on ARMv8 — both with 0.0 significance. In contrast, the improvements at the
bottom rows were much more significant and consistent on both ARMv7 and ARMv8.

Still regarding the Flutter benchmarks, the code size also had great improvements,
58

Benchmark Runtime Significance

rrect_contains_bench (Moto G4) +36.16% -0.1
animation_bench_open_first_frame (Moto G4) +6.09% -0.1
sync_star_sync_star_semantics (Moto G4) +1.97% -0.1
rrect_contains_bench (Pixel 2) +15.21% -0.0
build_bench (Pixel 2) +0.39% -0.0
animation_bench_total (Pixel 2) +0.02% -0.0
gesture_detector (Moto G4) +1.03% -0.0
gesture_detector (Pixel 2) +0.7% -0.0
sync_star_list_semantics (Moto G4) +0.17% -0.0
layout_bench (Pixel 2) -0.12% 0.0
sync_star_sync_star (Pixel 2) -1.31% 0.0
sync_star_list_semantics (Pixel 2) -0.39% 0.0
layout_bench (Moto G4) -0.35% 0.0
sync_star_sync_star_semantics (Pixel 2) -0.56% 0.0
sync_star_list (Moto G4) -2.19% 0.0
sync_star_generated (Moto G4) -4.19% 0.0
sync_star_sync_star (Moto G4) -5.66% 0.0
sync_star_list (Pixel 2) -3.27% 0.0
matrix_utils_transform_point_perspective (Moto G4) -4.66% 0.0
build_bench (Moto G4) -0.39% 0.0
animation_bench_subsequent_frame (Pixel 2) -4.25% 0.1
animation_bench_total (Moto G4) -1.18% 0.1
animation_bench_close_first_frame (Pixel 2) -3.26% 0.1
animation_bench_open_first_frame (Pixel 2) -10.76% 0.1
matrix_utils_transform_point_affine (Moto G4) -9.18% 0.1
animation_bench_subsequent_frame (Moto G4) -4.97% 0.1
animation_bench_close_first_frame (Moto G4) -6.3% 0.2
sync_star_generated (Pixel 2) -6.36% 0.2
matrix_utils_transform_rect_affine (Moto G4) -14.31% 0.2
matrix_utils_transform_rect_perspective (Moto G4) -10% 0.3
velocity_tracker_bench (Moto G4) -8.62% 0.4
matrix_utils_transform_rect_perspective (Pixel 2) -12.87% 1.4
matrix_utils_transform_point_perspective (Pixel 2) -6.92% 2.3
matrix_utils_transform_point_affine (Pixel 2) -8.93% 3.1
velocity_tracker_bench (Pixel 2) -9.69% 4.6
matrix_utils_transform_rect_affine (Pixel 2) -15.99% 5.1

Table 1: Runtime results on Flutter benchmarks (ARMv7).

Benchmark Runtime Significance

animation_bench_close_first_frame +8.2% -0.2
animation_bench_total +0.02% -0.0
sync_star_sync_star_semantics +0.44% -0.0
layout_bench +0.21% -0.0
gesture_detector -0.04% 0.0
build_bench -0.32% 0.0
sync_star_sync_star -1.38% 0.0
sync_star_list_semantics -0.98% 0.0
rrect_contains_bench -4.07% 0.0
animation_bench_open_first_frame -4.93% 0.0
animation_bench_subsequent_frame -3.5% 0.0
sync_star_list -6.72% 0.1
sync_star_generated -8.99% 0.2
matrix_utils_transform_rect_perspective -16.33% 1.4
velocity_tracker_bench -12.91% 4.0
matrix_utils_transform_rect_affine -30.67% 4.4
matrix_utils_transform_point_affine -25.31% 6.0
matrix_utils_transform_point_perspective -24.83% 6.6

Table 2: Runtime results on Flutter benchmarks (ARMv8).

being the most important results presented on tables 3 and 4. The total size of all
applications decreased, with most of the regressions happening in the read-only data size,
mainly because of the extra bits that were added on the stack maps (described in section
3.1.4). The Flutter Gallery application is a collection of many of the features available
on Flutter, and it is the main application used for evaluating the impact on code size
(the other benchmarks are mainly used to assess runtime performance). The total size
of the gallery decreased 2.05% on ARMv7 and 1.06% on ARMv8. This difference was
caused mainly because of the updates to reduce the size of boxing instructions, described
in section 2.2, that only happened on ARMv7.

One important thing to highlight is that the changes for unboxed parameter and
return values did not have much impact on code size: the changes for unboxed values when
performing interface calls (section 3.2) only decreased the Gallery total size in 0.15% and,
before that, for changes for static calls had even increased the total size in 0.13%. The
reason for having a small impact is probably that there are still many places that use
boxed values, so even though we achieved more unboxed values, it is necessary to pay for
extra boxing and unboxing instructions. This being the case, it is expected that future
changes that enable more values to be unboxed, including NNBD, will make the positive
impact of unboxed parameters more significant. In any case, those changes had a great
60

Benchmark Code size

rrect_contains_bench_readonlydata_size +1.082%
sync_star_readonlydata_size +1.048%
sync_star_semantics_readonlydata_size +1.041%
matrix_utils_transform_readonlydata_size +0.993%
velocity_tracker_bench_readonlydata_size +0.719%
hello_world_readonlydata_size +0.562%
gesture_detector_readonlydata_size +0.295%
layout_bench_readonlydata_size +0.252%
build_bench_readonlydata_size +0.182%
animation_bench_readonlydata_size +0.156%
flutter_gallery_readonlydata_size +0.133%
layout_bench_total_size -2.012%
build_bench_total_size -2.016%
animation_bench_total_size -2.034%
flutter_gallery_total_size -2.05%
gesture_detector_total_size -2.292%
hello_world_total_size -3.043%
flutter_gallery_instructions_size -3.076%
build_bench_instructions_size -3.448%
layout_bench_instructions_size -3.462%
animation_bench_instructions_size -3.468%
rrect_contains_bench_total_size -3.811%
gesture_detector_instructions_size -3.851%
sync_star_semantics_total_size -3.876%
sync_star_total_size -3.88%
matrix_utils_transform_total_size -3.955%
velocity_tracker_bench_total_size -4.255%
hello_world_instructions_size -5.278%
rrect_contains_bench_instructions_size -6.568%
sync_star_semantics_instructions_size -6.673%
sync_star_instructions_size -6.690%
matrix_utils_transform_instructions_size -6.736%
velocity_tracker_bench_instructions_size -7.092%

Table 3: Code size results on Flutter benchmarks (ARMv7).

Benchmark Code size

layout_bench_readonlydata_size -0.02%
build_bench_readonlydata_size -0.022%
gesture_detector_readonlydata_size -0.035%
animation_bench_readonlydata_size -0.039%
flutter_gallery_readonlydata_size -0.058%
rrect_contains_bench_readonlydata_size -0.081%
sync_star_semantics_readonlydata_size -0.161%
matrix_utils_transform_readonlydata_size -0.175%
sync_star_readonlydata_size -0.186%
hello_world_readonlydata_size -0.258%
velocity_tracker_bench_readonlydata_size -0.3%
rrect_contains_bench_total_size -0.345%
sync_star_semantics_total_size -0.376%
sync_star_total_size -0.386%
gesture_detector_total_size -0.5%
matrix_utils_transform_total_size -0.535%
layout_bench_total_size -0.64%
build_bench_total_size -0.641%
animation_bench_total_size -0.644%
rrect_contains_bench_instructions_size -0.661%
sync_star_semantics_instructions_size -0.663%
sync_star_instructions_size -0.676%
velocity_tracker_bench_total_size -0.677%
hello_world_total_size -0.685%
gesture_detector_instructions_size -0.895%
matrix_utils_transform_instructions_size -0.964%
velocity_tracker_bench_instructions_size -0.966%
flutter_gallery_total_size -1.059%
animation_bench_instructions_size -1.152%
layout_bench_instructions_size -1.154%
build_bench_instructions_size -1.154%
hello_world_instructions_size -1.202%
flutter_gallery_instructions_size -1.474%

Table 4: Code size results on Flutter benchmarks (ARMv8).

impact, which will be discussed next, on the performance of many Dart programs.

Links 49 and 50 of the appendix provide the entire figures with all changes in the
Flutter benchmarks on ARMv7 and ARMv8, respectively.

Additionally, several benchmarks are responsible for evaluating performance, among

other features, of Dart programs in general, not necessarily related to Flutter. In particular,
there are many benchmarks focused on the Dart AOT compilation. Table 5 shows the
40 Dart AOT benchmarks that had the most significant regressions on runtime of Dart
AOT, on x64 architecture, while table 6 shows the 40 Dart AOT benchmarks with the
greatest runtime improvements. The runtime for those was represented as a score, such
that a higher score is better, as opposed to the other tables. The score increases at the
same ratio as the actual runtime decreases, and it is also obtained by taking the best out
of 10 runs for each benchmark.

Link 51 of the appendix contains a figure with the results of all Dart AOT benchmarks
on x64.

More than 75% of the results on table 5 were regressions with absolute value less
than 4% in runtime. Only three of those benchmarks had significance with absolute
value greater than 1.0. Most of the regressions happened on noisy benchmarks and micro
benchmarks, which are used to assess very specific language features or scenarios. Among
those results, one that is important to highlight is the Splay benchmark, which uses splay
trees (SLEATOR; TARJAN, 1985). This data manipulation benchmark, focused on object
creation and destruction, is responsible for stressing the GC: the regression of -3.8% was
mainly a consequence of the extra overhead caused by using the bitmap of unboxed fields
(section 2.6) when the GC is visiting object pointers.

In contrast, table 6 shows more significant results — here, none of the 40 improvements
had a significance less than 1.0, and only three were less than 4% of improvement. Whereas
the most significant regression was -5.54%, the most significant improvement was 1701%.
All the SIMD benchmarks improved at least 200%, due to the changes described in section
2.8, however non-SIMD benchmarks also had great results, such as NBody, which improved
131.1% on Intel Xeon and 105.1% on Intel Core i5. Some of the main benchmarks and
their runtime results on x64 (Intel Xeon) are presented below:

• DeltaBlue: This benchmark was initially written in SmallTalk (GOLDBERG; ROB-

SON, 1983) and implements the classic DeltaBlue Constraint Solver (FREEMAN-
BENSON; MALONEY, 1989), which exercises many method calls. Besides the classical
63

Benchmark Runtime Significance

Score
DartMicroBench.PrimeNumberTyped (Intel Core i5) -5.54% -3.9
DartMicroBench.PrimeNumber (Intel Core i5) -5.63% -2.8
TypeChecks.IsCheck (Intel Core i5) -9.64% -1.6
Dromaeo.ObjectString (Intel Core i5) -%2.74 -0.8
TypeChecks.IsCheck (Intel Xeon) -9.03% -0.8
StarryStrings (Intel Xeon) -5.12% -0.8
IsolateJson.Decode1MBx4 (Intel Xeon) -7.53% -0.8
Isolate.SendReceiveBytesTransferable100MB (Intel Core i5) -2.76% -0.7
DartMicroBenchMM.MaxLib (Intel Core i5) -3.99% -0.7
DartMicroBench.PrimeNumber (Intel Xeon) -3.65% -0.7
Dromaeo.ObjectString (Intel Xeon) -3.3% -0.6
FfiBoringssl.DigestDartMemory (Intel Core i5) -2.61% -0.6
DartMicroBench.PrimeNumberTyped (Intel Xeon) -3.33% -0.6
Splay (Intel Xeon) -3.8% -0.6
TypedData.Int8ListViewVarBench (Intel Core i5) -5.67% -0.5
DartMicroBenchMM.MinLib (Intel Core i5) -1.36% -0.4
DartMicroBenchMM.MaxLib (Intel Xeon) -2.03% -0.4
Iteration.tree.manual (Intel Xeon) -3.57% -0.4
MicroClosureInvokeLocalClosed (Intel Xeon) -2.32% -0.4
FfiBoringssl.DigestCMemory (Intel Core i5) -2.1% -0.3
StarryStrings (Intel Core i5) -3.39% -0.3
ForEachLoop (Intel Xeon) -3.26% -0.3
TypedData.Int64ListVarBench (Intel Core i5) -0.43% -0.3
DartMicroBenchMM.MaxCode (Intel Core i5) -3.35% -0.3
FfiMemory.PointerInt64 (Intel Core i5) -1% -0.3
DartNTZ.NTZ32 (Intel Core i5) -0.69% -0.3
Base64Encoder (Intel Xeon) -3.22% -0.3
MicroClosureInvokeTopLevel (Intel Xeon) -7.78% -0.2
Isolate.SendReceiveBytes100KB (Intel Xeon) -1.22% -0.2
MicroClosureInvokeLocalFree (Intel Xeon) -1.48% -0.2
TimerRunTest (Intel Xeon) -2.15% -0.2
DartMicroBenchMM.MinCode (Intel Core i5) -1.29% -0.2
BigInt.toString.4096.bits (Intel Xeon) -1.07% -0.2
IsolateJson.Decode1MBx4 (Intel Core i5) -8.32% -0.2
DartMicroBenchMM.MinLib (Intel Xeon) -1.16% -0.2
FfiStruct.FieldLoadStore (Intel Core i5) -1.66% -0.2
FfiMemory.PointerInt64Mint (Intel Core i5) -0.55% -0.2
BigInt.toString.0064.bits (Intel Xeon) -2.97% -0.2
FfiCall.Doublex04 (Intel Core i5) -2.06% -0.2
FfiBoringssl.DigestCMemory (Intel Xeon) -1.97% -0.2

Table 5: The 40 most significant regressions on the Dart AOT benchmarks (on x64
architecture).
64

Benchmark Runtime Significance

Score
JsonObjectRoundTrip (Intel Core i5) +4.37% 1.1
Int64.toString.009.bits (Intel Xeon) +9.49% 1.1
JsonChunkedUtf8RoundTrip (Intel Xeon) +7.25% 1.1
DeltaBlueClosures (Intel Core i5) +4.23% 1.1
BuildTableIdiomatic (Intel Core i5) +4.33% 1.1
ParticleSystemPaint (Intel Xeon) +9.81% 1.2
BigInt.toString.0064.bits (Intel Core i5) +6.26% 1.2
IsolateJson.SyncDecode1MBx1 (Intel Core i5) +5.83% 1.2
MegaOnlyOverriddenOnce (Intel Core i5) +8.59% 1.2
BuildTableBuffer (Intel Core i5) +4.25% 1.2
StreamBroadcastSubSyncTest (Intel Xeon) +8.27% 1.2
JsonParseDefaultReviver (Intel Core i5) +5.97% 1.2
MegaOverriddenEvery2ndIndirect (Intel Core i5) +3.09% 1.3
CollectionSieves-Set-removeLoop (Intel Xeon) +18.23% 1.3
Int64.parse.0009.bits (Intel Core i5) +3.8% 1.3
Tracer (Intel Xeon) +31.33% 1.4
Bench2D (Intel Xeon) +11.96% 1.4
JsonUtfRoundTrip (Intel Core i5) +4.53% 1.4
Meteor (Intel Xeon) +7.36% 1.5
BigInt.toString.0256.bits (Intel Core i5) +2.76% 1.5
MegaOnlyOverriddenOnce (Intel Xeon) +7.79% 1.5
MegaOnlyOverriddenOnceIndirect (Intel Xeon) +8.24% 1.6
Box2DOctane (Intel Xeon) +9.74% 1.7
StreamSingleSubAsBroadcastTest (Intel Xeon) +14.31% 1.7
DeltaBlue (Intel Xeon) +10.92% 2.0
Meteor (Intel Core i5) +6.06% 2.2
MegaOnlyOverriddenOnceIndirect (Intel Core i5) +7.29% 2.3
ParticleSystemPaint (Intel Core 2) +39.13% 2.3
Box2DBench (Intel Xeon) +15.22% 2.6
ParticleSystemPaint (Intel Core i5) +20.96% 2.7
Bench2D (Intel Core i5) +14.31% 3.1
MegaField (Intel Core i5) +9.41% 3.1
MegaFieldIndirect (Intel Core i5) +9.04% 3.6
Tracer (Intel Core i5) +29.12% 6.0
NBodySIMD (Intel Xeon) +245.7% 11.5
NBody (Intel Core i5) +105.1% 15.1
NBody (Intel Xeon) +131.1% 15.6
SkeletaAnimationsSIMD (Intel Xeon) +2286% 20.3
NBodySIMD (Intel Core i5) +235.5% 26.2
SkeletaAnimationSIMD (Intel Core i5) +1701% 123.6

Table 6: The 40 most significant improvements on the Dart AOT benchmarks (on x64
architecture).
65

implementation, there are also two other versions, using closures and iterators.
The DeltaBlue implementation improved 10.92%; DeltaBlueClosures, 7.58%; and
DeltaBlueIterators improved 9.74% (however with a lower significance of 0.6).
This is one of the benchmarks improved after submitting the CLs for unboxed
parameters, improving at least 4% after each of the two CLs (with the classical
implementation).

• NBody: It implements an algorithm that simulates the movement of planets. It is

part of The Computer Language Benchmarks Game (FULGHAM; GOUY, 2009), a
popular open source benchmark suite. This benchmark also has a version using SIMD,
NBodySIMD, which achieved a runtime improvement of 245.7%; the implementation
without SIMD improved 131.1%. Besides the changes for inlining the SIMD operations,
this benchmark had such great results because the algorithm uses many non-nullable
floating-point fields, that became unboxed after the changes presented in chapter 2.

• Meteor: Also part of The Computer Language Benchmarks Game, this benchmark
computes all solutions for the Meteor Puzzle using a branch and bound algorithm.
The puzzle consists of completely covering a 10 × 5 grid of hexagonal cells with
ten given pieces, each of them made of five hexagonal cells. The benchmark had an
improvement of 7.36%, and it is another example of a benchmark with great results
after supporting unboxed parameters: it improved 4.64% after submitting the CL
for supporting unboxed parameters when performing interface calls (section 3.2).

• Tracer: A benchmark that exercises ray tracing, based on (BURMISTER, 2007). It

achieved a runtime improvement of 31.33%.

• Richards: This benchmark was based on (RICHARDS, 1999) and it originated in

BCPL (RICHARDS, 1969). It simulates thread scheduling of an OS kernel. The
benchmark improved 6.63% (0.5 significance).

• Havlak: This benchmark is an implementation of a loop recognition algorithm

(HAVLAK, 1997), used for loop analysis. It improved 3.26% (0.8 significance).

• SHA256 and MD5: These benchmarks use, respectively, the Secure Hash Algorithm 2
with 256 bit digests (N.I.S.T, 2002) and the MD5 message-digest algorithm (RIVEST;
DUSSE, 1992), two widely used cryptographic hash functions. The former improved
2.41% and, the latter, 5%, both with significance of 0.4.

• Bench2D and Box2DOctane: These benchmarks exercise floating-point arithmetic

performing calculations through a physics game engine called Box2D (CATTO, 2010).
66

The former use the Bench2D benchmark suite (WEBBER, 2014) and the latter was
based on Box2dWeb (HECHT, 2012), a JavaScript port of Box2D. Bench2D improved
11.96% and Box2DOctane, 9.74%.
67

5 Final Remarks

This work has achieved important results on several benchmarks, demonstrating the
positive impact it leads on the Dart applications, regarding both code size and runtime
performance. Most of the features developed in this project use nullability information,
currently inferred from TFA, to unbox numeric values. Therefore, the impact of those
changes is expected to become even more remarkable after NNBD is enabled.

The project added support for storing unboxed values into instance fields and for using
such values as arguments when performing calls to both static and non-static methods. It
has also updated the boxing operations of integers to reduce code size on ARMv7 by using
a shared stub for the box allocation and it has added support to inline SIMD arithmetic
operations.

Future work might involve passing more unboxed values as arguments when performing
calls. This could be carried out by supporting new calls with unboxed values, such as calls
to closures, native functions or tear-offs. Another way to achieve that goal could be to
improve the changes presented in this document, for instance: by implementing a clever
partitioning algorithm that takes the call sites and possible targets into account when
creating the partition in order to reduce the size of each disjoint set; by accepting new
types of parameters to be unboxed, such as SIMD types; or by ensuring that dynamic
invocation forwarders are created whenever a function expects unboxed values, so that the
checking for the existence of such forwarders will not have to be performed. Finally, there
is also much space for making currently boxed values to become unboxed, for example,
by updating IL instructions that only accept boxed values to also support those values
unboxed.
68

References

ALPERN, B.; WEGMAN, M. N.; ZADECK, F. K. Detecting equality of variables in

programs. In: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles
of programming languages. [S.l.: s.n.], 1988. p. 1–11.

ARNOLD, K. et al. The Java programming language. [S.l.]: Addison-wesley Reading, 2000.

BALBAERT, I. Dart Cookbook. [S.l.]: Packt Publishing Ltd, 2014.

BRACHA, G. The Dart programming language. [S.l.]: Addison-Wesley Professional, 2015.

BRACHA, G.; BAK, L. Dart, a new programming language for structured web
programming. In: Presentation at GOTO conference. [S.l.: s.n.], 2011.

BURMISTER, A. Javascript ray-tracing: or the worst application of javascript ever. 2007.

http://labs.flog.co.nz/raytracer/. Currently unavailable.

CATTO, E. Box2D. 2010. https://box2d.org. Accessed: 24.05.2020.

CHENEY, C. J. A nonrecursive list compacting algorithm. Communications of the ACM,

ACM New York, NY, USA, v. 13, n. 11, p. 677–678, 1970.

FREEMAN-BENSON, B. N.; MALONEY, J. The deltablue algorithm: An incremental

constraint hierarchy solver. In: IEEE. Eighth Annual International Phoenix Conference on
Computers and Communications. 1989 Conference Proceedings. [S.l.], 1989. p. 538–542.

FULGHAM, B.; GOUY, I. The computer language benchmarks game. 2009.

GALLER, B. A.; FISHER, M. J. An improved equivalence algorithm. Communications of

the ACM, ACM New York, NY, USA, v. 7, n. 5, p. 301–303, 1964.

GOLDBERG, A.; ROBSON, D. Smalltalk-80: the language and its implementation. [S.l.]:
Addison-Wesley Longman Publishing Co., Inc., 1983.

GOODMAN, D.; EICH, B. Javascript bible. hungry minds. Inc., New York (USA), 2001.

GOUSIOS, G. et al. Lean ghtorrent: Github data on demand. In: Proceedings of the 11th
working conference on mining software repositories. [S.l.: s.n.], 2014. p. 384–387.

GUDEMAN, D. Representing type information in dynamically typed languages. [S.l.]:

Citeseer, 1995.

HAVLAK, P. Nesting of reducible and irreducible loops. ACM Transactions on

Programming Languages and Systems (TOPLAS), ACM New York, NY, USA, v. 19, n. 4,
p. 557–567, 1997.
69

HECHT, U. Box2dWeb. 2012. https://github.com/hecht-software/box2dweb.

Accessed: 24.05.2020.

JONES, R. E. Garbage collection: algorithms for automatic dynamic memory management.

[S.l.]: John Wiley and Sons, 1996.

KOPEC, D. Dart for Absolute Beginners. [S.l.]: Apress, 2014.

LIEBERMAN, H.; HEWITT, C. A real-time garbage collector based on the lifetimes of

objects. Communications of the ACM, ACM New York, NY, USA, v. 26, n. 6, p. 419–429,
1983.

N.I.S.T. Secure hash standard (shs). In: U.S. DOC. Federal Information Processing
Standards (FIPS) Publication 180-1. [S.l.], 2002.

OCTOVERSE. The State of the Octoverse. 2019. https://octoverse.github.com/

#top-languages. Accessed: 23.04.2020.

POLETTO, M.; SARKAR, V. Linear scan register allocation. ACM Transactions on

Programming Languages and Systems (TOPLAS), ACM New York, NY, USA, v. 21, n. 5,
p. 895–913, 1999.

RICHARDS, M. Bcpl: A tool for compiler writing and system programming. In:
Proceedings of the May 14-16, 1969, spring joint computer conference. [S.l.: s.n.], 1969. p.
557–566.

RICHARDS, M. Bench, A Machine and Language Independent Benchmark Program. 1999.

https://www.cl.cam.ac.uk/~mr10/Bench.html. Accessed: 24.05.2020.

RIVEST, R.; DUSSE, S. The MD5 message-digest algorithm. [S.l.]: MIT Laboratory for
Computer Science, 1992.

ROSEN, B. K.; WEGMAN, M. N.; ZADECK, F. K. Global value numbers and redundant
computations. In: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on
Principles of programming languages. [S.l.: s.n.], 1988. p. 12–27.

SINHA, S. Quick Start Guide to Dart Programming. [S.l.]: Springer, 2020.

SLEATOR, D. D.; TARJAN, R. E. Self-adjusting binary search trees. J. ACM, Association

for Computing Machinery, New York, NY, USA, v. 32, n. 3, p. 652–686, jul. 1985.

STROUSTRUP, B. The C++ programming language. [S.l.]: Pearson Education India,

2000.

WALRATH, K.; LADD, S. Dart: Up and Running: A New, Tool-Friendly Language for
Structured Web Apps. [S.l.]: " O’Reilly Media, Inc.", 2012.

WEBBER, J. Bench2D. 2014. https://github.com/joelgwebber/bench2d. Accessed:

24.05.2020.

ZURAS, D. et al. Ieee standard for floating-point arithmetic. IEEE Std, v. 754, n. 2008, p.
1–70, 2008.
70

APPENDIX A -- External links

Link 1: https://dart-review.googlesource.com

Link 2: https://dart-review.googlesource.com/c/sdk/+/127146/12/runtime/
vm/compiler/stub_code_compiler_arm.cc#1146

Link 3: https://dart-review.googlesource.com/c/sdk/+/127146/12/runtime/
vm/compiler/backend/il_arm.cc#4652

Link 4: https://dart-review.googlesource.com/c/sdk/+/127146/

Link 5: https://dart-review.googlesource.com/c/sdk/+/128072

Link 6: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/
vm/kernel_loader.cc

Link 7: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/
vm/compiler/backend/il_arm.cc

Link 8: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/
vm/compiler/backend/il_arm64.cc

Link 9: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/
vm/compiler/backend/il_x64.cc

Link 10: https://dart-review.googlesource.com/c/sdk/+/133064

Link 11: https://dart-review.googlesource.com/c/sdk/+/133229

Link 12: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/object.cc#3163

Link 13: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/object.h
71

Link 14: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/compiler/runtime_api.h

Link 15: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/raw_object.h#857

Link 16: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/class_table.h

Link 17: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/raw_object.cc#380

Link 18: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/clustered_snapshot.cc#195

Link 19: https://dart-review.googlesource.com/c/sdk/+/133983/10/runtime/

vm/snapshot.cc

Link 20: https://dart-review.googlesource.com/c/sdk/+/133983/10/

Link 21: https://dart-review.googlesource.com/c/sdk/+/132606/5

Link 22: https://dart-review.googlesource.com/c/sdk/+/128589/13/runtime/

vm/compiler/backend/il.h

Link 23: https://dart-review.googlesource.com/c/sdk/+/128589/13/runtime/

vm/compiler/stub_code_compiler_arm64.cc

Link 24: https://dart-review.googlesource.com/c/sdk/+/128589/13/runtime/

vm/compiler/backend/inliner.cc

Link 25: https://dart-review.googlesource.com/c/sdk/+/128589/13

Link 26: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/frontend/kernel_translation_helper.cc#3288

Link 27: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/object.h#3082

Link 28: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/raw_object.h

Link 29: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/il.h
72

Link 30: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/flow_graph.cc#1099

Link 31: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/linearscan.cc#700

Link 32: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/il_arm.cc

Link 33: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/il_arm64.cc

Link 34: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/il_x64.cc

Link 35: https://github.com/dart-lang/sdk/blob/

8fd8fec1858ae6bc4922fc52d46718d888da5e4c/runtime/vm/stack_frame.
cc#L254

Link 36: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/backend/flow_graph_compiler.cc#880

Link 37: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/dart_entry.h

Link 38: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/frontend/prologue_builder.cc

Link 39: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/method_recognizer.cc

Link 40: https://dart-review.googlesource.com/c/sdk/+/135313/27/runtime/

vm/compiler/graph_intrinsifier.cc

Link 41: https://dart-review.googlesource.com/c/sdk/+/135313

Link 42: https://dart-review.googlesource.com/c/sdk/+/138501/13/pkg/vm/

lib/metadata/unboxing_info.dart

Link 43: https://dart-review.googlesource.com/c/sdk/+/138501/13/pkg/vm/

lib/transformations/type_flow/unboxing_info.dart
73

Link 44: https://dart-review.googlesource.com/c/sdk/+/138501/13/pkg/vm/

lib/transformations/type_flow/transformer.dart

Link 45: https://dart-review.googlesource.com/c/sdk/+/138501/13/runtime/

vm/compiler/frontend/kernel_translation_helper.cc

Link 46: https://dart-review.googlesource.com/c/sdk/+/138501/13/runtime/

vm/compiler/frontend/kernel_translation_helper.h

Link 47: https://dart-review.googlesource.com/c/sdk/+/138501/13/runtime/

vm/compiler/frontend/kernel_to_il.cc

Link 48: https://dart-review.googlesource.com/c/sdk/+/138501

Link 49: https://pasteboard.co/JoX02W4.png

Link 50: https://pasteboard.co/JoX0mtX.png

Link 51: https://pasteboard.co/JoX0WTi.png

Data Structures Algorithms in Dart
80% (5)
Data Structures Algorithms in Dart
425 pages
Peterson J. Mastering Dart and Python Programming. A Comprehensive Guide... 2024
No ratings yet
Peterson J. Mastering Dart and Python Programming. A Comprehensive Guide... 2024
344 pages
E-Notes 1650 Content Document 20250324033438PM
No ratings yet
E-Notes 1650 Content Document 20250324033438PM
191 pages
CMOS DICs Unit 4
No ratings yet
CMOS DICs Unit 4
92 pages
1.ZKTeco Presentation - 2021-04-20
No ratings yet
1.ZKTeco Presentation - 2021-04-20
85 pages
NullCon - The Complexity of Reversing Flutter Applications (Axelle Apvrille - Fortinet)
No ratings yet
NullCon - The Complexity of Reversing Flutter Applications (Axelle Apvrille - Fortinet)
49 pages
Tkinter GUI Application Development Blueprints: Master GUI programming in Tkinter as you design, implement, and deliver 10 real-world applications
From Everand
Tkinter GUI Application Development Blueprints: Master GUI programming in Tkinter as you design, implement, and deliver 10 real-world applications
Bhaskar Chaudhary
No ratings yet
Tracy
No ratings yet
Tracy
101 pages
Manual JKSimMet V5.1
100% (5)
Manual JKSimMet V5.1
414 pages
Manual-UI Design Flutter
No ratings yet
Manual-UI Design Flutter
70 pages
2015-06-15 English GSK980TC3系列编程及操作手册 - 2014年12月第1版
No ratings yet
2015-06-15 English GSK980TC3系列编程及操作手册 - 2014年12月第1版
328 pages
Slide 08 - Classes, Futures and Local Storage - Tagged
No ratings yet
Slide 08 - Classes, Futures and Local Storage - Tagged
92 pages
Dart
No ratings yet
Dart
42 pages
NY8B062D v1.8
No ratings yet
NY8B062D v1.8
115 pages
Plumbing 01 (Water Supply)
100% (1)
Plumbing 01 (Water Supply)
20 pages
1923pt B.tech It Batchno 40
No ratings yet
1923pt B.tech It Batchno 40
52 pages
CERT Common Sense Guide Fifth Edition
No ratings yet
CERT Common Sense Guide Fifth Edition
175 pages
Dart Programing Language Basics
No ratings yet
Dart Programing Language Basics
34 pages
Flutter Lecture 1
No ratings yet
Flutter Lecture 1
23 pages
Ofdm Thesis
No ratings yet
Ofdm Thesis
128 pages
Python Geospatial Development Essentials
From Everand
Python Geospatial Development Essentials
Karim Bahgat
No ratings yet
Flutter - Code Development Notes
No ratings yet
Flutter - Code Development Notes
13 pages
Unit 5
No ratings yet
Unit 5
15 pages
Unit 3 ADAD
No ratings yet
Unit 3 ADAD
18 pages
Mobile Application Development Day V: by Falema Garedow & Its All About You!
No ratings yet
Mobile Application Development Day V: by Falema Garedow & Its All About You!
27 pages
Dart Flutter Important
No ratings yet
Dart Flutter Important
10 pages
Lec - 04 - Entity Relationship Modeling ERM
No ratings yet
Lec - 04 - Entity Relationship Modeling ERM
26 pages
MAD QB (All Units) Soln
No ratings yet
MAD QB (All Units) Soln
69 pages
Unit 1
No ratings yet
Unit 1
11 pages
Introduction To DART
No ratings yet
Introduction To DART
21 pages
System Integrated Water Softener: On-Demand Steam Solutions
No ratings yet
System Integrated Water Softener: On-Demand Steam Solutions
4 pages
MAD Solution
No ratings yet
MAD Solution
89 pages
E Note Flutter
No ratings yet
E Note Flutter
58 pages
Reversing PDF
100% (1)
Reversing PDF
136 pages
Foxconn Wi Fi Module
No ratings yet
Foxconn Wi Fi Module
35 pages
Lesson3 - Advance Dart Programming
No ratings yet
Lesson3 - Advance Dart Programming
25 pages
Drones 07 00688 v2
No ratings yet
Drones 07 00688 v2
19 pages
Clad Exam Prep Guide English
No ratings yet
Clad Exam Prep Guide English
16 pages
10 1 1 1 6269 PDF
No ratings yet
10 1 1 1 6269 PDF
231 pages
Design and Construction Coordination Problems and Planning For Design Build Project New Users
No ratings yet
Design and Construction Coordination Problems and Planning For Design Build Project New Users
11 pages
2010 Javagpu Diss
No ratings yet
2010 Javagpu Diss
103 pages
Flutter Basics
No ratings yet
Flutter Basics
12 pages
Systems Integration and Architecture 1: Exercise
No ratings yet
Systems Integration and Architecture 1: Exercise
3 pages
Windjammer Intel 2 Controllers Operation Guide PDF
No ratings yet
Windjammer Intel 2 Controllers Operation Guide PDF
18 pages
Seminar Report
No ratings yet
Seminar Report
21 pages
15 SCP Commands Securely Copy Files To Remote Servers Linux
No ratings yet
15 SCP Commands Securely Copy Files To Remote Servers Linux
7 pages
Dart Programming Language
No ratings yet
Dart Programming Language
5 pages
Lesson2 Intro To DartA
No ratings yet
Lesson2 Intro To DartA
51 pages
Day1-1-Introduction To Dart - 1 Hour - 083912
No ratings yet
Day1-1-Introduction To Dart - 1 Hour - 083912
8 pages
Dart Cookbook Sample Chapter
No ratings yet
Dart Cookbook Sample Chapter
32 pages
Dart Cheatsheet
No ratings yet
Dart Cheatsheet
26 pages
p6 Aionfpga Thesis Canzani Mueller
No ratings yet
p6 Aionfpga Thesis Canzani Mueller
91 pages
Unit 3 AdvanceAD
No ratings yet
Unit 3 AdvanceAD
16 pages
Microbiology: Principles and Explorations: Jacquelyn G Black
0% (1)
Microbiology: Principles and Explorations: Jacquelyn G Black
1 page
Emrsn HTP 001 A1
No ratings yet
Emrsn HTP 001 A1
7 pages
Unable To Change IP Address On Compact GuardLogix Controller
No ratings yet
Unable To Change IP Address On Compact GuardLogix Controller
2 pages
Flutter Is An Open-Source UI Software Development Kit Created by Google. It Is Used To Develop Cross
No ratings yet
Flutter Is An Open-Source UI Software Development Kit Created by Google. It Is Used To Develop Cross
10 pages
330f15 AnsariJonesWilder Dart
No ratings yet
330f15 AnsariJonesWilder Dart
10 pages
MIS304-Specialized - PDF 1
No ratings yet
MIS304-Specialized - PDF 1
22 pages
Arushi
No ratings yet
Arushi
1 page
01 Advanced Mobile Programming 2021 - Introduction
No ratings yet
01 Advanced Mobile Programming 2021 - Introduction
36 pages
1BASIC
No ratings yet
1BASIC
33 pages
AC 09 Course Delivery Plan Practicals GCC Lab CDP
No ratings yet
AC 09 Course Delivery Plan Practicals GCC Lab CDP
5 pages
Mid Term Exam (Lab)
No ratings yet
Mid Term Exam (Lab)
2 pages
Production Planning Master Data
No ratings yet
Production Planning Master Data
21 pages
Learning Dart With Code Examples
No ratings yet
Learning Dart With Code Examples
8 pages
ICT Project Management
No ratings yet
ICT Project Management
4 pages
Machine Learning for Business: Using Amazon SageMaker and Jupyter
From Everand
Machine Learning for Business: Using Amazon SageMaker and Jupyter
Doug Hudgeon
5/5 (1)
CNC Milling for Makers: Basics - Techniques - Applications
From Everand
CNC Milling for Makers: Basics - Techniques - Applications
Christian Rattat
4.5/5 (6)
First Assignment: 1. Explain The Following Briefly
No ratings yet
First Assignment: 1. Explain The Following Briefly
2 pages
Assignment 1 (Mad)
No ratings yet
Assignment 1 (Mad)
6 pages
02 Exploring Dart - 3
No ratings yet
02 Exploring Dart - 3
162 pages
Heated Oxygen Sensor (HO2S) Monitor
No ratings yet
Heated Oxygen Sensor (HO2S) Monitor
3 pages
Darrprogramming Ebook
No ratings yet
Darrprogramming Ebook
42 pages
MA - 1 Introduction To Mobile App Development
No ratings yet
MA - 1 Introduction To Mobile App Development
14 pages
Calhoun Resume Feb20
No ratings yet
Calhoun Resume Feb20
1 page
Rotating Equipment Specialist Program: Providing Quality Customer Support Worldwide
No ratings yet
Rotating Equipment Specialist Program: Providing Quality Customer Support Worldwide
2 pages
Learning Dart - Second Edition - Sample Chapter
100% (2)
Learning Dart - Second Edition - Sample Chapter
30 pages
Lab 02
No ratings yet
Lab 02
12 pages
Learning Dart From Scratch
No ratings yet
Learning Dart From Scratch
2 pages
Dart Programming Language
No ratings yet
Dart Programming Language
3 pages
Flutter Seminar Report
No ratings yet
Flutter Seminar Report
23 pages
ABB Robotics 1.14 Low-Res
No ratings yet
ABB Robotics 1.14 Low-Res
24 pages
Ionic Framework
No ratings yet
Ionic Framework
2 pages
To Do App Using Flutter Report
No ratings yet
To Do App Using Flutter Report
11 pages
8.1 Dart-Overview PDF
No ratings yet
8.1 Dart-Overview PDF
5 pages
Creality Ender 3 and Creality Slicer Tutorial for 3D printers and tips and tricks.
From Everand
Creality Ender 3 and Creality Slicer Tutorial for 3D printers and tips and tricks.
Armin Snyder
3/5 (1)
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
History of Dart
No ratings yet
History of Dart
2 pages
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet