[go: up one dir, main page]

0% found this document useful (0 votes)
27 views12 pages

French Tax Code

The document describes a project to overhaul the outdated infrastructure used to calculate French income tax amounts. The current system relies on a 30-year old custom compiler and language that has become difficult to maintain and extend. The authors developed a new open-source compiler called Mlang, which is based on a formalization of the French tax code. Mlang compiles the tax rules to modern languages like Python and provides insights into the tax computation. The French tax authority is transitioning to use Mlang for processing tax returns.

Uploaded by

Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

French Tax Code

The document describes a project to overhaul the outdated infrastructure used to calculate French income tax amounts. The current system relies on a 30-year old custom compiler and language that has become difficult to maintain and extend. The authors developed a new open-source compiler called Mlang, which is based on a formalization of the French tax code. Mlang compiles the tax rules to modern languages like Python and provides insights into the tax computation. The French tax authority is transitioning to use Mlang for processing tax returns.

Uploaded by

Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A Modern Compiler for the French Tax Code

Denis Merigoux∗ Raphaël Monat∗ Jonathan Protzenko


Inria LIP6 Microsoft Research
Paris, France Sorbonne Université, CNRS USA
denis.merigoux@inria.fr Paris, France protz@microsoft.com
raphael.monat@lip6.fr

Abstract how to compute the final amount of income tax (IR, for impôt
In France, income tax is computed from taxpayers’ individ- sur le revenu) owed by each household.
ual returns, using an algorithm that is authored, designed As in many other tax systems around the world, this com-
arXiv:2011.07966v2 [cs.PL] 26 Nov 2020

and maintained by the French Public Finances Directorate putation is quite complex. France uses a bracket system (as
(DGFiP). This algorithm relies on a legacy custom language in, say, the US federal income tax), along with a myriad of tax
and compiler originally designed in 1990, which unlike French credits, deductions, optional rules, state-sponsored direct aid,
wine, did not age well with time. Owing to the shortcom- all of which are parameterized over the composition of the
ings of the input language and the technical limitations of household, that is, the number of children, their respective
the compiler, the algorithm is proving harder and harder ages, potential disabilities, and so on.
to maintain, relying on ad-hoc behaviors and workarounds Unlike, say, the United States, the French system relies
to implement the most recent changes in tax law. Compe- heavily on automation. During tax season, French taxpayers
tence loss and aging code also mean that the system does not log in to the online tax portal, which is managed by the state.
benefit from any modern compiler techniques that would There, taxpayers are presented with online forms, generally
increase confidence in the implementation. pre-filled. If applicable, taxpayers can adjust the forms, e.g.
We overhaul this infrastructure and present Mlang, an by entering extra deductions or credits. Once the taxpayer is
open-source compiler toolchain whose goal is to replace satisfied with the contents of the online form, they send in
the existing infrastructure. Mlang is based on a reverse- their return. Behind the scenes, the IR algorithm is run, and
engineered formalization of the DGFiP’s system, and has taking as input the contents of the forms, returns the final
been thoroughly validated against the private DGFiP test amount of tax owed. The taxpayer is then presented with
suite. As such, Mlang has a formal semantics; eliminates the result at tax-collection time.
previous hand-written workarounds in C; compiles to mod- Naturally, the ability to independently reproduce and thus
ern languages (Python); and enables a variety of instrumen- trust the IR computation performed by the DGFiP is crucial.
tations, providing deep insights about the essence of French First, taxpayers need to understand the result, as their own
income tax computation. The DGFiP is now officially transi- estimate may differ (explainability). Second, citizens may
tioning to Mlang for their production system. want to audit the algorithm, to ensure it faithfully imple-
ments the law (correctness). Third, a standalone, reusable
CCS Concepts: • Software and its engineering → Gen- implementation allows for a complete and precise simulation
eral programming languages; • Social and professional of the impacts of a tax reform, greatly improving existing
topics → History of programming languages. efforts [8, 17] (forecasting).
Keywords: legal expert system, compiler, tax code Unfortunately, we are currently far from a transparent,
open-source, reproducible computation. Following numer-
ACM Reference Format: ous requests (using a disposition similar to the United State’s
Denis Merigoux, Raphaël Monat, and Jonathan Protzenko. 2020. Freedom of Information Act), parts of the existing source
A Modern Compiler for the French Tax Code. In Proceedings of code were published. In doing so, the public learned that
Preprint. ACM, New York, NY, USA, 12 pages.
i) the existing infrastructure is made up of various parts
pieced together and that ii) key data required to accurately
1 Introduction reproduce IR computations was not shared with the public.
The French Tax Code is a body of legislation amounting to The current, legacy architecture of the IR tax system is
roughly 3,500 pages of text, defining the modalities of tax presented in Fig. 1. The bulk of the tax code is described as a
collection by the state. In particular, each new fiscal year, a set of “rules” authored in M, a custom, non Turing-complete
new edition of the Tax Code describes in natural language language. A total of 90,000 lines of M rules compile to 535,000
lines of C (including whitespace and comments) via a custom
∗ Equal contribution compiler. Rules are now mostly public [10]. Over time, the
expressive power of rules turned out to be too limited to
Preprint, Nov. 2020, Online
express a particular feature, known as liquidations multiples,
2020.
Preprint, Nov. 2020, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko

“rules” “rules” the DGFiP, and following numerous discussions, iterations,


M files C files and visits to their offices, we have been formally approved to
“calculette” start replacing the legacy infrastructure with our new imple-
Shared state Shared library mentation, meaning that within a few years’ time, all French
tax returns will be processed using the compiler described
“inter” in the present paper.
C files
DGFiP’s internal GCC
2 Giving Semantics to the M Language
compiler The 2018 version of the income tax computation [10] is split
across 48 files, for a total of 92,000 lines of code. The code is
written in M, the input language originally designed by the
Figure 1. Legacy architecture
DGFiP. In order to understand this body of tax code, we set
out to give a semantics to M.
which involves tax returns across different years. Lacking
the expertise to extend the M language, the DGFiP added in 2.1 Overview of M
1995 some high-level glue code in C, known as “inter”. The M programs are made up of two parts: declarations and rules.
glue code is closer to a full-fledged language, and has a non- Declarations introduce: input variables, intermediary vari-
recursive call-graph which may call the “rules” computation ables, output variables and exceptions. Variables are either
multiple times with various parameters. The “inter” driver scalars or fixed-length arrays. Both variables and exceptions
amounts to 35,000 lines of C code and has not been released. are annotated with a human-readable description. Variables
Both “inter” and “rules” are updated every year to fol- that morally belong to the same category may be annotated
low updates in the law, and as such, have been extensively with a kind. This is used later in M++ (§3.3) for partitioning
modified over their 30-year operational lifespan. variables, and quickly checking whether any variable of a
Our goal is to address these shortcomings by bringing the given kind has a non-undef value.
French tax code infrastructure into the 21st century. Specifi- Rules capture the computational part of an M program;
cally, we wish to: i) reverse-engineer the unpublished parts they are either variable assignments or raise-if statements.
of the DGFiP computation, so as to ii) provide an explain- As a first simplified example, the French tax code declares
able, open-source, correct implementation that can be inde- an input variable V_0AC for whether an individual is single
pendently audited; furthermore, we wish to iii) modernize (value 1) or not (value 0).
the compiler infrastructure, eliminating in the process any Lacking any notion of data type or enumeration, there
hand-written C that could not be released because of secu- is no way to enforce statically that an individual cannot be
rity concerns, thus enabling a host of modern applications, married (V_0AM) and single (V_0AC) at the same time.
simulations and use-cases. Instead, an exception A031 is declared, along with a human-
● We start with a reverse-engineered formal semantics readable description. Then, a rule raises an exception if the
for the M DSL, along with a proof of type safety per- sum of the two variables is greater than 1. (The seemingly
formed using the Coq [29] proof assistant (§2). superfluous + 0 is explained in §2.5.) For the sake of example,
● To eliminate C code from the ecosystem, we extend we drop irrelevant extra syntactic features, and for the sake
the M language with enough capabilities to encode the of readability, we translate keywords and descriptions into
logic of the high-level “inter” driver (Fig. 1) – we dub English.
the new design M++ (§3). V_0AC : input family ... : "Checkbox : Single" type BOOLEAN ;
● To execute M/M++ programs, we introduce Mlang, V_0AM : input family ... : "Checkbox : Married" type BOOLEAN ;
A031:exception :"A":"031":"00":"both married and single":"N";
a complete re-implementation which combines a ref-
erence interpreter along with an optimizing compiler if V_0AC + V_0AM + 0 > 0 then error A031 ;
that generates C and Python code (§4).
● We evaluate our implementation: we show how we
As a second simplified example, the following M rule com-
putes the value of a hypothetical variable TAXBREAK. Its value
attained 100% conformance on the legacy system’s
is computed from variables CHILDRENCOUNT (for the number
testsuite, then proceed to enable a variety of analyses
of children in the household) and TAXOWED (for the tax owed
and instrumentations to fuzz, measure and stress-test
before the break) – the assigned expression relies on a condi-
our new system (§5).
● We conclude with a tour d’horizon of related attempts
tional and the built-in max function. This expression gives a
better tax break to households having three or more children.
at increasing trust in algorithmic parts of the law (§6).
TAXBREAK= if (CHILDRENCOUNT+0 > 2)
Our code is open-source and available on GitHub (link then max(MINTAXBREAK,TAXOWED * 20 / 100)
disabled for anonymous submission). We have engaged with else MINTAXBREAK endif;
A Modern Compiler for the French Tax Code Preprint, Nov. 2020, Online

For the rest of this paper, we abandon concrete syntax and all- ∐︀program̃︀ ::= ∐︀command̃︀ | ∐︀command̃︀ ; ∐︀program̃︀
caps variable names, in favor of a core calculus that faithfully ∐︀command̃︀ ::= if ∐︀expr̃︀ then ∐︀error̃︀
models M: 𝜇M. | ∐︀var̃︀ := ∐︀expr̃︀ | ∐︀var̃︀ [ X ; ∐︀float̃︀ ] := ∐︀expr̃︀

∐︀expr̃︀ ::= ∐︀var̃︀ | X | ∐︀valuẽ︀ | ∐︀expr̃︀ ∐︀binop̃︀ ∐︀expr̃︀


2.2 𝜇M: a core model of M | ∐︀unop̃︀ ∐︀expr̃︀ |if ∐︀expr̃︀ then ∐︀expr̃︀ else ∐︀expr̃︀
The 𝜇M core language omits variable declarations, whose | ∐︀func̃︀ ( ∐︀expr̃︀, . . . , ∐︀expr̃︀ ) | ∐︀var̃︀ [ ∐︀expr̃︀ ]
main purpose is to provide a human-readable description ∐︀valuẽ︀ ::= undef | ∐︀float̃︀
string that relates them to the original tax form. The 𝜇M core
language also eliminates syntactic sugar, such as statically ∐︀binop̃︀ ::= ∐︀arithop̃︀ | ∐︀boolop̃︀
bounded loops, or type aliases (e.g. BOOLEAN). ∐︀arithop̃︀ ::= + | - | * | /
Finally, a particular feature of M is that rules may be pro-
∐︀boolop̃︀ ::= <= | < | > | >= | == | != | && | ||
vided into any order: the M language has a built-in depen-
dency resolution feature that automatically re-orders compu- ∐︀unop̃︀ ::= - | ~
tations (rules) and asserts that there are no loops in variable ∐︀func̃︀ ::= round | truncate | max | min | abs
assignments. In our own implementation (Mlang, §4), we | pos | pos_or_null | null | present
perform a topological sort; in our 𝜇M formalization, we as-
sume that computations are already in a suitable order. Figure 2. Syntax of the 𝜇M language

2.3 Syntax of 𝜇M
We describe the syntax of 𝜇M in Fig. 2. A program is a se- has been left blank) but may still be referred in other rules.
quence of commands (“rules”). Commands are either raise- Rather than introduce spurious variable assignments with
error-if, or assignments. We define two forms of assignment: undef, we remain faithful to the very loose nature of the M
one for scalars and the other for fixed-size arrays. The latter language and account for references to undefined variables.
is of the form a[X, n] := e, where X is bound in e (the in- Then, Γ ⊢ ∐︀program̃︀ ⇛ Γ ′ enforces well-formedness for
dex is always named X). Using Haskell’s list comprehension a whole program while returning an extended environment
syntax, this is to be understood as 𝑎 ∶= (︀𝑒⋃︀𝑋 ← (︀0..𝑛 − 1⌋︀⌋︀. Γ ′ . We take advantage of the fact that scalar and array as-
Expressions are a combination of variables (including the signments have different syntactic forms. M disallows as-
special index expression X), values, comparisons, logic and signing different types to the same variable; we rule this
arithmetic expressions, conditionals, calls to builtin func- out in T-Assign-*. A complete 𝜇M program is well-formed if
tions, or index accesses. Most functions exhibit standard ∅ ⊢ 𝑃 ⇛ _.
behavior on floating-point values, but M assumes the de-
fault IEEE-754 rounding mode, that is, rounding to nearest 2.5 Operational semantics of 𝜇M
and ties to even. The detailed behavior of each function is At this stage, seeing that there are neither unbounded loops
described in Fig. 6. nor user-defined (recursive) functions in the language, M is
Values can be undef, which arises in two situations: refer- obviously not Turing-complete. The language semantics are
ences to variables that have not been defined (i.e. for which nonetheless quite devious, owing to the undef value, which
the entry in the tax form has been left blank) and out of can be explicitly converted to a float via a + 0, as seen in
bounds array accesses. All other values are IEEE-754 double- earlier examples. We proceed to formalize them in Coq [29],
precision numbers, i.e. 64-bit floats. The earlier BOOLEAN type using the Flocq library [4]. This ensures we correctly ac-
(§2.1) is simply an alias for a float whose value is implicitly count for all cases related to the undef value, and guides the
0 or 1. There is no other kind of value, as a reference to an
implementation of Mlang (§3).
array variable is invalid. Function present discriminates the Expressions. The semantics of expressions is defined in
undef value from floats. Fig. 4. The memory environment, written Ω is a function
from variables to either scalar values (usually denoted 𝑣),
2.4 Typing 𝜇M or arrays (written (𝑣 0, . . . , 𝑣𝑛−1 )). A value absent from the
Types in 𝜇M are either scalar or array types. M does not offer environment evaluates to undef.
nested arrays. Therefore, typing is mostly about making sure The special array index variable X is evaluated as a normal
scalars and arrays are not mixed up. variable. Conditionals reduce normally, except when the
In Fig. 3, a first judgment Γ ⊢ 𝑒 defines expression well- guard is undef: in that case, the whole conditional evaluates
formedness. It rules out references to arrays, hence enforcing into undef. If an index evaluates to undef, the whole array
that expressions have type scalar and that no values of type access is undef. In the case of a negative out-of-bounds index
array can be produced. Furthermore, variables may have no access the result is 0; in the case of a positive out-of-bounds
assignment at all (if the underlying entry in the tax form index access the result is undef. Otherwise, the index is
Preprint, Nov. 2020, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko

Global function environment Δ: Judgment : Ω ⊢ 𝑒 ⇓ 𝑣 (“Under Ω, 𝑒 evaluates to 𝑣”)


Δ(round) = Δ(truncate) = Δ(abs) = Δ(pos) D-value D-var-undef D-var
= Δ(pos_or_null) = Δ(null) = Δ(present) = 1 𝑣 ∈ ∐︀valuẽ︀ 𝑥 ∈⇑ dom Ω Ω(𝑥) = 𝑣
Δ(min) = Δ(max) = Δ(∐︀arithop̃︀) = Δ(∐︀boolop̃︀) = 2 Ω⊢𝑣⇓𝑣 Ω ⊢ 𝑥 ⇓ undef Ω⊢𝑥 ⇓𝑣

Judgment : Γ ⊢ 𝑒 (“Under Γ, 𝑒 is well-formed”) D-cond-true D-X


Ω ⊢ 𝑒1 ⇓ 𝑓 𝑓 ∉ {0, undef} Ω ⊢ 𝑒 2 ⇓ 𝑣2 Ω(X) = 𝑣
T-float T-undef T-var-undef T-var Ω ⊢ if 𝑒 1 then 𝑒 2 else 𝑒 3 ⇓ 𝑣2 Ω⊢X⇓𝑣
𝑥 ∈⇑ dom Γ Γ(𝑥) = scalar
Γ ⊢ ∐︀float̃︀ Γ ⊢ undef Γ ⊢𝑥 Γ ⊢𝑥 D-cond-false D-index-neg
Ω ⊢ 𝑒1 ⇓ 0 Ω ⊢ 𝑒 3 ⇓ 𝑣3 Ω ⊢𝑒 ⇓𝑟 𝑟 <0
T-index-undef T-conditional Ω ⊢ if 𝑒 1 then 𝑒 2 else 𝑒 3 ⇓ 𝑣3 Ω ⊢ 𝑥(︀𝑒⌋︀ ⇓ 0
𝑥 ∈⇑ dom Γ Γ ⊢𝑒 Γ ⊢ 𝑒1 Γ ⊢ 𝑒2 Γ ⊢ 𝑒3
Γ ⊢ 𝑥(︀𝑒⌋︀ Γ ⊢ if 𝑒 1 then 𝑒 2 else 𝑒 3 D-cond-undef D-index-undef
Ω ⊢ 𝑒 1 ⇓ undef Ω ⊢ 𝑒 ⇓ undef
T-index T-func Ω ⊢ if 𝑒 1 then 𝑒 2 else 𝑒 3 ⇓ undef Ω ⊢ 𝑥(︀𝑒⌋︀ ⇓ undef
Γ(𝑥) = array Γ ⊢𝑒 Δ(𝑓 ) = 𝑛 Γ ⊢ 𝑒1 ⋯ Γ ⊢ 𝑒𝑛
Γ ⊢ 𝑥(︀𝑒⌋︀ Γ ⊢ 𝑓 (𝑒 1 , . . . , 𝑒𝑛 ) D-index-outside D-tab-undef
Ω ⊢𝑒 ⇓𝑟 𝑟 ⩾𝑛 ⋃︀Ω(𝑥)⋃︀ = 𝑛 𝑥 ∈⇑ dom Ω
Judgment : Γ ⊢ ∐︀command̃︀ ⇛ Γ ′ and Ω ⊢ 𝑥(︀𝑒⌋︀ ⇓ undef Ω ⊢ 𝑥(︀𝑒⌋︀ ⇓ undef

Γ ⊢ ∐︀program̃︀ ⇛ Γ ′ (“𝑃 transforms Γ to Γ ′ ”) D-index


Ω(𝑥) = (𝑣0 , . . . , 𝑣𝑛−1 )
T-cond T-seq Ω ⊢𝑒 ⇓𝑟 𝑟 ∈ (︀0, 𝑛) 𝑟 ′ = truncateF (𝑟 )
Γ ⊢𝑒 Γ0 ⊢ 𝑐 ⇛ Γ1 Γ1 ⊢ 𝑃 ⇛ Γ2
Ω ⊢ 𝑥(︀𝑒⌋︀ ⇓ 𝑣𝑟 ′
Γ ⊢ if 𝑒 then ∐︀error̃︀ ⇛ Γ Γ0 ⊢ 𝑐 ; 𝑃 ⇛ Γ2
D-func
T-assign-scalar Ω ⊢ 𝑒 1 ⇓ 𝑣1 ⋯ Ω ⊢ 𝑒𝑛 ⇓ 𝑣𝑛
𝑥 ∈ Γ ⇒ Γ(𝑥) = scalar Γ ⊢𝑒
Ω ⊢ 𝑓 (𝑒 1 , . . . , 𝑒𝑛 ) ⇓ 𝑓 (𝑣1 , . . . , 𝑣𝑛 )
Γ ⊢ 𝑥 := 𝑒 ⇛ Γ(︀𝑥 ↦ scalar⌋︀

T-assign-array Figure 4. Operational semantics: expressions


𝑥 ∈ Γ ⇒ Γ(𝑥) = array Γ(︀X ↦ scalar⌋︀ ⊢ 𝑒
Γ ⊢ 𝑥(︀X, 𝑛⌋︀ := 𝑒 ⇛ Γ(︀𝑥 ↦ array⌋︀
Judgment : Ω𝑐 ⊢ 𝑐 ⇛ Ω𝑐′ and

Figure 3. Typing of expressions and programs Ω𝑐 ⊢ 𝑃 ⇛ Ω𝑐′ (“Under Ω𝑐 , 𝑃 produces Ω𝑐′ ”)


D-assign
Ω𝑐 ≠ error Ω𝑐 ⊢ 𝑒 ⇓ 𝑣
Ω𝑐 ⊢ 𝑥 := 𝑒 ⇛ Ω𝑐 (︀𝑥 ↦ 𝑣⌋︀
truncated into an integer, used to access Ω. The behavior of
functions, unary and binary operators is described in Fig. 6. D-assert-other
Figuring out these (unusual) semantics took over a year. Ω𝑐 ≠ error Ω𝑐 ⊢ 𝑒 ⇓ 𝑣 𝑣 ∈ {0, undef}
We initially worked in a black-box setting, relying on the on- Ω𝑐 ⊢ if 𝑒 then ∐︀error̃︀ ⇛ Ω𝑐
line tax simulator as an oracle for the M semantics. We later D-assert-true
on gained access to the test suite and the implementation, Ω𝑐 ≠ error Ω𝑐 ⊢ 𝑒 ⇓ 𝑓 𝑓 ∉ {0, undef}
which allowed us to fully replicate the correct semantics. Ω𝑐 ⊢ if 𝑒 then ∐︀error̃︀ ⇛ error
Statements. The memory environment Ω is extended into
Ω𝑐 , to propagate the error case that may be raised by rasied D-error D-seq
Ω𝑐,0 ⊢ 𝑐 ⇛ Ω𝑐,1 Ω𝑐,1 ⊢ 𝑃 ⇛ Ω𝑐,2
exceptions. An assignment updates a valid memory envi-
error ⊢ 𝑐 ⇛ error Ω𝑐,0 ⊢ 𝑐 ; 𝑃 ⇛ Ω𝑐,2
ronment with the computed value. If an assertion’s guard
evaluates to a non-zero float, an error is raised; otherwise, D-assign-table
program execution continues. Rule D-error propagates a Ω𝑐 ≠ error
raised error across a program. The whole-array assignment Ω𝑐 (︀X ↦ 0⌋︀ ⊢ 𝑒 ⇓ 𝑣0 ⋯ Ω𝑐 (︀X ↦ 𝑛 − 1⌋︀ ⊢ 𝑒 ⇓ 𝑣𝑛−1
works by evaluating the expression in different memory en- Ω𝑐 ⊢ 𝑥(︀X, 𝑛⌋︀ := 𝑒 ⇛ Ω𝑐 (︀𝑥 ↦ (𝑣0 , . . . , 𝑣𝑛−1 )⌋︀
vironments, one for each index.
Figure 5. Operational semantics: statements
2.6 Type safety
We now prove type safety in Coq. Owing to the unusual
semantics of the undef value, and to the lax treatment of
A Modern Compiler for the French Tax Code Preprint, Nov. 2020, Online

𝑒 1 ⊙ 𝑒 2 , ⊙ ∈ {+, −} undef 𝑓2 ∈ F 𝑒 1 ⊙ 𝑒 2 , ⊙ ∈ {×, ÷} undef 𝑓2 ∈ F, 𝑓2 ≠ 0 0 round(undef) = undef


undef undef 0 ⊙ 𝑓2 undef undef undef undef round(𝑓 ∈ F) = floorF (𝑓 + sign(𝑓 ) ∗ 0.50005)
𝑓1 ∈ F 𝑓1 ⊙ 0 𝑓1 ⊙F 𝑓2 𝑓1 undef 𝑓1 ⊙ F 𝑓2 0 truncate(undef) = undef
truncate(𝑓 ∈ F) = floorF (𝑓 + 10−6 )
𝑏 1 ∐︀boolop̃︀ 𝑏 2 undef 𝑓2 ∈ F 𝑚(𝑒 1 , 𝑒 2 ), 𝑚 ∈ {min, max} undef 𝑓2 ∈ F abs(x) ≡ if x >= 0 then x else -x
undef undef undef undef 0 𝑚 F (0, 𝑓2 ) pos_or_null (x) ≡ x >= 0
𝑓1 ∈ F undef 𝑓1 ∐︀boolop̃︀F 𝑓2 𝑓1 ∈ F 𝑚 F (𝑓1 , 0) 𝑚 F (𝑓1 , 𝑓2 ) pos(x) ≡ x > 0
null(x) ≡ x = 0
present(undef) = 0
present(𝑓 ∈ F) = 1

Figure 6. Function semantics. For context on round and truncate definitions, see §4.3

∐︀program̃︀ ::= ∐︀fundecl̃︀* 3 The Design of a New DSL: M++


∐︀fundecl̃︀ ::= ∐︀funnamẽ︀ ( ∐︀var̃︀* ): ∐︀command̃︀* As described in Fig. 1, the internal compiler of the DGFiP
∐︀command̃︀ ::= if ∐︀expr̃︀ then ∐︀command̃︀* else ∐︀command̃︀*
compiles M files (§2) to C code. Insofar as we understand,
| partition with ∐︀var_kind̃︀ : ∐︀command̃︀* the M codebase originally expressed the whole income tax
| ∐︀var̃︀ = ∐︀expr̃︀ | ∐︀var̃︀* <- ∐︀func̃︀() | del ∐︀var̃︀ computation. However, in the 1990s (§1), the DGFiP started
executing the M code twice, with slightly different parame-
∐︀expr̃︀ ::= ∐︀var̃︀ | ∐︀float̃︀ | undef | ∐︀expr̃︀ ∐︀binop̃︀ ∐︀expr̃︀ | ∐︀unop̃︀ ∐︀expr̃︀
| ∐︀builtiñ︀ ( ∐︀expr̃︀, . . . , ∐︀expr̃︀ ) | exists( ∐︀var_kind̃︀ ) ters, in order for the taxpayer to witness the impact of a tax
reform.
∐︀binop̃︀ ::= ∐︀arithop̃︀ | ∐︀boolop̃︀ Rather than extending M with support for user-defined
∐︀arithop̃︀ ::= + | - | * | / functions, the DGFiP wrote the new logic in C, in a folder
called “inter”, for multi-year computations.
∐︀boolop̃︀ ::= <= | < | > | >= | == | != | && | ||
This piece of code can read and write variables used in the
∐︀unop̃︀ ::= - | ~ M codebase using shared global state. To assemble the final
∐︀var_kind̃︀ ::= taxbenefit | deposit | ...
executable, M-produced C files and hand-written “inter” C
files are compiled by GCC and distributed as a shared library.
∐︀func̃︀ ::= ∐︀funnamẽ︀ | call_m Over time, the “inter” folder grew to handle a variety of
∐︀builtiñ︀ ::= present | cast special cases, multiplying calls into the M codebase. At the
time of writing, the “inter” folder amounts to 35,000 lines of
Figure 7. Syntax of the M++ language C code.
This poses numerous problems. First, the mere fact that
“inter” is written in C prevents it from being released to the
public, the DGFiP fearing security issues that might some-
undefined variables, this provides an additional level of guar- how be triggered by malicious inputs provided by the tax-
antee, by ensuring that reduction always produces a value payer. Therefore, the taxpayer cannot reproduce the tax com-
or an error (i.e. we haven’t forgotten any corner cases in putation since key parts of the logic are missing. Second, by
our semantics). Furthermore, we show in the process that virtue of being written in C, “inter” does not compose with
the store is consistent with the typing environment, writ-
ten Γ ⊳ Ω. This entails store typing (i.e. values of the right
M, hindering maintainability, readability and auditability.
Third, C limits the ability to modernize the codebase; right
type are to be found in the store) and proper handling of
undefined variables (i.e. dom Ω ⊆ dom Γ).
now, the online tax simulator is entirely written in C using
Apache’s CGI feature (including HTML code generation), a
Theorem (Expressions). If Γ ⊳ Ω and Γ ⊢ 𝑒 , then there
very legacy infrastructure for Web-based development.
exists 𝑣 such that Γ ⊢ 𝑒 ⇓ 𝑣.
Fourth, C is notoriously hard to analyze, preventing both
the DGFiP and the taxpayer from doing fine-grained analy-
We extend ⊳ to commands, so as to account for exceptions: ses.
To address all of these limitations, we design M++, a com-
Γ ⊳𝑐 Ω𝑐 ⇐⇒ Ω𝑐 = error ∨ Γ ⊳ Ω𝑐 panion domain-specific language (DSL) that is powerful
enough to completely eliminate the hand-written C code.
Theorem (Commands). If Γ ⊢ 𝑐 ⇛ Γ ′ et Γ ⊳𝑐 Ω𝑐 , then there
exists Ω𝑐′ such that Ω𝑐 ⊢ 𝑐 ⇛ Ω𝑐′ and Γ ′ ⊳𝑐 Ω𝑐′ .
3.1 Concrete syntax and new constructions
We provide full proofs and definitions in Coq, along with a The chief purpose of the M++ DSL is to repeatedly call the
guided tour of our development, in the supplement. M rules, with different M variable assignments for each call.
Preprint, Nov. 2020, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko

To assist with this task, M++ provides basic computational 1 compute_benefits():


2 if exists(taxbenefit) or exists(deposit):
facilities, such as functions and local variables. In essence,
3 V_INDTEO = 1
M++ allows implementing a “driver” for the M code. 4 V_CALCUL_NAPS = 1
Fig. 8 shows concrete syntax for M++. We chose syntax 5 partition with taxbenefit:
resembling Python, where block scope is defined by indenta- 6 NAPSANSPENA, IAD11, INE, IRE, PREM8_11 <- call_m()
tion. As the French administration moves towards a modern 7 iad11 = cast(IAD11)
8 ire = cast(IRE)
digital infrastructure, Python seems to be reasonably under-
9 ine = cast(INE)
stood across various administrative services. 10 prem = cast(PREM8_11)
Fig. 7 formally lists all of the language constructs that 11 V_CALCUL_NAPS = 0
M++ provides. A program is a sequence of function declara- 12 V_IAD11TEO = iad11
tions. M++ features two flavors of variables. Local variables 13 V_IRETEO = ire
14 V_INETEO = ine
follow scoping rules similar to Python: there is one local
15 PREM8_11 = prem
variable scope per function body; however, unlike Python,
we disallow shadowing and have no block scope or nonlocal
Figure 8. Example function defined in M++
keyword. Local variables exist only in M++. Variables in all-
caps live in the M variable scope, which is shared between
M and M++, and obey particular semantics. and were able to design a much more principled semantics
that we believe will lower the risk of future errors.
The partition operation operates over a variable kind 𝑘
3.2 Semantics of M++ (§2.1). The sub-block 𝑐 of partition executes in a restricted
scope, where variables having kind 𝑘 are temporarily set
Two constructs support the interaction between M and M++:
to undef. Upon completion of 𝑐, the variables at kind 𝑘 are
the <- and partition operators. They have slightly unusual
restored to their original value, while other variables are
semantics, in the way that they deal with the M variable
propagated from the sub-computation into the parent scope.
scope. These semantics are heavily influenced by the needs
This allows running computations while “disabling” groups
of the DGFiP, as we strived to provide something that would
of variables, e.g. ignoring an entire category of tax credits.
feel intuitive to technicians in the French administration.
To precisely define the expected behavior, Fig. 9 presents
reduction semantics of the form Δ, Ω1 ⊢ 𝑐 ↝ Ω2 , mean-
3.3 Example
ing command 𝑐 updates the store from Ω1 to Ω2 , given the Fig. 8 provides a complete M++ example, namely the function
compute_benefits.
functions declared in Δ.
We distinguish built-ins, which may only appear in ex- The conditional at line 2 uses a variable kind-check (§2.1)
pressions and do not modify the global store, from functions, to see if any variables of kind “tax benefit” have a non-undef
which are declared at the top-level and may modify the store. value. Then, lines 3-4 set some flags before calling M. Line
The call_m operation is a special function. The <- operator 5 tells us that the call to M at line 6 is to be executed in a
takes a function call, and executes it in a copy of the memory. restricted context where variables of kind “tax benefit” are
Then, only those variables that appear on the left-hand side set to undef. Line 6 runs the M computation, over the current
see their value propagated to the parent execution environ- state of the M variables; five M output variables are retained
ment. Thus, call_m only affects variables 𝑋⃗ .
from this M execution, while the rest are discarded. Lines
7-11 represent local variable assignment, where cast has the
To execute the function call, the <- operator either looks up
same effect as + 0 in M, namely, forcing the conversion of
definitions in Δ, the environment of user-defined functions,
undef to 0. Then, lines 11-15 set M some variables as input
or executes the M rules in the call_m case, relying on the
earlier definition of ⇛ (Fig. 5).
for later function calls.
Worded differently, our semantics introduce a notion of
call stack and treat the M computation as a function call 4 Mlang: an M/M++ Compiler
returning multiple values. It is to be noted that the original C After clarifying the semantics of M (§2), and designing a new
code had no such notion, and that the 𝑋⃗ were nothing more DSL to address its shortcomings (M++, §3), we now present
than mere comments. As such, there was no way to statically Mlang, a modern compiler for both M and M++.
rule out potential hidden state persisting from one call_m to
another since the global scope was modified in place. 4.1 Architecture of Mlang
With this formalization and its companion implementa- Mlang takes as input an M codebase, an M++ file, and a file
tion (§4), we were able to confirm that there is currently no specifying assumptions (described in the next paragraph).
reliance on hidden state (something which we suspect took Mlang currently generates Python or C; it also offers a built-
considerable effort to enforce in the hand-written C code), in interpreter for computations. Mlang is implemented in
A Modern Compiler for the French Tax Code Preprint, Nov. 2020, Online

Judgments: Δ, Ω ⊢ 𝑒 £ 𝑣 (“Under Δ, Ω, 𝑒 evaluates into v”) Δ, Ω1 ⊢ 𝑐 ↝ Ω2 (“Under Δ, 𝑐 transforms Ω1 into Ω2 ”)


Cast-float Cast-undef Exists-true Exists-false
Δ, Ω ⊢ e £ 𝑓 𝑓 ≠ undef Δ, Ω ⊢ e £ undef ∃𝑋 ∈ Ω, 𝑘𝑖𝑛𝑑(𝑋 ) = 𝑘 ∧ Ω(𝑋 ) ≠ undef ∀𝑋 ∈ Ω, 𝑘𝑖𝑛𝑑(𝑋 ) ≠ 𝑘 ∨ Ω(𝑋 ) = undef
Δ, Ω ⊢ cast(e) £ 𝑓 Δ, Ω ⊢ cast(e) £ 0 Δ, Ω ⊢ exists(k) £ 1 Δ, Ω ⊢ exists(k) £ 0

Call
Ω1 ⊢ M rules ⇛ Ω2 if f = call_m Ω3 (𝑌 ) = Ω1 (𝑌 ) if 𝑌 ∈⇑ 𝑋⃗
Δ, Ω1 ⊢ Δ(𝑓 ) ↝ Ω2 otherwise Ω3 (𝑌 ) = Ω2 (𝑌 ) if 𝑌 ∈ 𝑋⃗
Δ, Ω1 ⊢ 𝑋⃗ ← f() ↝ Ω3

Partition
Ω2 (𝑌 ) = undef if kind(𝑌 ) = 𝑘 Ω4 (𝑌 ) = Ω1 (𝑌 ) if kind(𝑌 ) = 𝑘 Delete
Δ, Ω2 ⊢ 𝑐 ↝ Ω3
Ω2 (𝑌 ) = Ω1 (𝑌 ) otherwise Ω4 (𝑌 ) = Ω3 (𝑌 ) otherwise Δ, Ω1 ⊢ v = undef ↝ Ω2
Δ, Ω1 ⊢ partition with k ∶ 𝑐 ↝ Ω4 Δ, Ω1 ⊢ del v ↝ Ω2

Figure 9. Reduction rules of M++

OCaml, with around 9,000 lines of code. The general archi- income tax computation is released. There are 214 inputs,
tecture is shown in Fig. 10. The M files and the M++ program and we chose 11 output variables. Basic accepts as inputs
are first parsed and transformed into intermediate represen- only the marital status and the salaries of each individual of
tations. These intermediate representations are inlined into the couple. The output is the income tax.
a single backend intermediate representation (BIR), consist-
ing of assignments and conditionals. Inlining is aware of
4.2 Optimizations
the semantic subtleties described in Fig. 9 and uses tempo-
rary variable assignments to save/restore the shared M/M++ In the 2018 tax code, the initial number of BIR instructions
scope. after inlining M
BIR code is then translated to the optimization intermedi- and M++ files together is 656,020. This essentially cor-
ate representation (OIR) in order to perform optimizations. responds to what the legacy compiler normally generates,
OIR is the control-flow-graph equivalent of BIR. since it performs no optimizations.
Additional assumptions. In M, a variable not defined in Thanks to its modern compiler architecture, Mlang can
the current memory environment evaluates to undef (rule easily perform numerous textbook optimizations, namely
D-Var-Undef, Fig. 4). This permissive behavior is fine for an dead code elimination, inlining and partial evaluation. This
interpreter which has a dynamic execution environment; allows greatly improving the quality of the generated code.
however, our goal is to generate efficient C and Python code The number of instructions after optimization is shown
that can be integrated into existing software. As such, declar- in Fig. 11. Without any assumption (All), the optimizations
ing every single one of the 27,113 shrink the generated C code to 15% of the unoptimized size
possible variables in C would be quite unsavory. (a factor of 6.5). With the most restrictive assumption file
We therefore devise a mechanism that allows stating ahead (Simplified), only 0.47% of the original instructions remain
of time which variables can be truly treated as inputs, and after optimization.
which are the outputs that we are interested in. Since these Definedness analysis. Due to the presence of undef, some
vary depending on the use-case, we choose to list these as- usual optimizations are not available. For example, optimiz-
sumptions in a separate file that can be provided alongside ing e * 0 into 0 is incorrect when e is undef, as undef * 0 =
with the M/M++ source code, rather than making this an in- undef. Similarly, e + 0 cannot be rewritten as e. Our partial
trinsic, immutable property set at variable-declaration time. evaluation is thus combined with a simple definedness anal-
Doing so increases the quality of the generated C or Python. ysis. The lattice of the analysis is shown in Fig. 12; we use
We call these assumption files; we have hand-written 5 of the standard sharp symbol of abstract interpretation [7] to
those. denote abstract elements. The transfer function absorb#
All is the empty file, i.e. no additional assumptions. This defined in Fig. 13 is used to compute the definedness in the
leaves 2459 input variables, and 10,411 output variables for case of the multiplication, the division and all operators in
the 2018 codebase. Selected outs enables all input variables, ∐︀boolop̃︀. The cast# transfer function is used for the addi-
but retains only 11 output variables. Tests corresponds to tion and the subtraction.
the inputs and outputs used in the test files used by the This definedness analysis enables finer-grained partial
DGFiP. Simplified corresponds to the simplified simulator evaluation rules, such as those presented in Fig. 14.
released each year by the DGFiP a few months before the full The optimizations for + 0 and * 0 are invalid in the pres-
ence of IEEE-754 special values (NaN, minus zero, infinities)
Preprint, Nov. 2020, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko

assumptions.m_spec Python

sources.m M AST M IR BIR Interpreter

source.mpp M++ AST M++ IR OIR C

Parsing Desugaring Inlining Optimization Transpiling

Figure 10. Mlang compilation passes

// my_var1 is a local variable always defined


Spec. name # inputs # outputs # instructions
#define my_truncate(a) ( my_var1=(a)+0.000001,floor(my_var1) )
All 2,459 10,411 100,396 #define my_round(a) (floor(
Selected outs 2,459 11 72,186 (a<0) ? (double)(long long)(a-.50005)
: (double)(long long)(a+.50005)))
Tests 1,680 651 80,481
Simplified 214 11 3,142
Figure 15. Custom rounding and truncation rules
Basic 3 1 345

Figure 11. Number of instructions generated after optimiza- 4.3 Backends


tion. Instructions with optimizations disabled: 656,020. DGFiP (legacy).
The DGFiP’s legacy system has a single backend that pro-
⊺ duces pre-ANSI (K&R) C. For each M rule, two C computa-
tions are emitted. The first one aims to determine whether
the resulting value is defined. It operates on C’s char type,
undef# ∐︀float̃︀#
where 0 is undefined or 1 is defined. The second computa-
tion is syntactically identical, except it operates on double
– and thus computes the actual arithmetic expression. This
two-step process explains some of the operational semantics:
Figure 12. Definedness lattice with 0 being undefined, the special value undef is absorbing
for e.g. the multiplication.
Careful study of the generated code also allowed us to
𝑑1 𝑑2 absorb# (𝑑 1, 𝑑 2 ) cast# (𝑑 1, 𝑑 2 )
nail down some non-standard rounding and truncation rules
undef# undef# undef# undef# which had until then eluded us. We list them in Fig. 15; these
undef# ∐︀float̃︀# undef# ∐︀float̃︀# are used to implement the built-in operators from Fig. 2 in
∐︀float̃︀# undef# undef# ∐︀float̃︀# both our interpreter and backends.
∐︀float̃︀# ∐︀float̃︀# ∐︀float̃︀# ∐︀float̃︀# Mlang.
Our backend generates C and Python from BIR. Since BIR
Figure 13. Transfer functions over the definedness lattice, only features assignments, arithmetic and conditionals, we
implicitly lifted to the full lattice. plan to extend it with backends for JavaScript, R/MatLab and
even SQL for in-database native tax computation. Depend-
ing on the DGFiP’s appetite for formal verification, we may
𝑒 + undef ↝ 𝑒 𝑒 ∶ ∐︀float̃︀# + 0 ↝ 𝑒 verify the whole compiler since the semantics are relatively
𝑒 ∗1↝𝑒 𝑒 ∶ ∐︀float̃︀# ∗ 0 ↝ 0 small.
max(0, min(0, 𝑥)) ↝ 0 present(undef) ↝ 0 Implementing a new backend is not very onerous: it took
max(0, −max(0, 𝑥)) ↝ 0 present(𝑒 ∶ ∐︀float̃︀# ) ↝ 1 us 500 lines for the C backend and 375 lines for the Python
backend. Both backends are validated by running them over
Figure 14. Examples of optimizations the entire test suite and comparing the result with our refer-
ence interpreter.
Our generated code only relies on a small library of helpers
[3, 21]. We have instrumented the M code to confirm that which implement operations over M values. These helpers
these are valid on the values used. But for safety, these unsafe are aware of all the semantic subtleties of M and are manually
optimizations are only enabled if the --fast_math flag is set. audited against the paper semantics.
A Modern Compiler for the French Tax Code Preprint, Nov. 2020, Online

5 Analyzing and Evaluating the Tax Code Scheme M compiler C compiler Bin. size Time
Due to the sheer size of the code and number of variables, Original DGFiP GCC -O0 7 Mo ∼ 1 ms
generating efficient code is somewhat delicate – we had the Original DGFiP GCC -O1 7 Mo ∼ 1 ms
pleasure of breaking both the Clang and Python parsers Local vars Mlang Clang -O0 33 Mo ∼ 2 ms
because of an exceedingly naïve translation. Thankfully, ow- Local vars Mlang Clang -O1 19 Mo ∼ 1 ms
ing to our flexible architecture for Mlang, we were able to Array Mlang Clang -O0 19 Mo ∼ 1 ms
quickly iterate and evaluate several design choices. Array Mlang Clang -O1 10 Mo ∼ 0.5 ms
We now show the benefits of a modern compiler infras-
tructure, and proceed to describe a variety of instrumen- Figure 16. Performance of the C code generated by various
tations, techniques and tweaking knobs that allowed us to compilation schemes for the M code. The time measured is the
gain insights on the the tax computation. By bringing the M time spent inside the main tax computation function for one fiscal
language into the 21st century, we not only greatly enhance household picked in the set of test cases. Size of the compiled binary
the quality of the generated code, but also unlock a host of is indicated. “Original” corresponds to the DGFiP’s legacy system.
techniques that significantly increase our confidence in the “Local vars” corresponds to Mlang’s C backend mapping each M
French tax computation. variable to a C local variable. “Array” corresponds to Mlang’s C
backend mapping M variables to an array of variables.
5.1 Performance of the generated code
We initially generated C code that would emit one local
variable per M variable. But with tens of thousands of local negatively or positively. We plan in the future to devise
variables, running the code required ulimit -s. Furthermore, a static analysis that could formally detect errors, such as
this naïve compilation technique resulted in code twice as comparisons that are always false, or numbers that may be
slow as with the legacy compiler (2ms instead of 1ms, with suspiciously close to zero (denormals).
-O0), which we attribute to constantly saving and restoring Fixed precision. Nevertheless, floating-point computa-
registers (spills). tions are notoriously hard to analyze and reason about, so
We analyzed the legacy code and found out that the DGFiP we set out to investigate replacing floats with integer val-
stored all of the M variables in a global array. We imple- ues. In our first experiment, we adopted big decimals, i.e.
mented the same technique and found out that with -O0, we a bignum for the integer part and a fixed amount of digits
were as fast as the legacy code. We attribute this improve- for the fractional part. Our test suite indicates that the inte-
ment to the fact that the array, which is a few dozen kB, ger part never exceeds 9999999999 (encodable in 37 bits); it
which fits in the L2 cache of most modern processors. This also indicates that with 40 bits of precision for the fractional
is a surprisingly fortuitous choice by the DGFiP. Compil- part, we get correct results. This means that a 128-bit integer
ing with -O1, we further halve the execution time and reach would be a viable alternative to a double, with the added
0.5ms (a 50% performance improvement over the legacy im- advantage that formal analysis tools would be able to deal
plementation). We attribute this gain to our ahead-of-time with it much better.
optimizations which significantly reduce the quantity of gen- Using rationals. Finally, we wondered if it was possible
erated C code. See Fig. 16 for full results. to completely work without floating-point and eliminate im-
precision altogether, taking low-level details such as round-
5.2 The cost of IEEE-754 ing mode and signed zeroes completely out of the picture.
Relying on IEEE-754 and its limited precision for something To that end, we encoded values as fractions where both
as crucial as the income tax of an entire nation naturally numerator and denominator are big integers. We observed
raises questions. Thanks to our new infrastructure, we were that both never exceed 2128 , meaning we could conceivably
able to instrument the generated code and gain numerous implement values as a struct with two 128-bit integers and a
insights. sign bit. We have yet to investigate the performance impact
Does precision matter? We tweaked our backend to use of this change.
the MPFR multiprecision library [13]. With 1024-bit floats,
all tests still pass, meaning that there is no loss of precision 5.3 Test-case generation
with the double-precision 64-bit format. The DGFiP test suite is painstakingly constructed by hand
Does rounding matter? We then instrumented the code by lawyers year after year. From this test suite, we picked
to measure the effect of the IEEE-754 rounding mode on the fi- the 476 cases that don’t raise any exceptions (see §2.1). The
nal result. Anything other than the default (rounding to near- DGFiP has no infrastructure to automatically generate cases
est, ties to even) generates incorrect results. The control-flow that would exercise new situations. As such, the test suite
remains roughly the same, but some comparisons against remains relatively limited in the variety of households it
0 do give out different results as the computation skews covers.
Preprint, Nov. 2020, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko

Generating test cases is actually non-trivial: the search DGFiP Private (476 tests) Randomized (1267 tests)
Fuzzer-generated (358 tests)

Percentage of assignments
space is incredibly large, owing to the amount of variables,
but also deeply constrained, owing to the fact that most vari-
ables only admit a few possible values (§1), and are further 60%
constrained in relationship to other variables. 40%
We now set out to automatically generate fresh (valid) test 20%
cases for the tax computation, with two objectives: assert on 0%
a very large number of test cases that our code and the legacy 0 (uncovered) 1 2 or more
implementation compute the same result; and exhibit corner Number of distinct values assigned

cases that were previously not exercised, so as to generate


fresh novel tax situations for lawmakers to consider. Figure 17. Value coverage of assignments for each test suite
Randomized testing. We start by randomly mutating the
legacy test suite, in order to generate new distinct, valid test
cases. If a test case raises an exception, we discard it. We seeing that there is currently none in the DGFiP infrastruc-
obtain 1267 tests, but these are, unsurprisingly, very close to ture. However, traditional code coverage makes little sense:
the legacy test suite and do not exercise very many new situa- conditionals are very rare in the generated code.
tions. They did, however, help us when reverse-engineering Rather, we focus on value coverage:
the semantics of M. We now have 100% conformance on for each assignment in the code, we count the number of
those tests. distinct values assigned during the execution of an entire test
Coverage-guided fuzzing. In order to better explore the case. This is a good proxy for test quality: the more different
search space, we turn to AFL [31]. The tool admits several values flow through an assignment, the more interesting the
usage modes – finding genuine crashes (e.g. segfaults), or tax situation is.
generating test cases for further seeding into the rest of Fig. 17 shows our measurements. The first take-away is
the testing pipeline. We focus on the latter mode, meaning that our randomized tests did not result in meaningful tests:
that we generate an artificial “crash” when a synthesized the number of assignments that are uncovered actually in-
testcase raises no M errors, that is, when we have found a creased. The tests we obtained with AFL, however, signifi-
valid testcase. cantly increase the quality of test coverage. We managed to
We first devise an injection from opaque binary inputs, synthesize many tests that exercise statements previously
which AFL controls, to the DGFiP input variables. Once unvisited by the DGFiP’s test suite, and exhibit much more
“crashes” have been collected, we simply emit a set of test complex assignments (2 or more different values assigned).
inputs that has the same format as the DGFiP. Our knowledge of the existing DGFiP test suite is incom-
Thanks to this very flexible architecture, we were able to plete, as we only have access to a partial set of tests. In
perform fully general fuzzing exercising all input variables, particular, a special set of rules apply when the tax needs
as well as targeted fuzzing that focuses on a subset of the to be adjusted later on following an audit, and the tests for
variables. The former takes a few hours on a high-end ma- these have not been communicated to us. We hope to obtain
chine; the latter mere minutes. We synthesized around 30,000 visibility onto those in the future.
tests cases, which we reduced down to 358 using afl-cmin.
So far, the fuzzer-generated test case have pointed out
6 Related Work and Conclusion
of a few bugs in Mlang’s optimizations and backends. We
plan to further use AFL to find find test cases that satisfy 6.1 Implementing the law
extra properties not originally present in the tax code, e.g. Formalizing part of the law using logic programming or
an excessively high marginal tax rate that might raise some a custom domain specific language has been extensively
legality questions. tried in the past, as early as 1914 [1, 2, 9, 12, 15, 23, 26].
Symbolic execution fuzzing. We attempted to use dy- Most of these works follow the same structure: they take a
namic symbolic execution tool KLEE [5], but found out that it subset of the law, analyze its logical structure, and encode
only had extremely limited support for floating-point compu- it using a novel or existing formalism. All of them stress
tations. As detailed earlier (§5.2), we have found that integer the complexity of this formal endeavor, coming from i) the
based computations are a valid replacement for floats, and underlying reality that the law models and ii) the logical
plan to use this alternate compilation scheme to investigate structure of the legislative text itself. After more a century
whether KLEE would provide interesting test cases. of research, no silver bullet has emerged that would allow
to systematically translate the text of a law into a formal
5.4 Coverage measurements model.
Finally, we wish to evaluate how “good” our new test cases However, domain-specific attempts have been more suc-
are. Code coverage seems like a natural notion, especially cessful. Recently, blockchain has demonstrated increased
A Modern Compiler for the French Tax Code Preprint, Nov. 2020, Online

interest for domain-specific languages encoding smart con- or Approximation of Fixpoints. In POPL. ACM, 238–252.
tracts [14, 16, 25, 30]. Regular private commercial contracts [8] Equipe Leximpact de L’Assemblée nationale. 2019. LexImpact. https:
have also been targeted for formalization [6, 28], as well //leximpact.an.fr/
[9] John Dewey. 1914. Logical method and law. Cornell LQ 10 (1914), 17.
as financial contracts [11, 22]. Concerning the public sec- [10] Direction Générale des Finances Publiques (DGFiP). 2019. Les règles
tor, the “rules as code” movement has been the object of an du moteur de calcul de l’impôt sur le revenu et de l’impôt sur la fortune
exhaustive OECD report [20]. immobilière. https://gitlab.adullact.net/dgfip/ir-calcul
Closer to the topic of this paper, the logical structure of the [11] Jean-Marc Eber. 2009. The Financial Crisis, a Lack of Contract Specifi-
US tax law has been extensively studied by Lawsky [18, 19], cation Tools: What Can Finance Learn from Programming Language
Design?.. In ESOP. 205–206.
pointing out the legal ambiguities in the text of the law that [12] D Fernández Duque, M González Bedmar, D Sousa, Joosten, J.J, and
need to be resolved using legal reasoning. She also claims G. Errezil Alberdi. 2019. To drive or not to drive: A formal analysis
that the tax law drafting style follows default logic [24], a of requirements (51) and (52) from Regulation (EU) 2016/799. In Per-
non-monotonic logic that is hard to encode in languages sonalidades jurídicas difusas y artificiales. TransJus Working Papers
with first-order logic (FOL). This could explain, as M is also Publication - Edición Especial, 159–171. http://diposit.ub.edu/dspace/
handle/2445/137759
based on FOL, the complexity of the DGFiP codebase. [13] Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier,
As this complexity generates opacity around the way taxes and Paul Zimmermann. 2007. MPFR: A multiple-precision binary
are computed, another government agency set out to re- floating-point library with correct rounding. ACM Transactions on
implement the entire French socio-fiscal system in Python Mathematical Software (TOMS) 33, 2 (2007), 13–es.
[14] Xiao He, Bohan Qin, Yan Zhu, Xing Chen, and Yi Liu. 2018. Spesc: A
[27]. Even if this initiative was helpful and used as a compu-
specification language for smart contracts. In 2018 IEEE 42nd Annual
tation backend for various online simulators, the results it Computer Software and Applications Conference (COMPSAC), Vol. 1.
returns are not legally binding, unlike the results returned by IEEE, 132–137.
the DGFiP. Furthermore, this Python implementation does [15] Nils Holzenberger, Andrew Blair-Stanek, and Benjamin Van Durme.
not deal with all the corner cases of the law. To the extent of 2020. A Dataset for Statutory Reasoning in Tax Law Entailment and
our knowledge, our work is unprecedented in terms of size Question Answering. arXiv preprint arXiv:2005.05257 (2020).
[16] Tom Hvitved. 2011. Contract formalisation and modular implementation
and exhaustiveness of the portion of the law turned into a of domain-specific languages. Ph.D. Dissertation. Citeseer.
reusable and formalized software artifact. [17] Camille Landais, Thomas Piketty, and Emmanuel Saez. 2011. Pour
une révolution fiscale: Un impôt sur le revenu pour le 21 ème siècle.
6.2 Conclusion https://www.revolution-fiscale.fr/simuler/irpp/
[18] Sarah B. Lawsky. 2017. Formalizing the Code. Tax Law Review 70, 377
Thanks to modern compiler construction techniques, we
(2017).
have been able to lift up a legacy, secret codebase into a [19] Sarah B. Lawsky. 2018. A Logic for Statutes. Florida Tax Review (2018).
reusable, public artifact that can be distributed into virtually [20] James Mohun and Alex Roberts. 2020. Cracking the code: Rulemaking
any programming environment. The natural next step for for humans and machines. (2020).
the DGFiP is to consider taking more insight from program- [21] Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre
Jeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond,
ming languages research, and design a successor to M/M++
Nathalie Revol, and Serge Torres. 2018. Handbook of Floating-Point
that provides good tooling for translating the tax law into a Arithmetic (2nd Ed.). Springer. https://doi.org/10.1007/978-3-319-
correct and distributable implementation. 76526-6
[22] Grant Olney Passmore and Denis Ignatovich. 2017. Formal Verification
References of Financial Algorithms. In CADE. https://doi.org/10.1007/978-3-319-
63046-5_3
[1] Layman E Allen. 1956. Symbolic logic: A razor-edged tool for drafting [23] Marcos A Pertierra, Sarah Lawsky, Erik Hemberg, and Una-May
and interpreting legal documents. Yale LJ 66 (1956), 833. O’Reilly. 2017. Towards Formalizing Statute Law as Default Logic
[2] Layman E. Allen and C. Rudy Engholm. 1978. Normalized legal drafting through Automatic Semantic Parsing.. In ASAIL@ ICAIL.
and the query method. Journal of Legal Education 29, 4 (1978), 380–412. [24] R. Reiter. 1987. Readings in Nonmonotonic Reasoning. Morgan Kauf-
[3] Sylvie Boldo, Jacques-Henri Jourdan, Xavier Leroy, and Guillaume mann Publishers Inc., San Francisco, CA, USA, Chapter A Logic for De-
Melquiond. 2015. Verified Compilation of Floating-Point Computa- fault Reasoning, 68–93. http://dl.acm.org/citation.cfm?id=42641.42646
tions. J. Autom. Reason. 54, 2 (2015), 135–163. https://doi.org/10.1007/ [25] Vincenzo Scoca, Rafael Brundo Uriarte, and Rocco De Nicola. 2017.
s10817-014-9317-x Smart contract negotiation in cloud computing. In 2017 IEEE 10th
[4] Sylvie Boldo and Guillaume Melquiond. 2011. Flocq: A Unified Library International Conference on Cloud Computing (CLOUD). IEEE, 592–
for Proving Floating-Point Algorithms in Coq. In 20th IEEE Symposium 599.
on Computer Arithmetic, ARITH 2011, Tübingen, Germany, 25-27 July [26] M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, and
2011, Elisardo Antelo, David Hough, and Paolo Ienne (Eds.). IEEE H. T. Cory. 1986. The British Nationality Act As a Logic Program.
Computer Society, 243–252. https://doi.org/10.1109/ARITH.2011.40 Commun. ACM 29, 5 (May 1986), 370–386.
[5] Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. Klee: unas- [27] Sébastien Shulz. 2019. Free software to tackle the lack of transparency
sisted and automatic generation of high-coverage tests for complex in the social tax system. Sociology of a heterogeneous movement at
systems programs.. In OSDI, Vol. 8. 209–224. the margins of the state. Revue francaise de science politique 69, 5
[6] John J Camilleri. 2017. Contracts and Computation—Formal modelling (2019), 845–868.
and analysis for normative natural language. [28] SMU Centre for Computational Law. 2020. The L4 domain-specific
[7] Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A language. https://github.com/smucclaw/dsl
Unified Lattice Model for Static Analysis of Programs by Construction
Preprint, Nov. 2020, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko

[29] The Coq Development Team. 2017. The Coq Proof Assistant Reference Verified Software: Theories, Tools, and Experiments. Springer, 229–247.
Manual, version 8.7. http://coq.inria.fr [31] Michal Zalewski. 2014. American fuzzy lop.
[30] Jakub Zakrzewski. 2018. Towards verification of Ethereum smart
contracts: a formalization of core of Solidity. In Working Conference on

You might also like