[go: up one dir, main page]

SQLite's Use Of Tcl

D. Richard Hipp
24th Annual Tcl/Tk Conference
Houston, TX
2017-10-19

1.0 Introduction

SQLite is a TCL extension that has escaped into the wild.

The design of SQLite was inspired by the design of TCL, both in the way it handles datatypes and in the formatting of its source code. The index use case for SQLite was in a Tcl/Tk application for an industrial company. From its inception, SQLite has always depended heavily on TCL. These days, SQLite no longer uses TCL internally and can be run separately from any TCL interpreter, and yet the SQLite development process still depends heavily on TCL.

1.1 About SQLite

SQLite is an SQL database engine, and the most widely used database engine in the world. SQLite is built into all cellphones as a core component and is the primary means of data persistence on phones. SQLite is also an integral part of most web browsers. SQLite is built into MacOS and is used by most of the default applications on that platform. Windows10 requires the C:\Windows\System32\winsqlite3.dll file in order to boot. Countless other popular applications like Skype and WhatsApp and iTunes depend on SQLite.

Because SQLite is open source and can be freely downloaded and duplicated, exact usage numbers are unavailable. But reasonable estimates are that there are more SQLite instances in operation today than there are people on earth. Most devices that use the SQLite database engine contain hundreds of separate databases, and there are billions of such devices. Hence, there are likely around one trillion SQLite databases in active use.

There are more copies of SQLite in use than there are copies of Linux. We know this because SQLite is used on almost all Linux systems, but SQLite is also used on many other non-linux systems such as Macs, iPhones, and Windows computers. By similar arguments, there are more copies of SQLite in use than there are Macs, or Windows PCs. There are probably more copies of SQLite in use than all other database engines combined. It seems likely that SQLite is the most widely used and deployed software component of any kind, with the possible exception of the zLib compression library.

1.2 SQLite Is A TCL Extension, Not A TCL Application

SQLite is not written in TCL. Rather, SQLite is intended to be used by TCL. Like the TCL interpreter, SQLite is written in ANSI C.

The fact that SQLite was primarily intended to be used from TCL is evident in an number of ways.

All programming languages, other than C/C++, require some kind of adaptor in order to interface with the SQLite C implementation. Language adaptors for SQLite are widely available from third party programmers. The adaptors for PHP and Python are built into those languages, for example. A Java adaptor is baked into Android. And so forth. Only the TCL adaptor is included as part of the SQLite core. The source code file that implements the TCL adaptor for SQLite, "tclsqlite.c", was part of the very first check-in on the SQLite source repository on 2000-05-29. (See https://sqlite.org/src/ci/6f3655f79f9b6fc9.)

All modern SQL implementations provide a means to do late binding of parameter values to the SQL statements. Usually a naked "?" character, or a "?" followed by an integer is used. For example:

SELECT passwd, photo FROM user WHERE uid=?1

The "?1" token in the SQL above would be assigned a value at run-time in order to look up the password and photograph of a particular user.

SQLite supports this syntax. But because of its TCL heritage, SQLite also allows the parameter to take the form of a TCL variable. Hence:

SELECT passwd, photo FROM user WHERE uid=$uid

When a statement such as the above is run, the TCL language adaptor automatically binds the value of the $uid TCL variable to the SQL statement before it is evaluated, providing an intuitive and seamless interface between TCL and SQL. SQLite is the only database engine that behaves this way.

The TCL heritage of SQLite is visible in the type system of SQLite. Early versions of SQLite (prior to 2004) operated on the classic TCL principal that "everything is a string". Beginning with SQLite3 (2004-06-18), SQLite also supports binary data. However, types are still very flexible in SQLite, just as they are in TCL. SQLite treats the datatypes on column names in a CREATE TABLE statement as suggestions rather than hard requirements. SQLite is happy to store a 100KB string value in a column that is declared "SHORT INT", just as TCL is happy to store either a large string or a small integer in the same variable. There are some differences in how SQLite deals with datatypes, in comparison to TCL, due to the different nature of the SQL language. SQLite has the concept of "type affinity". If a column is declared "INT" and one inserts a string into that column that looks like an integer and can be safely converted into an integer without loss of information, then that conversion occurs automatically. This provides a measure of compatibility with the rigid type systems of other SQL database engines.

The flexible type system of SQLite seems natural and intuitive to programmers with prior experience programming in TCL. Curiously, though, it is a source of frustration and frequent complaining from programmers accustomed to the rigid and unforgiving type systems of languages like Java.

The similarities in the type systems of TCL and SQLite extends to more than just the interface. An important part of the C/C++ interface for SQLite is the "sqlite3_value" object (https://sqlite.org/c3ref/value.html) which is analogous to the Tcl_Obj object in TCL. Both TCL and SQLite use a dual-representation approach, where each value can be represented simultaneously as both a string and some other type.

2.0 How SQLite Depends On Tcl

SQLite began as a TCL extension, though these days most uses of SQLite are in applications written in languages other than TCL. Many programmers who use SQLite in their applications have no knowledge or awareness of TCL. The SQLite source code used by most developers is a single file of pure C code named "sqlite3.c" that contains no TCL code. This is what we mean when we say that SQLite as "escaped" into the wild. Deployed instances of SQLite no longer depends on TCL.

Nevertheless, SQLite is still heavily dependent upon TCL and the ongoing support, maintenance, and enhancement of SQLite would not be possible without TCL, and would be seriously inconvenienced without Tk.

2.1 SQLite Source Code Is Mostly TCL

The deliverable source code for SQLite is a single file named "sqlite3.c" and its companion header "sqlite3.h". Both files are 100% ANSI-C code. But developers do not edit these files directly. The sqlite3.c and sqlite3.h source files are build products, and the source tree used to build those files is over 50% TCL code. Figure 1 nearby shows the exact ratios.


Figure 1: Breakdown Of SQLite Source Code By Language

Figure 1 is for the main SQLite source repository. Many of the test cases and much of the documentation is held in separate repositories, not included in Figure 1. The separate repositories also contain a great deal of TCL code.

Much of the TCL code in the main SQLite repository consists of test scripts. At this writing, the core repository contains 1153 separate test scripts totally about 389 KB of space. But this is not the only use of TCL in SQLite.

2.2 Machine Generated C Code

A non-trivial amount of the deliverable C code for SQLite is machine generated. Some of the machine generated code is created by C programs, such as LEMON which translates the SQL language grammar into C code to implement a push-down automaton to parse the SQL language. But much of the automatically generated code is created using TCL scripts. TCL is well suited for scanning source files to extract information to be merged with other files and for making mechanized edits. For example, the byte-code engine used to evaluate SQL statements inside of SQLite is implemented as a large "switch" statement inside a "for" loop, with a separate "case" for each opcode, all in the "vdbe.c" source file. At build-time, TCL scripts scan the vdbe.c source file looking for the appropriate "case" statements and then build header files that assign consecutive integers to each symbolic opcode name. (The opcodes used by the SQLite byte-code engine are not an API as they are in TCL and thus can change from one build to the next.) This mapping of symbolic opcode names into integers is not a simple as one might suppose. For reasons of optimization, there are many constraints on the specific values that are assigned to opcodes. For example, many opcodes such as OP_Add must have the same numeric value as the corresponding "+" token in the SQL language parser. Sometimes a group of related opcodes, such as the comparison operators OP_Eq, OP_Ne, OP_Lt, OP_Le, OP_Ge, and OP_Gt, need to be assigned consecutive integers in a specific order. These constraints are all handled effortlessly in TCL. Accomplishing the same with AWK would be rather more difficult.

Perhaps the most important task for TCL during the SQLite build process is constructing the SQLite amalgamation source code file. Recall that most developers use SQLite in the form of a single big file of C code named "sqlite3.c" and referred to as "the amalgamation". A TCL script named "mksqlite3c.tcl" runs in order to construct the amalgamation from over one hundred separate input files. Each of these inputs files must be added to the amalgamation in just the right order. Furthermore, the source files are edited as part of the amalgamation building process. When mksqlite3c.tcl encounters a "#include" for an SQLite header, it replaces the "#include" with a copy of that header file, taking care to make sure each header file is only included once. The mksqlite3.tcl script automatically adds the "static" keyword to internal SQLite APIs to give them file linkage, and makes other similar edits.

2.3 The sqlite3_analyzer.exe Utility

In addition to the core SQLite library, the SQLite source tree also contains code for several analysis and control programs. One of these programs is called "sqlite3_analyzer" (or "sqlite3_analyzer.exe" on Windows). The sqlite3_analyzer program examines an SQLite database and generates a detailed report on the disk usage by the various tables and indexes within that database. The sqlite3_analyzer program is very useful in understanding how an application is using disk space.

It turns out that sqlite3_analyzer, though disguised as an ordinary executable, is really a TCL application. The main source code file for this application is tool/spaceanal.tcl. During the build process, this script is converted into a C-language string constant (using another TCL script) and added to a very simple C-language wrapper than starts a TCL interpreter and then passes the application script to that interpreter.

The sqlite3_analyzer program could be rewritten in pure C. But that would be a lot of code. The TCL script that implements sqlite3_analyzer is less than 1000 lines long. The equivalent C program would surely be at least ten times larger.

Note that the sqlite3_analyzer utility program statically links a TCL interpreter and so does not require a TCL installation on the target computer to use. The sqlite3_analyzer utility program is used by tens of thousands of developers, most of whom do not realize that they are really running a TCL application.

2.4 TCL Used For Testing SQLite

One of the key features of SQLite is that it uses aviation-grade testing. The tests of SQLite, at a minimum, provide 100% modified condition/decision coverage (MC/DC) of the SQLite code, with independence. 100% MC/DC roughly means that every branch instruction at the machine code level is exercised at least once in each direction. The precise definition of MC/DC is slightly stricter than this, for example when comparing boolean vectors, but the 100% branch tests coverage definition is very close approximation. The "with independence" term means that SQLite is tested in multiple ways with test code being written and maintained by different individuals.

The amount of testing done on SQLite is fanatical. On the other hand, that level of testing is necessary for a fundamental low-level component, such as a database engine, that is used by billions of devices. If an ordinary application encounters an obscure bug, it can normally be rebooted to clear the problem. But the job of a database engine is to remember things, and so databases tend to remember their mistakes across reboots. For these reasons, it is important that SQLite have a very low bug density.

TCL is used in every aspect of SQLite testing. The test cases that are part of the primary SQLite source code repository are written in TCL. Other test cases such as TH3 and SQLLogicTest are written in C but still depend on TCL for operation.

The TH3 test suite is a set of proprietary tests for SQLite that form the primary means of achieving 100% MC/DC. TH3 is designed to run on embedded hardware without the support of a desktop operating system. TH3 consists of over 1350 test modules together with over 100 control files. The test modules are written in either C or SQL or a combination of both. The control files are text formatted in a way that easily parsed by TCL. To build a TH3 test, a TCL script is run that combines some subset of the test modules and control files into a single large C program that will automatically run the required tests. This C program is then linked against the "sqlite3.c" amalgamation and the resulting binary is moved to the target computer and executed. TCL scripts automate this entire process on all major host platforms.

To verify that the TH3 tests really do provide 100% MC/DC, special options are added to the TCL scripts that run the tests, causing the GCC coverage analysis tools to be invoked. The output of gcov is then postprocessed to reveal and report any branch instructions that were missed by the tests. The TH3 tests themselves are all implemented in C and SQL, but the operation and management of those tests is all done with TCL.

2.5 TCL Generated Documentation

The extensive documentation for SQLite available on the SQLite website (https://sqlite.org/) is all generated by TCL. Many of the documents, such as the API reference documentation and the descriptions of the byte-code engine opcodes, are created by TCL scripts that scan C source code and extract the necessary information from the code and comments. Thus, the API documentation is largely derived from comments in the source code. Keeping the official documentation (in comments) and the source code close together helps ensure that they are in agreement.

Other whitepaper documents are generated from source files that look mostly like HTML but which contain additional TCL code embedded inside of <tcl>...</tcl> markup. The added TCL code is used for advanced formatting techniques, for automatically creating cross-references and cross-links, and for constructing complex displays such as the popular "rail-road" syntax diagrams for SQL.

2.6 Tcl/Tk Used During Development Of SQLite

The text editor used by the primary author of SQLite is a custom editor with emacs-style key bindings that is built on top of the Tk Text widget. The "e" editor, as it is called, is cross-platform, which helps in the development of a cross-platform software library like SQLite. The "e" editor has been used for the entire 17-year history of SQLite. It has been enhanced over the years with various customizations created especially to help manage the SQLite source code.

The Fossil version control system used for the SQLite source code (and written specifically for that purpose) uses Tcl/Tk to show graphical side-by-side diffs in a separate window. When the "fossil diff --tk" command is run, Fossil generates a script to show the diff graphically and then kicks off a separate "wish" process to run that script and display the diff in a separate window. This graphical diff window has a "Save" button which will cause the Tcl/Tk code needed to reproduce itself to be written to a file. This file can be, in turn, sent to a collaborator for display. Passing around graphical diffs as ordinary text files is much simpler and easier than passing around JPEG images or text "context" diffs.

No two SQLite developers work in the same office. The team is geographically distributed. To help the team stay in touch, a custom chatroom has been created using a Tcl/Tk script. The same script works as both client and server. The chatroom is private and uses a proprietary protocol, so that developers are free to discuss sensitive matters without fear of eavesdropping. The chatroom is implemented as just over 1000 lines of Tk code, and is thus accessible and easy to customize. Among the customizations is the ability to send saved "fossil diff --tk" graphical diffs to collaborators and have the diff pop automatically on the collaborators screen. Small features like this seem trivial by themselves, but together than help the developers to work much more efficiently. These kinds of productivity-enhancing features are unavailable to users of commercial business collaboration packages such as HipChat.

3.0 Summary

SQLite is an indispensable element of most modern computer systems, and TCL is an indispensable tool used in the production and maintenance of SQLite. Thus, the computing infrastructure we enjoy today would not exist except for TCL.

As deployed, SQLite contains no TCL code. However, the design of SQLite is inspired by TCL. And TCL is used extensively in the code generation, testing, analysis, documentation, and development of SQLite. Without TCL, SQLite would not exist.

Every developer and every team has a finite number of "brain cycles" available to do their job. The fewer cycles spent messing with tools, the more cycles are available to devote towards solving the problem. So for maximum productivity, it is important to use tools that get the job done with a minimum of fuss and bother. Our 17-year experience using TCL in the SQLite project has convinced us that TCL is just such a tool. Tcl provides the most help per brain cycle of any similar technology.