CN118377463A

CN118377463A - A software design method, device and equipment for mobile database

Info

Publication number: CN118377463A
Application number: CN202410456897.2A
Authority: CN
Inventors: 王胜; 王海月; 王征权; 袁来; 彭志勇
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2024-04-16
Filing date: 2024-04-16
Publication date: 2024-07-23

Abstract

The application discloses a software design method, a device and equipment of a mobile database, and relates to the field of databases supporting object agent models developed by small mobile terminals; execution of TOTEM-SQL commands is completed through a TOTEM query system; the redox operation is performed by comparing the operation record in the log file with the operation actually performed based on the WAL method to record the redox information to the log file. The application stores and organizes the data in a structuring way, so that the data can be efficiently read and written in by calculation, and the system crash can be timely recovered by the log management part, thereby ensuring the data security.

Description

A software design method, device and equipment for mobile database

技术领域Technical Field

本申请涉及小型移动端开发的支持对象代理模型的数据库领域，具体涉及一种移动数据库的软件设计方法、装置及设备。The present application relates to the field of databases supporting object proxy models developed for small mobile terminals, and specifically to a software design method, device and equipment for a mobile database.

背景技术Background technique

移动数据库(Totem Mobile Database，简称TMDB)软件是一种小型移动端数据库，主要设计用于移动设备和嵌入式系统上轻量级数据库系统。通过手持终端、小型便携设备等方式携带移动数据库进行野外作业、地理信息采集。当前的移动数据库存在以下问题：资源消耗太大，无法保证在有限的移动环境中高效运行；对实时性要求较高的一些应用存在应用局限性；需要远程调用导致使用速度较慢。Totem Mobile Database (TMDB) software is a small mobile database designed for lightweight database systems on mobile devices and embedded systems. Mobile databases are carried by handheld terminals, small portable devices, etc. for field operations and geographic information collection. Current mobile databases have the following problems: too much resource consumption, unable to guarantee efficient operation in a limited mobile environment; there are application limitations for some applications with high real-time requirements; and the need for remote calls leads to slow usage.

发明内容Summary of the invention

本申请提供一种移动数据库的软件设计方法、装置及设备，通过一种结构化的方式来保存和组织数据，使得计算能够高效地读取和写入数据，并通过日志管理部分，保障系统崩溃能及时恢复，保证数据安全性。The present application provides a software design method, device and equipment for a mobile database, which saves and organizes data in a structured manner so that the calculation can efficiently read and write data, and through the log management part, ensures that the system crash can be recovered in time to ensure data security.

第一方面，本申请实施例提供一种移动数据库的软件设计方法，所述移动数据库的软件设计方法包括：In a first aspect, an embodiment of the present application provides a software design method for a mobile database, the software design method for a mobile database comprising:

基于存储层次结构设计，将数据表通过转化为键值对的形式存放于LSM-Tree中，并在LSM-Tree中引入compaction机制，周期性的合并和优化操作，采用B-Tree作为索引，应用Bloom Filter快速查找算法，实现存储管理；Based on the storage hierarchy design, the data table is converted into a key-value pair and stored in the LSM-Tree. The compaction mechanism is introduced into the LSM-Tree to perform periodic merging and optimization operations. B-Tree is used as the index and the Bloom Filter fast search algorithm is applied to achieve storage management.

通过TOTEM的查询系统完成TOTEM-SQL命令的执行，且查询系统接收查询服务命令并在分析器中编译形成查询树，经重写和优化后进入执行器，由存储系统完成磁盘数据的获取，返回查询结果，实现编译查询；The TOTEM-SQL command is executed through TOTEM's query system, and the query system receives the query service command and compiles it in the analyzer to form a query tree, which enters the executor after rewriting and optimization. The storage system completes the acquisition of disk data and returns the query results to realize the compiled query.

基于WAL方法以记录redo信息至日志文件，通过比较日志文件中的操作记录与实际执行的操作来执行redo操作，以在移动数据库崩溃时恢复移动数据库，实现日志管理。Based on the WAL method, redo information is recorded in the log file, and the redo operation is performed by comparing the operation record in the log file with the actual operation, so as to restore the mobile database when the mobile database crashes and realize log management.

结合第一方面，在一种实施方式中，In combination with the first aspect, in one implementation,

对于存储管理，在内存中，数据存储为元组列表，且元数据则分布在多个表中，所述表包括类表、副表、对象表、切换表和双向指针表；For storage management, in memory, data is stored as a list of tuples, and metadata is distributed in multiple tables, including class table, sub-table, object table, switch table and bidirectional pointer table;

所述表由内存管理器进行管理，所述内存管理器用于提供数据插入功能以及监控MemTable的大小，且当MemTable的大小超过阈值时，将MemTable转换为不可变的MemTable，并启动compaction过程，将不可变的MemTable转化为SSTable并存储于最底层；The table is managed by a memory manager, which is used to provide data insertion functions and monitor the size of MemTable. When the size of MemTable exceeds a threshold, the MemTable is converted into an immutable MemTable, and a compaction process is started to convert the immutable MemTable into an SSTable and store it at the bottom layer.

外存数据通过LSM-Tree结构进行组织，层级管理器负责追踪每个SSTable的层级和关键信息，层级管理器还用于执行compaction，包括选择适当的层级和SSTable进行合并，并实施多个SSTable的合并操作。The external storage data is organized through the LSM-Tree structure. The level manager is responsible for tracking the level and key information of each SSTable. The level manager is also used to perform compaction, including selecting appropriate levels and SSTables for merging, and implementing the merge operation of multiple SSTables.

所述LSM-Tree将数据存储在MemTable中，数据分为可变的mutable MemTable和不可变的immutable MemTable，数据首先写入可变MemTable，填满后转变为不可变MemTable，而新的数据写入需等待空闲的可变MemTable；The LSM-Tree stores data in MemTable, which is divided into mutable MemTable and immutable MemTable. Data is first written into the mutable MemTable, and then converted into the immutable MemTable after it is filled. New data needs to wait for the free mutable MemTable to be written.

不可变MemTable通过compaction操作合并到外存，并转换回为可变MemTable以便再次写入；The immutable MemTable is merged into external memory through compaction operations and converted back to a mutable MemTable for writing again;

外存中的数据以SSTable的形式进行存储，数据按key排序并分层，从最底层到最高层，且层级越高，SSTable数量越多，单个大小越大；The data in the external memory is stored in the form of SSTable. The data is sorted and layered by key, from the bottom layer to the top layer. The higher the layer, the more SSTables there are and the larger the size of each SSTable.

数据从内存的不可变MemTable开始compaction，首先转换为最底层的SSTable，随后当某层SSTable达到阈值，则进一步compaction至更高层级，直至达到最高层级。Data starts to be compacted from the immutable MemTable in memory, first converted to the bottom-level SSTable, and then when a certain level of SSTable reaches the threshold, it is further compacted to a higher level until the highest level is reached.

结合第一方面，在一种实施方式中，对于SSTable中指定key的操作步骤，具体包括：In conjunction with the first aspect, in one implementation, the operation steps for specifying a key in the SSTable specifically include:

读取Footer以获取组件信息，检查SSTable的最小和最大key以确定key是否在范围内，若是，则通过Bloom Filter验证key是否存在于SSTable中，之后根据index block中的信息定位到包含目标key的数据块，并在该数据块中搜索key。Read Footer to obtain component information, check the minimum and maximum keys of SSTable to determine whether the key is in the range. If so, verify whether the key exists in SSTable through Bloom Filter, then locate the data block containing the target key according to the information in the index block, and search for the key in the data block.

结合第一方面，在一种实施方式中，所述compaction用于清除内存数据刷盘后产生的数据层交叠和重复key累积，且compaction还用于通过定期合并多层数据来清理旧版本数据。In combination with the first aspect, in one implementation, the compaction is used to clear data layer overlap and duplicate key accumulation generated after the memory data is flushed to disk, and the compaction is also used to clean up old version data by periodically merging multiple layers of data.

结合第一方面，在一种实施方式中，对于编译查询，分析器用于检查输入的查询字符串，通过词法、语法和语义分析确保查询字符串的正确性，若查询字符串有效，则分析器创建一查询树并传递给计划器，若无效，则返回错误提示。In combination with the first aspect, in one embodiment, for compiled queries, the analyzer is used to check the input query string and ensure the correctness of the query string through lexical, grammatical and semantic analysis. If the query string is valid, the analyzer creates a query tree and passes it to the planner. If it is invalid, an error prompt is returned.

结合第一方面，在一种实施方式中，对于编译查询，通过JavaCC扩展JSqlParser，生成Java词法和语法分析器，所述JSqlParser为轻量级Java库，用于解析SQL为树结构，且支持广泛SQL命令。In combination with the first aspect, in one implementation, for compiled queries, JSqlParser is extended by JavaCC to generate a Java lexical and syntax analyzer, wherein the JSqlParser is a lightweight Java library for parsing SQL into a tree structure and supporting a wide range of SQL commands.

对于日志管理，在数据写入Memtable前，通过WAL将记录持久化到磁盘，且通过LogManager.java文件，实现管理日志系统的功能函数；For log management, before data is written to Memtable, records are persisted to disk through WAL, and the function of managing the log system is implemented through the LogManager.java file;

当移动数据库崩溃时通过已持久化的日志文件记录恢复数据至最后正常状态，所述日志文件中记录所有对象操作，且移动数据库重启时，移动数据库从最近的检查点后开始应用日志文件记录，恢复到一致状态。When the mobile database crashes, data is restored to the last normal state through the persisted log file records, in which all object operations are recorded. When the mobile database is restarted, the mobile database applies the log file records starting from the most recent checkpoint to restore to a consistent state.

第二方面，本申请实施例提供一种移动数据库的软件设计装置，所述移动数据库的软件设计装置包括：In a second aspect, an embodiment of the present application provides a software design device for a mobile database, the software design device for a mobile database comprising:

存储管理模块，其用于基于存储层次结构设计，将数据表通过转化为键值对的形式存放于LSM-Tree中，并在LSM-Tree中引入compaction机制，周期性的合并和优化操作，采用B-Tree作为索引，应用Bloom Filter快速查找算法，实现存储管理；The storage management module is used to convert data tables into key-value pairs based on the storage hierarchy design and store them in LSM-Tree. The compaction mechanism is introduced into LSM-Tree to perform periodic merging and optimization operations. B-Tree is used as the index and Bloom Filter fast search algorithm is applied to achieve storage management.

编译查询模块，其用于通过TOTEM的查询系统完成TOTEM-SQL命令的执行，且查询系统接收查询服务命令并在分析器中编译形成查询树，经重写和优化后进入执行器，由存储系统完成磁盘数据的获取，返回查询结果，实现编译查询；Compile query module, which is used to complete the execution of TOTEM-SQL commands through TOTEM's query system. The query system receives query service commands and compiles them in the analyzer to form a query tree. After rewriting and optimization, it enters the executor, and the storage system completes the acquisition of disk data and returns the query results to realize compiled query;

日志管理模块，其用于基于WAL方法以记录redo信息至日志文件，通过比较日志文件中的操作记录与实际执行的操作来执行redo操作，以在移动数据库崩溃时恢复移动数据库，实现日志管理。The log management module is used to record redo information to the log file based on the WAL method, and to perform redo operations by comparing the operation records in the log file with the actual operations performed, so as to restore the mobile database when the mobile database crashes and realize log management.

第三方面，本申请实施例提供一种移动数据库的软件设计设备，所述移动数据库的软件设计设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的移动数据库的软件设计程序，其中所述移动数据库的软件设计程序被所述处理器执行时，实现上述所述的移动数据库的软件设计方法的步骤。In a third aspect, an embodiment of the present application provides a software design device for a mobile database, wherein the software design device for a mobile database comprises a processor, a memory, and a software design program for a mobile database stored in the memory and executable by the processor, wherein when the software design program for a mobile database is executed by the processor, the steps of the software design method for a mobile database described above are implemented.

本申请实施例提供的技术方案带来的有益效果包括：The beneficial effects brought by the technical solution provided by the embodiments of the present application include:

(1)体积小巧，小型移动数据库占用的空间极小，非常适合存储空间有限的移动设备，比如手机、平板电脑以及物联网设备等，可以在几百KB甚至几MB的存储空间内运行；(1) Small size: small mobile databases take up very little space, making them very suitable for mobile devices with limited storage space, such as mobile phones, tablets, and IoT devices. They can run in a storage space of a few hundred KB or even a few MB.

(2)资源高效，轻量化设计，小型移动数据库在运行时对内存、CPU和电池消耗相对较低，能够确保在资源有限的移动环境中稳定高效地运行；嵌入式集成，可以直接嵌入到应用程序中，不需要单独的数据库服务器，简化了部署和管理，适合独立运行的应用场景；易于移植，由于体积小、依赖少，小型移动数据库很容易跨平台移植，能够在各种移动操作系统上运行；(2) Resource efficiency and lightweight design. Small mobile databases consume relatively little memory, CPU, and battery during operation, ensuring stable and efficient operation in resource-limited mobile environments. Embedded integration means that they can be directly embedded into applications without the need for a separate database server, simplifying deployment and management and making them suitable for standalone application scenarios. Easy to port. Due to their small size and low dependencies, small mobile databases are easily ported across platforms and can run on a variety of mobile operating systems.

(3)本地数据存储，即使在没有网络连接的情况下，移动设备也可以访问和操作存储在本地的小型数据库，保证数据可用性并减少对网络的依赖；(3) Local data storage: Even without a network connection, mobile devices can access and operate small databases stored locally, ensuring data availability and reducing dependence on the network;

(4)快速响应，由于数据直接存储在本地，查询和更新操作通常比远程调用云数据库更快，尤其适用于对实时性要求较高的应用；安全性高，在移动端直接进行处理和分析可节省网络传输开销，从而实现快速决策以避免延迟带来的经济损失或安全风险。(4) Fast response: Since data is directly stored locally, query and update operations are usually faster than remote calls to cloud databases, which is especially suitable for applications with high real-time requirements. High security: Direct processing and analysis on the mobile terminal can save network transmission overhead, thereby enabling quick decision-making to avoid economic losses or security risks caused by delays.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请移动数据库的软件设计方法的流程示意图；FIG1 is a flow chart of a software design method for a mobile database of the present application;

图2为移动数据库软件LSM-Tree层级图；Figure 2 is a LSM-Tree hierarchy diagram of the mobile database software;

图3为移动数据库软件记录偏移的B-Tree例图；FIG3 is a B-Tree example diagram of mobile database software record offset;

图4为移动数据库软件为TMDB存储时插入数据的时序图；FIG4 is a timing diagram of inserting data when the mobile database software is storing TMDB;

图5为TMDB存储时查询数据的时序图；FIG5 is a timing diagram of querying data when storing in TMDB;

图6为本申请移动数据库的软件设计装置的功能模块示意图；FIG6 is a schematic diagram of the functional modules of the software design device for the mobile database of the present application;

图7为移动数据库的软件设计设备的硬件结构示意图。FIG. 7 is a schematic diagram of the hardware structure of the software design device for the mobile database.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application more clear, the implementation methods of the present application will be further described in detail below with reference to the accompanying drawings.

第一方面，本申请实施例提供一种移动数据库的软件设计方法，通过一种结构化的方式来保存和组织数据，使得计算能够高效地读取和写入数据，并通过日志管理部分，保障系统崩溃能及时恢复，保证数据安全性。On the first aspect, the embodiments of the present application provide a software design method for a mobile database, which saves and organizes data in a structured manner so that the calculation can efficiently read and write data, and through the log management part, ensures that the system crash can be recovered in time to ensure data security.

一实施例中，参照图1，图1为本申请移动数据库的软件设计方法的流程示意图。如图1所示，移动数据库的软件设计方法包括：In one embodiment, referring to FIG1 , FIG1 is a flow chart of the software design method of the mobile database of the present application. As shown in FIG1 , the software design method of the mobile database includes:

S1：基于存储层次结构设计，将数据表通过转化为键值对的形式存放于LSM-Tree中，并在LSM-Tree中引入compaction(数据合并)机制，周期性的合并和优化操作，采用B-Tree(一种自平衡的树，能够保持数据有序)作为索引，应用Bloom Filter快速查找算法，实现存储管理；Tree-LSTM是一种深度学习模型，它是LSTM(Long Short-Term Memory，长短期记忆网络)的扩展，旨在更好地处理具有树状层级结构的数据。S1: Based on the storage hierarchy design, the data table is converted into key-value pairs and stored in the LSM-Tree. The compaction (data merging) mechanism is introduced in the LSM-Tree, and periodic merging and optimization operations are performed. B-Tree (a self-balancing tree that can keep data in order) is used as the index, and the Bloom Filter fast search algorithm is applied to achieve storage management. Tree-LSTM is a deep learning model, which is an extension of LSTM (Long Short-Term Memory) and is designed to better process data with a tree-like hierarchical structure.

对于移动数据库的设计包括存储管理、编译查询和日志管理。对于存储管理，存储软件的设计是计算机系统中负责数据存储和检索的关键组成部分，通过存储层次结构的设计，有效地处理和管理数据，数据表通过转化为键值对的形式存放于LSM-Tree中，在LSM-Tree中，引进compaction机制，定期的合并和优化操作，确保了数据的高效存储和访问，同时使用B-Tree作为索引，应用了Bloom Filter快速查找算法。The design of mobile database includes storage management, compiled query and log management. For storage management, the design of storage software is a key component responsible for data storage and retrieval in computer systems. Through the design of storage hierarchy, data is effectively processed and managed. Data tables are stored in LSM-Tree by converting them into key-value pairs. In LSM-Tree, compaction mechanism is introduced, and regular merging and optimization operations are performed to ensure efficient storage and access of data. At the same time, B-Tree is used as index and Bloom Filter fast search algorithm is applied.

进一步的，对于存储管理，在内存中，数据存储为元组列表，且元数据则分布在多个表中，所述表包括类表、副表、对象表、切换表和双向指针表；Further, for storage management, in memory, data is stored as a list of tuples, and metadata is distributed in multiple tables, including a class table, a sub-table, an object table, a switch table, and a bidirectional pointer table;

所述表由内存管理器(Mem Manager)进行管理，所述内存管理器用于提供数据插入功能以及监控MemTable(内存表)的大小，且当MemTable的大小超过阈值时，将MemTable转换为不可变的MemTable，并启动compaction过程，将不可变的MemTable转化为SSTable(一种数据存储结构)并存储于最底层(level-0)；The table is managed by a memory manager (Mem Manager), which is used to provide data insertion functions and monitor the size of MemTable (memory table). When the size of MemTable exceeds a threshold, the MemTable is converted to an immutable MemTable, and a compaction process is started to convert the immutable MemTable into an SSTable (a data storage structure) and store it at the bottom layer (level-0);

外存数据通过LSM-Tree结构进行组织，层级管理器负责追踪每个SSTable的层级和关键信息，如最大/最小键值和大小，层级管理器还用于执行compaction，包括选择适当的层级和SSTable进行合并，并实施多个SSTable的合并操作。The external storage data is organized through the LSM-Tree structure. The level manager is responsible for tracking the level and key information of each SSTable, such as the maximum/minimum key value and size. The level manager is also used to perform compaction, including selecting the appropriate level and SSTable for merging, and implementing the merge operation of multiple SSTables.

进一步的，LSM-Tree将数据存储在MemTable中，数据分为可变的mutable(可变的)MemTable和不可变的immutable(不可变的)MemTable，数据首先写入可变MemTable，填满后转变为不可变MemTable，而新的数据写入需等待空闲的可变MemTable；Furthermore, LSM-Tree stores data in MemTable, which is divided into mutable MemTable and immutable MemTable. Data is first written into mutable MemTable, and then converted into immutable MemTable after it is full. New data needs to wait for the free mutable MemTable to be written.

外存中的数据以SSTable的形式进行存储，数据按key(键)排序并分层，从最底层(level-0)到最高层(level-n)，且层级越高，SSTable数量越多，单个大小越大；The data in the external memory is stored in the form of SSTable. The data is sorted and layered by key, from the lowest level (level-0) to the highest level (level-n). The higher the level, the more SSTables there are and the larger the size of each one.

SSTable是一种高效的key-value型文件存储格式，进一步的，对于SSTable中指定key的操作步骤，具体包括：SSTable is an efficient key-value file storage format. Further, the operation steps for specifying a key in SSTable include:

读取Footer(定位文件的起始索引)以获取组件信息，检查SSTable的最小和最大key以确定key是否在范围内，若是，则通过Bloom Filter(布隆过滤器)验证key是否存在于SSTable中，之后根据index block(索引区块)中的信息定位到包含目标key的数据块，并在该数据块中搜索key。Read Footer (locate the starting index of the file) to obtain component information, check the minimum and maximum keys of SSTable to determine whether the key is in the range. If so, verify whether the key exists in SSTable through Bloom Filter, and then locate the data block containing the target key according to the information in the index block, and search for the key in the data block.

在基于LSM-Tree架构的计算机系统中，compaction是一种核心机制，compaction用于清除内存数据刷盘后产生的数据层交叠和重复key累积，且compaction还用于通过定期合并多层数据来清理旧版本数据。即compaction用于解决内存数据刷盘后产生的数据层交叠和重复key累积问题，这些问题会导致读取性能降低和存储空间浪费。Compaction通过定期合并多层数据并清理旧版本来提升读取效率和节省空间。In computer systems based on the LSM-Tree architecture, compaction is a core mechanism that is used to clear data layer overlaps and duplicate key accumulations generated after memory data is flushed to disk. Compaction is also used to clean up old versions of data by periodically merging multiple layers of data. That is, compaction is used to solve the problems of data layer overlaps and duplicate key accumulations generated after memory data is flushed to disk, which can lead to reduced reading performance and waste of storage space. Compaction improves reading efficiency and saves space by periodically merging multiple layers of data and cleaning up old versions.

S2：通过TOTEM(图腾专利系统，即一种专利查询系统)的查询系统完成TOTEM-SQL(Structured Query Language，结构化查询语言)命令的执行，且查询系统接收查询服务命令并在分析器中编译形成查询树，经重写和优化后进入执行器，由存储系统完成磁盘数据的获取，返回查询结果，实现编译查询；S2: The TOTEM-SQL (Structured Query Language) command is executed through the query system of TOTEM (Totem Patent System, a patent query system), and the query system receives the query service command and compiles it in the analyzer to form a query tree, which enters the executor after rewriting and optimization, and the storage system completes the acquisition of disk data and returns the query results to realize the compiled query;

进一步的，对于编译查询，分析器用于检查输入的查询字符串，通过词法、语法和语义分析确保查询字符串的正确性，若查询字符串有效，则分析器创建一查询树并传递给计划器，若无效，则返回错误提示。Furthermore, for compiled queries, the analyzer is used to check the input query string and ensure the correctness of the query string through lexical, grammatical and semantic analysis. If the query string is valid, the analyzer creates a query tree and passes it to the planner. If it is invalid, an error prompt is returned.

对于编译查询，通过JavaCC(一种能生成语法和词法分析器的生成程序)扩展JSqlParser(一种SQL解析器)，生成Java(门面向对象的编程语言)词法和语法分析器，简化创建并支持高级特性和错误报告；所述JSqlParser为轻量级Java库，用于解析SQL为树结构，且支持广泛SQL命令，无需依赖，适用于多种数据库工具开发。For compiled queries, JSqlParser (a SQL parser) is extended through JavaCC (a generator that can generate syntax and lexical analyzers) to generate Java (an object-oriented programming language) lexical and syntax analyzers, simplifying creation and supporting advanced features and error reporting; the JSqlParser is a lightweight Java library used to parse SQL into a tree structure, and supports a wide range of SQL commands without dependencies, and is suitable for development of multiple database tools.

S3：基于WAL方法以记录redo信息至日志文件，通过比较日志文件中的操作记录与实际执行的操作来执行redo操作，以在移动数据库崩溃时恢复移动数据库，实现日志管理。S3: Based on the WAL method, redo information is recorded in the log file, and redo operations are performed by comparing the operation records in the log file with the actual operations performed, so as to restore the mobile database when the mobile database crashes and realize log management.

即使用WAL(预写日志)方法通过记录redo(重写)信息到日志文件来确保数据的一致性和恢复能力，WAL允许程序通过比较日志中的操作记录与实际执行的操作来执行redo操作，从而在系统崩溃时快速恢复数据库。由于系统采用了非重写技术，即使没有undo(撤销)操作，也能保证数据库的一致性。That is, the WAL (write-ahead log) method is used to ensure data consistency and recovery capabilities by recording redo (rewrite) information in the log file. WAL allows the program to perform redo operations by comparing the operation records in the log with the actual operations performed, thereby quickly recovering the database when the system crashes. Because the system uses non-rewrite technology, the consistency of the database can be guaranteed even without undo (undo) operations.

进一步的，对于日志管理，在数据写入Memtable前，通过WAL将记录持久化到磁盘，且通过LogManager.java文件(java的操作日志管理文件)，实现管理日志系统的功能函数，即实现了管理日志系统一系列的功能函数，初始化新日志文件、初始化新索引B树文件、删除旧日志文件、持久化日志到磁盘、加载并显示日志记录；Furthermore, for log management, before data is written to Memtable, the records are persisted to disk through WAL, and the functional functions of managing the log system are implemented through the LogManager.java file (java operation log management file), that is, a series of functional functions of the log management system are implemented, including initializing new log files, initializing new index B-tree files, deleting old log files, persisting logs to disk, and loading and displaying log records;

当移动数据库崩溃时通过已持久化的日志文件记录恢复数据至最后正常状态，所述日志文件中记录所有对象操作，确保了即使未持久化的更改(脏页)丢失，也能通过日志安全恢复，且移动数据库重启时，移动数据库从最近的检查点后开始应用日志文件记录，恢复到一致状态。When the mobile database crashes, the data is restored to the last normal state through the persisted log file records. The log file records all object operations, ensuring that even if non-persistent changes (dirty pages) are lost, they can be safely restored through the log. When the mobile database is restarted, the mobile database applies the log file records from the most recent checkpoint to restore to a consistent state.

需要说明的是，TMDB(Totem Mobile Database)是一款为移动端设计的支持对象代理模型的数据库。TMDB的体系结构，包含了存储管理、编译查询、日志管理等关键模块。另外Transaction模块负责处理用户的读写请求，并将这些请求分派至相应的模块执行，同时负责协调各模块间的工作并处理业务逻辑。It should be noted that TMDB (Totem Mobile Database) is a database designed for mobile terminals that supports the object proxy model. The architecture of TMDB includes key modules such as storage management, compiled query, and log management. In addition, the Transaction module is responsible for processing user read and write requests, dispatching these requests to the corresponding modules for execution, and coordinating the work between modules and processing business logic.

存储管理是移动数据库(TMDB)中负责数据存储和检索的关键组成部分。它通过提供一种结构化的方式来保存和组织数据，使得计算机能够高效地读取和写入数据。Storage management is a key component responsible for data storage and retrieval in a mobile database (TMDB). It saves and organizes data in a structured way so that computers can read and write data efficiently.

在存储管理中，存储层次结构包括CPU、主存、缓存、辅存，存储层次结构的设计目的是通过层次化的存储器来平衡访问速度、成本和容量。快速但昂贵的存储器(如缓存)用于存储最常用的数据，而较慢但容量较大且成本较低的存储器(如辅存)用于长期存储和扩展。这种层次化结构使得计算机系统能够更有效地处理和管理数据，提高性能和响应速度。In storage management, the storage hierarchy includes CPU, main memory, cache, and auxiliary memory. The storage hierarchy is designed to balance access speed, cost, and capacity through hierarchical memory. Fast but expensive memory (such as cache) is used to store the most frequently used data, while slower but larger and less expensive memory (such as auxiliary memory) is used for long-term storage and expansion. This hierarchical structure enables computer systems to process and manage data more efficiently, improving performance and response speed.

采用日志先行(WAL)的策略，确保所有操作都记录在日志中，便于数据恢复与异常处理。在LSM-Tree中，数据初始写入内存，然后通过一系列合并(compaction)操作，逐步从较小的磁盘文件C1合并至较大的文件CL。与传统数据库的原地更新(in-place update)不同，LSM-Tree实行追加更新(out-place update)，即所有变更先写入内存，再顺序刷新至磁盘，以此优化数据存储和访问效率。The log-first (WAL) strategy is adopted to ensure that all operations are recorded in the log, which is convenient for data recovery and exception handling. In LSM-Tree, data is initially written to the memory, and then gradually merged from the smaller disk file C1 to the larger file CL through a series of merge (compaction) operations. Unlike the in-place update of traditional databases, LSM-Tree implements out-place update, that is, all changes are first written to the memory and then refreshed to the disk in sequence, so as to optimize data storage and access efficiency.

SSTable是TMDB中使用的高效key-value存储格式，SSTable的Footer记录了内部组件的位置信息，用于快速访问。数据按key排序，存储在4KB的data blocks中，并通过index block索引以快速定位，包含最key和偏移。为加速查找，SSTable还保存最小key和Bloom Filter。查询指定key的步骤如下：SSTable is an efficient key-value storage format used in TMDB. The Footer of SSTable records the location information of internal components for fast access. Data is sorted by key, stored in 4KB data blocks, and indexed by index block for fast location, including the minimum key and offset. To speed up the search, SSTable also saves the minimum key and Bloom Filter. The steps to query a specified key are as follows:

本访问Footer获取组件信息；This accesses Footer to obtain component information;

检查最小和最大key确定查询范围；Check the minimum and maximum keys to determine the query range;

使用Bloom Filter快速判断key是否存在；Use Bloom Filter to quickly determine whether a key exists;

通过index block定位可能的数据块；Locate possible data blocks through index blocks;

遍历该数据块以查找目标key。Traverse the data block to find the target key.

LSM-Tree为架构的系统中引入了compaction机制，内存中的数据到达上限后不断刷盘，数据范围互相交叠的层越来越多，相同key的数据不断积累，引起读性能下降和空间膨胀。因此，compaction机制被引入，通过周期性的后台任务不断的回收旧版本数据和将多层合并为一层的方式来优化读性能和空间占用问题。The compaction mechanism is introduced in the system with LSM-Tree architecture. When the data in the memory reaches the upper limit, it is constantly flushed to the disk. The number of layers with overlapping data ranges increases, and the data with the same key continues to accumulate, causing the read performance to degrade and the space to expand. Therefore, the compaction mechanism is introduced to optimize the read performance and space usage issues by periodically reclaiming old version data and merging multiple layers into one layer through periodic background tasks.

在TMDB中，compaction过程选择level和SSTable的步骤如下：In TMDB, the steps for selecting levels and SSTables in the compaction process are as follows:

计算每个level的评分，选择评分最高的level-i进行compaction；Calculate the score of each level and select the level-i with the highest score for compaction;

如果i＝0，将level-0的所有SSTable compact成新的SSTable并移至level-1，同时删除旧的SSTable；否则，初始化files集合，选择level-i中文件名后缀最小的SSTablex，将其加入files；If i=0, all SSTables in level-0 are compacted into new SSTables and moved to level-1, and the old SSTables are deleted; otherwise, the files set is initialized, and the SSTablex with the smallest file name suffix in level-i is selected and added to the files;

找出level-i中与x有重叠的所有SSTable(例如y与z)，加入files；Find all SSTables in level-i that overlap with x (e.g. y and z) and add them to files;

确定level-(i+1)中与files有重叠的SSTable(例如a、b)，加入files_2；Determine the SSTables (such as a and b) in level-(i+1) that overlap with files and add them to files_2;

遍历level-i的SSTable，寻找可以加入files且不会导致files_2新增SSTable的SSTable t，将其加入files。Traverse the SSTables of level-i, find SSTable t that can be added to files and will not cause a new SSTable to be added to files_2, and add it to files.

最终，files和files_2中的所有SSTable构成此次compaction的合并集合。Finally, all SSTables in files and files_2 constitute the merged set of this compaction.

在移动数据库(TMDB)中，使用B-Tree作为索引。In the mobile database (TMDB), B-Tree is used as the index.

B-Tree是一种自平衡的树，能够保持数据有序。这种数据结构能够让查找数据、顺序访问、插入数据及删除的动作，都在对数时间内完成，为系统大块数据的读写操作做了优化。B-Tree通过减少定位记录时所经历的中间过程，从而加快存取速度。B-Tree is a self-balancing tree that keeps data in order. This data structure allows data search, sequential access, data insertion and deletion to be completed in logarithmic time, which optimizes the system's large-scale data read and write operations. B-Tree speeds up access by reducing the intermediate process of locating records.

在移动数据库(TMDB)中，使用Bloom Filter算法，Bloom Filter是一种空间效率高的概率型数据结构，用于快速判断一个元素是否属于某个集合。能快速判断数据是否存在，从而避免昂贵的磁盘I/O操作或减少不必要的计算。Bloom Filter的基本原理是使用一个位数组(bit array)和多个哈希函数。位数组的每个位初始化为0，而哈希函数用于将元素映射到位数组的特定位置。当一个元素被加入集合时，通过哈希函数计算出k个哈希值，并将位数组中对应的k个位设置为1。查询元素时，同样使用哈希函数计算k个位，如果所有位都是1，则认为元素可能存在于集合中；如果任一位为0，则元素一定不在集合中。In the mobile database (TMDB), the Bloom Filter algorithm is used. Bloom Filter is a probabilistic data structure with high space efficiency, which is used to quickly determine whether an element belongs to a set. It can quickly determine whether the data exists, thereby avoiding expensive disk I/O operations or reducing unnecessary calculations. The basic principle of Bloom Filter is to use a bit array and multiple hash functions. Each bit of the bit array is initialized to 0, and the hash function is used to map the element to a specific position in the bit array. When an element is added to the set, k hash values are calculated by the hash function, and the corresponding k bits in the bit array are set to 1. When querying an element, the hash function is also used to calculate k bits. If all bits are 1, it is considered that the element may exist in the set; if any bit is 0, the element is definitely not in the set.

在移动数据库(TMDB)中，设计了两种与LSM-Tree结合的缓存模式：In the mobile database (TMDB), two cache modes combined with LSM-Tree are designed:

MetaCache：缓存SSTable的元数据块，包含键范围和偏移量等信息，这种设计允许快速定位所需SSTable，减少磁盘IO，提升查询效率和命中率；MetaCache: caches the metadata blocks of SSTable, including key range and offset information. This design allows quick location of the required SSTable, reduces disk IO, and improves query efficiency and hit rate.

DataCache：基于FIFO策略的键值对缓存，使用OrderedMap记录数据并保持插入顺序和映射关系，新插入的数据项被置顶，保证最近访问的数据优先保留，而被访问的数据项通过重新插入保持在缓存顶部。这些缓存策略显著提升了TMDB的查询性能，达到了100倍的性能增益。DataCache: A key-value cache based on the FIFO strategy. It uses OrderedMap to record data and maintain the insertion order and mapping relationship. Newly inserted data items are placed at the top to ensure that recently accessed data is retained first, while accessed data items are kept at the top of the cache by reinserting. These cache strategies significantly improve the query performance of TMDB, achieving a 100-fold performance gain.

需要说明的是，编译查询在移动数据库(TMDB)中将查询语句转换为数据库内核能够高效执行的指令序列，同时确保查询语句的安全性和有效性，并通过优化提升查询性能。It should be noted that compiled queries convert query statements in the mobile database (TMDB) into instruction sequences that can be efficiently executed by the database kernel, while ensuring the security and effectiveness of the query statements and improving query performance through optimization.

在编译查询中，首先词法和语法分析过程将查询字符串转化为内部语法结构，并进行语法检查以确保其符合TOTEM-SQL规范，最终生成一个反映查询内容的分析树。词法分析器将会识别信息，并转换为相应的标号传递给语法分析器，语法分析器接收到了查询符号串后，会将其识别为一条SELECT查询命令并执行相应的动作，生成分析树结构。In compiling a query, the lexical and grammatical analysis process first converts the query string into an internal grammatical structure, and performs a grammatical check to ensure that it complies with the TOTEM-SQL specification, and finally generates a parse tree that reflects the query content. The lexical analyzer will recognize the information and convert it into corresponding labels and pass it to the grammatical analyzer. After receiving the query symbol string, the grammatical analyzer will recognize it as a SELECT query command and perform corresponding actions to generate a parse tree structure.

其次是语义分析部分，或者称为转换阶段，它接受语法分析器传来的分析树，并检查该命令是否有不符合系统规定的地方，如果没有则最终转换为一个查询树返回。The second is the semantic analysis part, or the conversion stage, which accepts the analysis tree from the syntax analyzer and checks whether the command conforms to the system regulations. If not, it is finally converted into a query tree and returned.

语义分析(转换器)以分析器传递过来的分析树作为输入，然后递归地处理它。一般来说各种命令最后都会转换为一个查询(Query)节点，这个节点将是新数据结构的最顶端节点。形成查询节点后，转换器会检查FROM子句里面的类名是否存在。在类名全部被识别后，转换器检查查询所用的属性名是否包含在查询给出的类里。The semantic analysis (converter) takes the parse tree passed by the analyzer as input and processes it recursively. Generally speaking, various commands will eventually be converted into a query node, which will be the top node of the new data structure. After forming the query node, the converter will check whether the class name in the FROM clause exists. After all the class names are recognized, the converter checks whether the attribute names used in the query are contained in the class given by the query.

所有的查询命令都会经过以上过程。有一些复杂的查询命令(要产生计划的命令)会调用转换器部分，主要包括SELECT、INSERT、DELETE和UPDATE命令。它们的查询树会经过计划器转换为查询计划，然后由执行器根据计划来进行扫描工作。All query commands will go through the above process. Some complex query commands (commands that need to generate plans) will call the converter part, mainly including SELECT, INSERT, DELETE and UPDATE commands. Their query trees will be converted into query plans by the planner, and then the executor will perform scanning work according to the plan.

其它许多较简单的非计划命令(如创建或删除基本类的命令)不需要进行类的扫描工作，只要在系统的模式数据上进行一些操作，因此在转换器部分只是进行了简单的逻辑错误的检查，而没有做任何实际的转换工作。它们的处理直接调用相应的模块就可以完成。Many other simpler non-planned commands (such as commands to create or delete basic classes) do not require class scanning, but only require some operations on the system's schema data. Therefore, the converter only performs a simple logical error check without any actual conversion. Their processing can be completed by directly calling the corresponding module.

创建类操作包括创建严格代理类和非严格代理类两部分。比较特殊的是创建严格代理类的命令。该命令首先要求创建代理类的模式信息，这部分相当于一个非计划命令；在此之后还要根据声明的代理规则对源类进行一次查询创建代理对象，并建立对象与代理对象的联系，这部分又要和计划命令类似地调用计划器和执行器。为了程序接口的统一，在语义分析器中所有命令最后都是按统一的查询树形式输出的。The class creation operation includes creating a strict proxy class and a non-strict proxy class. The command to create a strict proxy class is quite special. The command first requires the creation of the proxy class schema information, which is equivalent to a non-plan command; after that, the source class must be queried to create a proxy object according to the declared proxy rules, and the connection between the object and the proxy object must be established. This part also calls the planner and executor similar to the plan command. In order to unify the program interface, all commands in the semantic analyzer are finally output in the form of a unified query tree.

在如上基础上创建了各种查询逻辑设计，创建源类逻辑设计、删除对象逻辑设计类、查找逻辑设计、插入逻辑设计、删除类逻辑设计、创建代理类逻辑设计、修改属性逻辑设计、跨类查找逻辑设计。Based on the above, various query logic designs were created, including source class logic design, object deletion logic design class, search logic design, insertion logic design, deletion class logic design, proxy class logic design, attribute modification logic design, and cross-class search logic design.

具体实现中编译部分拓展了基于javacc的JSqlParser[5]实现。Java CompilerCompiler(JavaCC)是一种用于生成词法分析器和语法分析器的工具。它使用一种类似于BNF的语法，输入语言定义，并生成Java代码，用于执行词法分析和语法分析。JavaCC使得生成语言分析器变得简单，支持各种语言特性，如自动生成状态机，语法制导翻译，词法优先级和语法优先级等。另外，JavaCC还提供了诸如词法规则高亮显示，语法错误报告等高级功能。由于它生成的是Java代码，所以JavaCC生成的语言分析器具有跨平台性，高效性和可维护性。In the specific implementation, the compilation part extends the JSqlParser[5] implementation based on javacc. Java CompilerCompiler (JavaCC) is a tool for generating lexical analyzers and syntax analyzers. It uses a grammar similar to BNF, inputs the language definition, and generates Java code for performing lexical analysis and syntax analysis. JavaCC makes it easy to generate language analyzers and supports various language features such as automatic generation of state machines, syntax-directed translation, lexical priority and syntax priority. In addition, JavaCC also provides advanced features such as lexical rule highlighting and syntax error reporting. Since it generates Java code, the language analyzer generated by JavaCC is cross-platform, efficient and maintainable.

JSqlParser是一个Java库，可以解析SQL语句并将其转换为程序易于处理的表示形式。它可以从字符串或文件中解析SQL语句，并支持各种SQL命令和子句，包括SELECT、INSERT、UPDATE、DELETE和CREATE语句。解析后的SQL语句使用树状结构表示，程序可以轻松遍历和修改它。JSqlParser旨在轻量级且易于使用，不需要任何外部依赖。它可用于构建各种应用程序，包括SQL查询生成器、数据库迁移工具和SQL查询分析和优化工具，这样我们通过具体代码实现各类流程。JSqlParser is a Java library that can parse SQL statements and convert them into a representation that is easy for the program to handle. It can parse SQL statements from strings or files and supports a variety of SQL commands and clauses, including SELECT, INSERT, UPDATE, DELETE, and CREATE statements. The parsed SQL statements are represented using a tree structure that can be easily traversed and modified by the program. JSqlParser is designed to be lightweight and easy to use, without any external dependencies. It can be used to build a variety of applications, including SQL query builders, database migration tools, and SQL query analysis and optimization tools, so that we implement various processes through specific code.

需要说明的是，日志管理在移动数据库(TMDB)中WAL(Write-Ahead Logging)机制是为了确保在发生崩溃时能够恢复到一致的状态。崩溃恢复过程利用WAL日志来重新执行(redo)在崩溃前未完成的事务。It should be noted that the WAL (Write-Ahead Logging) mechanism of log management in the mobile database (TMDB) is to ensure that a consistent state can be restored in the event of a crash. The crash recovery process uses the WAL log to re-execute (redo) transactions that were not completed before the crash.

首先日志要完成对新日志文件与新索引b树文件的初始化、删除日志目录下的旧文件、给定参数key、op、value将日志持久化到磁盘、加载并显示目前日志文件中的所有日志记录，这些功能函数均在文件LogManager.java中实现。First, the log must complete the initialization of the new log file and the new index b-tree file, delete the old files in the log directory, persist the log to the disk given the parameters key, op, and value, and load and display all log records in the current log file. These functions are all implemented in the file LogManager.java.

然后开始写日志，在数据写入Memtable之前，根据WAL，需要将记录写入日志持久化到磁盘。Then start writing logs. Before data is written to Memtable, records need to be written to logs and persisted to disk according to WAL.

创建LogTableItem对象，使用LogTableItem(int logid,Byte op,String key,String value)构造函数，其中logid设置为当前currentId，其他参数来自WriteLog方法的输入。设置LogTableItem的offset为currentOffset，并使currentId自增。Create a LogTableItem object using the LogTableItem(int logid, Byte op, String key, String value) constructor, where logid is set to the current currentId and the other parameters come from the input of the WriteLog method. Set the offset of the LogTableItem to currentOffset and increment the currentId.

调用writeLogItemToSSTable(LogTableItem log)方法，将日志记录写入文件。Call the writeLogItemToSSTable(LogTableItem log) method to write the log records to the file.

根据LogTableItem的value属性，检查是否已存在相应的日志记录。若存在，将新记录的logid添加到对应的列表中；若不存在，创建新列表并添加logid。According to the value attribute of LogTableItem, check whether the corresponding log record already exists. If it does, add the logid of the new record to the corresponding list; if it does not exist, create a new list and add the logid.

使用BTree_indexer类的insert()方法，将包含LogTableItem偏移量的节点插入B树。Insert the node containing the offset of the LogTableItem into the B-tree using the insert() method of the BTree_indexer class.

当日志文件达到预设的大小阈值时，调用BTree_indexer类的write()方法，将B树索引持久化到磁盘，以防数据丢失。When the log file reaches the preset size threshold, the write() method of the BTree_indexer class is called to persist the B-tree index to disk to prevent data loss.

日志管理最重要的一个机制就是崩溃恢复机制，检查点在WAL日志恢复中用于减少不必要的日志读取，并允许删除过时日志。所有在检查点之前的更改已写入磁盘，因此在系统恢复时，只需从检查点后开始恢复操作，之前的日志可以安全删除。日志记录了数据库中哪个对象个做了什么操作，当系统崩溃时，虽然脏页数据没有持久化，但是日志记录已经持久化，接着数据库重启后，可以根据日志中的内容进行redo，将所有数据恢复到最新的状态。结合之前对检查点的描述，可以知道需要从检查点之后(即checkpoint+1)位置的日志块开始读取，并根据读取内容对数据库进行恢复。One of the most important mechanisms of log management is the crash recovery mechanism. Checkpoints are used in WAL log recovery to reduce unnecessary log reading and allow the deletion of obsolete logs. All changes before the checkpoint have been written to disk, so when the system is restored, it is only necessary to start the recovery operation from after the checkpoint, and the previous logs can be safely deleted. The log records which object in the database has done what operation. When the system crashes, although the dirty page data is not persisted, the log record has been persisted. Then, after the database is restarted, redo can be performed based on the content in the log to restore all data to the latest state. Combined with the previous description of checkpoints, it can be known that it is necessary to start reading from the log block after the checkpoint (i.e. checkpoint+1) and recover the database based on the read content.

如图2所示，是本移动数据库(TMDB)软件LSM-Tree采用一种追加更新(out-placeupdate)的方式，具体过程：在内存中，LSM-Tree的数据保存在MemTable中，MemTable又分为immutable MemTable和mutable MemTable，前者不可修改，后者可修改。数据写入时，会写入mutable MemTable中，当mutable MemTable写满时会转化为immutable MemTable，再写入数据需要等待空闲的mutable MemTable。每个immutable MemTable都会等待合适的时机，进行compaction操作合并到外存中，随后转化为mutable MemTable再次变得可写入。As shown in Figure 2, the mobile database (TMDB) software LSM-Tree adopts an out-place update method. The specific process is as follows: In memory, LSM-Tree data is stored in MemTable, which is divided into immutable MemTable and mutable MemTable. The former cannot be modified, while the latter can be modified. When data is written, it is written to the mutable MemTable. When the mutable MemTable is full, it is converted to the immutable MemTable. To write data again, you need to wait for an idle mutable MemTable. Each immutable MemTable will wait for the right time to perform a compaction operation and merge it into the external memory, and then it will be converted to a mutable MemTable and become writable again.

在外存中，LSM-Tree的数据保存在SSTable(Sorted-String Table)中，每个SSTable中的数据按照key排序。这些SSTable分层存储，从level-0到level-n，对应SSTable的数量越来越多，大小越来越大。从内存中的immutable MemTable进行compaction时首先转换为level-0中的SSTable，当level-i的大小达到上限时，会进行compaction到level-(i+1)，经过一系列compaction最终到达level-n。In external memory, LSM-Tree data is stored in SSTable (Sorted-String Table), and the data in each SSTable is sorted by key. These SSTables are stored in layers, from level-0 to level-n, and the number and size of corresponding SSTables are increasing. When compacting from the immutable MemTable in memory, it is first converted to the SSTable in level-0. When the size of level-i reaches the upper limit, it will be compacted to level-(i+1), and finally reach level-n after a series of compactions.

如图3所示，是,移动数据库(TMDB)软件记录偏移的B-Tree例图：在将B-Tree持久化保存到SSTable中的index block时，需要解决孩子结点如何保存的问题。在内存中的B-Tree，可以很方便地通过指针记录其孩子节点，但是在持久化到磁盘时，指针记录的是在内存中的地址，而需要的是孩子结点的磁盘上的物理地址，因此，先根遍历B-Tree，即先保存其孩子结点，得到每个孩子结点在SSTable中的物理偏移之后，再保存父节点。如图3所示的B-Tree结构，先保存A、B、C、D、E结点，同时得到这些节点的偏移量，再记录F结点，同时将第三行的指针替换为实际的物理偏移量。As shown in Figure 3, this is an example of a B-Tree for recording offsets in the mobile database (TMDB) software: When persisting the B-Tree to the index block in the SSTable, it is necessary to solve the problem of how to save the child nodes. The B-Tree in memory can easily record its child nodes through pointers, but when persisting to disk, the pointer records the address in memory, and what is needed is the physical address of the child node on the disk. Therefore, the B-Tree is traversed from the root first, that is, its child nodes are saved first, and the physical offset of each child node in the SSTable is obtained before saving the parent node. The B-Tree structure shown in Figure 3 first saves the A, B, C, D, and E nodes, and obtains the offsets of these nodes, then records the F node, and replaces the pointer in the third row with the actual physical offset.

如图4所示，是移动数据库(TMDB)软件存储时插入数据的时序图：首先Transaction接受请求，然后先写日志。写日志完成后，调用Mem Manager提供的add()接口，往MemTable中添加数据。添加数据完成后需要判断MemTable是否达到大小上限，如果达到上限则需要调用Mem Manager提供的writeInternal()接口进行compaction将MemTable转化成SSTable，并通知Level Manager将新SSTable加入level-0。此后，需要验证是否触发级联的compaction，Level Manager需要调用compact()接口计算各level的得分，检查是否满足compaction的条件。如果满足，计算需要进行compaction的level和SSTable，然后调用Level Manager提供的writeToDisk()接口进行compaction，同时记录日志。As shown in Figure 4, it is a timing diagram of inserting data when the mobile database (TMDB) software is stored: first, the Transaction accepts the request, and then writes the log. After the log is written, the add() interface provided by the Mem Manager is called to add data to the MemTable. After the data is added, it is necessary to determine whether the MemTable has reached the upper limit. If it has reached the upper limit, it is necessary to call the writeInternal() interface provided by the Mem Manager to perform compaction to convert the MemTable into an SSTable, and notify the Level Manager to add the new SSTable to level-0. After that, it is necessary to verify whether the cascade compaction is triggered. The Level Manager needs to call the compact() interface to calculate the score of each level and check whether the conditions for compaction are met. If so, calculate the level and SSTable that need to be compacted, and then call the writeToDisk() interface provided by the Level Manager to perform compaction and record the log at the same time.

如图5所示，是,移动数据库(TMDB)软件存储时查询数据的时序图；相比于插入数据可能诱发的递归compaction，查询流程则简单很多。首先Transaction接受请求，到MemManager管理的MemTable中遍历搜索目标key。如果没有找到，则交由Cache Manager，在缓存中查找，并更新缓存优先级。如果没有找到，则交由Level Manager，在磁盘上中从level-0到level-6依次遍历每一层的所有SSTable搜索目标key，具体的搜索单个SSTable的流程在2.4节中已经介绍。在磁盘上的查找结果需要返回给Cache Manager用于更新缓存，淘汰缓存中长期未使用的数据。As shown in Figure 5, this is a timing diagram of querying data when the mobile database (TMDB) software is stored; compared to the recursive compaction that may be induced by inserting data, the query process is much simpler. First, the Transaction accepts the request and traverses the MemTable managed by the MemManager to search for the target key. If it is not found, it is handed over to the Cache Manager to search in the cache and update the cache priority. If it is not found, it is handed over to the Level Manager to traverse all SSTables of each layer from level-0 to level-6 on the disk to search for the target key. The specific process of searching a single SSTable has been introduced in Section 2.4. The search results on the disk need to be returned to the Cache Manager to update the cache and eliminate data that has not been used for a long time in the cache.

如下表1和表2所示，是移动数据库(TMDB)软件处理一个查询时TOTEM里使用的数据结构，采用一个简单的例子演示在每个阶段数据结构所做的改变。该查询最开始是由用户输入的如上所示的一个字符串。词法分析器将会识别出表1所示的信息，并转换为相应的标号传递给语法分析器。语法分析器接收到了例中的查询符号串后，会将其识别为一条SELECT查询命令并执行相应的动作，生成表2的分析树结构。As shown in Tables 1 and 2 below, the data structures used in TOTEM when the mobile database (TMDB) software processes a query are shown. A simple example is used to demonstrate the changes made to the data structure at each stage. The query is initially entered by the user as a string as shown above. The lexical analyzer will recognize the information shown in Table 1 and convert it into corresponding labels and pass it to the syntax analyzer. After receiving the query symbol string in the example, the syntax analyzer will recognize it as a SELECT query command and perform corresponding actions to generate the analysis tree structure in Table 2.

表1Table 1

表2Table 2

对于移动数据库(TMDB)软件的检查点原理，查点的作用主要是在基于WAL日志恢复时避免读取所有的日志记录，同时允许删除一些不必要的日志内容。在检查点位置之前的所有日志中记录过的数据变化都已经全部从缓存区中刷入磁盘，反映到了磁盘文件中，从而在宕机恢复时只需从检查点标记之后读取日志并进行恢复，检查点标记之前的日志已经没有意义，可以删除这些日志磁盘文。For the checkpoint principle of the mobile database (TMDB) software, the main function of the checkpoint is to avoid reading all log records when recovering based on the WAL log, while allowing some unnecessary log content to be deleted. All data changes recorded in all logs before the checkpoint position have been flushed from the cache to the disk and reflected in the disk file. Therefore, when recovering from a downtime, you only need to read the log after the checkpoint mark and recover. The logs before the checkpoint mark are meaningless and can be deleted.

本申请实施例的移动数据库的软件设计方法，是针对移动端开发的支持对象代理模型的数据库，包括存储管理、编译查询、日志管理，存储管理TMDB是键值数据库，同时支持对象代理模型，在系统中负责数据存储和检索的关键组成部分。它通过提供一种结构化的方式来保存和组织数据，使得计算机能够高效地读取和写入数据。编译查询部分是数据库通过查询分析器对输入的查询字符串进行词法，语法和语义的检查，并且为了程序接口的统一，在语义分析器中所有命令最后都是按统一的查询树形式输出的，完成数据检索、计算和返回结果。日志管理部分采用WAL(Write Ahead Log)预写日志方法，在描述这些变化的日志记录刷写到存储器之后，保证数据操作的原子性和持久性，并通过最近的检查点开始回放日志，实现数据恢复。本申请具备轻量化、资源占用小、适合移动设备存储和处理数据的特点，专门为小型移动端应用程序设计的数据库解决方案。The software design method of the mobile database of the embodiment of the present application is a database supporting the object proxy model developed for the mobile terminal, including storage management, compilation query, and log management. The storage management TMDB is a key-value database that supports the object proxy model and is responsible for the key component of data storage and retrieval in the system. It saves and organizes data in a structured way so that the computer can read and write data efficiently. The compilation query part is that the database checks the lexical, grammatical and semantics of the input query string through the query analyzer, and for the unification of the program interface, all commands in the semantic analyzer are finally output in the form of a unified query tree to complete data retrieval, calculation and return results. The log management part adopts the WAL (Write Ahead Log) pre-write log method. After the log records describing these changes are flushed to the memory, the atomicity and persistence of data operations are guaranteed, and the log is replayed from the latest checkpoint to achieve data recovery. The present application has the characteristics of lightweight, low resource occupation, and suitable for mobile device storage and data processing. It is a database solution designed specifically for small mobile terminal applications.

第二方面，本申请实施例还提供一种移动数据库的软件设计装置。In a second aspect, an embodiment of the present application also provides a software design device for a mobile database.

一实施例中，参照图6，图6为本申请移动数据库的软件设计装置的功能模块示意图。如图6所示，移动数据库的软件设计装置包括存储管理模块、编译查询模块和日志管理模块。In one embodiment, referring to Fig. 6, Fig. 6 is a functional module diagram of the software design device for mobile database of the present application. As shown in Fig. 6, the software design device for mobile database includes a storage management module, a compilation query module and a log management module.

存储管理模块用于基于存储层次结构设计，将数据表通过转化为键值对的形式存放于LSM-Tree中，并在LSM-Tree中引入compaction机制，周期性的合并和优化操作，采用B-Tree作为索引，应用Bloom Filter快速查找算法，实现存储管理；编译查询模块用于通过TOTEM的查询系统完成TOTEM-SQL命令的执行，且查询系统接收查询服务命令并在分析器中编译形成查询树，经重写和优化后进入执行器，由存储系统完成磁盘数据的获取，返回查询结果，实现编译查询；日志管理模块用于基于WAL方法以记录redo信息至日志文件，通过比较日志文件中的操作记录与实际执行的操作来执行redo操作，以在移动数据库崩溃时恢复移动数据库，实现日志管理。The storage management module is used to convert the data table into key-value pairs based on the storage hierarchy design and store it in the LSM-Tree. The compaction mechanism is introduced in the LSM-Tree, and the operations are periodically merged and optimized. B-Tree is used as the index and the Bloom Filter fast search algorithm is applied to realize storage management. The compilation query module is used to complete the execution of TOTEM-SQL commands through TOTEM's query system. The query system receives the query service command and compiles it into a query tree in the analyzer. After rewriting and optimization, it enters the executor, and the storage system completes the acquisition of disk data and returns the query results to realize the compilation query. The log management module is used to record redo information to the log file based on the WAL method, and executes the redo operation by comparing the operation record in the log file with the actual operation, so as to restore the mobile database when the mobile database crashes and realize log management.

其中，上述移动数据库的软件设计装置中各个模块的功能实现与上述移动数据库的软件设计方法实施例中各步骤相对应，其功能和实现过程在此处不再一一赘述。Among them, the functional implementation of each module in the above-mentioned mobile database software design device corresponds to the various steps in the above-mentioned mobile database software design method embodiment, and its functions and implementation processes are no longer repeated here.

第三方面，本申请实施例提供一种移动数据库的软件设计设备，移动数据库的软件设计设备可以是个人计算机(personal computer，PC)、笔记本电脑、服务器等具有数据处理功能的设备。In a third aspect, an embodiment of the present application provides a software design device for a mobile database. The software design device for a mobile database may be a personal computer (PC), a laptop computer, a server, or other device with data processing capabilities.

参照图7，图7为本申请实施例方案中涉及的移动数据库的软件设计设备的硬件结构示意图。本申请实施例中，移动数据库的软件设计设备可以包括处理器、存储器、通信接口以及通信总线。Referring to Fig. 7, Fig. 7 is a schematic diagram of the hardware structure of the mobile database software design device involved in the embodiment of the present application. In the embodiment of the present application, the mobile database software design device may include a processor, a memory, a communication interface and a communication bus.

其中，通信总线可以是任何类型的，用于实现处理器、存储器以及通信接口互连。The communication bus may be of any type and is used to interconnect the processor, the memory, and the communication interface.

通信接口包括输入/输出(input/output，I/O)接口、物理接口和逻辑接口等用于实现移动数据库的软件设计设备内部的器件互连的接口，以及用于实现移动数据库的软件设计设备与其他设备(例如其他计算设备或用户设备)互连的接口。物理接口可以是以太网接口、光纤接口、ATM接口等；用户设备可以是显示屏(Display)、键盘(Keyboard)等。The communication interface includes input/output (I/O) interface, physical interface and logical interface, etc., which are used to realize the interconnection of devices inside the mobile database software design device, and the interface used to realize the interconnection between the mobile database software design device and other devices (such as other computing devices or user devices). The physical interface can be an Ethernet interface, a fiber optic interface, an ATM interface, etc.; the user device can be a display, a keyboard, etc.

存储器可以是各种类型的存储介质，例如随机存取存储器(randomaccessmemory，RAM)、只读存储器(read-only memory，ROM)、非易失性RAM(non-volatileRAM，NVRAM)、闪存、光存储器、硬盘、可编程ROM(programmable ROM，PROM)、可擦除PROM(erasable PROM，EPROM)、电可擦除PROM(electrically erasable PROM，EEPROM)等。The memory can be various types of storage media, such as random access memory (RAM), read-only memory (ROM), non-volatile RAM (NVRAM), flash memory, optical storage, hard disk, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), etc.

处理器可以是通用处理器，通用处理器可以调用存储器中存储的移动数据库的软件设计程序，并执行本申请实施例提供的移动数据库的软件设计方法。例如，通用处理器可以是中央处理器(central processing unit，CPU)。其中，移动数据库的软件设计程序被调用时所执行的方法可参照本申请移动数据库的软件设计方法的各个实施例，此处不再赘述。The processor may be a general-purpose processor, and the general-purpose processor may call the software design program of the mobile database stored in the memory and execute the software design method of the mobile database provided in the embodiment of the present application. For example, the general-purpose processor may be a central processing unit (CPU). The method executed when the software design program of the mobile database is called may refer to the various embodiments of the software design method of the mobile database of the present application, and will not be repeated here.

本领域技术人员可以理解，图7中示出的硬件结构并不构成对本申请的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art will appreciate that the hardware structure shown in FIG. 7 does not constitute a limitation on the present application, and may include more or fewer components than shown in the figure, or a combination of certain components, or a different arrangement of components.

本申请的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。术语“第一”、“第二”和“第三”等描述，是用于区分不同的对象等，其不代表先后顺序，也不限定“第一”、“第二”和“第三”是不同的类型。The terms "including" and "having" and any variations thereof in the specification and claims of this application and the above-mentioned drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally includes other steps or units inherent to these processes, methods, products or devices. The terms "first", "second" and "third" are used to distinguish different objects, etc., and do not represent a sequence, nor do they limit the "first", "second" and "third" to different types.

在本申请实施例的描述中，“示例性的”、“例如”或者“举例来说”等用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”、“例如”或者“举例来说”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言，使用“示例性的”、“例如”或者“举例来说”等词旨在以具体方式呈现相关概念。In the description of the embodiments of the present application, "exemplary", "for example" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary", "for example" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary", "for example" or "for example" is intended to present related concepts in a specific way.

在本申请实施例的描述中，除非另有说明，“/”表示或的意思，例如，A/B可以表示A或B；文本中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况，另外，在本申请实施例的描述中，“多个”是指两个或多于两个。In the description of the embodiments of the present application, unless otherwise specified, “/” means or. For example, A/B can mean A or B. The “and/or” in the text is merely a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, “multiple” refers to two or more than two.

在本申请实施例描述的一些流程中，包含了按照特定顺序出现的多个操作或步骤，但是应该理解，这些操作或步骤可以不按照其在本申请实施例中出现的顺序来执行或并行执行，操作的序号仅用于区分开各个不同的操作，序号本身不代表任何的执行顺序。另外，这些流程可以包括更多或更少的操作，并且这些操作或步骤可以按顺序执行或并行执行，并且这些操作或步骤可以进行组合。In some processes described in the embodiments of the present application, multiple operations or steps that appear in a specific order are included, but it should be understood that these operations or steps may not be executed in the order in which they appear in the embodiments of the present application or in parallel, and the sequence number of the operation is only used to distinguish the different operations, and the sequence number itself does not represent any execution order. In addition, these processes may include more or fewer operations, and these operations or steps may be executed in sequence or in parallel, and these operations or steps may be combined.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes a number of instructions for a terminal device to execute the methods described in each embodiment of the present application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the present application specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present application.

Claims

1. A software design method for a mobile database, the software design method for the mobile database comprising:

Based on the storage hierarchy design, storing the data table in an LSM-Tree in a form of converting the data table into key value pairs, introducing a comparison mechanism into the LSM-Tree, periodically merging and optimizing the operation, adopting B-Tree as an index, and applying a Bloom Filter quick search algorithm to realize storage management;

The execution of TOTEM-SQL command is completed through TOTEM's query system, the query system receives the query service command and compiles the query tree in the analyzer, and the query tree is rewritten and optimized, then enters the executor, the storage system completes the acquisition of disk data, returns the query result, and realizes the compiling query;

and (3) based on the WAL method, the redox information is recorded into the log file, and the redox operation is executed by comparing the operation record in the log file with the operation actually executed so as to recover the mobile database when the mobile database crashes, thereby realizing log management.

2. A method of software design for a mobile database as in claim 1, wherein:

For storage management, in a memory, data is stored as a tuple list, and metadata is distributed in a plurality of tables, wherein the tables comprise a class table, a secondary table, an object table, a switching table and a bidirectional pointer table;

The table is managed by a memory manager, which is used for providing a data insertion function and monitoring MemTable the size, and when MemTable exceeds a threshold value, converting MemTable into invariable MemTable, starting a compatibility process, converting invariable MemTable into SSTable and storing in the bottom layer;

The external data is organized through an LSM-Tree structure, the hierarchical manager is responsible for tracking the hierarchy and key information of each SSTable, and the hierarchical manager is also used for executing the comparison, including selecting the proper hierarchy and SSTable for merging, and executing the merging operation of a plurality of SSTable.

3. A method of software design of a mobile database as claimed in claim 2, wherein:

The LSM-Tree stores data in MemTable, the data is divided into variable mutable MemTable and invariable immutable MemTable, the data is firstly written into variable MemTable, the data is changed into invariable MemTable after being filled, and new data is written into variable MemTable needing to wait for idle;

immutable MemTable is consolidated to external memory by a compare operation and converted back to mutable MemTable for re-writing;

The data in the external memory are stored in the form of SSTable, the data are ordered and layered according to keys, the higher the hierarchy is, the more SSTable number is, and the larger the single size is;

the data starts to be compatible from the immutable MemTable of the memory, is firstly converted into the SSTable at the bottom layer, and then when the SSTable at a certain layer reaches the threshold value, the data is further compatible to a higher level until the highest level is reached.

4. The method for designing software of mobile database according to claim 2, wherein the operation step of designating key in SSTable comprises:

And reading Footes to acquire component information, checking the minimum and maximum keys of the SSTable to determine whether the keys are in range, if so, verifying whether the keys exist in the SSTable through a Bloom Filter, positioning to a data block containing a target key according to information in an index block, and searching for the keys in the data block.

5. A method of software design for a mobile database as in claim 1, wherein: the compare is used to clear the overlap of data layers and repeated key accumulation generated after the memory data is flushed, and the compare is also used to clear old version data by periodically merging multiple layers of data.

6. A method of software design for a mobile database as in claim 1, wherein: for compiling query, the analyzer is used for checking the input query character string, ensuring the correctness of the query character string through lexical, grammatical and semantic analysis, creating a query tree by the analyzer if the query character string is valid, transmitting the query tree to the planner, and returning an error prompt if the query character string is invalid.

7. A method of software design for a mobile database as in claim 1, wherein: for compiled queries, JSqlParser is extended by JavaCC to generate Java lexical and grammatical parsers, JSqlParser is a lightweight Java library for parsing SQL into a tree structure and supporting extensive SQL commands.

8. A method of software design for a mobile database as in claim 1, wherein:

for log management, before data is written Memtable, the records are persisted to a disk through WAL, and the function of a log system is managed through a LogManager.java file;

And when the mobile database crashes, restoring the data to the final normal state through the durable log file record, recording all object operations in the log file, and when the mobile database is restarted, starting to apply the log file record after the latest check point, and restoring to the consistent state.

9. A software design device for a mobile database, the software design device for a mobile database comprising:

The storage management module is used for storing the data table in an LSM-Tree in a key value pair mode based on storage hierarchical structure design, introducing a compatibility mechanism into the LSM-Tree, periodically combining and optimizing operations, adopting B-Tree as an index, and applying a Bloom Filter quick search algorithm to realize storage management;

The compiling and inquiring module is used for completing the execution of TOTEM-SQL commands through an inquiring system of TOTEM, receiving an inquiring service command by the inquiring system, compiling and forming an inquiring tree in the analyzer, entering an executor after rewriting and optimizing, completing the acquisition of disk data by the storage system, returning an inquiring result and realizing compiling and inquiring;

And the log management module is used for recording the redox information to the log file based on the WAL method, and executing the redox operation by comparing the operation record in the log file with the operation actually executed so as to recover the mobile database when the mobile database crashes, thereby realizing log management.

10. A software design device of a mobile database, characterized in that the software design device of a mobile database comprises a processor, a memory, and a software design program of a mobile database stored on the memory and executable by the processor, wherein the software design program of a mobile database, when executed by the processor, implements the steps of the software design method of a mobile database according to any one of claims 1 to 8.