[go: up one dir, main page]

CN101794242B - Fault-tolerant computer system data comparing method serving operating system core layer - Google Patents

Fault-tolerant computer system data comparing method serving operating system core layer Download PDF

Info

Publication number
CN101794242B
CN101794242B CN201010103349XA CN201010103349A CN101794242B CN 101794242 B CN101794242 B CN 101794242B CN 201010103349X A CN201010103349X A CN 201010103349XA CN 201010103349 A CN201010103349 A CN 201010103349A CN 101794242 B CN101794242 B CN 101794242B
Authority
CN
China
Prior art keywords
data
list
syner
redundant
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201010103349XA
Other languages
Chinese (zh)
Other versions
CN101794242A (en
Inventor
张兴军
董小社
雷济凯
胡冰
王恩东
胡雷钧
孙江斌
张东
田佳
赵晓昳
伍卫国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong High-End Server & Storage Research Institute
Xian Jiaotong University
Original Assignee
Shandong High-End Server & Storage Research Institute
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong High-End Server & Storage Research Institute, Xian Jiaotong University filed Critical Shandong High-End Server & Storage Research Institute
Priority to CN201010103349XA priority Critical patent/CN101794242B/en
Publication of CN101794242A publication Critical patent/CN101794242A/en
Application granted granted Critical
Publication of CN101794242B publication Critical patent/CN101794242B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

服务于操作系统核心层的容错计算机系统数据比较方法,通过在Linux操作系统中启动内核守护进程,执行数据比较器逻辑,为容错计算机系统中的双模冗余进程提供数据比较服务。在内核中添加事件链表作为消息通道,冗余进程与数据比较器以生产-消费的方式工作,冗余进程将待写数据封装为消息包插入消息链表,比较器从消息链表中取下消息包,按定义格式解析消息包并比较冗余进程待写数据,最后将结果返还给冗余进程。本发明在操作系统核心层实现,无须硬件定制,实现简单,适用于基于普通硬件架构的进程级双模冗余容错系统,通用性好。所有比较逻辑都在操作系统核心层自动完成,无须应用程序参与,对应用具有良好的透明性。

Figure 201010103349

The data comparison method of the fault-tolerant computer system serving the core layer of the operating system provides data comparison services for the dual-mode redundant process in the fault-tolerant computer system by starting the kernel daemon process in the Linux operating system and executing the data comparator logic. Add an event linked list in the kernel as a message channel, the redundant process and the data comparator work in a production-consumption mode, the redundant process encapsulates the data to be written into a message packet and inserts it into the message linked list, and the comparator removes the message package from the message linked list , parse the message packet according to the defined format and compare the data to be written by the redundant process, and finally return the result to the redundant process. The invention is implemented at the core layer of the operating system without hardware customization, and is simple to implement, and is suitable for a process-level dual-mode redundant fault-tolerant system based on a common hardware architecture, and has good versatility. All comparison logic is automatically completed at the core layer of the operating system, without the participation of the application program, and has good transparency to the application.

Figure 201010103349

Description

服务于操作系统核心层的容错计算机系统数据比较方法Data comparison method of fault-tolerant computer system serving operating system core layer

技术领域 technical field

本发明属于计算机领域,涉及计算机容错技术与数据比较技术,特别涉及一种服务于操作系统核心层的容错计算机系统数据比较方法。The invention belongs to the field of computers, and relates to computer fault-tolerant technology and data comparison technology, in particular to a fault-tolerant computer system data comparison method serving the core layer of an operating system.

背景技术 Background technique

随着计算机、互联网技术的飞速发展,信息化已深入到社会的方方面面,计算机技术在提高工作效率、促进信息交流等方面极大地改变了人们的生活方式,但同时也使人们对它产生了越来越多的依赖,一次计算机系统的故障可能带来无法估量的损失。对那些需要保障信息安全和提供不间断信息服务的机构来说,例如证券、制造、通信、银行、运输,业务系统的可靠性和不间断性显得尤为重要。如何提高计算机系统的可靠性与可用性,从而保障各种关键应用持续运营,达到永续经营的良性循环,已成为信息领域的一个重要问题。容错计算机及相关技术正是在这种客观需求下应运而生,利用容错计算机能避免因服务器故障而引发的数以万计的经济损失。With the rapid development of computer and Internet technology, informatization has penetrated into all aspects of society. Computer technology has greatly changed people's lifestyles in terms of improving work efficiency and promoting information exchange. With more and more dependence, a failure of a computer system may bring immeasurable losses. For those institutions that need to ensure information security and provide uninterrupted information services, such as securities, manufacturing, communications, banking, and transportation, the reliability and uninterruptibility of business systems are particularly important. How to improve the reliability and availability of computer systems, so as to ensure the continuous operation of various key applications and achieve a virtuous cycle of sustainable operation, has become an important issue in the information field. Fault-tolerant computers and related technologies emerge at the historic moment under such objective demands. The use of fault-tolerant computers can avoid tens of thousands of economic losses caused by server failures.

容错计算机是在冗余资源(硬件冗余、时间冗余、信息冗余、软件冗余)的基础上,通过设计合理的体系结构,在系统软件的有效管理下而形成的高可靠、高可用计算机。故障检测是实现容错计算机系统的关键技术之一,而对任务数据的比较、表决是错误发现的主要手段。Fault-tolerant computer is based on redundant resources (hardware redundancy, time redundancy, information redundancy, software redundancy), through the design of a reasonable architecture, and under the effective management of system software, it is highly reliable and highly available. computer. Fault detection is one of the key technologies to realize fault-tolerant computer system, and the comparison and voting of task data are the main means of fault discovery.

对数据的比较、表决主要有基于硬件和基于软件两种方式。基于硬件的方法在系统中增加比较芯片,芯片中包含比较或投票逻辑,对所有待写出的数据进行比较、表决,这种方式发现错误及时,但设计复杂,实现成本高。基于软件的方法在库函数或者应用程序中设置比较、表决点,对任务的中间结果和最后输出进行一致性判断,这种方式系统设计简单,但对应用透明性差,给编程人员和用户带来了额外的负担。There are mainly two methods for data comparison and voting: hardware-based and software-based. The hardware-based method adds a comparison chip to the system. The chip contains comparison or voting logic to compare and vote on all the data to be written. This method finds errors in a timely manner, but the design is complicated and the implementation cost is high. The software-based method sets comparison and voting points in the library function or application program, and makes a consistency judgment on the intermediate results and final output of the task. This method is simple in system design, but has poor transparency to applications, which brings great harm to programmers and users. additional burden.

发明内容 Contents of the invention

本发明的目的在于针对上述现有技术的缺点和不足,提供了一种服务于操作系统核心层的容错计算机系统数据比较方法,本发明能对容错系统中冗余任务的状态和数据结果进行一致性比较,同时记录冗余任务的同步比较信息。The purpose of the present invention is to provide a kind of fault-tolerant computer system data comparison method serving the core layer of the operating system in view of the shortcomings and deficiencies of the above-mentioned prior art. Sexual comparison, while recording the synchronous comparison information of redundant tasks.

为了实现上述任务,本发明采用如下的技术解决方案:在Linux操作系统内核中创建内核态守护进程ft_syner,执行比较器逻辑,为冗余进程提供数据比较服务;冗余进程在执行写操作时,分别准备好待写数据,再由主进程将待写数据封装为消息包,然后将消息包添加入冗余进程和数据比较器通信通道,并主动唤醒数据比较器ft_syner进行数据比较,在数据比较器ft_syner完成数据比较后,冗余进程通过检测消息包中的比较结果字段获得比较结果。In order to realize above-mentioned task, the present invention adopts following technical solution: create kernel mode daemon process ft_syner in Linux operating system kernel, execute comparator logic, provide data comparison service for redundant process; When redundant process is carrying out write operation, Prepare the data to be written separately, and then the main process encapsulates the data to be written into a message packet, and then adds the message packet to the redundant process and the data comparator communication channel, and actively wakes up the data comparator ft_syner for data comparison. After the ft_syner completes the data comparison, the redundant process obtains the comparison result by checking the comparison result field in the message packet.

所述的通信通道,实现方式如下:在Linux操作系统内核中创建事件链表ft_syner_event_list,冗余进程和数据比较器通过事件链表ft_syner_event_list实现通信。The implementation of the communication channel is as follows: an event chain list ft_syner_event_list is created in the Linux operating system kernel, and the redundant process and the data comparator realize communication through the event chain list ft_syner_event_list.

数据比较器和冗余进程以生存者-消费者方式工作,冗余进程将待比较的数据按协议格式整理为消息包后挂接在事件链表ft_syner_event_list中,比较器从该事件链表ft_syner_event_list中取下消息包,从中提取数据信息进行比较。The data comparator and the redundant process work in a survivor-consumer mode. The redundant process organizes the data to be compared into a message packet according to the protocol format and then hooks it into the event chain list ft_syner_event_list, and the comparator removes it from the event chain list ft_syner_event_list A message package from which data information is extracted for comparison.

所述的消息包的格式为:The format of the message packet is:

   typedef struct{typedef struct{

   struct list_head list;struct list_head list;

   short ft_msg_type;short ft_msg_type;

   struct task_struct*p1;struct task_struct*p1;

   struct task_struct*p2;struct task_struct*p2;

   void*master_data;void *master_data;

   long master_data_len;long master_data_len;

   void*slave_data;void *slave_data;

   long slave_data_len;long slave_data_len;

   short error;short error;

}ft_syner_event_msg;} ft_syner_event_msg;

其中,list为链表头,用于将消息包挂入事件链表,采用list_head为Linux内核通用链表结构,消息包的插入、删除操作使用内核中的list_add()和list_del()完成;Among them, list is the head of the linked list, which is used to hang the message package into the event linked list. The list_head is used as the general linked list structure of the Linux kernel. The insertion and deletion of the message package are completed using list_add() and list_del() in the kernel;

ft_msg_type为消息类型,表示消息包来自于哪一类系统调用,具体值定义如下:ft_msg_type is the message type, indicating which type of system call the message packet comes from. The specific values are defined as follows:

#define FT_WRITE    1  //write()系统调用#define FT_WRITE 1 //write() system call

#define FT_WRITEV   2  //writev()系统调用#define FT_WRITEV 2 //writev() system call

#define FT_SEND     3  //send()系统调用#define FT_SEND 3 //send() system call

#define FT_SENDTO   4  //sendto()系统调用#define FT_SENDTO 4 //sendto() system call

#define FT_SENDMSG  5  //sendmsg()系统调用#define FT_SENDMSG 5 //sendmsg() system call

比较器根据ft_msg_type消息类型的值判断消息包从哪一类系统调用上产生,上述定义能够根据需求扩展或裁减;The comparator judges which type of system call the message packet is generated from according to the value of the ft_msg_type message type, and the above definition can be expanded or cut according to requirements;

p1,p2为生成该消息包的冗余进程对的进程控制块指针,p1为主进程,p2为从进程,比较器通过这两个指针获取冗余进程对的进程控制块;p1, p2 are the process control block pointers of the redundant process pair that generates the message packet, p1 is the master process, and p2 is the slave process, and the comparator obtains the process control block of the redundant process pair through these two pointers;

master_data是冗余进程对中主进程的数据缓冲区指针,master_data_len是缓冲区长度;master_data is the data buffer pointer of the master process in the redundant process pair, and master_data_len is the buffer length;

slave_data是冗余进程对中从进程的数据缓冲区指针,slave_data_len为缓冲区长度;slave_data is the data buffer pointer of the slave process in the redundant process pair, and slave_data_len is the buffer length;

error记录比较结果,值为1表示数据一致,值为0表示数据不一致。error records the comparison result, a value of 1 indicates that the data is consistent, and a value of 0 indicates that the data is inconsistent.

所述的数据比较器,是在Linux操作系统内核中创建的一个内核态守护进程ft_syner,它常驻操作系统内核,执行比较器逻辑,为冗余进程提供数据比较服务。内核态守护进程ft_syner空闲时处于等待状态,有任务时可被冗余进程唤醒,或自身周期性唤醒,内核态守护进程ft_syner每次被唤醒后,遍历事件链表ft_syner_event_list,取下链表中的每一个消息包进行解析,并对数据进行比较。解析完当前事件链表中的所有消息包后,内核态守护进程ft_syner调用函数sleep_on_timeout()进入等待状态,在此函数中设置等待周期为5秒,等待时间用完后唤醒内核态守护进程ft_syner进入下一轮遍历。The data comparator is a kernel-mode daemon process ft_syner created in the Linux operating system kernel, which resides in the operating system kernel, executes comparator logic, and provides data comparison services for redundant processes. The kernel state daemon ft_syner is in a waiting state when it is idle, and can be woken up by redundant processes when there are tasks, or it can be woken up periodically by itself. After each wake-up, the kernel state daemon ft_syner traverses the event list ft_syner_event_list and removes each event in the list. The message packets are parsed and the data is compared. After parsing all the message packets in the current event chain list, the kernel state daemon process ft_syner calls the function sleep_on_timeout() to enter the waiting state. In this function, the waiting period is set to 5 seconds. After the waiting time is exhausted, the kernel state daemon process ft_syner is awakened and enters the next state. A round of traversal.

所述的冗余进程需要增加主从属性,冗余进程在执行写操作时,分别准备好待写数据,再由主进程将待写数据封装为消息包,通过list_add(&(msg->list),&ft_syner_event_list)将消息包添加入事件链表ft_syner_event_list,并主动唤醒内核态守护进程ft_syner,在内核态守护进程ft_syner完成数据比较后,冗余进程可通过检测msg->error获得比较结果。The redundant process needs to increase the master-slave attribute. When the redundant process executes the write operation, it prepares the data to be written respectively, and then the master process encapsulates the data to be written into a message packet, and passes list_add(&(msg->list ), &ft_syner_event_list) add the message packet to the event chain list ft_syner_event_list, and actively wake up the kernel-mode daemon process ft_syner, after the kernel-mode daemon process ft_syner completes the data comparison, the redundant process can obtain the comparison result by detecting msg->error.

本发明的数据比较器以内核守护进程的方式提供数据比较服务,冗余进程在调用write(),writev(),send(),sendto(),sendmsg()相关写操作时,利用数据比较器的服务完成操作中待写数据的比较。即通过在Linux操作系统中启动内核守护进程ft_syner,该进程执行数据比较器逻辑,为容错计算机系统中的双模冗余进程提供数据比较服务。内核中添加的事件链表ft_syner_event_list作为消息通道,冗余进程与数据比较器以生产-消费的方式工作,冗余进程将待写数据封装为消息包插入消息链表,比较器从消息链表中取下、解析消息包并完成数据比较,最后将结果返还给冗余进程。本发明在操作系统核心层以软件的方式简洁、可靠的完成了双模容错系统中冗余进程的数据比较。该方法在操作系统核心层实现,无须硬件定制,适用于基于通用硬件架构的进程级双模冗余容错系统,所有逻辑都在操作系统核心层自动完成,无须应用程序参与,对应用具有良好的透明性。Data comparator of the present invention provides data comparison service in the mode of kernel daemon process, and redundant process utilizes data comparator when calling write (), writev (), send (), sendto (), sendmsg () relevant write operation The service completes the comparison of the data to be written in the operation. That is, by starting the kernel daemon process ft_syner in the Linux operating system, the process executes the data comparator logic to provide data comparison services for the dual-mode redundant process in the fault-tolerant computer system. The event list ft_syner_event_list added in the kernel is used as a message channel. The redundant process and the data comparator work in a production-consumption mode. The redundant process encapsulates the data to be written into a message packet and inserts it into the message list. The comparator is removed from the message list. Parse the message packet and perform data comparison, and finally return the result to the redundant process. The invention simply and reliably completes the data comparison of redundant processes in the dual-mode fault-tolerant system in the mode of software at the core layer of the operating system. This method is implemented at the core layer of the operating system without hardware customization. It is suitable for process-level dual-mode redundant fault-tolerant systems based on general hardware architecture. All logic is automatically completed at the core layer of the operating system without the participation of application programs. It has a good impact on applications. transparency.

附图说明 Description of drawings

图1为本发明中数据比较器的工作流程图;Fig. 1 is the working flow chart of data comparator among the present invention;

图2为本发明中冗余进程与数据比较器的交互原理图。Fig. 2 is a schematic diagram of the interaction between the redundant process and the data comparator in the present invention.

具体实施方式 Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

本发明的方法如下:Method of the present invention is as follows:

在Linux操作系统内核中创建内核态守护进程ft_syner,执行比较器逻辑,为冗余进程提供数据比较服务;冗余进程在执行写操作时,分别准备好待写数据,再由主进程将待写数据封装为消息包,然后将消息包添加入冗余进程和数据比较器通信通道,并主动唤醒数据比较器ft_syner进行数据比较,在数据比较器ft_syner完成数据比较后,冗余进程通过检测消息包中的比较结果字段获得比较结果。Create a kernel state daemon process ft_syner in the kernel of the Linux operating system, execute comparator logic, and provide data comparison services for redundant processes; when redundant processes perform write operations, they prepare data to be written, and then the main process writes the data to be written The data is encapsulated into a message packet, and then the message packet is added to the redundant process and the data comparator communication channel, and the data comparator ft_syner is actively awakened to perform data comparison. After the data comparator ft_syner completes the data comparison, the redundant process passes the detection message packet The comparison result field in to get the comparison result.

所述的通信通道,实现方式如下:在Linux操作系统内核中创建事件链表ft_syner_event_list,冗余进程和数据比较器通过事件链表ft_syner_event_list实现通信。The implementation of the communication channel is as follows: an event chain list ft_syner_event_list is created in the Linux operating system kernel, and the redundant process and the data comparator realize communication through the event chain list ft_syner_event_list.

数据比较器和冗余进程以生存者-消费者方式工作,冗余进程将待比较的数据按协议格式整理为消息包后挂接在事件链表ft_syner_event_list中,比较器从该事件链表ft_syner_event_list中取下消息包,从中提取数据信息进行比较。The data comparator and the redundant process work in a survivor-consumer mode. The redundant process organizes the data to be compared into a message packet according to the protocol format and then hooks it into the event chain list ft_syner_event_list, and the comparator removes it from the event chain list ft_syner_event_list A message package from which data information is extracted for comparison.

所述的消息包的格式为:The format of the message packet is:

   typedef struct{typedef struct{

   struct list_head list;struct list_head list;

   short ft_msg_type;short ft_msg_type;

   struct task_struct*p1;struct task_struct*p1;

   struct task_struct*p2;struct task_struct*p2;

   void*master_data;void *master_data;

   long master_data_len;long master_data_len;

   void*slave_data;void *slave_data;

   long slave_data_len;long slave_data_len;

   short error;short error;

}ft_syner_event_msg;} ft_syner_event_msg;

其中,list为链表头,用于将消息包挂入事件链表,采用list_head为Linux内核通用链表结构,消息包的插入、册除操作使用内核中的list_add()和list_del()完成;Among them, list is the head of the linked list, which is used to hang the message package into the event linked list, using list_head as the general linked list structure of the Linux kernel, and the insertion and deletion of the message package are completed using list_add() and list_del() in the kernel;

ft_msg_type为消息类型,表示消息包来自于哪一类系统调用,具体值定义如下:ft_msg_type is the message type, indicating which type of system call the message packet comes from. The specific values are defined as follows:

#define FT_WRITE    1   //write()系统调用#define FT_WRITE 1 //write() system call

#define FT_WRITEV    2   //writev()系统调用#define FT_WRITEV 2 //writev() system call

#define FT_SEND      3   //send()系统调用#define FT_SEND 3 //send() system call

#define FT_SENDTO    4   //sendto()系统调用#define FT_SENDTO 4 //sendto() system call

#define FT_SENDMSG   5   //sendmsg()系统调用#define FT_SENDMSG 5 //sendmsg() system call

比较器根据ft_msg_type消息类型的值判断消息包从哪一类系统调用上产生,上述定义能够根据需求扩展或裁减;The comparator judges which type of system call the message packet is generated from according to the value of the ft_msg_type message type, and the above definition can be expanded or cut according to requirements;

p1,p2为生成该消息包的冗余进程对的进程控制块指针,p1为主进程,p2为从进程,比较器通过这两个指针获取冗余进程对的进程控制块;p1, p2 are the process control block pointers of the redundant process pair that generates the message packet, p1 is the master process, and p2 is the slave process, and the comparator obtains the process control block of the redundant process pair through these two pointers;

master_data是冗余进程对中主进程的数据缓冲区指针,master_data_len是缓冲区长度;master_data is the data buffer pointer of the master process in the redundant process pair, and master_data_len is the buffer length;

slave_data是冗余进程对中从进程的数据缓冲区指针,slave_data_len为缓冲区长度;slave_data is the data buffer pointer of the slave process in the redundant process pair, and slave_data_len is the buffer length;

error记录比较结果,值为1表示数据一致,值为0表示数据不一致。error records the comparison result, a value of 1 indicates that the data is consistent, and a value of 0 indicates that the data is inconsistent.

所述的数据比较器,是在Linux操作系统内核中创建的一个内核态守护进程ft_syner,它常驻操作系统内核,执行比较器逻辑,为冗余进程提供数据比较服务。内核态守护进程ft_syner空闲时处于等待状态,有任务时可被冗余进程唤醒,或自身周期性唤醒,内核态守护进程ft_syner每次被唤醒后,遍历事件链表ft_syner_event_list,取下链表中的每一个消息包进行解析,并对数据进行比较。解析完当前事件链表中的所有消息包后,ft_syner调用函数sleep_on_timeout()进入等待状态,在此函数中设置等待周期为5秒,等待时间用完后唤醒ft_syner进入下一轮遍历。The data comparator is a kernel-mode daemon process ft_syner created in the Linux operating system kernel, which resides in the operating system kernel, executes comparator logic, and provides data comparison services for redundant processes. The kernel state daemon ft_syner is in a waiting state when it is idle, and can be woken up by redundant processes when there are tasks, or it can be woken up periodically by itself. After each wake-up, the kernel state daemon ft_syner traverses the event list ft_syner_event_list and removes each event in the list. The message packets are parsed and the data is compared. After parsing all the message packets in the current event chain list, ft_syner calls the function sleep_on_timeout() to enter the waiting state. In this function, the waiting period is set to 5 seconds. After the waiting time is exhausted, ft_syner is awakened to enter the next round of traversal.

所述的冗余进程需要增加主从属性,冗余进程在执行写操作时,分别准备好待写数据,再由主进程将待写数据封装为消息包,通过list_add(&(msg->list),&ft_syner_event_list)将消息包添加入事件链表ft_syner_event_list,并主动唤醒数据比较器ft_syner,在数据比较器ft_syner完成数据比较后,冗余进程可通过检测msg->error获得比较结果。The redundant process needs to increase the master-slave attribute. When the redundant process executes the write operation, it prepares the data to be written respectively, and then the master process encapsulates the data to be written into a message packet, and passes list_add(&(msg->list ), &ft_syner_event_list) add the message packet to the event chain list ft_syner_event_list, and actively wake up the data comparator ft_syner, after the data comparator ft_syner completes the data comparison, the redundant process can obtain the comparison result by detecting msg->error.

图1所示数据比较器的工作流程为:The workflow of the data comparator shown in Figure 1 is:

(1)比较器进程ft_syner调用spin_lock()获取消息链表的自旋锁;(1) The comparator process ft_syner calls spin_lock() to obtain the spin lock of the message list;

(2)判断消息链表ft_syner_event_list是否为空,如果为空转(6),如果不为空转(3);(2) Judge whether the message linked list ft_syner_event_list is empty, if it is idling (6), if it is not idling (3);

(3)获取链表中的一个消息包msg,解析该消息包并完成数据比较;(3) Obtain a message packet msg in the linked list, parse the message packet and complete data comparison;

(4)使用内核中的list_add()将处理完的消息包从消息链表中删除;(4) Use list_add() in the kernel to delete the processed message packet from the message list;

(5)如果消息链表中还有未处理的剩余消息包,转(3),否则转(6);(5) If there are unprocessed remaining message packets in the message linked list, turn to (3), otherwise turn to (6);

(6)调用spin_unlock()释放消息链表的自旋锁;(6) Call spin_unlock() to release the spin lock of the message list;

(7)比较器进程ft_syner调用sleep_on_timeout()进入睡眠等待;(7) The comparator process ft_syner calls sleep_on_timeout() to enter sleep and wait;

(8)进程ft_syner在等待时间周期到达后被动唤醒,或被冗余进程主动唤醒;(8) The process ft_syner wakes up passively after the waiting time period arrives, or is actively woken up by a redundant process;

(9)判断标志位finish,如果finish为1,表示接收到结束比较器服务的请求,ft_syner结束,如果finish为0,转(1)进入下一轮服务。(9) Judge the flag bit finish, if finish is 1, it means that the request to end the comparator service is received, ft_syner ends, if finish is 0, turn to (1) to enter the next round of service.

图2展示了冗余进程与数据比较器的交互过程。该图以write()操作为例,冗余进程P1、P2在执行write()操作时需要进行数据比较,冗余进程将待比较数据封装在消息包msg中,并将其插入消息链表,唤醒数据比较器。比较器从消息链表中取下消息包,按格式定义解析消息包,并对包中的两份数据进行一致性比较,将比较结果存入msg->error中。最后冗余进程通过查看msg->error的值获取比较结果。Figure 2 shows the interaction process between the redundant process and the data comparator. The figure takes the write() operation as an example. The redundant processes P1 and P2 need to perform data comparison when executing the write() operation. The redundant process encapsulates the data to be compared in the message packet msg, inserts it into the message list, and wakes up data comparator. The comparator removes the message package from the message list, parses the message package according to the format definition, and compares the two data in the package for consistency, and stores the comparison result in msg->error. Finally, the redundant process obtains the comparison result by looking at the value of msg->error.

Claims (1)

1.服务于容错计算机的操作系统核心层系统数据比较方法,其特征在于,首先在Linux操作系统内核中创建内核态守护进程ft_syner,其作用为执行比较器逻辑,为冗余进程提供数据比较服务;其次当冗余进程在执行写操作过程中分别准备好待写数据后,主进程将这些待写数据封装为消息包并将消息包添加入冗余进程和数据比较器通信通道里,同时主动唤醒数据比较器进行数据比较;最后数据比较器完成数据比较,冗余进程通过检测消息包中的比较结果字段获得比较结果;所述的通信通道,实现方式如下:在Linux操作系统内核中创建事件链表ft_syner_event_list,冗余进程和数据比较器通过事件链表ft_syner_event_list实现通信;数据比较器和冗余进程以生存者-消费者方式工作,冗余进程将待比较的数据按协议格式整理为消息包后挂接在事件链表ft_syner_event_list中,比较器从该事件链表ft_syner_event_list中取下消息包并从中提取数据信息进行比较;所述的数据比较器,是在Linux操作系统内核中创建的一个内核态守护进程ft_syner,它常驻操作系统内核,执行比较器逻辑,为冗余进程提供数据比较服务,内核态守护进程ft_syner空闲时处于等待状态,有任务时被冗余进程唤醒,或自身周期性唤醒,内核态守护进程ft_syner每次被唤醒后,遍历事件链表ft_syner_event_list,取下链表中的每一个消息包进行解析,并对数据进行比较,解析完当前事件链表中的所有消息包后,内核态守护进程ft_syner调用函数sleep_on_timeout进入等待状态,在此函数中设置等待周期为5秒,等待时间用完后唤醒内核态守护进程ft_syner进入下一轮遍历;所述的冗余进程需要增加主从属性,冗余进程在执行写操作时,分别准备好待写数据,再由主进程将待写数据封装为消息包,通过函数list_add将消息包添加入事件链表ft_syner_event_list,并主动唤醒内核态守护进程ft_syner,在内核态守护进程ft_syner完成数据比较后,冗余进程通过检测msg的值获得比较结果。1. The operating system core layer system data comparison method serving a fault-tolerant computer is characterized in that, at first, a kernel state daemon process ft_syner is created in the Linux operating system kernel, and its effect is to perform comparator logic to provide data comparison services for redundant processes ; Secondly, when the redundant process prepares the data to be written respectively during the execution of the write operation, the main process encapsulates the data to be written into a message packet and adds the message packet into the communication channel between the redundant process and the data comparator, and actively Wake up data comparator and carry out data comparison; Finally data comparator completes data comparison, redundant process obtains comparison result by detecting the comparison result field in the message packet; Described communication channel, implementation is as follows: create event in Linux operating system kernel The linked list ft_syner_event_list, the redundant process and the data comparator communicate through the event linked list ft_syner_event_list; the data comparator and the redundant process work in a survivor-consumer mode, and the redundant process organizes the data to be compared into message packets according to the protocol format and then hangs Connected in the event chain list ft_syner_event_list, the comparator takes off the message packet from the event chain list ft_syner_event_list and extracts data information therefrom for comparison; the data comparator is a kernel state daemon process ft_syner created in the Linux operating system kernel, It resides in the kernel of the operating system, executes the comparator logic, and provides data comparison services for redundant processes. The kernel state daemon process ft_syner is in a waiting state when it is idle, and is awakened by the redundant process when there is a task, or wakes up periodically by itself. The kernel state daemon process After the process ft_syner wakes up each time, it traverses the event chain list ft_syner_event_list, removes each message packet in the chain list for analysis, and compares the data. After parsing all the message packets in the current event chain list, the kernel state daemon process ft_syner calls the function sleep_on_timeout enters the waiting state, and the waiting period is set to 5 seconds in this function. After the waiting time is exhausted, the kernel state daemon process ft_syner is awakened to enter the next round of traversal; the redundant process needs to increase the master-slave attribute, and the redundant process is executing During the write operation, the data to be written is prepared separately, and then the main process encapsulates the data to be written into a message packet, and adds the message packet to the event chain list ft_syner_event_list through the function list_add, and actively wakes up the kernel mode daemon process ft_syner, in the kernel mode daemon process After ft_syner completes the data comparison, the redundant process obtains the comparison result by checking the value of msg.
CN201010103349XA 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer Expired - Fee Related CN101794242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010103349XA CN101794242B (en) 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010103349XA CN101794242B (en) 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer

Publications (2)

Publication Number Publication Date
CN101794242A CN101794242A (en) 2010-08-04
CN101794242B true CN101794242B (en) 2012-07-18

Family

ID=42586952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010103349XA Expired - Fee Related CN101794242B (en) 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer

Country Status (1)

Country Link
CN (1) CN101794242B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102323900B (en) * 2011-08-31 2014-03-26 国家计算机网络与信息安全管理中心 System fault tolerance mechanism based on dynamic sensing for many-core environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088817A (en) * 1993-11-26 2000-07-11 Telefonaktiebolaget Lm Ericsson Fault tolerant queue system
CN101000561A (en) * 2006-12-20 2007-07-18 中国电子科技集团公司第十四研究所 Method for implementing kernel of multi-machine fault-tolerant system
CN101369241A (en) * 2007-09-21 2009-02-18 中国科学院计算技术研究所 A cluster fault-tolerant system, device and method
CN101383690A (en) * 2008-10-27 2009-03-11 西安交通大学 A network synchronization method of fault-tolerant computer system based on socket

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088817A (en) * 1993-11-26 2000-07-11 Telefonaktiebolaget Lm Ericsson Fault tolerant queue system
CN101000561A (en) * 2006-12-20 2007-07-18 中国电子科技集团公司第十四研究所 Method for implementing kernel of multi-machine fault-tolerant system
CN101369241A (en) * 2007-09-21 2009-02-18 中国科学院计算技术研究所 A cluster fault-tolerant system, device and method
CN101383690A (en) * 2008-10-27 2009-03-11 西安交通大学 A network synchronization method of fault-tolerant computer system based on socket

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
附图1.

Also Published As

Publication number Publication date
CN101794242A (en) 2010-08-04

Similar Documents

Publication Publication Date Title
CN101383690B (en) A network synchronization method of fault-tolerant computer system based on socket
CN103559217B (en) A kind of massive multicast data towards isomeric data storehouse warehouse-in implementation method
CN102591964A (en) Implementation method and device for data reading-writing splitting system
CN111949633A (en) ICT system operation log analysis method based on parallel stream processing
US10853157B2 (en) Compact binary event log generation
CN106850260A (en) A kind of dispositions method and device of virtual resources management platform
CN104050261A (en) Stormed-based variable logic general data processing system and method
CN101719852B (en) Method and device for monitoring performance of middleware
CN102999384B (en) Managing processes within suspend states and execution states
CN104092575A (en) A resource monitoring method and system
CN103198007A (en) Multi-process log output method and system
US20220171652A1 (en) Distributed container image construction scheduling system and method
CN102012850A (en) Hardware monitoring and micro-packet protocol-based key data restoration method
CN103064770A (en) Dual-process redundancy transient fault tolerating method
CN102063369A (en) Embedded software testing method based on AADL (Architecture Analysis and Design Language) mode time automata model
Bouteiller et al. Correlated set coordination in fault tolerant message logging protocols
CN110515918A (en) A distributed storage platform and construction method based on HDFS
CN103440200B (en) A kind of height based on dual operating systems real-time big data quantity test back method
CN107491372A (en) A kind of method and system for linux system RPM bags statistics CPU usage
CN101794242B (en) Fault-tolerant computer system data comparing method serving operating system core layer
CN114510531A (en) Database synchronization method, apparatus, electronic device and storage medium
CN101499971B (en) Service network performance optimization system
CN102420849A (en) Mobile main body platform model and mobile main body migration method
CN111143475B (en) State management method and device for Storm data analysis
CN115982234A (en) A Link Tracking Method Based on Microservice Distributed Architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120718

Termination date: 20150129

EXPY Termination of patent right or utility model