[go: up one dir, main page]

CN100472471C - A system and method for acquiring computer operating system fault site information - Google Patents

A system and method for acquiring computer operating system fault site information Download PDF

Info

Publication number
CN100472471C
CN100472471C CNB2006100576026A CN200610057602A CN100472471C CN 100472471 C CN100472471 C CN 100472471C CN B2006100576026 A CNB2006100576026 A CN B2006100576026A CN 200610057602 A CN200610057602 A CN 200610057602A CN 100472471 C CN100472471 C CN 100472471C
Authority
CN
China
Prior art keywords
operating system
memory
module
site information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2006100576026A
Other languages
Chinese (zh)
Other versions
CN101025709A (en
Inventor
周涛
周建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CNB2006100576026A priority Critical patent/CN100472471C/en
Publication of CN101025709A publication Critical patent/CN101025709A/en
Application granted granted Critical
Publication of CN100472471C publication Critical patent/CN100472471C/en
Anticipated expiration legal-status Critical
Active legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

一种计算机操作系统故障现场信息获取的系统和方法,基于EFI BIOS和运行在EFI BIOS上的操作系统进行运作,包括操作系统监视模块(2),计数器(4)和硬件狗模块(5);EFI BIOS包括内存分配模块(1)和运行时间服务模块(3);计算机上电后,EFI BIOS初始化内存时,内存分配模块(1)将物理内存分为操作系统内存和故障分析系统内存;操作系统运行时,启动并驻留操作系统监视模块(2),操作系统监视模块(2)收集操作系统现场信息;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块(3);运行时间服务模块(3)建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。其能够保证操作系统在出现故障的现场对系统进行分析和诊断,获取故障现场的信息。

Figure 200610057602

A system and method for obtaining on-site information of a computer operating system fault, operating based on EFI BIOS and an operating system running on the EFI BIOS, including an operating system monitoring module (2), a counter (4) and a hardware dog module (5); EFI BIOS includes a memory allocation module (1) and a runtime service module (3); after the computer is powered on, when the EFI BIOS initializes the memory, the memory allocation module (1) divides the physical memory into operating system memory and fault analysis system memory; operation When the system is running, it starts and resides the operating system monitoring module (2), and the operating system monitoring module (2) collects the on-site information of the operating system; when the operating system crashes, the event of the operating system crashing is notified to the runtime service module (3) of the EFI BIOS ); the running time service module (3) establishes a fault analysis system environment for obtaining fault site information of the operating system, and obtains the fault site information of the operating system. It can ensure that the operating system analyzes and diagnoses the system at the site of the fault, and obtains the information of the fault site.

Figure 200610057602

Description

一种计算机操作系统故障现场信息获取的系统和方法 A system and method for acquiring computer operating system fault site information

技术领域 technical field

本发明涉及计算机领域,特别是涉及一种计算机操作系统故障现场信息获取的系统和方法。The invention relates to the field of computers, in particular to a system and a method for acquiring fault site information of a computer operating system.

背景技术 Background technique

目前计算机操作系统(Operation System,OS),特别是多任务操作系统,如Windows操作系统都较为复杂,一个完备的操作系统在运行的过程中,由于多个应用程序或者新程序的运行,可能会出现新的故障。一般地,现有的操作系统都有一些操作系统诊断和维护方法,可以监控操作系统的工作状况,并在有可能出现问题的时候提前向用户报警。但在操作系统发生一些致命错误时(如内存错误、应用程序越界访问等),操作系统会崩溃,包括死循环(即死机)或产生无法识别的错误(蓝屏),这时通常的做法是将计算机机器重新启动。然而这时所有的计算机故障现场信息将全部丢失,无法进行进一步进行故障分析,也就无法找到问题的根本原因。而不排除机器的问题,隐患依旧存在,系统的稳定性得不到保证,性能得不到保障,可能在制约条件满足的时候再次发生故障,用户对其信任度就会下降。因此,如何在计算机操作系统中,如果在操作系统死机或者崩溃时,获取故障现场信息,已经成为业界迫切需要解决的问题。At present, the computer operating system (Operation System, OS), especially the multitasking operating system, such as the Windows operating system, is relatively complicated. During the operation of a complete operating system, due to the operation of multiple applications or new programs, it may A new failure has occurred. Generally, existing operating systems have some methods for diagnosing and maintaining the operating system, which can monitor the working status of the operating system and alert the user in advance when there is a possible problem. However, when some fatal errors occur in the operating system (such as memory errors, application out-of-bounds access, etc.), the operating system will crash, including an infinite loop (that is, crash) or an unrecognizable error (blue screen). The computer machine restarts. However, at this time, all computer failure site information will be lost, and further failure analysis cannot be carried out, and the root cause of the problem cannot be found. If the problem of the machine is not ruled out, hidden dangers still exist, the stability and performance of the system cannot be guaranteed, and failures may occur again when the constraints are met, and the user's trust in it will decline. Therefore, in the computer operating system, if the operating system crashes or crashes, how to obtain fault site information has become an urgent problem to be solved in the industry.

现有的处理操作系统致命错误的方法有以下几种:The existing methods for handling fatal operating system errors are as follows:

1、操作系统(如Windows操作系统)崩溃后,由操作系统的dump进程进行转储(dump),现有的3种转储的模式分别为小内存转储(64K);核心内存转储;完全内存转储,然后才通过分析工具程序对转储文件进行分析。1. After the operating system (such as Windows operating system) crashes, the dump process of the operating system will dump (dump). The existing three dump modes are small memory dump (64K); core memory dump; Complete memory dump before analyzing the dump file with the analysis tool program.

但这3种处理操作系统致命错误的方法都存在如下缺陷:But these three methods of handling fatal operating system errors all have the following defects:

这3种处理操作系统致命错误的转储模式的转变需要在操作系统下进行设置,如果要进行全内存转储需要占用大量的核心存储空间,但如果太少(小内存转储)又会丢失很多的信息,当然,用户可以根据不同的应用和崩溃可能会适用不同的的转储模式,但在计算机操作系统崩溃时只能按照设定好的转储模式进行转储,而不能再变更为别的转储模式。另一方面,这种方法仍旧需要依赖于操作系统下的转储进行,如果在出现严重的操作系统故障情况,如转储进行也崩溃的情况下,或者操作系统本地存储的内存出现故障的情况下,将无法进行现场信息的保存。The transition of these three dump modes for dealing with fatal errors in the operating system needs to be set under the operating system. If you want to perform a full memory dump, it will take up a lot of core storage space, but if it is too small (small memory dump), it will be lost. A lot of information, of course, users can apply different dump modes according to different applications and crashes, but when the computer operating system crashes, it can only be dumped according to the set dump mode, and cannot be changed to Other dump modes. On the other hand, this method still needs to rely on the dump under the operating system. If there is a serious operating system failure, such as the crash of the dump, or the failure of the local memory of the operating system In this case, the site information cannot be saved.

2、计算机操作系统在发生致命错误时,由系统管理员或者操作系统开发人员在现场进行重启、获取故障现场信息,进行计算机故障的诊断和维护等操作工作。2. When a fatal error occurs in the computer operating system, the system administrator or operating system developer will restart the computer on site, obtain fault site information, and perform operations such as computer fault diagnosis and maintenance.

但这种处理方法的缺点也是显而易见的,其主要缺点是需要系统管理员或者操作系统开发人员到现场进行操作,这样需要占用系统管理员或者操作系统开发人员的大量时间和精力来进行操作系统的诊断和维护。而由于没有计算机崩溃时的现场,因此系统管理员或者操作系统开发人员就无法准确定位故障,而只能凭借经验和大量的分析工具程序长时间的运行来发现问题,获取现场故障信息,其效率十分低下,而能够真正查找出计算机操作系统故障的现场信息的概率也较低。因此,这一方法在现实应用中不可能得到普遍的应用。However, the disadvantages of this method are also obvious. The main disadvantage is that system administrators or operating system developers need to go to the site to operate, which requires a lot of time and energy for system administrators or operating system developers to develop the operating system. diagnostics and maintenance. And because there is no scene when the computer crashes, the system administrator or operating system developer cannot accurately locate the fault, but can only rely on experience and a large number of analysis tool programs to find problems and obtain on-site fault information by running for a long time. It is very low, and the probability of being able to find out the on-site information of the computer operating system failure is also low. Therefore, this method cannot be widely used in practical applications.

发明内容 Contents of the invention

本发明的目的在于克服上述缺陷而提供的一种计算机故障现场信息获取的系统和方法,其能够保证操作系统在出现故障的现场对系统进行分析和诊断,获取故障现场的信息,包括内存信息等。The purpose of the present invention is to overcome the above defects and provide a system and method for obtaining computer fault site information, which can ensure that the operating system analyzes and diagnoses the system at the fault site, and obtains the information of the fault site, including memory information, etc. .

为实现本发明目的而提供的一种计算机操作系统故障现场信息获取的系统,基于EFI BIOS和运行在EFI BIOS上的操作系统进行运作;The system that a kind of computer operating system failure site information acquisition provided for realizing the purpose of the present invention operates based on EFI BIOS and the operating system running on the EFI BIOS;

包括操作系统监视模块,所述EFI BIOS包括内存分配模块和运行时间服务模块;Including an operating system monitoring module, the EFI BIOS includes a memory allocation module and a runtime service module;

所述内存分配模块,用于在计算机系统上电以后,EFI BIOS在预引导阶段,EFI BIOS对内存初始化时,将物理内存分为操作系统内存和故障分析系统内存;The memory allocation module is used to divide the physical memory into operating system memory and fault analysis system memory when EFI BIOS initializes the memory in the pre-boot stage after the computer system is powered on;

所述操作系统监视模块,运行并驻留在所述操作系统中,用于在操作系统正常运行时收集操作系统现场信息并保存;同时在操作系统崩溃时,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块。The operating system monitoring module runs and resides in the operating system, and is used to collect and store the on-site information of the operating system when the operating system is running normally; at the same time, when the operating system crashes, the event of the operating system crash is notified to the EFI BIOS The runtime service module.

所述运行时间服务模块,运行于故障分析系统内存空间,用于对获取操作系统故障信息的故障分析系统文件进行初始化,建立获取操作系统故障现场信息的故障分析系统环境,提供分析系统支持环境,获取操作系统故障现场信息,选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。The runtime service module runs in the memory space of the fault analysis system, and is used to initialize the fault analysis system files for obtaining operating system fault information, establish a fault analysis system environment for obtaining operating system fault site information, and provide an analysis system support environment, Obtain the operating system fault site information, select the operating system fault site information and save location, and save the information content to the location.

本发明的系统还可以包括计数器和硬件狗模块;The system of the present invention may also include a counter and a hardware dog module;

所述计数器,用于定时计算机运行时间;操作系统监视模块定时改写计数器,防止计数器溢出;当计数器溢出时,则产生中断,触发EFI BIOS中断管理程序,启动硬件狗模块;Described counter is used for timing computer running time; Operating system monitoring module regularly rewrites counter, prevents counter from overflowing; When counter overflows, then produces interruption, triggers EFI BIOS interrupt management program, starts hardware dog module;

硬件狗模块,用于将系统程序的指针调整到故障分析系统内存空间中的运行时间服务模块,从而使计算机系统的控制权转到EFI BIOS的运行时间服务模块中。The hardware dog module is used to adjust the pointer of the system program to the runtime service module in the fault analysis system memory space, so that the control right of the computer system is transferred to the runtime service module of the EFI BIOS.

所述操作系统为Windows操作系统。The operating system is Windows operating system.

本发明还提供一种计算机操作系统故障现场信息获取的方法,其特征在于,包括下列步骤:The present invention also provides a method for acquiring computer operating system failure site information, which is characterized in that it includes the following steps:

步骤A)计算机上电后,EFI BIOS初始化内存时,内存分配模块将物理内存分为操作系统内存和故障分析系统内存;Step A) After the computer is powered on, when the EFI BIOS initializes the memory, the memory allocation module divides the physical memory into operating system memory and fault analysis system memory;

步骤B)操作系统运行时,启动并驻留操作系统监视模块,操作系统监视模块收集操作系统现场信息;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块;Step B) when the operating system is running, start and reside in the operating system monitoring module, the operating system monitoring module collects the site information of the operating system; when the operating system crashes, the event of the operating system crashing is notified to the runtime service module of the EFI BIOS;

步骤C)运行时间服务模块运行于故障分析系统内存空间,建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。Step C) The runtime service module runs in the memory space of the fault analysis system, establishes a fault analysis system environment for obtaining fault site information of the operating system, and obtains fault site information of the operating system.

所述步骤B)还包括下列步骤:Described step B) also comprises the following steps:

步骤B1)操作系统运行时,启动并驻留操作系统监视模块,操作系统监视模块定时写计数器;Step B1) when the operating system is running, start and reside in the operating system monitoring module, and the operating system monitoring module regularly writes the counter;

步骤B2)当操作系统崩溃,进入EFI BIOS系统管理模式时,硬件狗模块将系统程序指针指向运行时间服务模块,启动运行时间服务模块。Step B2) When the operating system crashes and enters the EFI BIOS system management mode, the hardware dog module points the system program pointer to the runtime service module, and starts the runtime service module.

所述步骤C)包括下列步骤:Described step C) comprises the following steps:

步骤C1)运行时间服务模块上载基于EFI的设备驱动;Step C1) The runtime service module uploads the device driver based on EFI;

步骤C2)分析工具选择操作系统故障现场的的内存内容,存储位置,记录并保存现场的信息。Step C2) The analysis tool selects the memory content and storage location of the operating system failure site, records and saves the site information.

所述步骤C1)还包括下列步骤:Said step C1) also includes the following steps:

运行时间服务模块加载EFI web服务,建立基于网络连接,在加载网卡驱动以后,通过网络向控制端发出系统警告,通知控制端系统本操作系统目前状态。The runtime service module loads the EFI web service, establishes a network-based connection, and after loading the network card driver, sends a system warning to the control terminal through the network to inform the control terminal of the current state of the operating system.

所述设备驱动包括网卡驱动,IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动。The device drivers include network card drivers, IDE/SCSI device drivers, USB device drivers, and PCI device drivers.

所述现场信息包括CPU利用率信息,内存使用率信息,寄存器内容信息和进程信息中的一类或者多类信息组合。The context information includes one or more types of information among CPU utilization information, memory utilization information, register content information and process information.

本发明的有益效果是:本发明利用EFI BIOS完成对操作系统死机的分析,在计算机操作系统出现故障时,利用独立于操作系统的内存空间和分析操作环境对计算机操作系统崩溃的原因进行分析,获取故障现场信息。其判断保存计算机操作系统当前状态,并在操作系统出现故障时进入独立的内存空间而不破坏操作系统的内存区域和内存状态,并在分析环境下对操作系统的内存区域和内存状态进行分析并结合对硬件系统的底层诊断,来获取机器故障的现场信息,获取计算机操作系统发生故障的原因。The beneficial effect of the present invention is: the present invention utilizes EFI BIOS to finish the analysis to operating system crash, when computer operating system breaks down, utilizes the memory space that is independent of operating system and analysis operation environment to analyze the reason that computer operating system crashes, Obtain fault site information. It judges and saves the current state of the computer operating system, and enters an independent memory space when the operating system fails without destroying the memory area and memory state of the operating system, and analyzes the memory area and memory state of the operating system in the analysis environment. Combined with the bottom-level diagnosis of the hardware system, the on-site information of the machine failure is obtained, and the cause of the failure of the computer operating system is obtained.

附图说明 Description of drawings

图1是本发明计算机操作系统故障现场信息获取的系统结构示意图;Fig. 1 is a schematic diagram of the system structure of computer operating system failure site information acquisition of the present invention;

图2是图1中监控模块工作过程流程图;Fig. 2 is a flowchart of the working process of the monitoring module in Fig. 1;

图3是本发明计数器中断模块流程图;Fig. 3 is a flowchart of the counter interrupt module of the present invention;

图4是本发明EFI BIOS运行时间服务模块流程图。Fig. 4 is the flow chart of EFI BIOS running time service module of the present invention.

具体实施方式 Detailed ways

下面结合附图1~4进一步详细说明本发明的一种计算机操作系统故障现场信息获取的系统和方法。A system and method for acquiring fault site information of a computer operating system according to the present invention will be further described below in conjunction with accompanying drawings 1 to 4 .

本发明克服现有的基本输入输出系统(Basic Input/Output System,BIOS)的弱点,利用EFI BIOS在这运行时间(runtime)功能和预引导(pre-boot)功能两方面的改进和提高,解决了在计算机操作系统崩溃时获取计算机故障现场信息的问题。The present invention overcomes the weakness of the existing basic input/output system (Basic Input/Output System, BIOS), utilizes the improvement and enhancement of EFI BIOS in two aspects of the runtime (runtime) function and the pre-boot (pre-boot) function, and solves the problem of The problem of obtaining computer failure scene information when the computer operating system crashes is solved.

本发明将涉及可扩展固件接口(EFI)技术,下面我们先对其进行介绍:The present invention will relate to Extensible Firmware Interface (EFI) technology, we first introduce it below:

可扩展固件接口(Extensible Firmware Interface,EFI)是1999年出现的用以取代沿用多年的基本输入输出系统(BIOS)的新一代接口程序,关于可扩展固件接口的介绍,详见UEFI论坛关于EFI技术的介绍http://www.UEFI.org。EFI BIOS介于硬件设备以及操作系统(比如Windows或者Linux)之间。与传统的BIOS不同,EFI BIOS使用全球最广泛的高级语言C语言进行编写,其提供了既具有传统BIOS的功能又有优于传统BIOS的扩展功能,在设计机制和架构上也有别于传统BIOS的实现,是下一代BIOS接口规范,这就意味着有更多的工程师可以参与EFI BIOS的开发工作,添加许多更有价值的功能。Extensible Firmware Interface (Extensible Firmware Interface, EFI) is a new generation of interface program that appeared in 1999 to replace the basic input and output system (BIOS) that has been used for many years. For an introduction to the Extensible Firmware Interface, see UEFI Forum for details on EFI technology Introduction to http://www.UEFI.org. EFI BIOS is between hardware devices and operating systems (such as Windows or Linux). Different from traditional BIOS, EFI BIOS is written in C language, the most widely used high-level language in the world. It provides both traditional BIOS functions and extended functions superior to traditional BIOS, and is also different from traditional BIOS in terms of design mechanism and architecture. The realization of the BIOS interface specification is the next generation, which means that more engineers can participate in the development of EFI BIOS and add many more valuable functions.

EFI BIOS具备的基本功能为:The basic functions of EFI BIOS are:

硬件平台初始化;Hardware platform initialization;

支持启动操作系统;Support booting the operating system;

脱离操作系统的平台管理工具。Platform management tools that are independent of the operating system.

EFI BIOS的工作模式可以简单归纳为:启动系统,标准固件平台初始化,接着从加载EFI驱动程序库以及及执行相关程序,在EFI BIOS系统启动菜单中选取所要进入的系统并向EFI BIOS提交启动引导代码,正常则进入系统,否则将中止启动服务并返回EFI BIOS系统启动菜单。The working mode of EFI BIOS can be simply summarized as follows: start the system, initialize the standard firmware platform, then load the EFI driver library and execute related programs, select the system to enter in the EFI BIOS system boot menu and submit the boot guide to EFI BIOS If the code is normal, it will enter the system, otherwise it will stop the startup service and return to the EFI BIOS system startup menu.

在本发明的对计算机操作系统的故障现场信息获取方法中,特别是以Windows操作系统而进行的描述,但本发明同样适用除了Windows操作系统之外的操作系统的情况。In the fault scene information acquisition method for the computer operating system of the present invention, the Windows operating system is particularly described, but the present invention is also applicable to operating systems other than the Windows operating system.

如图1所示,本发明计算机操作系统故障现场信息获取的系统,包括有:As shown in Figure 1, the system that the computer operating system failure site information acquisition of the present invention includes:

(一)内存分配模块1,用于在支持EFI BIOS的硬件架构中,在计算机系统上电(Power on)以后,EFI BIOS在预引导(pre-boot)阶段,EFI BIOS对内存初始化时,将一部分内存进行保留,此时,EFI BIOS向Windows操作系统提供的内存大小就是系统物理内存大小减去保留内存的大小。(1) The memory allocation module 1 is used to support EFI BIOS in the hardware architecture. After the computer system is powered on (Power on), the EFI BIOS is in the pre-boot (pre-boot) stage. When the EFI BIOS initializes the memory, it will Part of the memory is reserved. At this time, the memory size provided by the EFI BIOS to the Windows operating system is the size of the system physical memory minus the size of the reserved memory.

同时,内存分配模块1将用于获取操作系统故障信息的故障分析系统文件放入保留内存区域,以供在发生操作系统崩溃时进入。At the same time, the memory allocation module 1 puts the fault analysis system file used for obtaining the fault information of the operating system into the reserved memory area for access when the operating system crashes.

在系统上电(Power on)以后,EFI BIOS在预引导(pre-boot)阶段将内存进行初始化,启动内存分配模块1,内存分配模块1将内存分为两部分:After the system is powered on (Power on), the EFI BIOS initializes the memory in the pre-boot stage, starts the memory allocation module 1, and the memory allocation module 1 divides the memory into two parts:

一部分为操作系统内存,用于Windows操作系统在进入操作系统安装(OS load)阶段以后,操作系统对这一部分内存进行控制,分配给操作系统及在操作系统上运行的各个进程程序使用。Part of it is the operating system memory, which is used for the Windows operating system after entering the operating system installation (OS load) stage, the operating system controls this part of the memory, and allocates it to the operating system and various process programs running on the operating system.

另一部分为保留为故障分析系统内存,在EFI BIOS启动时其初始化为故障分析系统环境预留内存空间,此保留内存空间只分配给故障分析系统文件,Windows操作系统在启动后不能发现和使用此部分空间。在此预留空间中运行EFI BIOS的运行时间(runtime)服务模块3,此模块的主要功能是建立获取故障信息所必需的操作环境,如加载EFI网卡(NIC)的驱动,加载分析工具(diagnostic tool)并且保存Windows操作系统故障信息的内容。The other part is reserved for the fault analysis system memory. When the EFI BIOS is started, it is initialized to reserve memory space for the fault analysis system environment. This reserved memory space is only allocated to the fault analysis system files, and the Windows operating system cannot find and use this memory after startup. part space. Run the runtime (runtime) service module 3 of EFI BIOS in this reserved space. The main function of this module is to set up the necessary operating environment to obtain fault information, such as loading the driver of EFI network card (NIC), loading analysis tools (diagnostic tool) and save the content of the Windows operating system fault information.

(二)操作系统监视模块2,用于在操作系统运行时收集操作系统现场信息;同时在发现操作系统不能响应应用程序的操作请求,出现操作系统崩溃时,将操作系统崩溃的事件通知EFI BIOS的运行时间(runtime)服务模块3。(2) The operating system monitoring module 2 is used to collect operating system site information when the operating system is running; at the same time, it is found that the operating system cannot respond to the operation request of the application program, and when the operating system crashes, the event of the operating system crash is notified to EFI BIOS The runtime (runtime) service module 3.

(三)计数器(Timer)4,用于定时计算机运行时间;操作系统监视模块2定时改写计数器4,防止计数器4溢出;当计数器4溢出时,则产生中断,触发EFI BIOS中断管理程序,启动硬件狗模块5。(3) counter (Timer) 4, is used for timing computer running time; Operating system monitoring module 2 regularly rewrites counter 4, prevents counter 4 from overflowing; When counter 4 overflows, then produces interruption, triggers EFI BIOS interrupt management program, starts hardware dog module 5.

当操作系统正常运行时,操作系统监视模块2定时改写计算机硬件中的南桥ICH芯片(I/OController HUB)中计数器4的计数寄存器,从而保证计数器4不会溢出而产生溢出中断;当操作系统崩溃时,则驻留并运行在操作系统中的监视模块2也无法正常运行,因此不能定时写南桥计数器4的计数寄存器,从而导致计数器4没有定时重置,计数器4溢出,产生中断,触发了EFI BIOS中断管理程序,启动硬件狗模块5。When the operating system was running normally, the operating system monitoring module 2 regularly rewritten the counting register of the counter 4 in the south bridge ICH chip (I/OController HUB) in the computer hardware, thereby ensuring that the counter 4 would not overflow and produce an overflow interrupt; when the operating system When it crashes, the monitoring module 2 residing and running in the operating system cannot operate normally, so the counting register of the south bridge counter 4 cannot be written regularly, thus causing the counter 4 to not reset regularly, the counter 4 overflows, generates an interrupt, and triggers The EFI BIOS interrupt management program is executed, and the hardware dog module 5 is started.

(四)硬件狗模块5,用于将系统程序的指针调整到故障分析系统内存空间中的运行时间(runtime)服务模块3,从而使计算机系统的控制权转到EFI BIOS的运行时间(runtime)服务模块3中。(4) hardware dog module 5, be used for the pointer of system program is adjusted to the running time (runtime) service module 3 in the fault analysis system memory space, thereby the control right of computer system is transferred to the running time (runtime) of EFI BIOS In service module 3.

(五)EFI BIOS的运行时间(runtime)服务模块3,用于对获取操作系统故障信息的分析环境的部件进行初始化,在进入运行时间服务模块3时需要对硬件系统进行的初始化,使其建立起生故障分析系统环境,提供分析系统支持环境,如加载EFI NIC的驱动程序,加载分析工具(diagnostic tool),选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。(5) The running time (runtime) service module 3 of EFI BIOS is used to initialize the parts of the analysis environment for obtaining operating system failure information. When entering the running time service module 3, the hardware system needs to be initialized to make it set up Generate failure analysis system environment, provide analysis system support environment, such as loading EFI NIC driver, loading analysis tool (diagnostic tool), select operating system failure scene information and save location, and save information content to this location.

在Windows操作系统崩溃时,分析工具6通过读取操作系统崩溃时的内存、寄存器内容等信息,结合对系统硬件的底层诊断来获取系统故障的信息,诊断操作系统发生故障的具体原因。然后选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。When the Windows operating system crashes, the analysis tool 6 obtains the information of the system failure by reading the information such as memory and register contents when the operating system crashes, and combines the underlying diagnosis of the system hardware to diagnose the specific cause of the failure of the operating system. Then select the operating system failure site information and save location, and save the information content to the location.

Windows操作系统启动后,在正常运行的情况下,操作系统监视模块2运行并驻留在操作系统时,收集Windows操作系统的现场信息,包括CPU利用率,内存使用率,寄存器内容,进程信息,并写入到操作系统管理的内存固定内存空间中去。After the Windows operating system is started, under the condition of normal operation, when the operating system monitoring module 2 runs and resides in the operating system, it collects the on-site information of the Windows operating system, including CPU utilization, memory utilization, register content, process information, And write to the fixed memory space of the memory managed by the operating system.

同时,操作系统监视模块2定时改写南桥计数器4的计数寄存器,保证计数器4不会溢出而产生溢出中断。At the same time, the operating system monitoring module 2 regularly rewrites the counting register of the south bridge counter 4 to ensure that the counter 4 will not overflow and generate an overflow interrupt.

如果操作系统崩溃,则驻留并运行在操作系统中的监视模块2也无法正常运行,因此不能定时改写南桥计数器4的计数寄存器,从而导致计数器4没有定时重置,计数器4溢出,产生中断,触发了EFI BIOS中断管理程序,进入到EFI BIOS系统管理模式(System Management,SM),启动硬件狗模块5,硬件狗模块5将系统程序指针指向故障分析系统内存空间中的运行时间(runtime)服务模块3,从而使系统控制权转到EFI BIOS的运行时间(runtime)服务模块3中。If the operating system crashes, the monitoring module 2 residing and running in the operating system cannot operate normally, so the counting register of the south bridge counter 4 cannot be rewritten regularly, resulting in counter 4 not being reset regularly, counter 4 overflows, and an interrupt is generated , triggered the EFI BIOS interrupt management program, entered the EFI BIOS system management mode (System Management, SM), started the hardware dog module 5, and the hardware dog module 5 pointed the system program pointer to the runtime (runtime) in the fault analysis system memory space Service module 3, so that system control is transferred to the runtime (runtime) service module 3 of EFI BIOS.

运行时间(runtime)服务模块3首先上载基于EFI的设备驱动,包括网卡(NIC),IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动,通过上载以上驱动,使得EFI BIOS获得相应设备的控制权,可以进行I/O设备的操作;然后,加载EFI web服务(service),建立基于网络连接,较佳地,是基于http的网络连接,在加载网卡(NIC)驱动以后,通过网络向某一控制端发出系统警告,通知远端管理员系统本操作系统目前状态;最后,分析工具(diagnostictool)6根据远程控制台的指令,如选择转储(dump)的内存内容,存储位置,如USB存储,网络存储,或者本地硬盘存储,从而记录下现场的信息。Runtime (runtime) service module 3 first uploads the device driver based on EFI, including network card (NIC), IDE/SCSI device driver, USB device driver, PCI device driver, by uploading the above drivers, EFI BIOS obtains the control right of the corresponding device , can carry out the operation of I/O device; Then, load EFI web service (service), establish based on network connection, preferably, be based on http network connection, after loading network card (NIC) driver, send to a certain place through network The control terminal sends a system warning to notify the remote administrator of the current state of the operating system; finally, the analysis tool (diagnostictool) 6 selects the memory content of dump (dump), storage location, such as USB storage according to the instruction of the remote console. , network storage, or local hard disk storage, so as to record the information on site.

下面结合上述系统进一步详细说明本发明的一种计算机故障现场信息获取的方法:Below in conjunction with above-mentioned system, further describe a kind of method of computer failure field information acquisition of the present invention:

步骤A:计算机上电后,在预引导(pro-boot)阶段中,EFI BIOS初始化内存时,内存分配模块1将物理内存分为操作系统内存和故障分析系统内存。Step A: After the computer is powered on, in the pre-boot stage, when the EFI BIOS initializes the memory, the memory allocation module 1 divides the physical memory into the operating system memory and the fault analysis system memory.

在计算机上电(Power on)以后,EFI BIOS在预引导(pre-boot)阶段将内存进行初始化,启动内存分配模块1,内存分配模块1将内存分为两部分:After the computer is powered on (Power on), the EFI BIOS initializes the memory in the pre-boot stage, starts the memory allocation module 1, and the memory allocation module 1 divides the memory into two parts:

一部分为操作系统内存,用于Windows操作系统在进入操作系统安装(OS load)阶段以后,操作系统对这一部分内存进行控制,分配给操作系统及在操作系统上运行的各个进程程序使用。Part of it is the operating system memory, which is used for the Windows operating system after entering the operating system installation (OS load) stage, the operating system controls this part of the memory, and allocates it to the operating system and various process programs running on the operating system.

另一部分为保留为故障分析系统内存,其初始化为在EFI BIOS启动时故障分析系统环境预留内存空间,此保留内存空间只分配给故障分析系统环境,Windows操作系统在启动后不能发现和使用此部分空间。并在此预留空间中运行EFI BIOS的运行时间(runtime)服务模块3,运行时间服务模块3建立获取故障信息所必需的操作环境,如加载EFI NIC的驱动,加载分析工具(diagnostic tool)6并且保存Windows操作系统故障信息的内容。The other part is reserved for the fault analysis system memory, which is initialized to reserve memory space for the fault analysis system environment when the EFI BIOS is started. This reserved memory space is only allocated to the fault analysis system environment, and the Windows operating system cannot find and use this memory space after startup. part space. And run the runtime (runtime) service module 3 of EFI BIOS in this reserved space, the runtime service module 3 establishes the necessary operating environment to obtain fault information, such as loading the driver of EFI NIC, loading analysis tool (diagnostic tool) 6 And save the content of the fault information of the Windows operating system.

步骤B:操作系统运行时,启动并驻留操作系统监视模块2,收集操作系统现场信息,并定时改写计数器4;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块3。Step B: when the operating system is running, start and reside in the operating system monitoring module 2, collect the on-site information of the operating system, and regularly rewrite the counter 4; when the operating system crashes, notify the runtime service module 3 of the EFI BIOS of the event of the operating system crash .

步骤B1:操作系统运行时,启动并驻留操作系统监视模块2,收集操作系统现场信息并保存,并定时改写计数器4。Step B1: When the operating system is running, start and reside the operating system monitoring module 2, collect and save the on-site information of the operating system, and rewrite the counter 4 at regular intervals.

如图2所示,Windows操作系统启动后,在正常运行的情况下,操作系统监视模块5运行并驻留在操作系统时,收集Windows操作系统的现场信息,包括CPU利用率信息,内存使用率信息,寄存器内容信息,进程信息,并写入到操作系统管理的内存固定空间中去。As shown in Figure 2, after the Windows operating system is started, under normal operation conditions, the operating system monitoring module 5 runs and resides in the operating system to collect the on-site information of the Windows operating system, including CPU utilization information, memory usage Information, register content information, process information, and write to the fixed memory space managed by the operating system.

同时,操作系统监视模块2定时改写南桥计数器4的计数寄存器,保证计数器4不会溢出而产生溢出中断。At the same time, the operating system monitoring module 2 regularly rewrites the counting register of the south bridge counter 4 to ensure that the counter 4 will not overflow and generate an overflow interrupt.

步骤B2:当操作系统崩溃,进入EFI BIOS系统管理模式时,硬件狗模块5将系统程序指针指向运行时间(runtime)服务模块3,启动运行时间(runtime)服务模块3。Step B2: When the operating system crashes and enters the EFI BIOS system management mode, the hardware dog module 5 points the system program pointer to the runtime (runtime) service module 3, and starts the runtime (runtime) service module 3.

如图3所示,如果操作系统崩溃,如系统死机蓝屏等,则驻留并运行在操作系统中的监视模块2也无法正常运行,因此不能定时写南桥计数器4的计数寄存器,从而导致计数器4没有定时重置,计数器4溢出,产生中断,触发了EFI BIOS中断管理程序,进入到EFI BIOS系统管理模式(SystemManagement,SM),并将系统程序指针指向故障分析系统内存空间中的运行时间(runtime)服务模块3,从而使系统控制权转到EFI BIOS的运行时间(runtime)服务模块3中。As shown in Figure 3, if the operating system crashes, such as the blue screen of system crash, etc., then the monitoring module 2 that resides and operates in the operating system cannot operate normally, so the counting register of the south bridge counter 4 cannot be written regularly, thereby causing the counter 4 does not reset regularly, and the counter 4 overflows, and interrupts are generated, which triggers the EFI BIOS interrupt management program, enters the EFI BIOS system management mode (SystemManagement, SM), and points the system program pointer to the running time in the fault analysis system memory space ( runtime) service module 3, so that the system control right is transferred to the runtime (runtime) service module 3 of EFI BIOS.

步骤C:运行时间服务模块3建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。Step C: The runtime service module 3 establishes a fault analysis system environment for obtaining fault site information of the operating system, and obtains fault site information of the operating system.

当运行时间服务模块3启动后,其上载设备驱动,建立网络连接,启动分析工具,选择并存储故障现场信息。When the runtime service module 3 is started, it uploads the device driver, establishes a network connection, starts the analysis tool, and selects and stores fault site information.

操作系统监视模块2在操作系统崩溃时,不能定时写计数器4的计数寄存器,定时器4溢出,触发EFI BIOS的中断管理程序,进入到EFI BIOS的系统管理模式(SM),启动硬件狗模块5,硬件狗模块5将系统程序的指针指向运行时间(runtime)服务模块3。The operating system monitoring module 2 can not regularly write the counting register of the counter 4 when the operating system crashes, the timer 4 overflows, triggers the interrupt management program of the EFI BIOS, enters the system management mode (SM) of the EFI BIOS, and starts the hardware dog module 5 , the hardware dog module 5 points the pointer of the system program to the runtime (runtime) service module 3 .

运行时间(runtime)服务模块3首先上载基于EFI的设备驱动,包括网卡(NIC),IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动,通过上载以上驱动,使得EFI BIOS获得相应设备的控制权,可以进行I/O设备的操作;Runtime (runtime) service module 3 first uploads the device driver based on EFI, including network card (NIC), IDE/SCSI device driver, USB device driver, PCI device driver, by uploading the above drivers, EFI BIOS obtains the control right of the corresponding device , you can operate the I/O device;

然后,运行时间服务模块3加载EFI web服务(service),建立基于网络连接,较佳地,是基于http的网络连接,在加载网卡(NIC)驱动以后,通过网络向某一控制端发出系统警告,通知远端管理员系统本操作系统目前状态;Then, the runtime service module 3 loads the EFI web service (service), establishes a network connection based on, preferably, a network connection based on http, and after loading the network card (NIC) driver, sends a system warning to a certain control terminal through the network , notify the remote administrator of the current status of the operating system;

最后,分析工具(diagnostic tool)6根据远程控制台的指令,如选择转储(dump)的内存内容,存储位置,如USB存储,网络存储,或者本地硬盘存储,记录并保存现场的信息。Finally, an analysis tool (diagnostic tool) 6 records and saves on-site information according to the instructions of the remote console, such as selecting the memory content of dump (dump), storage location, such as USB storage, network storage, or local hard disk storage.

本发明利用EFI BIOS在运行时间(runtime)和预引导(pre-boot)运行环境,完成对操作系统崩溃时不依赖于操作系统而获取现场故障信息,其运行并驻留在操作系统中的监视模块2获取操作系统现场信息,判断操作系统当前状态,在发生操作系统崩溃时进入分析环境,不破坏windows的内存区域,并在分析环境下对操作系统的内存区域进行分析并结合对硬件系统的底层诊断,来获取机器故障的信息,并可以结合现有的操作系统故障分析工具查明操作系统出现故障的原因,保证操作系统运行时的稳定性。The present invention utilizes EFI BIOS at runtime (runtime) and pre-boot (pre-boot) operating environment to complete the monitoring of on-site fault information not dependent on the operating system when the operating system crashes, and its operation and residing in the operating system Module 2 acquires the on-site information of the operating system, judges the current state of the operating system, enters the analysis environment when the operating system crashes, does not destroy the memory area of windows, and analyzes the memory area of the operating system in the analysis environment and combines the analysis of the hardware system The underlying diagnosis is used to obtain information about machine failures, and can be combined with existing operating system failure analysis tools to find out the cause of operating system failures and ensure the stability of the operating system when it is running.

本实施例是为了更好地理解本发明进行的详细的描述,并不是对本发明所保护的范围的限定,因此,本领域普通技术人员不脱离本发明的主旨未经创造性劳动而对本明所做的改变在本发明的保护范围内。This embodiment is a detailed description for a better understanding of the present invention, and is not a limitation of the protection scope of the present invention. Changes are within the protection scope of the present invention.

Claims (10)

1.一种计算机操作系统故障现场信息获取的系统,基于EFI BIOS和运行在EFI BIOS上的操作系统进行运作,其特征在于:1. A system for computer operating system failure scene information acquisition, based on EFI BIOS and operating system running on EFI BIOS, is characterized in that: 包括操作系统监视模块(2),所述EFI BIOS包括内存分配模块(1)和运行时间服务模块(3);Including an operating system monitoring module (2), the EFI BIOS includes a memory allocation module (1) and a runtime service module (3); 所述内存分配模块(1),用于在计算机系统上电以后,EFI BIOS在预引导阶段,EFI BIOS对内存初始化时,将物理内存分为操作系统内存和故障分析系统内存;The memory allocation module (1) is used to divide the physical memory into operating system memory and fault analysis system memory when the EFI BIOS is in the pre-boot stage after the computer system is powered on, and when the memory is initialized by the EFI BIOS; 所述操作系统监视模块(2),运行并驻留在所述操作系统中,用于在操作系统正常运行时收集操作系统现场信息并保存;同时在操作系统崩溃时,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块(3);The operating system monitoring module (2) runs and resides in the operating system, and is used to collect and store the on-site information of the operating system when the operating system is running normally; Notify the runtime service module (3) of EFI BIOS; 所述运行时间服务模块(3),运行于故障分析系统内存空间,用于对获取操作系统故障信息的故障分析系统文件进行初始化,建立获取操作系统故障现场信息的故障分析系统环境,提供分析系统支持环境,获取操作系统故障现场信息,选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。The runtime service module (3) runs in the memory space of the fault analysis system, and is used to initialize the fault analysis system files for obtaining fault information of the operating system, establish a fault analysis system environment for obtaining fault site information of the operating system, and provide an analysis system Support the environment, obtain the on-site information of the operating system failure, select the on-site information and storage location of the operating system failure, and save the information content to the location. 2.根据权利要求1所述的计算机操作系统故障现场信息获取的系统,其特征在于,还包括计数器(4)和硬件狗模块(5);2. the system that computer operating system failure site information acquisition according to claim 1 is characterized in that, also comprises counter (4) and hardware dog module (5); 所述计数器(4),用于定时计算机运行时间;操作系统监视模块(2)定时改写计数器(4),防止计数器(4)溢出;当计数器(4)溢出时,则产生中断,触发EFI BIOS中断管理程序,启动硬件狗模块(5);The counter (4) is used for timing computer running time; the operating system monitoring module (2) rewrites the counter (4) regularly to prevent the counter (4) from overflowing; when the counter (4) overflows, an interrupt is generated to trigger the EFI BIOS Interrupt management program, start hardware dog module (5); 硬件狗模块(5),用于将系统程序的指针调整到故障分析系统内存空间中的运行时间服务模块(3),从而使计算机系统的控制权转到EFI BIOS的运行时间服务模块(3)中。The hardware dog module (5) is used to adjust the pointer of the system program to the runtime service module (3) in the fault analysis system memory space, so that the control right of the computer system is transferred to the runtime service module (3) of the EFI BIOS middle. 3.根据权利要求1或2所述的计算机操作系统故障现场信息获取的系统,其特征在于,所述操作系统为Windows操作系统。3. The system for acquiring on-site information of a computer operating system fault according to claim 1 or 2, wherein the operating system is a Windows operating system. 4.根据权利要求1或2所述的计算机操作系统故障现场信息获取的系统,其特征在于,所述现场信息包括CPU利用率信息,内存使用率信息,寄存器内容信息和进程信息中的一类或者多类信息组合。4. The system for acquiring computer operating system failure site information according to claim 1 or 2, wherein said site information includes CPU utilization rate information, memory usage rate information, register content information and process information. Or a combination of multiple types of information. 5.一种计算机操作系统故障现场信息获取的方法,其特征在于,包括下列步骤:5. A method for computer operating system failure site information acquisition, characterized in that, comprising the following steps: 步骤A)该计算机上电后,EFI BIOS初始化内存时,内存分配模块(1)将物理内存分为操作系统内存和故障分析系统内存;Step A) After the computer is powered on, when the EFI BIOS initializes the memory, the memory allocation module (1) divides the physical memory into operating system memory and fault analysis system memory; 步骤B)操作系统运行时,启动并驻留操作系统监视模块(2),操作系统监视模块(2)收集操作系统现场信息;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块(3);Step B) when the operating system is running, start and reside in the operating system monitoring module (2), the operating system monitoring module (2) collects the site information of the operating system; when the operating system crashes, the event of the operating system crashing is notified to the running time of the EFI BIOS service module (3); 步骤C)运行时间服务模块(3)运行于故障分析系统内存空间,建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。Step C) The runtime service module (3) runs in the memory space of the fault analysis system, establishes a fault analysis system environment for obtaining fault site information of the operating system, and obtains the fault site information of the operating system. 6.根据权利要求5所述的计算机操作系统故障现场信息获取的方法,其特征在于,所述步骤B)还包括下列步骤:6. the method for computer operating system failure site information acquisition according to claim 5, is characterized in that, described step B) also comprises the following steps: 步骤B1)操作系统运行时,启动并驻留操作系统监视模块(2),操作系统监视模块(2)定时写计数器(4);Step B1) when the operating system is running, start and reside in the operating system monitoring module (2), and the operating system monitoring module (2) regularly writes the counter (4); 步骤B2)当操作系统崩溃,进入EFI BIOS系统管理模式时,硬件狗模块(5)将系统程序指针指向运行时间服务模块(3),启动运行时间服务模块(3)。Step B2) When the operating system crashes and enters the EFI BIOS system management mode, the hardware dog module (5) points the system program pointer to the runtime service module (3), and starts the runtime service module (3). 7.根据权利要求5或6所述的计算机操作系统故障现场信息获取的方法,其特征在于,所述现场信息包括CPU利用率信息,内存使用率信息,寄存器内容信息和进程信息中的一类或者多类信息组合。7. according to the method for claim 5 or 6 described computer operating system failure site information acquisition, it is characterized in that, described site information comprises CPU utilization rate information, memory usage rate information, a class in register content information and process information Or a combination of multiple types of information. 8.根据权利要求5或6所述的计算机操作系统故障现场信息获取的方法,其特征在于,所述步骤C)包括下列步骤:8. according to claim 5 or the method for the computer operating system failure site information acquisition described in claim 6, it is characterized in that, described step C) comprises the following steps: 步骤C1)运行时间服务模块(3)上载基于EFI的设备驱动;Step C1) runtime service module (3) uploads the device driver based on EFI; 步骤C2)分析工具(6)选择操作系统故障现场的内存内容、存储位置,记录并保存现场的信息。Step C2) The analysis tool (6) selects the memory content and storage location of the operating system failure site, records and saves the site information. 9.根据权利要求8所述的计算机操作系统故障现场信息获取的方法,其特征在于,所述步骤C1)还包括下列步骤:9. The method for computer operating system fault scene information acquisition according to claim 8, is characterized in that, described step C1) also comprises the following steps: 运行时间服务模块(3)加载EFI web服务,建立基于网络连接,在加载网卡驱动以后,通过网络向控制端发出系统警告,通知控制端系统本操作系统目前状态。The runtime service module (3) loads the EFI web service and establishes a network connection. After the network card driver is loaded, a system warning is sent to the control terminal through the network to notify the control terminal system of the current state of the operating system. 10.根据权利要求8所述的计算机操作系统故障现场信息获取的方法,其特征在于,所述设备驱动包括网卡驱动,IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动。10. The method for computer operating system failure site information acquisition according to claim 8, wherein said device driver comprises a network card driver, an IDE/SCSI device driver, a USB device driver, and a PCI device driver.
CNB2006100576026A 2006-02-22 2006-02-22 A system and method for acquiring computer operating system fault site information Active CN100472471C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100576026A CN100472471C (en) 2006-02-22 2006-02-22 A system and method for acquiring computer operating system fault site information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100576026A CN100472471C (en) 2006-02-22 2006-02-22 A system and method for acquiring computer operating system fault site information

Publications (2)

Publication Number Publication Date
CN101025709A CN101025709A (en) 2007-08-29
CN100472471C true CN100472471C (en) 2009-03-25

Family

ID=38744028

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100576026A Active CN100472471C (en) 2006-02-22 2006-02-22 A system and method for acquiring computer operating system fault site information

Country Status (1)

Country Link
CN (1) CN100472471C (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567550A (en) * 2011-12-31 2012-07-11 曙光信息产业股份有限公司 Method and device for collecting data of emergency event in operating system (OS)
CN102622322B (en) * 2012-02-24 2015-09-09 华为技术有限公司 A kind of method, black box and server utilizing black box to obtain crash info
CN102637144B (en) * 2012-03-31 2015-05-06 北京奇虎科技有限公司 System fault processing method and device
CN104699615B (en) * 2012-03-31 2017-09-22 北京奇虎科技有限公司 A kind for the treatment of method and apparatus of the system failure
CN104035871B (en) * 2014-06-27 2016-04-13 腾讯科技(深圳)有限公司 Based on fault handling method and the device of the application program in geographic position
CN105204977A (en) * 2014-06-30 2015-12-30 中兴通讯股份有限公司 System exception capturing method, main system, shadow system and intelligent equipment
CN105512000B (en) * 2014-09-24 2020-04-24 中兴通讯股份有限公司 Operating system abnormal information collection method and device and computer
US20170196029A1 (en) * 2016-01-05 2017-07-06 Gentex Corporation Communication system for vehicle
CN106997315B (en) * 2016-01-25 2021-01-26 阿里巴巴集团控股有限公司 Method and device for memory dump of virtual machine
CN107025146B (en) * 2016-01-30 2019-10-18 华为技术有限公司 A file generation method, device and system
CN106681771B (en) * 2016-12-30 2020-12-29 阿里巴巴(中国)有限公司 System reinstallation method and device
CN108319530A (en) * 2018-02-06 2018-07-24 合肥联宝信息技术有限公司 Diagnostic method, device, terminal and the medium of computer hardware
CN111158982B (en) * 2019-12-26 2022-06-28 联想(北京)有限公司 Electronic device, first operating system, data processing method, and storage medium
CN111341434B (en) * 2020-03-02 2024-05-28 北京医维星科技有限公司 Remote fault diagnosis and maintenance system for medical equipment and construction method thereof
CN113064747B (en) 2021-03-26 2022-10-28 山东英信计算机技术有限公司 Fault positioning method, system and device in server starting process
CN114064132B (en) * 2021-09-30 2023-07-21 中科创达软件股份有限公司 Method, device, equipment and system for recovering system downtime
CN116302646B (en) * 2023-02-24 2024-03-29 荣耀终端有限公司 Fault positioning method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN101025709A (en) 2007-08-29

Similar Documents

Publication Publication Date Title
CN100472471C (en) A system and method for acquiring computer operating system fault site information
JP6530774B2 (en) Hardware failure recovery system
CN104254840B (en) Memory dump and analysis in computer systems
US8260841B1 (en) Executing an out-of-band agent in an in-band process of a host system
CN108874624B (en) Server, method for monitoring Java process and storage medium
US8595552B2 (en) Reset method and monitoring apparatus
US8495430B2 (en) Generate diagnostic data for overdue thread in a data processing system
US20140208166A1 (en) Health monitoring of applications in a guest partition
US7809985B2 (en) Offline hardware diagnostic environment
CN102521105B (en) Output method of power on self test information, virtual machine manager and processor
US8909989B2 (en) Method for outputting power-on self test information, virtual machine manager, and processor
TWI808362B (en) Computer system and method capable of self-monitoring and restoring an operation of operating system
JP4677214B2 (en) Program, method and mechanism for collecting panic dump
JP2010086364A (en) Information processing device, operation state monitoring device and method
JP6237230B2 (en) Memory management program, memory management method, and memory management device
CN101446915B (en) Method and device for recording BIOS level logs
CN115951949A (en) Method, device and computing device for recovering configuration parameters of BIOS
US10474517B2 (en) Techniques of storing operational states of processes at particular memory locations of an embedded-system device
CN119003189A (en) System management memory allocation method, program running method, system and product
CN119046038A (en) Automatic processing method and device for downtime of server, data processing unit and medium
US8312433B2 (en) Operating system aided code coverage
JP2007133544A (en) Failure information analysis method and apparatus for implementing the same
JP5348120B2 (en) Program, method and mechanism for collecting panic dump
WO2008048581A1 (en) A processing device operation initialization system
CN100458708C (en) interrupt control system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant