CN100472471C - A system and method for acquiring computer operating system fault site information - Google Patents
A system and method for acquiring computer operating system fault site information Download PDFInfo
- Publication number
- CN100472471C CN100472471C CNB2006100576026A CN200610057602A CN100472471C CN 100472471 C CN100472471 C CN 100472471C CN B2006100576026 A CNB2006100576026 A CN B2006100576026A CN 200610057602 A CN200610057602 A CN 200610057602A CN 100472471 C CN100472471 C CN 100472471C
- Authority
- CN
- China
- Prior art keywords
- operating system
- memory
- module
- site information
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012544 monitoring process Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 description 7
- 238000003745 diagnosis Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
一种计算机操作系统故障现场信息获取的系统和方法,基于EFI BIOS和运行在EFI BIOS上的操作系统进行运作,包括操作系统监视模块(2),计数器(4)和硬件狗模块(5);EFI BIOS包括内存分配模块(1)和运行时间服务模块(3);计算机上电后,EFI BIOS初始化内存时,内存分配模块(1)将物理内存分为操作系统内存和故障分析系统内存;操作系统运行时,启动并驻留操作系统监视模块(2),操作系统监视模块(2)收集操作系统现场信息;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块(3);运行时间服务模块(3)建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。其能够保证操作系统在出现故障的现场对系统进行分析和诊断,获取故障现场的信息。
A system and method for obtaining on-site information of a computer operating system fault, operating based on EFI BIOS and an operating system running on the EFI BIOS, including an operating system monitoring module (2), a counter (4) and a hardware dog module (5); EFI BIOS includes a memory allocation module (1) and a runtime service module (3); after the computer is powered on, when the EFI BIOS initializes the memory, the memory allocation module (1) divides the physical memory into operating system memory and fault analysis system memory; operation When the system is running, it starts and resides the operating system monitoring module (2), and the operating system monitoring module (2) collects the on-site information of the operating system; when the operating system crashes, the event of the operating system crashing is notified to the runtime service module (3) of the EFI BIOS ); the running time service module (3) establishes a fault analysis system environment for obtaining fault site information of the operating system, and obtains the fault site information of the operating system. It can ensure that the operating system analyzes and diagnoses the system at the site of the fault, and obtains the information of the fault site.
Description
技术领域 technical field
本发明涉及计算机领域,特别是涉及一种计算机操作系统故障现场信息获取的系统和方法。The invention relates to the field of computers, in particular to a system and a method for acquiring fault site information of a computer operating system.
背景技术 Background technique
目前计算机操作系统(Operation System,OS),特别是多任务操作系统,如Windows操作系统都较为复杂,一个完备的操作系统在运行的过程中,由于多个应用程序或者新程序的运行,可能会出现新的故障。一般地,现有的操作系统都有一些操作系统诊断和维护方法,可以监控操作系统的工作状况,并在有可能出现问题的时候提前向用户报警。但在操作系统发生一些致命错误时(如内存错误、应用程序越界访问等),操作系统会崩溃,包括死循环(即死机)或产生无法识别的错误(蓝屏),这时通常的做法是将计算机机器重新启动。然而这时所有的计算机故障现场信息将全部丢失,无法进行进一步进行故障分析,也就无法找到问题的根本原因。而不排除机器的问题,隐患依旧存在,系统的稳定性得不到保证,性能得不到保障,可能在制约条件满足的时候再次发生故障,用户对其信任度就会下降。因此,如何在计算机操作系统中,如果在操作系统死机或者崩溃时,获取故障现场信息,已经成为业界迫切需要解决的问题。At present, the computer operating system (Operation System, OS), especially the multitasking operating system, such as the Windows operating system, is relatively complicated. During the operation of a complete operating system, due to the operation of multiple applications or new programs, it may A new failure has occurred. Generally, existing operating systems have some methods for diagnosing and maintaining the operating system, which can monitor the working status of the operating system and alert the user in advance when there is a possible problem. However, when some fatal errors occur in the operating system (such as memory errors, application out-of-bounds access, etc.), the operating system will crash, including an infinite loop (that is, crash) or an unrecognizable error (blue screen). The computer machine restarts. However, at this time, all computer failure site information will be lost, and further failure analysis cannot be carried out, and the root cause of the problem cannot be found. If the problem of the machine is not ruled out, hidden dangers still exist, the stability and performance of the system cannot be guaranteed, and failures may occur again when the constraints are met, and the user's trust in it will decline. Therefore, in the computer operating system, if the operating system crashes or crashes, how to obtain fault site information has become an urgent problem to be solved in the industry.
现有的处理操作系统致命错误的方法有以下几种:The existing methods for handling fatal operating system errors are as follows:
1、操作系统(如Windows操作系统)崩溃后,由操作系统的dump进程进行转储(dump),现有的3种转储的模式分别为小内存转储(64K);核心内存转储;完全内存转储,然后才通过分析工具程序对转储文件进行分析。1. After the operating system (such as Windows operating system) crashes, the dump process of the operating system will dump (dump). The existing three dump modes are small memory dump (64K); core memory dump; Complete memory dump before analyzing the dump file with the analysis tool program.
但这3种处理操作系统致命错误的方法都存在如下缺陷:But these three methods of handling fatal operating system errors all have the following defects:
这3种处理操作系统致命错误的转储模式的转变需要在操作系统下进行设置,如果要进行全内存转储需要占用大量的核心存储空间,但如果太少(小内存转储)又会丢失很多的信息,当然,用户可以根据不同的应用和崩溃可能会适用不同的的转储模式,但在计算机操作系统崩溃时只能按照设定好的转储模式进行转储,而不能再变更为别的转储模式。另一方面,这种方法仍旧需要依赖于操作系统下的转储进行,如果在出现严重的操作系统故障情况,如转储进行也崩溃的情况下,或者操作系统本地存储的内存出现故障的情况下,将无法进行现场信息的保存。The transition of these three dump modes for dealing with fatal errors in the operating system needs to be set under the operating system. If you want to perform a full memory dump, it will take up a lot of core storage space, but if it is too small (small memory dump), it will be lost. A lot of information, of course, users can apply different dump modes according to different applications and crashes, but when the computer operating system crashes, it can only be dumped according to the set dump mode, and cannot be changed to Other dump modes. On the other hand, this method still needs to rely on the dump under the operating system. If there is a serious operating system failure, such as the crash of the dump, or the failure of the local memory of the operating system In this case, the site information cannot be saved.
2、计算机操作系统在发生致命错误时,由系统管理员或者操作系统开发人员在现场进行重启、获取故障现场信息,进行计算机故障的诊断和维护等操作工作。2. When a fatal error occurs in the computer operating system, the system administrator or operating system developer will restart the computer on site, obtain fault site information, and perform operations such as computer fault diagnosis and maintenance.
但这种处理方法的缺点也是显而易见的,其主要缺点是需要系统管理员或者操作系统开发人员到现场进行操作,这样需要占用系统管理员或者操作系统开发人员的大量时间和精力来进行操作系统的诊断和维护。而由于没有计算机崩溃时的现场,因此系统管理员或者操作系统开发人员就无法准确定位故障,而只能凭借经验和大量的分析工具程序长时间的运行来发现问题,获取现场故障信息,其效率十分低下,而能够真正查找出计算机操作系统故障的现场信息的概率也较低。因此,这一方法在现实应用中不可能得到普遍的应用。However, the disadvantages of this method are also obvious. The main disadvantage is that system administrators or operating system developers need to go to the site to operate, which requires a lot of time and energy for system administrators or operating system developers to develop the operating system. diagnostics and maintenance. And because there is no scene when the computer crashes, the system administrator or operating system developer cannot accurately locate the fault, but can only rely on experience and a large number of analysis tool programs to find problems and obtain on-site fault information by running for a long time. It is very low, and the probability of being able to find out the on-site information of the computer operating system failure is also low. Therefore, this method cannot be widely used in practical applications.
发明内容 Contents of the invention
本发明的目的在于克服上述缺陷而提供的一种计算机故障现场信息获取的系统和方法,其能够保证操作系统在出现故障的现场对系统进行分析和诊断,获取故障现场的信息,包括内存信息等。The purpose of the present invention is to overcome the above defects and provide a system and method for obtaining computer fault site information, which can ensure that the operating system analyzes and diagnoses the system at the fault site, and obtains the information of the fault site, including memory information, etc. .
为实现本发明目的而提供的一种计算机操作系统故障现场信息获取的系统,基于EFI BIOS和运行在EFI BIOS上的操作系统进行运作;The system that a kind of computer operating system failure site information acquisition provided for realizing the purpose of the present invention operates based on EFI BIOS and the operating system running on the EFI BIOS;
包括操作系统监视模块,所述EFI BIOS包括内存分配模块和运行时间服务模块;Including an operating system monitoring module, the EFI BIOS includes a memory allocation module and a runtime service module;
所述内存分配模块,用于在计算机系统上电以后,EFI BIOS在预引导阶段,EFI BIOS对内存初始化时,将物理内存分为操作系统内存和故障分析系统内存;The memory allocation module is used to divide the physical memory into operating system memory and fault analysis system memory when EFI BIOS initializes the memory in the pre-boot stage after the computer system is powered on;
所述操作系统监视模块,运行并驻留在所述操作系统中,用于在操作系统正常运行时收集操作系统现场信息并保存;同时在操作系统崩溃时,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块。The operating system monitoring module runs and resides in the operating system, and is used to collect and store the on-site information of the operating system when the operating system is running normally; at the same time, when the operating system crashes, the event of the operating system crash is notified to the EFI BIOS The runtime service module.
所述运行时间服务模块,运行于故障分析系统内存空间,用于对获取操作系统故障信息的故障分析系统文件进行初始化,建立获取操作系统故障现场信息的故障分析系统环境,提供分析系统支持环境,获取操作系统故障现场信息,选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。The runtime service module runs in the memory space of the fault analysis system, and is used to initialize the fault analysis system files for obtaining operating system fault information, establish a fault analysis system environment for obtaining operating system fault site information, and provide an analysis system support environment, Obtain the operating system fault site information, select the operating system fault site information and save location, and save the information content to the location.
本发明的系统还可以包括计数器和硬件狗模块;The system of the present invention may also include a counter and a hardware dog module;
所述计数器,用于定时计算机运行时间;操作系统监视模块定时改写计数器,防止计数器溢出;当计数器溢出时,则产生中断,触发EFI BIOS中断管理程序,启动硬件狗模块;Described counter is used for timing computer running time; Operating system monitoring module regularly rewrites counter, prevents counter from overflowing; When counter overflows, then produces interruption, triggers EFI BIOS interrupt management program, starts hardware dog module;
硬件狗模块,用于将系统程序的指针调整到故障分析系统内存空间中的运行时间服务模块,从而使计算机系统的控制权转到EFI BIOS的运行时间服务模块中。The hardware dog module is used to adjust the pointer of the system program to the runtime service module in the fault analysis system memory space, so that the control right of the computer system is transferred to the runtime service module of the EFI BIOS.
所述操作系统为Windows操作系统。The operating system is Windows operating system.
本发明还提供一种计算机操作系统故障现场信息获取的方法,其特征在于,包括下列步骤:The present invention also provides a method for acquiring computer operating system failure site information, which is characterized in that it includes the following steps:
步骤A)计算机上电后,EFI BIOS初始化内存时,内存分配模块将物理内存分为操作系统内存和故障分析系统内存;Step A) After the computer is powered on, when the EFI BIOS initializes the memory, the memory allocation module divides the physical memory into operating system memory and fault analysis system memory;
步骤B)操作系统运行时,启动并驻留操作系统监视模块,操作系统监视模块收集操作系统现场信息;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块;Step B) when the operating system is running, start and reside in the operating system monitoring module, the operating system monitoring module collects the site information of the operating system; when the operating system crashes, the event of the operating system crashing is notified to the runtime service module of the EFI BIOS;
步骤C)运行时间服务模块运行于故障分析系统内存空间,建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。Step C) The runtime service module runs in the memory space of the fault analysis system, establishes a fault analysis system environment for obtaining fault site information of the operating system, and obtains fault site information of the operating system.
所述步骤B)还包括下列步骤:Described step B) also comprises the following steps:
步骤B1)操作系统运行时,启动并驻留操作系统监视模块,操作系统监视模块定时写计数器;Step B1) when the operating system is running, start and reside in the operating system monitoring module, and the operating system monitoring module regularly writes the counter;
步骤B2)当操作系统崩溃,进入EFI BIOS系统管理模式时,硬件狗模块将系统程序指针指向运行时间服务模块,启动运行时间服务模块。Step B2) When the operating system crashes and enters the EFI BIOS system management mode, the hardware dog module points the system program pointer to the runtime service module, and starts the runtime service module.
所述步骤C)包括下列步骤:Described step C) comprises the following steps:
步骤C1)运行时间服务模块上载基于EFI的设备驱动;Step C1) The runtime service module uploads the device driver based on EFI;
步骤C2)分析工具选择操作系统故障现场的的内存内容,存储位置,记录并保存现场的信息。Step C2) The analysis tool selects the memory content and storage location of the operating system failure site, records and saves the site information.
所述步骤C1)还包括下列步骤:Said step C1) also includes the following steps:
运行时间服务模块加载EFI web服务,建立基于网络连接,在加载网卡驱动以后,通过网络向控制端发出系统警告,通知控制端系统本操作系统目前状态。The runtime service module loads the EFI web service, establishes a network-based connection, and after loading the network card driver, sends a system warning to the control terminal through the network to inform the control terminal of the current state of the operating system.
所述设备驱动包括网卡驱动,IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动。The device drivers include network card drivers, IDE/SCSI device drivers, USB device drivers, and PCI device drivers.
所述现场信息包括CPU利用率信息,内存使用率信息,寄存器内容信息和进程信息中的一类或者多类信息组合。The context information includes one or more types of information among CPU utilization information, memory utilization information, register content information and process information.
本发明的有益效果是:本发明利用EFI BIOS完成对操作系统死机的分析,在计算机操作系统出现故障时,利用独立于操作系统的内存空间和分析操作环境对计算机操作系统崩溃的原因进行分析,获取故障现场信息。其判断保存计算机操作系统当前状态,并在操作系统出现故障时进入独立的内存空间而不破坏操作系统的内存区域和内存状态,并在分析环境下对操作系统的内存区域和内存状态进行分析并结合对硬件系统的底层诊断,来获取机器故障的现场信息,获取计算机操作系统发生故障的原因。The beneficial effect of the present invention is: the present invention utilizes EFI BIOS to finish the analysis to operating system crash, when computer operating system breaks down, utilizes the memory space that is independent of operating system and analysis operation environment to analyze the reason that computer operating system crashes, Obtain fault site information. It judges and saves the current state of the computer operating system, and enters an independent memory space when the operating system fails without destroying the memory area and memory state of the operating system, and analyzes the memory area and memory state of the operating system in the analysis environment. Combined with the bottom-level diagnosis of the hardware system, the on-site information of the machine failure is obtained, and the cause of the failure of the computer operating system is obtained.
附图说明 Description of drawings
图1是本发明计算机操作系统故障现场信息获取的系统结构示意图;Fig. 1 is a schematic diagram of the system structure of computer operating system failure site information acquisition of the present invention;
图2是图1中监控模块工作过程流程图;Fig. 2 is a flowchart of the working process of the monitoring module in Fig. 1;
图3是本发明计数器中断模块流程图;Fig. 3 is a flowchart of the counter interrupt module of the present invention;
图4是本发明EFI BIOS运行时间服务模块流程图。Fig. 4 is the flow chart of EFI BIOS running time service module of the present invention.
具体实施方式 Detailed ways
下面结合附图1~4进一步详细说明本发明的一种计算机操作系统故障现场信息获取的系统和方法。A system and method for acquiring fault site information of a computer operating system according to the present invention will be further described below in conjunction with accompanying
本发明克服现有的基本输入输出系统(Basic Input/Output System,BIOS)的弱点,利用EFI BIOS在这运行时间(runtime)功能和预引导(pre-boot)功能两方面的改进和提高,解决了在计算机操作系统崩溃时获取计算机故障现场信息的问题。The present invention overcomes the weakness of the existing basic input/output system (Basic Input/Output System, BIOS), utilizes the improvement and enhancement of EFI BIOS in two aspects of the runtime (runtime) function and the pre-boot (pre-boot) function, and solves the problem of The problem of obtaining computer failure scene information when the computer operating system crashes is solved.
本发明将涉及可扩展固件接口(EFI)技术,下面我们先对其进行介绍:The present invention will relate to Extensible Firmware Interface (EFI) technology, we first introduce it below:
可扩展固件接口(Extensible Firmware Interface,EFI)是1999年出现的用以取代沿用多年的基本输入输出系统(BIOS)的新一代接口程序,关于可扩展固件接口的介绍,详见UEFI论坛关于EFI技术的介绍http://www.UEFI.org。EFI BIOS介于硬件设备以及操作系统(比如Windows或者Linux)之间。与传统的BIOS不同,EFI BIOS使用全球最广泛的高级语言C语言进行编写,其提供了既具有传统BIOS的功能又有优于传统BIOS的扩展功能,在设计机制和架构上也有别于传统BIOS的实现,是下一代BIOS接口规范,这就意味着有更多的工程师可以参与EFI BIOS的开发工作,添加许多更有价值的功能。Extensible Firmware Interface (Extensible Firmware Interface, EFI) is a new generation of interface program that appeared in 1999 to replace the basic input and output system (BIOS) that has been used for many years. For an introduction to the Extensible Firmware Interface, see UEFI Forum for details on EFI technology Introduction to http://www.UEFI.org. EFI BIOS is between hardware devices and operating systems (such as Windows or Linux). Different from traditional BIOS, EFI BIOS is written in C language, the most widely used high-level language in the world. It provides both traditional BIOS functions and extended functions superior to traditional BIOS, and is also different from traditional BIOS in terms of design mechanism and architecture. The realization of the BIOS interface specification is the next generation, which means that more engineers can participate in the development of EFI BIOS and add many more valuable functions.
EFI BIOS具备的基本功能为:The basic functions of EFI BIOS are:
硬件平台初始化;Hardware platform initialization;
支持启动操作系统;Support booting the operating system;
脱离操作系统的平台管理工具。Platform management tools that are independent of the operating system.
EFI BIOS的工作模式可以简单归纳为:启动系统,标准固件平台初始化,接着从加载EFI驱动程序库以及及执行相关程序,在EFI BIOS系统启动菜单中选取所要进入的系统并向EFI BIOS提交启动引导代码,正常则进入系统,否则将中止启动服务并返回EFI BIOS系统启动菜单。The working mode of EFI BIOS can be simply summarized as follows: start the system, initialize the standard firmware platform, then load the EFI driver library and execute related programs, select the system to enter in the EFI BIOS system boot menu and submit the boot guide to EFI BIOS If the code is normal, it will enter the system, otherwise it will stop the startup service and return to the EFI BIOS system startup menu.
在本发明的对计算机操作系统的故障现场信息获取方法中,特别是以Windows操作系统而进行的描述,但本发明同样适用除了Windows操作系统之外的操作系统的情况。In the fault scene information acquisition method for the computer operating system of the present invention, the Windows operating system is particularly described, but the present invention is also applicable to operating systems other than the Windows operating system.
如图1所示,本发明计算机操作系统故障现场信息获取的系统,包括有:As shown in Figure 1, the system that the computer operating system failure site information acquisition of the present invention includes:
(一)内存分配模块1,用于在支持EFI BIOS的硬件架构中,在计算机系统上电(Power on)以后,EFI BIOS在预引导(pre-boot)阶段,EFI BIOS对内存初始化时,将一部分内存进行保留,此时,EFI BIOS向Windows操作系统提供的内存大小就是系统物理内存大小减去保留内存的大小。(1) The
同时,内存分配模块1将用于获取操作系统故障信息的故障分析系统文件放入保留内存区域,以供在发生操作系统崩溃时进入。At the same time, the
在系统上电(Power on)以后,EFI BIOS在预引导(pre-boot)阶段将内存进行初始化,启动内存分配模块1,内存分配模块1将内存分为两部分:After the system is powered on (Power on), the EFI BIOS initializes the memory in the pre-boot stage, starts the
一部分为操作系统内存,用于Windows操作系统在进入操作系统安装(OS load)阶段以后,操作系统对这一部分内存进行控制,分配给操作系统及在操作系统上运行的各个进程程序使用。Part of it is the operating system memory, which is used for the Windows operating system after entering the operating system installation (OS load) stage, the operating system controls this part of the memory, and allocates it to the operating system and various process programs running on the operating system.
另一部分为保留为故障分析系统内存,在EFI BIOS启动时其初始化为故障分析系统环境预留内存空间,此保留内存空间只分配给故障分析系统文件,Windows操作系统在启动后不能发现和使用此部分空间。在此预留空间中运行EFI BIOS的运行时间(runtime)服务模块3,此模块的主要功能是建立获取故障信息所必需的操作环境,如加载EFI网卡(NIC)的驱动,加载分析工具(diagnostic tool)并且保存Windows操作系统故障信息的内容。The other part is reserved for the fault analysis system memory. When the EFI BIOS is started, it is initialized to reserve memory space for the fault analysis system environment. This reserved memory space is only allocated to the fault analysis system files, and the Windows operating system cannot find and use this memory after startup. part space. Run the runtime (runtime)
(二)操作系统监视模块2,用于在操作系统运行时收集操作系统现场信息;同时在发现操作系统不能响应应用程序的操作请求,出现操作系统崩溃时,将操作系统崩溃的事件通知EFI BIOS的运行时间(runtime)服务模块3。(2) The operating
(三)计数器(Timer)4,用于定时计算机运行时间;操作系统监视模块2定时改写计数器4,防止计数器4溢出;当计数器4溢出时,则产生中断,触发EFI BIOS中断管理程序,启动硬件狗模块5。(3) counter (Timer) 4, is used for timing computer running time; Operating
当操作系统正常运行时,操作系统监视模块2定时改写计算机硬件中的南桥ICH芯片(I/OController HUB)中计数器4的计数寄存器,从而保证计数器4不会溢出而产生溢出中断;当操作系统崩溃时,则驻留并运行在操作系统中的监视模块2也无法正常运行,因此不能定时写南桥计数器4的计数寄存器,从而导致计数器4没有定时重置,计数器4溢出,产生中断,触发了EFI BIOS中断管理程序,启动硬件狗模块5。When the operating system was running normally, the operating
(四)硬件狗模块5,用于将系统程序的指针调整到故障分析系统内存空间中的运行时间(runtime)服务模块3,从而使计算机系统的控制权转到EFI BIOS的运行时间(runtime)服务模块3中。(4)
(五)EFI BIOS的运行时间(runtime)服务模块3,用于对获取操作系统故障信息的分析环境的部件进行初始化,在进入运行时间服务模块3时需要对硬件系统进行的初始化,使其建立起生故障分析系统环境,提供分析系统支持环境,如加载EFI NIC的驱动程序,加载分析工具(diagnostic tool),选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。(5) The running time (runtime)
在Windows操作系统崩溃时,分析工具6通过读取操作系统崩溃时的内存、寄存器内容等信息,结合对系统硬件的底层诊断来获取系统故障的信息,诊断操作系统发生故障的具体原因。然后选择操作系统故障现场信息和保存位置,并保存信息内容到该位置。When the Windows operating system crashes, the
Windows操作系统启动后,在正常运行的情况下,操作系统监视模块2运行并驻留在操作系统时,收集Windows操作系统的现场信息,包括CPU利用率,内存使用率,寄存器内容,进程信息,并写入到操作系统管理的内存固定内存空间中去。After the Windows operating system is started, under the condition of normal operation, when the operating
同时,操作系统监视模块2定时改写南桥计数器4的计数寄存器,保证计数器4不会溢出而产生溢出中断。At the same time, the operating
如果操作系统崩溃,则驻留并运行在操作系统中的监视模块2也无法正常运行,因此不能定时改写南桥计数器4的计数寄存器,从而导致计数器4没有定时重置,计数器4溢出,产生中断,触发了EFI BIOS中断管理程序,进入到EFI BIOS系统管理模式(System Management,SM),启动硬件狗模块5,硬件狗模块5将系统程序指针指向故障分析系统内存空间中的运行时间(runtime)服务模块3,从而使系统控制权转到EFI BIOS的运行时间(runtime)服务模块3中。If the operating system crashes, the
运行时间(runtime)服务模块3首先上载基于EFI的设备驱动,包括网卡(NIC),IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动,通过上载以上驱动,使得EFI BIOS获得相应设备的控制权,可以进行I/O设备的操作;然后,加载EFI web服务(service),建立基于网络连接,较佳地,是基于http的网络连接,在加载网卡(NIC)驱动以后,通过网络向某一控制端发出系统警告,通知远端管理员系统本操作系统目前状态;最后,分析工具(diagnostictool)6根据远程控制台的指令,如选择转储(dump)的内存内容,存储位置,如USB存储,网络存储,或者本地硬盘存储,从而记录下现场的信息。Runtime (runtime)
下面结合上述系统进一步详细说明本发明的一种计算机故障现场信息获取的方法:Below in conjunction with above-mentioned system, further describe a kind of method of computer failure field information acquisition of the present invention:
步骤A:计算机上电后,在预引导(pro-boot)阶段中,EFI BIOS初始化内存时,内存分配模块1将物理内存分为操作系统内存和故障分析系统内存。Step A: After the computer is powered on, in the pre-boot stage, when the EFI BIOS initializes the memory, the
在计算机上电(Power on)以后,EFI BIOS在预引导(pre-boot)阶段将内存进行初始化,启动内存分配模块1,内存分配模块1将内存分为两部分:After the computer is powered on (Power on), the EFI BIOS initializes the memory in the pre-boot stage, starts the
一部分为操作系统内存,用于Windows操作系统在进入操作系统安装(OS load)阶段以后,操作系统对这一部分内存进行控制,分配给操作系统及在操作系统上运行的各个进程程序使用。Part of it is the operating system memory, which is used for the Windows operating system after entering the operating system installation (OS load) stage, the operating system controls this part of the memory, and allocates it to the operating system and various process programs running on the operating system.
另一部分为保留为故障分析系统内存,其初始化为在EFI BIOS启动时故障分析系统环境预留内存空间,此保留内存空间只分配给故障分析系统环境,Windows操作系统在启动后不能发现和使用此部分空间。并在此预留空间中运行EFI BIOS的运行时间(runtime)服务模块3,运行时间服务模块3建立获取故障信息所必需的操作环境,如加载EFI NIC的驱动,加载分析工具(diagnostic tool)6并且保存Windows操作系统故障信息的内容。The other part is reserved for the fault analysis system memory, which is initialized to reserve memory space for the fault analysis system environment when the EFI BIOS is started. This reserved memory space is only allocated to the fault analysis system environment, and the Windows operating system cannot find and use this memory space after startup. part space. And run the runtime (runtime)
步骤B:操作系统运行时,启动并驻留操作系统监视模块2,收集操作系统现场信息,并定时改写计数器4;当操作系统崩溃,将操作系统崩溃的事件通知EFI BIOS的运行时间服务模块3。Step B: when the operating system is running, start and reside in the operating
步骤B1:操作系统运行时,启动并驻留操作系统监视模块2,收集操作系统现场信息并保存,并定时改写计数器4。Step B1: When the operating system is running, start and reside the operating
如图2所示,Windows操作系统启动后,在正常运行的情况下,操作系统监视模块5运行并驻留在操作系统时,收集Windows操作系统的现场信息,包括CPU利用率信息,内存使用率信息,寄存器内容信息,进程信息,并写入到操作系统管理的内存固定空间中去。As shown in Figure 2, after the Windows operating system is started, under normal operation conditions, the operating
同时,操作系统监视模块2定时改写南桥计数器4的计数寄存器,保证计数器4不会溢出而产生溢出中断。At the same time, the operating
步骤B2:当操作系统崩溃,进入EFI BIOS系统管理模式时,硬件狗模块5将系统程序指针指向运行时间(runtime)服务模块3,启动运行时间(runtime)服务模块3。Step B2: When the operating system crashes and enters the EFI BIOS system management mode, the
如图3所示,如果操作系统崩溃,如系统死机蓝屏等,则驻留并运行在操作系统中的监视模块2也无法正常运行,因此不能定时写南桥计数器4的计数寄存器,从而导致计数器4没有定时重置,计数器4溢出,产生中断,触发了EFI BIOS中断管理程序,进入到EFI BIOS系统管理模式(SystemManagement,SM),并将系统程序指针指向故障分析系统内存空间中的运行时间(runtime)服务模块3,从而使系统控制权转到EFI BIOS的运行时间(runtime)服务模块3中。As shown in Figure 3, if the operating system crashes, such as the blue screen of system crash, etc., then the
步骤C:运行时间服务模块3建立获取操作系统故障现场信息的故障分析系统环境,获取操作系统故障现场信息。Step C: The
当运行时间服务模块3启动后,其上载设备驱动,建立网络连接,启动分析工具,选择并存储故障现场信息。When the
操作系统监视模块2在操作系统崩溃时,不能定时写计数器4的计数寄存器,定时器4溢出,触发EFI BIOS的中断管理程序,进入到EFI BIOS的系统管理模式(SM),启动硬件狗模块5,硬件狗模块5将系统程序的指针指向运行时间(runtime)服务模块3。The operating
运行时间(runtime)服务模块3首先上载基于EFI的设备驱动,包括网卡(NIC),IDE/SCSI设备驱动,USB设备驱动,PCI设备驱动,通过上载以上驱动,使得EFI BIOS获得相应设备的控制权,可以进行I/O设备的操作;Runtime (runtime)
然后,运行时间服务模块3加载EFI web服务(service),建立基于网络连接,较佳地,是基于http的网络连接,在加载网卡(NIC)驱动以后,通过网络向某一控制端发出系统警告,通知远端管理员系统本操作系统目前状态;Then, the
最后,分析工具(diagnostic tool)6根据远程控制台的指令,如选择转储(dump)的内存内容,存储位置,如USB存储,网络存储,或者本地硬盘存储,记录并保存现场的信息。Finally, an analysis tool (diagnostic tool) 6 records and saves on-site information according to the instructions of the remote console, such as selecting the memory content of dump (dump), storage location, such as USB storage, network storage, or local hard disk storage.
本发明利用EFI BIOS在运行时间(runtime)和预引导(pre-boot)运行环境,完成对操作系统崩溃时不依赖于操作系统而获取现场故障信息,其运行并驻留在操作系统中的监视模块2获取操作系统现场信息,判断操作系统当前状态,在发生操作系统崩溃时进入分析环境,不破坏windows的内存区域,并在分析环境下对操作系统的内存区域进行分析并结合对硬件系统的底层诊断,来获取机器故障的信息,并可以结合现有的操作系统故障分析工具查明操作系统出现故障的原因,保证操作系统运行时的稳定性。The present invention utilizes EFI BIOS at runtime (runtime) and pre-boot (pre-boot) operating environment to complete the monitoring of on-site fault information not dependent on the operating system when the operating system crashes, and its operation and residing in the
本实施例是为了更好地理解本发明进行的详细的描述,并不是对本发明所保护的范围的限定,因此,本领域普通技术人员不脱离本发明的主旨未经创造性劳动而对本明所做的改变在本发明的保护范围内。This embodiment is a detailed description for a better understanding of the present invention, and is not a limitation of the protection scope of the present invention. Changes are within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2006100576026A CN100472471C (en) | 2006-02-22 | 2006-02-22 | A system and method for acquiring computer operating system fault site information |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2006100576026A CN100472471C (en) | 2006-02-22 | 2006-02-22 | A system and method for acquiring computer operating system fault site information |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101025709A CN101025709A (en) | 2007-08-29 |
| CN100472471C true CN100472471C (en) | 2009-03-25 |
Family
ID=38744028
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB2006100576026A Active CN100472471C (en) | 2006-02-22 | 2006-02-22 | A system and method for acquiring computer operating system fault site information |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN100472471C (en) |
Families Citing this family (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102567550A (en) * | 2011-12-31 | 2012-07-11 | 曙光信息产业股份有限公司 | Method and device for collecting data of emergency event in operating system (OS) |
| CN102622322B (en) * | 2012-02-24 | 2015-09-09 | 华为技术有限公司 | A kind of method, black box and server utilizing black box to obtain crash info |
| CN102637144B (en) * | 2012-03-31 | 2015-05-06 | 北京奇虎科技有限公司 | System fault processing method and device |
| CN104699615B (en) * | 2012-03-31 | 2017-09-22 | 北京奇虎科技有限公司 | A kind for the treatment of method and apparatus of the system failure |
| CN104035871B (en) * | 2014-06-27 | 2016-04-13 | 腾讯科技(深圳)有限公司 | Based on fault handling method and the device of the application program in geographic position |
| CN105204977A (en) * | 2014-06-30 | 2015-12-30 | 中兴通讯股份有限公司 | System exception capturing method, main system, shadow system and intelligent equipment |
| CN105512000B (en) * | 2014-09-24 | 2020-04-24 | 中兴通讯股份有限公司 | Operating system abnormal information collection method and device and computer |
| US20170196029A1 (en) * | 2016-01-05 | 2017-07-06 | Gentex Corporation | Communication system for vehicle |
| CN106997315B (en) * | 2016-01-25 | 2021-01-26 | 阿里巴巴集团控股有限公司 | Method and device for memory dump of virtual machine |
| CN107025146B (en) * | 2016-01-30 | 2019-10-18 | 华为技术有限公司 | A file generation method, device and system |
| CN106681771B (en) * | 2016-12-30 | 2020-12-29 | 阿里巴巴(中国)有限公司 | System reinstallation method and device |
| CN108319530A (en) * | 2018-02-06 | 2018-07-24 | 合肥联宝信息技术有限公司 | Diagnostic method, device, terminal and the medium of computer hardware |
| CN111158982B (en) * | 2019-12-26 | 2022-06-28 | 联想(北京)有限公司 | Electronic device, first operating system, data processing method, and storage medium |
| CN111341434B (en) * | 2020-03-02 | 2024-05-28 | 北京医维星科技有限公司 | Remote fault diagnosis and maintenance system for medical equipment and construction method thereof |
| CN113064747B (en) | 2021-03-26 | 2022-10-28 | 山东英信计算机技术有限公司 | Fault positioning method, system and device in server starting process |
| CN114064132B (en) * | 2021-09-30 | 2023-07-21 | 中科创达软件股份有限公司 | Method, device, equipment and system for recovering system downtime |
| CN116302646B (en) * | 2023-02-24 | 2024-03-29 | 荣耀终端有限公司 | Fault positioning method, system, electronic equipment and storage medium |
-
2006
- 2006-02-22 CN CNB2006100576026A patent/CN100472471C/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN101025709A (en) | 2007-08-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN100472471C (en) | A system and method for acquiring computer operating system fault site information | |
| JP6530774B2 (en) | Hardware failure recovery system | |
| CN104254840B (en) | Memory dump and analysis in computer systems | |
| US8260841B1 (en) | Executing an out-of-band agent in an in-band process of a host system | |
| CN108874624B (en) | Server, method for monitoring Java process and storage medium | |
| US8595552B2 (en) | Reset method and monitoring apparatus | |
| US8495430B2 (en) | Generate diagnostic data for overdue thread in a data processing system | |
| US20140208166A1 (en) | Health monitoring of applications in a guest partition | |
| US7809985B2 (en) | Offline hardware diagnostic environment | |
| CN102521105B (en) | Output method of power on self test information, virtual machine manager and processor | |
| US8909989B2 (en) | Method for outputting power-on self test information, virtual machine manager, and processor | |
| TWI808362B (en) | Computer system and method capable of self-monitoring and restoring an operation of operating system | |
| JP4677214B2 (en) | Program, method and mechanism for collecting panic dump | |
| JP2010086364A (en) | Information processing device, operation state monitoring device and method | |
| JP6237230B2 (en) | Memory management program, memory management method, and memory management device | |
| CN101446915B (en) | Method and device for recording BIOS level logs | |
| CN115951949A (en) | Method, device and computing device for recovering configuration parameters of BIOS | |
| US10474517B2 (en) | Techniques of storing operational states of processes at particular memory locations of an embedded-system device | |
| CN119003189A (en) | System management memory allocation method, program running method, system and product | |
| CN119046038A (en) | Automatic processing method and device for downtime of server, data processing unit and medium | |
| US8312433B2 (en) | Operating system aided code coverage | |
| JP2007133544A (en) | Failure information analysis method and apparatus for implementing the same | |
| JP5348120B2 (en) | Program, method and mechanism for collecting panic dump | |
| WO2008048581A1 (en) | A processing device operation initialization system | |
| CN100458708C (en) | interrupt control system and method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |
