[go: up one dir, main page]

CN101430674A - Intraconnection communication method of distributed virtual machine monitoring apparatus - Google Patents

Intraconnection communication method of distributed virtual machine monitoring apparatus Download PDF

Info

Publication number
CN101430674A
CN101430674A CNA2008102398997A CN200810239899A CN101430674A CN 101430674 A CN101430674 A CN 101430674A CN A2008102398997 A CNA2008102398997 A CN A2008102398997A CN 200810239899 A CN200810239899 A CN 200810239899A CN 101430674 A CN101430674 A CN 101430674A
Authority
CN
China
Prior art keywords
vmm
data
ring
communication
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008102398997A
Other languages
Chinese (zh)
Other versions
CN101430674B (en
Inventor
宋忠雷
肖利民
陈思名
彭近兵
祝明发
马博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN2008102398997A priority Critical patent/CN101430674B/en
Publication of CN101430674A publication Critical patent/CN101430674A/en
Application granted granted Critical
Publication of CN101430674B publication Critical patent/CN101430674B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The invention provides an implementation method of a distributed virtual monitor communication technology. In the method, the existing reliable transmission protocol is utilized to realize reliable and efficient communication among virtual machine monitors (VMM), provide the necessary basis for resources integration, provide a single physical node for an upper customer operating system, and realize the management and use of multi-node resources by the customer operating system by mainly combining an expansion system equipment simulation part with a descriptor ring mechanism. The method comprises the following steps: step one: a preparation phase; step two: a connection establishment phase; step three: a data transmission process; and step four: a connection closing phase. The method is an innovation based on the existing mature technology, and has simple implementation and good use and development prospects.

Description

A kind of intraconnection communication method of distributed virtual machine monitoring apparatus
(1) technical field
The present invention relates to a kind of intraconnection communication method of distributed virtual machine monitoring apparatus, belong to the computer system virtualization technology, especially relate to a kind of interior connection letter technology with the distributed virtual machine monitor system of realizing the virtual target of a server group of planes.Belong to field of computer technology.
(2) background technology
Intel Virtualization Technology
Unit is virtual progressively ripe
Intel Virtualization Technology is since paying close attention to widely once occurring being subjected to people, now, increasing manufacturer begins to get involved, adding from the AMD of processor aspect and Intel to the Microsoft of operating system aspect is from the lofty tone response that emerges in large numbers server system manufacturer of the third party software manufacturer of One's name is legion.As seen the Intel Virtualization Technology development is swift and violent.
Xen is by the monitor of virtual machine of increasing income (VMM) of Cambridge University's computer laboratory exploitation, can create a plurality of virtual machines simultaneously thereon, operating system of each virtual machine operation.The Intel Virtualization Technology that Xen adopts is called as para-virtualization (half is virtual), and promptly VMM ignores to the general instruction of client operating system, need use hypercalls (hypercall) to replace to the sensitivity instruction of operating system.Great like this isolation and the performance that has improved system, but it is exactly that it needs the user that operating system is made amendment that a shortcoming is arranged.Along with AMD and Intel release the auxiliary virtual supporting technology of hardware, Xen also supports fully virtualized and hardware is assisted virtual.
VMware company is leader in the market, and it adopts fully virtualized (Full-Virtualization) technology.Fully virtualized technology is exactly that client operating system does not need to do any change and can run directly on the virtual machine, and it is conventional that it can allow the user to use, and the operating system that need not to revise is used as client.
Multimachine is virtual to be walked to go on the stage
Along with the growth of using computer resource, the unit resource has not satisfied user's demand, how to break through another important development direction of single host restriction becoming Intel Virtualization Technology.
The Virtual Multiprocessor project of Tokyo Univ Japan is based on the distributed VMM of an IA-32 group of planes.In this project, between VMM and hardware layer, moved the operating system of a simplification.It can be provided support by this layer host OS.VirtualMultiprocessor has adopted half virtualized technology, by the modification to client operating system it is cooperated with VirtualMultiprocessor and finishes the work.Virtual Multiprocessor runs on user's attitude, and its major function relies on the system call to host operating system, thereby inefficiency.Communication between its distributed VMM is to finish by the Transmission Control Protocol that calls lower floor's multiple operating system, and internodal communication is responsible for by operating system.
VNUMA is the distributed VMM based on an IA-64 group of planes of University of New South Wales's exploitation, and it runs on the bottom.Client operating system (Linux) is cooperated with vNUMA by accurate virtualized mode.The main target of vNUMA is that the distributed shared storage that provides transparent is used for science and calculates.The vNUMA system has adopted a kind of pre-virtualized method that is called as, and this method is proposed jointly by German Karlsruhe university, University of New South Wales and IBM.This is a kind of semi-automatic Guest building method that provides instrument to support, utilize the support of assembler, code to the Guest system scans, and part privileged instruction is wherein carried out static state replace, and adopts the profile method dynamically to seek and manual the replacement to instruction that can't static treatment.The guiding theory of this method is to adopt the compilation tool support, increases the instruction that can directly move as far as possible, reduces the instruction that needs simulation.The communication system of cross-node
Communication between VMM
Virtual Multiprocessor communication
The Virtual Multiprocessor project of Tokyo Univ Japan also is to realize distributed virtual machine.In this project, between VMM and hardware layer, move the operating system of a simplification, be called Host OS.The application protocol of oneself is used in communication between VMM, and it uses transmission and the reception of finishing data among the Host OS based on the ICP/IP protocol stack of Ethernet.Its advantage is that communication protocol is used existing ICP/IP protocol stack, to realizing bringing great convenience.And shortcoming also is obviously, it needs the support of host OS, and communication belongs to application layer, and is low on the efficient.
VNUMA communication
VNUMA is a kind of distributed VMM based on an IA-64 group of planes of University of New South Wales's exploitation.The VMM that it is based on the Anthem architecture runs directly on the hardware, and bottom communication is directly realized on hardware.But it is non-increasing income.Its advantage is that shortcoming is exactly the specialized hardware support by high on the hardware implementation efficiency, to realizing having brought difficulty.
Other strides efficient communication mechanism between node
The VMMC communication mechanism
Virtual Memory-Mapped Communication (VMMC) is a kind of communication mechanism based on virtual memory mappings, and it supports the data from the transmit leg virtual memory to take over party's virtual memory directly to transmit.Support supporting data zero-copy technique, but it needs special-purpose hardware supported.
Active?Messages
It is first ULN that is widely used (User Level Networking), is for parallel micro-computer development at first, uses on different network interface hardwares, then develops into a kind of assembly language that is used for communication.It is supported without any need for agreement, approaches hardware more, the efficient height, but relatively poor deadlock is handled and be synchronous.
Fast?Sockets
Its usefulness be GAM (Globally Addressable Memory) interface, the queue structure with the bottom does not display, got rid of the implicit execution of handling procedure, but provide cache management for message on the horizon and manage flow to avoid deadlock by consumer process.It is widely used in group of planes interconnection, and has promoted to solving the development of the related protocols such as node contention problem that caused by large quantities of transmission.Though it only needs a spot of copying data, need special-purpose platform interface to realize.
In sum, communication mainly contains two kinds between present VMM, and a kind of is in the VMM based on host OS, and communication is finished by the protocol stack among the host OS; Another is based on the VMM of hardware, and it finishes communication on the basis of proprietary communication hardware.The communication mechanism that the present invention sets forth is based on gigabit Ethernet, for the VMM that directly runs on the hardware provides communication.
(3) summary of the invention
The object of the present invention is to provide a kind of intraconnection communication method of distributed virtual machine monitoring apparatus, it mainly adopts ring mechanism to combine with reliable transport protocol, and it is auxiliary with effective notifying mechanism, utilize the high-speed interconnect local net network, for distributed virtual machine monitor provides reliably communication efficiently, finish the integration of cluster resource.
Method of the present invention is based on group system-a kind of multicomputer system that connects by the external the Internet network, be characterized in that each physical resource is distributed on a plurality of nodes, by virtual and module cooperation to each node resource, finish the integration to server resource, the computing machine in the cluster need be cooperated by the mode of network delivery message.The target of patent of the present invention is that (Symmetric Multi-Processors, SMP) monitor of virtual machine of characteristic provides reliably communication efficiently in order to utilize Intel Virtualization Technology to provide to have symmetric multiprocessor on Network of Workstation.The characteristics of distributed virtual machine monitor be on-line operation on physical hardware, and need not the help of operating system.
Patent of the present invention provides the monitor of virtual machine with SMP characteristic by at cluster nodes deploy VMM on the physical arrangement of Network of Workstation.And, be referred to as equipment domain (Dom0), and provide the reliable transport protocol stack by it in the linux operating system that the VMM operation was revised.By in VMM, realizing interaction mechanism efficiently with equipment domain, finish calling to equipment domain protocol stack, thereby reach between VMM reliably data transmission efficiently, make the business-like operating system of supporting the SMP structure need not to revise and promptly may operate in this virtual machine.
The communication module of VMM is responsible for the VMM that cloth is deployed on each physical nodes data transmission efficiently is provided reliably, makes VMM finish integration to Multi-processor Resources, for client operating system presents single mirror image.
A kind of intraconnection communication method of distributed virtual machine monitoring apparatus of the present invention, the specific implementation step is as follows:
Step 1, preparatory stage:
Each node VMM is the dicyclo headspace when creating virtual machine and being the virtual machine storage allocation, informs that the virtual machine page is unavailable, and initialization requests ring and service ring structure;
Each node VMM is starting outfit analog module dm when starting virtual machine respectively, and with the page-map at dicyclo place in the domain0 address space, for domain0 be operated in the dm module on the domain0 application program and use;
In each node VMM initialization of event channel respectively, event channel is managed by VMM, and presents to domain0 and use;
Step 2, connect the stage:
When client operating system starts:
1. start client operating system by administration module, at first be trapped among the VMM, VMM starts the qemu-dm module that equipment simulating is provided for client operating system after client operating system is distributed core resource;
Start the communication module part that we realized in the qemu-dm module:
For improving data transmission rate, we use three connections to communicate, and are respectively between the vcpu on the different physical nodes, remote I/O device access, long-range memory access;
The qemu-dm module operates in the application layer of domain0, and each node reads parameter from the configuration file of client operating system respectively, parameter is passed over, such as node IP address.Start node and create socket, begin connection is monitored, initiate connection request after node is created socket but not start; This process is the obstruction mode, that is to say to have only to set up to connect well just can down to carry out afterwards;
The kernel state that enters into domain0 by the system call mode uses the ICP/IP protocol stack to finish establishment of connection; The descriptor that to set up good connection respectively is saved in the connection array, uses when sending for data;
Step 3, data transmission procedure:
When client operating system skips leaf or can at first be trapped among the VMM during I/O device access;
VMM creates different requests according to the difference request to being absorbed in reason analysis, and the map addresses of the data that needs are sent joins in the request ring by write pointer then to domain0, and event channel is set at last;
Communication thread is waken up in the mode of event channel by VMM in the qemu-dm module, begins the request ring is carried out poll, need the transmission meeting by read pointer request be taken out as data, and sets up packet according to request, for packet adds the application layer head;
Data are imported the ICP/IP protocol stack among the domain0 into, by network interface card data are sent;
Destination node interrupts transferring to the ICP/IP protocol stack by network interface card after receiving data, protocol stack is preserved data behind the head that removes below the transport layer, and with the address be saved in one the ring descriptor in, by service ring write pointer its is put in the service ring and to go, inform that by event channel VMM handles then;
VMM reads service ring, obtains behind the address in the descriptor data map receiving data like this and finishing in client operating system;
Step 4, connection closed stage:
Client operating system is closed;
Discharge and start node resource and discharge remote resource by communication module;
Check whether data dispose in the ring;
With connection closed;
Discharge ring and with the related resource of communicating by letter;
The communication thread service is withdrawed from.
A kind of intraconnection communication method of distributed virtual machine monitoring apparatus of the present invention, its advantage and effect are: by utilizing existing reliable transport protocol stack, and finish communicating by letter of distributed VMM with the combination of mechanism such as event channel by dicyclo, the present invention has improved the high efficiency and the extensibility of communicating by letter in the distributed VMM system, and the present invention innovates on the existing mature technology basis, enforcement is more prone to, and has good use and development prospect.
(4) description of drawings
Fig. 1 DVMM entire system structural representation
Fig. 2 two node system module synoptic diagram
Fig. 3 overall architecture synoptic diagram of communicating by letter
Fig. 4 two node traffic model synoptic diagram
Fig. 5 descriptor rings structural representation
Fig. 6 communication process detailed maps
(5) embodiment
See also shown in Fig. 1 to 5, be communicated with the letter method in a kind of distributed virtual watch-dog of the present invention,
1. method general introduction
Patent of the present invention is based on group system-a kind of multicomputer system that connects by the external the Internet network, be characterized in that each physical resource is distributed on a plurality of nodes, by virtual and module cooperation to each node resource, finish the integration to server resource, the computing machine in the cluster need be cooperated by the mode of network delivery message.The target of patent of the present invention is that (Symmetric Multi-Processors, SMP) monitor of virtual machine of characteristic provides reliably communication efficiently in order to utilize Intel Virtualization Technology to provide to have symmetric multiprocessor on Network of Workstation.The characteristics of distributed virtual machine monitor be on-line operation on physical hardware, and need not the help of operating system.
Patent of the present invention provides the monitor of virtual machine with SMP characteristic by at cluster nodes deploy VMM on the physical arrangement of Network of Workstation.And, be referred to as equipment domain (Dom0), and provide the reliable transport protocol stack by it in the linux operating system that the VMM operation was revised.By in VMM, realizing interaction mechanism efficiently with equipment domain, finish calling to equipment domain protocol stack, thereby reach between VMM reliably data transmission efficiently, make the business-like operating system of supporting the SMP structure need not to revise and promptly may operate in this virtual machine.
The communication module of VMM is responsible for the VMM that cloth is deployed on each physical nodes data transmission efficiently is provided reliably, makes VMM finish integration to Multi-processor Resources, for client operating system presents single mirror image.
2. the characteristics of distributed VMM communication
Distributed VMM system runs directly on the physical hardware, is in charge of and integrates hardware resource, for operating in the operating system service on upper strata.Communication is as the part of distribution VMM system, for realizing other functions services of whole distributed VMM system.
Because VMM itself does not have the reliable transmission layer protocol, and the management of not responsible external unit (ethernet nic), and these be equipment domain all, finish reliable communication between VMM for utilizing Ethernet, need VMM itself to realize and the high efficiency interactive mechanism of equipment domain, finish use protocol stack among the equipment domain.Made full use of equipment domain like this, and needn't realize that numerous and diverse transport layer protocol and network interface card drive, and make VMM have more extensibility at VMM.
3. system architecture
VMM communication is divided into following module by functional sequence:
The pre-service of module one, VMM communication.
When Guest OS causes remote request, can be trapped among the VMM, at this moment VMM to Guest OS be absorbed in reason analysis, different communication is distinguished, and is added different heads, constituted the head of application layer.In the DVMM system, the kind of communication mainly contains:
IPI transmits the content of a register at every turn.
Figure A200810239899D00092
IOREQ and control information are transmitted in the remote equipment visit at every turn, are no more than 100 bytes.
Figure A200810239899D00093
DSM, the data volume of each communication is one page.
Figure A200810239899D00094
An instruction is transmitted in the remote I/O operation at every turn.
Communication thread is mutual in the last DM module of module two, VMM and dom0.
VMM is in kernel state for the two ends of communication, and promptly on the hardware, and the qemu-dm module runs on the domain0, is in user's attitude, and both must have data interaction.In order to make VMM and Dom0 swap data, introduce dicyclo mechanism, be called request ring and service ring.Because the data that send or receive are bigger usually, VMM and Dom0 be the switching entity data not, and announce by reference.Ring is deposited and is sent the Data Control structure, is called descriptor rings.Data that VMM will send or Dom0 are kept at the address of the data that receive in the ring in one, and by the authorization list among the VMM data place page is licensed to Dom0 and come to finish transmission by its mapping, so both reduce the shellfish of holding of data, improved efficient, increased the capacity of ring again.
Authorization list is for realizing data sharing between the different domain, and in DVMM, client operating system and Dom0 can be regarded as two domain of equity, and VMM should guarantee both independence, securities.Yet communication need allow dom0 send data among the GOS, perhaps receives data and offers GOS, is the copy that reduces data, needs to realize data sharing between the two.
Obviously, for request ring and service ring, they should be able to be by VMM and Dom0 common access, and more most importantly is, what their residing positions must safety, can not be revised by other module.Based on above consideration, send ring and receive ring and from the kernel page of GuestOS, reserve, and restriction Guest OS can not use this page.Use for Dom0 by the mode of mapping then.The page of Guest OS is managed by VMM, so it also can use.
As shown in Figure 5, the request ring is identical with the service ring structure, each ring has two pointers, be respectively read and write, use jointly by VMM and dom0, one end is read an end and is write, VMM can put into request in the ring by write pointer when request communication, and handle by the read pointer request of reading data are sent at the dom0 of the other end, for the service ring, after Dom0 received data, structure of newly-built ring was put into the service ring by write pointer with it, and VMM reads service and handle by read pointer then.
Event channel mechanism is for solving cooperating between VMM and the Dom0.In the DVMM system, for improving the efficient of communication, asynchronous system is adopted in communication, and VMM notifies dom0 kernel services device to handle request by event channel mechanism.VMM has also ensured the efficient of communication under the situation that does not influence its performance like this.
Event channel is provided with masked bits in essence, each represents a passage, use jointly by VMM and Dom0, when needs trigger the Dom0 incident, initiate communication request such as VMM, behind DSR,, and learnt that by Dom0 inquiry request of data needs to handle by set to event channel.Dom0 also can make amendment to it, but owing to it is safeguarded by VMM, so need finish by hypercalls.
Realize the communication thread server in module three, the qemu-dm module
In the DVMM system, what qemu-dm module itself was responsible for is the I/O simulation, and we make it go back the transmission of reliable news between responsible node when being responsible for the I/O simulation to the expansion that qemu-dm carries out.Communication thread mainly is responsible for establishment of connection, the poll of request and service ring, the transmission of data and reception.
When Guest OS created, qemu-dm was responsible for being established to the TCP link of the corresponding qemu-dm of other different nodes.Communication thread can be carried out poll to TCP socket and request ring respectively.When communication thread found that the purpose of this message is qemu-dm, it can pass to qemu-dm with message; When communication thread finds that the purpose of this message is VMM, the arrival that it can directly be put into message the service ring and utilize event channel notice VMM message.
When VMM need transmit information, the mode notifying communication thread that it can send the purpose and the message request of putting into of communication ring and utilize event channel.Communication thread can select the TCP corresponding with destination node to link after request sends the message of taking correspondence the ring away, and this message is sent to destination node by socket.
Communication thread can be done simple explanation to each bar message, and when it found that VMM need send the page, it at first can allow the back directly obtain data and to be sent to corresponding socket from this page with this page-map to the address space of oneself.Similar with the process that sends, when communication thread found that long-range page data arrives, it at first can be with the purpose page-map to the linear address space of oneself, and directly the page data among the socket is read in the purpose client page.
Because in multi-node system, each communication thread may safeguard that a plurality of TCP connect, so communication thread also needs to be responsible for to send forwarding or the corresponding bag of route between ring and the TCP socket in request.VMM only is responsible for communicating according to node serial number, and the pairing IP of node serial number address is transparent to VMM.So communication thread must be responsible for the bag of the required transmission of the VMM IP address according to its destination node correspondence, from connecting, the TCP of correspondence sends.
Protocol stack is to the transmission of data among module four, the domain0
Protocol stack that provides among the Dom0 and network device driver: because in VMM, external unit is in charge of by dom0, so network device driver and reliable transport protocol (TCP) that we can only multiplexing dom0 provide, Transmission Control Protocol is connection-oriented communication, and itself provide retransmission mechanism etc., thereby can realize the reliable communication between VMM.
By the combination of number of mechanisms, communication system is for the DVMM system provides simply, data transmission reliably but efficiently.
As shown in Figure 6, when Guest OS needs remote pages or I/O visit, can 1. be trapped among the VMM, VMM is absorbed in reason processing request according to GuestOS and 2. the request of data of needs transmission is put in the transmission ring, be in this request of application layer process poll in the 3. qemu-dm module this moment, when finding to have data to send, carry out the processing of 4. application layer and the protocol stack that calls among the dom0 sends by 5. network interface card data from Ethernet.Take over party 6. network interface card receives and interrupts after the data being received by protocol stack, 7. transfers to application layer process, 8. puts into and receives ring, and 9. inform VMM, and 10. VMM serves the whole like this transmission receiving course of Guest OS. and just finished.
4. system works flow process
Initial phase:
The initialization of communication module
Because communication protocol is connection-oriented C/S model, so each node is not that two category nodes are distinguished at initial phase by the system of symmetry during initialization: system chooses a node as starting node, and all the other nodes are as non-startup node.Start node and set up socket, and begin port monitored and connect, set up to call out behind the socket and connect, connection is saved in the array but not start node with the server end.
The initialization of ring:
Two descriptor rings are implemented in the reservation page of client operating system, start the initialization zero clearing of carrying out the page at client operating system, and the read-write pointer is set then, and the read-write pointer all points to the ring reference position, has judged whether data according to both differences.
The foundation of service:
The DVMM system adds two communication thread by the qemu-dm module is expanded, and one of them is for being responsible for sending the thread of data, and it sends ring to request and carries out poll to handling.Another is a thread of being responsible for receiving data, and it is caused by protocol stack among the equipment domain0, puts into the service ring after protocol stack receives data, handles for VMM.Thread is set up in qemu-dm module start-up course.
System's normal work stage:
Distributed VMM realizes the resource consolidation of multinode, for the module communication between VMM on the different nodes is provided.
Connect and set up: utilize socket to connect between multinode, starting node is the server end, and other node is the client end.
Data send, and: VMM has data to send request is presented in the request ring, and notice qemu-dm resume module utilizes the ICP/IP protocol stack to send then.
Data Receiving: put into after the ICP/IP protocol stack receives in the service ring, handle by VMM.
Below in conjunction with accompanying drawing, it is as follows that concrete implementation step is described in detail in detail:
Step 1, preparatory stage:
Each node VMM is the dicyclo headspace when creating virtual machine and being the virtual machine storage allocation, informs that the virtual machine page is unavailable, and initialization requests ring and service ring structure;
Each node VMM is starting outfit analog module dm when starting virtual machine respectively, and with the page-map at dicyclo place in the domain0 address space, for domain0 be operated in the dm module on the domain0 application program and use;
In each node VMM initialization of event channel respectively, event channel is managed by VMM, and presents to domain0 and use.
Step 2, connect the stage:
When client operating system starts:
1. start client operating system by administration module, at first be trapped among the VMM, VMM starts the qemu-dm module that equipment simulating is provided for client operating system after client operating system is distributed core resource;
Start the communication module part that we realized in the qemu-dm module:
For improving data transmission rate, we use three connections to communicate, and are respectively between the vcpu on the different physical nodes, remote I/O device access, long-range memory access;
The qemu-dm module operates in the application layer of domain0, and each node reads parameter from the configuration file of client operating system respectively, parameter is passed over, such as node IP address.Start node and create socket, begin connection is monitored, initiate connection request after node is created socket but not start.This process is the obstruction mode, that is to say to have only to set up to connect well just can down to carry out afterwards;
The kernel state that enters into domain0 by the system call mode uses the ICP/IP protocol stack to finish establishment of connection, and the descriptor that will set up good connection respectively is saved in the connection array, uses when sending for data.
Step 3, data transmission procedure:
When client operating system skips leaf or can at first be trapped among the VMM during I/O device access;
VMM creates different requests according to the difference request to being absorbed in reason analysis, and the map addresses of the data that needs are sent joins in the request ring by write pointer then to domain0, and event channel is set at last;
Communication thread is waken up in the mode of event channel by VMM in the qemu-dm module, begins the request ring is carried out poll, need the transmission meeting by read pointer request be taken out as data, and sets up packet according to request, for packet adds the application layer head;
Data are imported the ICP/IP protocol stack among the domain0 into, by network interface card data are sent;
Destination node interrupts transferring to the ICP/IP protocol stack by network interface card after receiving data, protocol stack is preserved data behind the head that removes below the transport layer, and with the address be saved in one the ring descriptor in, by service ring write pointer its is put in the service ring and to go, inform that by event channel VMM handles then;
VMM reads service ring, obtains behind the address in the descriptor data map receiving data like this and finishing in client operating system.
Step 4, connection closed stage:
Client operating system is closed;
Discharge and start node resource and discharge remote resource by communication module;
Check whether data dispose in the ring;
With connection closed;
Discharge ring and with the related resource of communicating by letter;
The communication thread service is withdrawed from.

Claims (2)

1、一种分布式虚拟机监控器通信技术的实现方法,其特征在于:该实现方法步骤如下:1. An implementation method of distributed virtual machine monitor communication technology, characterized in that: the implementation method steps are as follows: 步骤一、准备阶段:Step 1. Preparation stage: 各结点VMM创建虚拟机并为虚拟机分配内存时为双环预留空间,告知虚拟机页面不可用,并初始化请求环和服务环结构;When each node VMM creates a virtual machine and allocates memory for the virtual machine, it reserves space for the double ring, informs the virtual machine that the page is unavailable, and initializes the structure of the request ring and the service ring; 各节点VMM分别在启动虚拟机时启动设备模拟模块dm,并将双环所在的页面映射到domain0地址空间内,以供domain0和工作在domain0之上的dm模块中应用程序使用;Each node VMM starts the device simulation module dm when starting the virtual machine, and maps the page where the double ring is located into the domain0 address space for use by domain0 and the application program in the dm module working on domain0; 在各节点VMM分别事件通道的初始化,事件通道由VMM管理,并呈现给domain0使用;Initialize the event channel of each node VMM, the event channel is managed by the VMM and presented to domain0 for use; 步骤二、建立连接阶段:Step 2. Establish connection stage: 当客户操作系统启动时:When the guest OS boots: 1.由管理模块启动客户操作系统,首先陷入到VMM中,VMM对客户操作系统进行分配核心资源后启动为客户操作系统提供设备模拟的qemu-dm模块;1. Start the guest operating system by the management module, first fall into the VMM, and after the VMM allocates core resources to the guest operating system, start the qemu-dm module that provides device simulation for the guest operating system; 启动qemu-dm模块中我们所实现的通信模块部分:Start the communication module part we implemented in the qemu-dm module: 为提高数据传输率,我们使用三个连接进行通信,分别是不同物理节点上的vcpu间,远程I/O设备访问,远程访存;In order to improve the data transmission rate, we use three connections for communication, namely between vcpus on different physical nodes, remote I/O device access, and remote memory access; qemu-dm模块运行在domain0的应用层,各节点分别从客户操作系统的配置文件读取参数,将参数传递过来,比如节点IP地址;启动节点创建socket,开始对连接进行监听,而非启动节点创建socket后发起连接请求;这个过程为阻塞方式,也就是说只有建立好连接之后才可往下进行;The qemu-dm module runs in the application layer of domain0. Each node reads parameters from the configuration file of the client operating system and passes them over, such as the node IP address; the startup node creates a socket and starts to monitor the connection instead of the startup node After the socket is created, a connection request is initiated; this process is blocked, that is to say, it can only proceed after the connection is established; 通过系统调用方式进入到domain0的内核态使用TCP/IP协议栈完成连接的建立;分别将建立好的连接的描述符保存到连接数组,以供数据发送时使用;Enter the kernel state of domain0 through the system call and use the TCP/IP protocol stack to complete the establishment of the connection; respectively save the established connection descriptors to the connection array for use when sending data; 步骤三、数据传输过程:Step 3. Data transmission process: 当客户操作系统发生缺页或者I/O设备访问时会首先陷入到VMM中;When a page fault or I/O device access occurs in the guest operating system, it will first fall into the VMM; VMM对陷入原因进行分析,根据不同请求创建不同的请求,将需要发送的数据的地址映射到domain0,然后通过写指针加入到请求环中,最后设置事件通道;VMM analyzes the cause of the trap, creates different requests according to different requests, maps the address of the data to be sent to domain0, and then adds it to the request ring through the write pointer, and finally sets the event channel; qemu-dm模块中通信线程被VMM以事件通道的方式唤醒,开始对请求环进行轮询,当有数据需要发送会通过读指针将请求取出来,并根据请求组建数据包,为数据包加应用层头部;The communication thread in the qemu-dm module is awakened by the VMM in the form of an event channel, and starts to poll the request ring. When there is data to be sent, the request will be fetched through the read pointer, and a data packet will be formed according to the request, and the application will be added to the data packet. layer header; 数据传入domain0中的TCP/IP协议栈,通过网卡将数据发送出去;The data is transmitted to the TCP/IP protocol stack in domain0, and the data is sent out through the network card; 目的节点接收到数据后由网卡中断交由TCP/IP协议栈,协议栈在去掉传输层以下的头部后将数据保存,并将地址保存到一个环描述符中,通过服务环写指针将它放入到服务环中去,然后通过事件通道告知VMM来处理;After the destination node receives the data, it is interrupted by the network card and handed over to the TCP/IP protocol stack. The protocol stack saves the data after removing the header below the transport layer, and saves the address in a ring descriptor, and writes it through the service ring pointer. Put it into the service ring, and then notify VMM to process it through the event channel; VMM读取服务环,得到描述符中的地址后将数据映射到客户操作系统中去,这样接收数据完成;VMM reads the service ring, and maps the data to the guest operating system after obtaining the address in the descriptor, so that the receiving data is completed; 步骤四、连接关闭阶段:Step 4, connection closing phase: 客户操作系统关闭;The guest operating system shuts down; 释放启动节点资源并通过通信模块释放远程资源;Release the boot node resources and release the remote resources through the communication module; 检查环中数据是否已经处理完毕;Check whether the data in the ring has been processed; 将连接关闭;close the connection; 释放环以及与通信相关资源;release ring and communication-related resources; 通信线程服务退出。The communication thread service exited.
CN2008102398997A 2008-12-23 2008-12-23 Intraconnection communication method of distributed virtual machine monitoring apparatus Expired - Fee Related CN101430674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102398997A CN101430674B (en) 2008-12-23 2008-12-23 Intraconnection communication method of distributed virtual machine monitoring apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102398997A CN101430674B (en) 2008-12-23 2008-12-23 Intraconnection communication method of distributed virtual machine monitoring apparatus

Publications (2)

Publication Number Publication Date
CN101430674A true CN101430674A (en) 2009-05-13
CN101430674B CN101430674B (en) 2010-10-20

Family

ID=40646080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102398997A Expired - Fee Related CN101430674B (en) 2008-12-23 2008-12-23 Intraconnection communication method of distributed virtual machine monitoring apparatus

Country Status (1)

Country Link
CN (1) CN101430674B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859263A (en) * 2010-06-12 2010-10-13 中国人民解放军国防科学技术大学 A Fast Communication Method Between Virtual Machines Supporting Online Migration
CN101957900A (en) * 2010-10-26 2011-01-26 中国航天科工集团第二研究院七○六所 Credible virtual machine platform
CN102045378A (en) * 2009-10-13 2011-05-04 杭州华三通信技术有限公司 Method for realizing full distribution of protocol stack process and distributed system
CN102708330A (en) * 2012-05-10 2012-10-03 深信服网络科技(深圳)有限公司 Method for preventing system from being invaded, invasion defense system and computer
CN102799465A (en) * 2012-06-30 2012-11-28 华为技术有限公司 Virtual interrupt management method and device of distributed virtual system
CN101667144B (en) * 2009-09-29 2013-02-13 北京航空航天大学 Virtual machine communication method based on shared memory
CN101751284B (en) * 2009-12-25 2013-04-24 华为技术有限公司 I/O resource scheduling method for distributed virtual machine monitor
CN103154891A (en) * 2010-10-01 2013-06-12 国际商业机器公司 Virtual machine stage detection
CN103701791A (en) * 2013-12-20 2014-04-02 中电长城网际系统应用有限公司 Server, terminal equipment, visual desktop system and operation method thereof
CN104956355A (en) * 2012-11-05 2015-09-30 Afl电信公司 Distributed test system architecture
CN105630576A (en) * 2015-12-23 2016-06-01 华为技术有限公司 Data processing method and apparatus in virtualization platform
CN106201349A (en) * 2015-12-31 2016-12-07 华为技术有限公司 A kind of method and apparatus processing read/write requests in physical host
CN106445642A (en) * 2016-10-27 2017-02-22 广东铂亚信息技术有限公司 Safety communication method based on virtual machine monitor and system
CN111241201A (en) * 2020-01-14 2020-06-05 厦门网宿有限公司 Distributed data processing method and system

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667144B (en) * 2009-09-29 2013-02-13 北京航空航天大学 Virtual machine communication method based on shared memory
CN102045378A (en) * 2009-10-13 2011-05-04 杭州华三通信技术有限公司 Method for realizing full distribution of protocol stack process and distributed system
CN102045378B (en) * 2009-10-13 2013-02-13 杭州华三通信技术有限公司 Method for realizing full distribution of protocol stack process and distributed system
CN101751284B (en) * 2009-12-25 2013-04-24 华为技术有限公司 I/O resource scheduling method for distributed virtual machine monitor
CN101859263A (en) * 2010-06-12 2010-10-13 中国人民解放军国防科学技术大学 A Fast Communication Method Between Virtual Machines Supporting Online Migration
CN101859263B (en) * 2010-06-12 2012-07-25 中国人民解放军国防科学技术大学 Quick communication method between virtual machines supporting online migration
CN103154891A (en) * 2010-10-01 2013-06-12 国际商业机器公司 Virtual machine stage detection
CN103154891B (en) * 2010-10-01 2016-03-23 国际商业机器公司 Virtual machine stage detects
CN101957900A (en) * 2010-10-26 2011-01-26 中国航天科工集团第二研究院七○六所 Credible virtual machine platform
CN102708330A (en) * 2012-05-10 2012-10-03 深信服网络科技(深圳)有限公司 Method for preventing system from being invaded, invasion defense system and computer
CN102799465A (en) * 2012-06-30 2012-11-28 华为技术有限公司 Virtual interrupt management method and device of distributed virtual system
CN102799465B (en) * 2012-06-30 2015-05-27 华为技术有限公司 Virtual interrupt management method and device of distributed virtual system
CN104956355A (en) * 2012-11-05 2015-09-30 Afl电信公司 Distributed test system architecture
US9882963B2 (en) 2012-11-05 2018-01-30 Afl Telecommunications Llc Distributed test system architecture
CN104956355B (en) * 2012-11-05 2018-10-09 Afl电信公司 Distributed test system framework
CN103701791A (en) * 2013-12-20 2014-04-02 中电长城网际系统应用有限公司 Server, terminal equipment, visual desktop system and operation method thereof
CN103701791B (en) * 2013-12-20 2017-09-01 中电长城网际系统应用有限公司 A kind of operating method of the virtual desktop based on virtual desktop system
CN105630576A (en) * 2015-12-23 2016-06-01 华为技术有限公司 Data processing method and apparatus in virtualization platform
CN105630576B (en) * 2015-12-23 2019-08-20 华为技术有限公司 Data processing method and device in a virtualization platform
CN106201349A (en) * 2015-12-31 2016-12-07 华为技术有限公司 A kind of method and apparatus processing read/write requests in physical host
CN106201349B (en) * 2015-12-31 2019-06-28 华为技术有限公司 A kind of method and apparatus handling read/write requests in physical host
US10579305B2 (en) 2015-12-31 2020-03-03 Huawei Technologies Co., Ltd. Method and apparatus for processing read/write request in physical machine
CN106445642A (en) * 2016-10-27 2017-02-22 广东铂亚信息技术有限公司 Safety communication method based on virtual machine monitor and system
CN111241201A (en) * 2020-01-14 2020-06-05 厦门网宿有限公司 Distributed data processing method and system
CN111241201B (en) * 2020-01-14 2023-02-07 厦门网宿有限公司 Distributed data processing method and system

Also Published As

Publication number Publication date
CN101430674B (en) 2010-10-20

Similar Documents

Publication Publication Date Title
CN101430674B (en) Intraconnection communication method of distributed virtual machine monitoring apparatus
US11934341B2 (en) Virtual RDMA switching for containerized
US8832688B2 (en) Kernel bus system with a hyberbus and method therefor
Huang et al. A case for high performance computing with virtual machines
US8776050B2 (en) Distributed virtual machine monitor for managing multiple virtual resources across multiple physical nodes
Lagar-Cavilla et al. Snowflock: rapid virtual machine cloning for cloud computing
CN104871493B (en) For the method and apparatus of the communication channel failure switching in high-performance calculation network
US20050044301A1 (en) Method and apparatus for providing virtual computing services
US8856801B2 (en) Techniques for executing normally interruptible threads in a non-preemptive manner
CN101271401B (en) A server farm system with a single system image
US8601496B2 (en) Method and system for protocol offload in paravirtualized systems
Ren et al. Shared-memory optimizations for inter-virtual-machine communication
US7761578B2 (en) Communicating in a virtual environment
US20050080982A1 (en) Virtual host bus adapter and method
JP2002342280A (en) Partitioned processing system, method for setting security in the same system and computer program thereof
Barham et al. Xen 2002
Zhang et al. Workload adaptive shared memory management for high performance network i/o in virtualized cloud
Dai et al. A lightweight VMM on many core for high performance computing
US11340932B2 (en) Packet handling based on multiprocessor architecture configuration
Sreenivasamurthy et al. Sivshm: Secure inter-vm shared memory
Gerangelos et al. vphi: Enabling xeon phi capabilities in virtual machines
Lu et al. Building efficient hpc cloud with sr-iov-enabled infiniband: The mvapich2 approach
CN114978589B (en) Lightweight cloud operating system and construction method thereof
CN110134491A (en) Information Processing Transmission Device
Zhang Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: HUAWEI TECHNOLOGY CO LTD

Free format text: FORMER OWNER: BEIJING AERONAUTICS AND ASTRONAUTICS UNIV.

Effective date: 20110926

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100191 HAIDIAN, BEIJING TO: 518129 SHENZHEN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20110926

Address after: 518129 headquarter office building of Bantian HUAWEI base, Longgang District, Shenzhen, Guangdong, China

Patentee after: Huawei Technologies Co., Ltd.

Address before: 100191 School of computer science and engineering, Beihang University, Xueyuan Road 37, Beijing, Haidian District

Patentee before: Beihang University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101020

Termination date: 20181223