CN103226692A

CN103226692A - A system and method for identifying video stream image frames

Info

Publication number: CN103226692A
Application number: CN2012104809177A
Authority: CN
Inventors: 王磊; 郑伟龙; 张文山; 姚以鹏; 陈曦
Original assignee: GUANGDONG SCIENCE CENTER
Current assignee: GUANGDONG SCIENCE CENTER
Priority date: 2012-11-22
Filing date: 2012-11-22
Publication date: 2013-07-31
Anticipated expiration: 2032-11-22
Also published as: CN103226692B

Abstract

The invention provides a system and a method for identifying video stream image frames, wherein a primary buffer area receives a data packet formed by an association gesture and divides the data packet into continuous image frames for storage, a matching unit sequentially takes out the image frames from the primary buffer area and stores the image frames into a secondary buffer area, the image frames stored in the secondary buffer area are matched with a conventional frame matching library, the image frames stored in the secondary buffer area are stored into a statement buffer area after the matching unit is successfully matched, and the association matching unit is started to carry out double-line matching identification. The invention provides a new visual gesture recognition scheme, which performs matching processing by utilizing frame-by-frame increment and a constant frame library, does not need to extract characteristic values, but performs matching by corresponding to the data frame library one by one, and directly intercepts the frame number for matching by an association recognition function after recognition, thereby greatly reducing the workload of a computer and obviously improving the operation speed and the matching precision.

Description

A system and method for identifying video stream image frames

技术领域 technical field

本发明涉及手势视频流识别技术领域，具体的涉及一种视频流图像帧的识别系统及方法。 The invention relates to the technical field of gesture video stream recognition, in particular to a system and method for recognizing video stream image frames. the

背景技术 Background technique

手势作为人类最自然的表达方式之一，在日常生活中得到了广泛的应用。其中，手语是用手势表示意思的语言，但对不熟悉手语的普通人来说，理解手语是非常困难的，所以如果有能够对手语进行翻译的技术，将大大方便聋哑人与正常人之间的交流。在手势及手语的识别中，一个关键环节便是手势跟踪。 As one of the most natural expressions of human beings, gestures are widely used in daily life. Among them, sign language is a language that uses gestures to express meaning, but it is very difficult for ordinary people who are not familiar with sign language to understand sign language. exchange between. In the recognition of gestures and sign language, a key link is gesture tracking. the

按现有技术，手势识别根据外围设备采集手势图像方式的不同可以分为：基于数据手套的手势识别和基于计算机视觉的手势识别。基于数据手套的手势识别是通过数据手套和位置跟踪来测量手势运动的轨迹和时序信息，其优点是识别率高，缺点是输入设备昂贵，并且要求打手势的人必须穿戴复杂的数据手套。 According to the prior art, gesture recognition can be divided into gesture recognition based on data glove and gesture recognition based on computer vision according to different ways of collecting gesture images by peripheral devices. Gesture recognition based on data gloves is to measure the trajectory and timing information of gesture movements through data gloves and position tracking. It has the advantage of high recognition rate, but the disadvantage is that the input equipment is expensive, and the gesturer must wear complex data gloves. the

而基于计算机视觉的手势识别，一般采用单目普通摄像头下的目标跟踪原理，其过程中比较难解决的一个问题就是遮挡，当一个目标物被另一个物体部分遮挡或完全遮挡时，跟踪的特征就会不完整或者消失，进而导致跟踪过程中断，此时需要重新检测目标物进行跟踪的重新初始化，非常不方便。为了解决这个问题，可以采用多个摄像头进行跟踪，但是跟踪算法会变得比较复杂，增加了技术难度和不稳定性。所以，基于计算机视觉的手势识别会使能够识别的手势种类受到很大的限制。 Gesture recognition based on computer vision generally uses the target tracking principle under a monocular ordinary camera. One of the more difficult problems in the process is occlusion. When a target is partially or completely occluded by another object, the tracking features It will be incomplete or disappear, which will lead to the interruption of the tracking process. At this time, it is necessary to re-detect the target to re-initialize the tracking, which is very inconvenient. In order to solve this problem, multiple cameras can be used for tracking, but the tracking algorithm will become more complicated, increasing technical difficulty and instability. Therefore, gesture recognition based on computer vision will greatly limit the types of gestures that can be recognized. the

为了解决上述问题，专利号为200810068423.1的中国专利“一种数据流图像帧的分割识别方法及其装置”，提供了积累一定的周期图像数据，通过判断区域图像是否满足识别区域的边界条件来进行模式识别，再通过特征值提取来进行模式比对从而得出想要的结果。该篇专利虽然提供了一定的技术解决方案，但如果实际进行应用则依然存在较多的缺陷，如识别种类少、识别速度慢、匹配精度不够等。 In order to solve the above problems, the Chinese patent No. 200810068423.1 "A Method and Device for Segmentation and Recognition of Data Stream Image Frames" provides a certain period of accumulated image data, which is carried out by judging whether the regional image satisfies the boundary conditions of the recognition region. Pattern recognition, and then pattern comparison through feature value extraction to obtain the desired result. Although this patent provides a certain technical solution, if it is actually applied, there are still many defects, such as few types of recognition, slow recognition speed, and insufficient matching accuracy. the

发明内容 Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种视频流图像帧的识别系统及方法，主要通过对图像帧的分割，再通过逐帧递增的方式将帧与常规常规帧匹配库进行匹配，从而避免了对来的图像数据进行提取关键帧的步骤，再通过联想功能进行联想，进而有效的缩短了识别匹配时间。 The purpose of the present invention is to overcome the deficiencies of the prior art and provide a video stream image frame recognition system and method, mainly through the segmentation of the image frame, and then match the frame with the conventional conventional frame matching library in a frame-by-frame increment mode , thereby avoiding the step of extracting key frames from the incoming image data, and then performing association through the associative function, thereby effectively shortening the identification and matching time. the

本发明通过以下技术方案实现： The present invention is realized through the following technical solutions:

一种视频流图像帧的识别系统，包括匹配单元、联想匹配单元、常规帧匹配库、语句缓存区、一级缓冲区和二级缓冲区；其中，一级缓冲区和二级缓冲区与相连；匹配单元与联想匹配单元、常规帧匹配库、语句缓存区、一级缓冲区、二级缓冲区相连；同时联想匹配单元与常规帧匹配库、语句缓存区、一级缓冲区、二级缓冲区相连。 A recognition system of a video stream image frame, including a matching unit, an associative matching unit, a conventional frame matching library, a statement buffer, a primary buffer, and a secondary buffer; wherein, the primary buffer and the secondary buffer are connected to ;The matching unit is connected with the associative matching unit, the conventional frame matching library, the statement buffer, the first-level buffer, and the second-level buffer; at the same time, the associative matching unit is connected with the conventional frame matching library, the statement buffer, the first-level buffer, and the second-level buffer area connected.

具体的，一级缓冲区接收联想手势形成的数据包并将其分割转存为连续的图像帧，匹配单元从一级缓冲区内依次取出图像帧存入二级缓冲区，并将二级缓冲区内存储的图像帧与常规帧匹配库进行匹配，在匹配单元匹配成功后将二级缓冲区内存储的图像帧存入语句缓存区，并启动联想匹配单元进行双线匹配识别。 Specifically, the first-level buffer receives the data packets formed by Lenovo gestures and divides them into continuous image frames, and the matching unit sequentially takes out the image frames from the first-level buffer and stores them in the second-level buffer, The image frames stored in the area are matched with the conventional frame matching library. After the matching unit is successfully matched, the image frames stored in the secondary buffer are stored in the sentence buffer area, and the associative matching unit is started for double-line matching recognition. the

其中，常规帧匹配库将每种相同帧数构成的手语分类存储，依次形成一帧匹配库、二帧匹配库、……N-1帧匹配库和N帧匹配库。 Among them, the conventional frame matching library classifies and stores each sign language composed of the same number of frames, and sequentially forms a one-frame matching library, a two-frame matching library, ... N-1 frame matching library and an N-frame matching library. the

举例来说，如：“我”、“你”、“他”，假如都由2个帧就可以构成，那么将这些由2个帧组成的手势都归纳在“二帧匹配库”中，其他以此类推。 For example, if "I", "You", and "He" can be composed of 2 frames, then these gestures composed of 2 frames are all summarized in the "two-frame matching library", and other and so on. the

其中，匹配单元将一级缓冲区内的图像帧从第一个图像帧开始依次递增存入二级缓冲区内，其中，当二级缓冲区内每存入新的一个图像帧时，匹配单元将二级缓冲区内存储的图像帧与常规帧匹配库进行匹配，匹配成功后将二级缓冲区内存储的图像帧存入语句缓存区。 Wherein, the matching unit stores the image frames in the primary buffer sequentially from the first image frame into the secondary buffer, wherein, when a new image frame is stored in the secondary buffer, the matching unit Match the image frames stored in the secondary buffer with the conventional frame matching library, and store the image frames stored in the secondary buffer into the statement buffer after the matching is successful. the

其中，在匹配单元匹配成功后，根据联想手势需要的帧数和二级缓冲区内存储的图像帧联想出三组词帧值（依次设为M1、M2、M3），清空二级缓冲区；接着开始匹配过程，通过三组词帧值依次读取一级缓冲区内未匹配的图像帧，将所读取的图像帧存入二级缓冲区并与常规帧匹配库进行匹配，如匹配成功则将二级缓冲区内存储的图像帧送往语句缓存区进行待处理；反复执行上述匹配过程，直至三组词帧值都不能匹配成功时，联想匹配单元结束双线匹配识别。 Among them, after the matching unit is successfully matched, three groups of word frame values are associated according to the number of frames required by the associative gesture and the image frames stored in the secondary buffer (set to M1, M2, M3 in turn), and the secondary buffer is cleared; Then start the matching process, read the unmatched image frames in the primary buffer sequentially through the three groups of word frame values, store the read image frames in the secondary buffer and match them with the conventional frame matching library, if the matching is successful The image frame stored in the secondary buffer is sent to the sentence buffer area for processing; the above matching process is repeatedly executed until the three groups of word frame values cannot be matched successfully, and the association matching unit ends the two-line matching recognition. the

其中，匹配单元即时检测语句缓存区内所存储的图像帧，当语句缓存区内所存储的图像帧能够形成完整的句式时，由匹配单元优化并输出结果。 Wherein, the matching unit immediately detects the image frames stored in the sentence buffer, and when the image frames stored in the sentence buffer can form a complete sentence pattern, the matching unit optimizes and outputs the result. the

根据上述系统所实现的一种视频流图像帧的识别方法，包括以下步骤： According to the recognition method of a kind of video stream image frame realized by above-mentioned system, comprise the following steps:

1）匹配单元运行； 1) The matching unit runs;

2）对当前接收到的数据包分割成连续的图像帧，将分割完成的连续图像帧存储在一级缓冲区中； 2) Divide the currently received data packet into continuous image frames, and store the divided continuous image frames in the primary buffer;

3）将一级缓冲区内的连续图像帧从还未进行匹配的第一帧开始依次递增存入二级缓冲区，再与常规帧匹配库进行匹配处理，如匹配成功则进行步骤5，如匹配不成功则进行步骤4； 3) The consecutive image frames in the primary buffer are sequentially stored in the secondary buffer from the first frame that has not yet been matched, and then matched with the conventional frame matching library. If the matching is successful, go to step 5, such as If the matching is unsuccessful, proceed to step 4;

4）将匹配不成功的图像帧加上其后面的一个图像帧一起，与常规帧匹配库进行再一次匹配；如匹配成功则进行步骤5，如匹配不成功则再一次进行步骤4； 4) Match the unsuccessful image frame and the image frame behind it with the conventional frame matching library; if the match is successful, go to step 5; if the match is unsuccessful, go to step 4 again;

5）将匹配成功的图像帧送往语句缓存区进行待处理，当语句缓存区内所存储的图像帧能够形成完整的句式时，进行步骤8； 5) Send the successfully matched image frames to the sentence buffer area for processing, and when the image frames stored in the sentence buffer area can form a complete sentence pattern, go to step 8;

6）联想匹配单元进行双线匹配识别； 6) The associative matching unit performs double-line matching recognition;

7）根据联想手势需要的帧数和二级缓冲区内存储的图像帧联想出三组词帧值（依次设为M1、M2、M3），清空二级缓冲区；接着开始匹配过程，通过三组词帧值依次读取一级缓冲区内未匹配的图像帧，将所读取的图像帧存入二级缓冲区并与常规帧匹配库进行匹配，如匹配成功则将二级缓冲区内存储的图像帧送往语句缓存区进行待处理，当语句缓存区内所存储的图像帧能够形成完整的句式时则进行步骤8；反复执行上述的匹配过程，直至三组词帧值都不能匹配成功时，联想匹配单元结束双线匹配识别，清空二级缓冲区，返回步骤3； 7) According to the number of frames required by the associative gesture and the image frames stored in the secondary buffer, three sets of word frame values are associated (set to M1, M2, M3 in turn), and the secondary buffer is cleared; The word frame value sequentially reads the unmatched image frames in the first-level buffer, stores the read image frames in the second-level buffer and matches them with the conventional frame matching library, and if the match is successful, the The stored image frame is sent to the sentence buffer area for processing, and when the image frame stored in the sentence buffer area can form a complete sentence pattern, step 8 is carried out; the above-mentioned matching process is repeatedly carried out until the three groups of word frame values cannot When the matching is successful, the Lenovo matching unit ends the two-line matching identification, clears the secondary buffer, and returns to step 3;

8）对语句缓存区内所存储的图像帧进行优化排列处理； 8) Optimize and arrange the image frames stored in the statement buffer;

9）输出结果。 9) Output the result.

其中，所述的步骤8中，因为手语语法与人们正常的说话语法不同，所以需要进行优化处理，如：手语打的比如：一杯可乐，应该打可乐一（杯），这个时候就需要用到优化把可乐一翻译成一杯可乐。 Among them, in the step 8, because the sign language grammar is different from people's normal speech grammar, it needs to be optimized, such as: sign language typing, for example: a cup of Coke, should be a Coke (cup), this time you need to use Optimize the translation of Coke into a glass of Coke. the

其中，所述的步骤7中包括如下匹配过程： Wherein, described step 7 includes the following matching process:

7.1）将词帧值M1发往一级缓存区，读取一级缓冲区内未匹配的图像帧，将所读取的图像帧存入二级缓冲区并与常规帧匹配库进行匹配，如匹配不成功则进行步骤7.2，如匹配成功则将二级缓冲区内存储的图像帧送往语句缓存区进行待处理，当语句缓存区内所存储的图像帧能够形成完整的句式时则进行所述的步骤8，否则继续进行步骤7.1； 7.1) Send the word frame value M1 to the first-level buffer, read the unmatched image frame in the first-level buffer, store the read image frame in the second-level buffer and match it with the conventional frame matching library, such as If the matching is not successful, proceed to step 7.2. If the matching is successful, the image frame stored in the secondary buffer is sent to the statement buffer for processing. When the image frame stored in the statement buffer can form a complete sentence pattern, then proceed to Step 8, otherwise proceed to step 7.1;

7.2）如不能匹配成功，则将词帧值M2发往一级缓存区，读取一级缓冲区内未匹配的图像帧，将所读取的图像帧存入二级缓冲区并与常规帧匹配库进行匹配，如匹配不成功则进行步骤7.3；如匹配成功则将二级缓冲区内存储的图像帧送往语句缓存区进行待处理，当语句缓存区内所存储的图像帧能够形成完整的句式时则进行所述的步骤8，否则进行步骤7.1； 7.2) If the match cannot be successful, send the word frame value M2 to the primary buffer, read the unmatched image frame in the primary buffer, store the read image frame in the secondary buffer and compare it with the regular frame The matching library performs matching, and if the matching is unsuccessful, proceed to step 7.3; if the matching is successful, the image frame stored in the secondary buffer is sent to the statement buffer area for processing, and when the image frame stored in the statement buffer area can form a complete sentence pattern, then proceed to step 8, otherwise proceed to step 7.1;

7.3）如不能匹配成功，则将词帧值M3发往一级缓存区，读取一级缓冲区内未匹配的图像帧，将所读取的图像帧存入二级缓冲区并与常规帧匹配库进行匹配，如匹配不成功则进行步骤7.4；如匹配成功则将二级缓冲区内存储的图像帧送往语句缓存区进行待处理，当语句缓存区内所存储的图像帧能够形成完整的句式时则进行所述的步骤8，否则进行步骤7.1； 7.3) If the match cannot be successful, send the word frame value M3 to the primary buffer, read the unmatched image frame in the primary buffer, store the read image frame in the secondary buffer and compare it with the regular frame The matching library performs matching, and if the matching is unsuccessful, proceed to step 7.4; if the matching is successful, the image frame stored in the secondary buffer is sent to the statement buffer area for processing, and when the image frame stored in the statement buffer area can form a complete sentence pattern, then proceed to step 8, otherwise proceed to step 7.1;

7.4）联想匹配单元结束双线匹配识别，清空二级缓冲区，返回所述步骤3。 7.4) The association matching unit ends the two-line matching identification, clears the secondary buffer, and returns to step 3.

与现有技术相比，本发明具有以下有益效果： Compared with the prior art, the present invention has the following beneficial effects:

本发明提供了新的视觉手势识别方案，利用逐帧递增与常数帧库进行匹配处理，不需要提取特征值，而是通过逐一对应数据帧库进行匹配，而当识别出后由联想识别功能直接截取帧数进匹配，大大减少了计算机的工作量，显著提高了运算速度和匹配精度，有效解决了基于计算机视觉手势方式识别种类少、识别速度慢的问题，能够适用于各个领域中，应用前景广泛，具备突出的效率性。 The present invention provides a new visual gesture recognition scheme, which uses frame-by-frame increments and constant frame databases for matching processing, without extracting feature values, but by matching data frame databases one by one, and when recognized, the associative recognition function directly Intercepting the number of frames for matching greatly reduces the workload of the computer, significantly improves the computing speed and matching accuracy, and effectively solves the problem of few types of gesture recognition and slow recognition speed based on computer vision gestures. It can be applied to various fields and has a promising application prospect. Extensive, with outstanding efficiency.

附图说明 Description of drawings

下面将结合实施例和附图对本发明作进一步的详细描述： The present invention will be described in further detail below in conjunction with embodiment and accompanying drawing:

图1为本发明一具体实施例的系统结构示意图； Fig. 1 is a schematic diagram of the system structure of a specific embodiment of the present invention;

图2为本发明一具体实施例的方法流程示意图； Fig. 2 is the schematic flow chart of the method of a specific embodiment of the present invention;

图3为本发明一具体实施例的方法细致流程示意图； Fig. 3 is a detailed flow diagram of the method of a specific embodiment of the present invention;

图4为本发明一具体实施例的联想匹配单元的流程示意图； Fig. 4 is a schematic flow diagram of an associative matching unit according to a specific embodiment of the present invention;

图5为为本发明一具体实施例的常规帧匹配库的结构示意图。 FIG. 5 is a schematic structural diagram of a conventional frame matching library according to a specific embodiment of the present invention. ``

具体实施方式 Detailed ways

下面结合实施例及附图，对本发明作进一步的详细说明，但理应理解本发明的实施方式并不限于此。 The present invention will be further described in detail below with reference to the embodiments and the accompanying drawings, but it should be understood that the embodiments of the present invention are not limited thereto. the

如图1所示为本发明的一种视频流图像帧的识别系统，包括匹配单元、联想匹配单元、常规帧匹配库、语句缓存区、一级缓冲区和二级缓冲区；其中，一级缓冲区和二级缓冲区与相连；匹配单元与联想匹配单元、常规帧匹配库、语句缓存区、一级缓冲区、二级缓冲区相连；同时联想匹配单元与常规帧匹配库、语句缓存区、一级缓冲区、二级缓冲区相连。 As shown in Figure 1, it is a recognition system of a video stream image frame of the present invention, including a matching unit, an associative matching unit, a conventional frame matching library, a statement buffer, a primary buffer and a secondary buffer; wherein, the primary The buffer is connected to the secondary buffer; the matching unit is connected to the Lenovo matching unit, the conventional frame matching library, the statement buffer, the primary buffer, and the secondary buffer; at the same time, the Lenovo matching unit is connected to the conventional frame matching library and the statement buffer , the primary buffer, and the secondary buffer are connected. the

具体的，如图2所示，一级缓冲区接收联想手势形成的数据包并将其分割转存为连续的图像帧，匹配单元从一级缓冲区内依次取出图像帧存入二级缓冲区，并将二级缓冲区内存储的图像帧与常规帧匹配库进行匹配，在匹配单元匹配成功后将二级缓冲区内存储的图像帧存入语句缓存区，并启动联想匹配单元进行双线匹配识别。 Specifically, as shown in Figure 2, the first-level buffer receives the data packets formed by Lenovo gestures and divides them into continuous image frames, and the matching unit sequentially takes out the image frames from the first-level buffer and stores them in the second-level buffer , and match the image frames stored in the secondary buffer with the conventional frame matching library, and store the image frames stored in the secondary buffer into the statement buffer after the matching unit is successfully matched, and start the associative matching unit for two-line match recognition. the

其中，如图5所示，常规帧匹配库将每种相同帧数构成的手语分类存储，依次形成一帧匹配库、二帧匹配库、……N-1帧匹配库和N帧匹配库。 Among them, as shown in Fig. 5, the conventional frame matching library classifies and stores the sign language composed of each kind of the same number of frames, and sequentially forms a one-frame matching library, two-frame matching library, ... N-1 frame matching library and N-frame matching library. the

举例来说：“我”、“你”、“他”，假如都由2个帧就可以构成，那么将这些由2个帧组成的手势都归纳在“二帧匹配库”中，其他类型的帧按现有技术和行业知识，以此类推。 For example: "I", "You", and "He", if they can all be composed of 2 frames, then these gestures composed of 2 frames are all summarized in the "two-frame matching library", and other types of gestures Frame by prior art and industry knowledge, and so on. the

具体的，如图4所示，在匹配单元匹配成功后，根据联想手势需要的帧数和二级缓冲区内存储的图像帧联想出三组词帧值（可依次设为M1、M2、M3），清空二级缓冲区；接着开始匹配过程，通过三组词帧值依次读取一级缓冲区内未匹配的图像帧，将所读取的图像帧存入二级缓冲区并与常规帧匹配库进行匹配，如匹配成功则将二级缓冲区内存储的图像帧送往语句缓存区进行待处理；反复执行上述匹配过程，直至三组词帧值都不能匹配成功时，联想匹配单元结束双线匹配识别。 Specifically, as shown in Figure 4, after the matching unit is successfully matched, three groups of word frame values are associated according to the number of frames required by the associative gesture and the image frames stored in the secondary buffer (which can be set to M1, M2, M3 in turn ), clear the secondary buffer; then start the matching process, read the unmatched image frames in the primary buffer sequentially through the three groups of word frame values, store the read image frames in the secondary buffer and compare them with the regular frames The matching library performs matching, and if the matching is successful, the image frame stored in the secondary buffer is sent to the sentence buffer area for processing; the above matching process is repeatedly executed until the three groups of word frame values cannot be matched successfully, and the association matching unit ends Two-line match recognition. the

其中，匹配单元即时检测语句缓存区内所存储的图像帧，当语句缓存区内所存储的图像帧能够形成完整的句式时，由匹配单元优化并输出结果。 Among them, the matching unit immediately detects the image frames stored in the sentence buffer, and when the image frames stored in the sentence buffer can form a complete sentence, the matching unit optimizes and outputs the result. the

如图2~图3所示，根据上述系统所实现的一种视频流图像帧的识别方法，包括以下步骤： As shown in Figures 2 to 3, a method for identifying video stream image frames implemented according to the above system includes the following steps:

1）运行匹配单元； 1) Run the matching unit;

4）将匹配不成功的图像帧加上其后面的一个图像帧一起，与常规帧匹配库进行再一次匹配；如匹配成功则进行步骤5，如匹配不成功则再一次进行步骤4； 4) Match the unsuccessful image frame and the following image frame with the conventional frame matching library; if the match is successful, go to step 5; if the match is unsuccessful, go to step 4 again;

9）输出结果。 9) Output the result.

其中，所述的步骤8中，因为手语语法与人们正常的说话语法不同，所以需要进行优化处理，如：手语打的比如：一杯可乐，应该打“可乐一（杯）”，如二包吃的，应该打“吃的二（包）”这个时候就需要用到优化把可乐一翻译成一杯可乐。 Among them, in the step 8, because the sign language grammar is different from people's normal speech grammar, it needs to be optimized. For example, if you type a sign language, for example: a cup of Coke, you should type "Coke One (cup)", such as two packs to eat Yes, it should be typed as "eating two (package)". At this time, you need to use optimization to translate Coke One into a cup of Coke. the

Claims

1. the recognition system of a video streaming image frame is characterized in that comprising matching unit, association's matching unit, conventional frame coupling storehouse, subquery cache district, first-level buffer district and level 2 buffering district; Wherein, first-level buffer district and level 2 buffering district with link to each other; Matching unit links to each other with association matching unit, conventional frame coupling storehouse, subquery cache district, first-level buffer district, level 2 buffering district; Associating matching unit simultaneously links to each other with conventional frame coupling storehouse, subquery cache district, first-level buffer district, level 2 buffering district; Concrete, the first-level buffer district receives the packet of association's gesture formation and it is cut apart unloading is the continuous images frame, matching unit takes out picture frame successively in the first-level buffer district and deposits the level 2 buffering district in, and the picture frame of level 2 buffering district stored and conventional frame coupling storehouse mated, deposit the picture frame of level 2 buffering district stored in the subquery cache district at matching unit after the match is successful, and start association's matching unit and carry out the identification of two-wire coupling.

2. the recognition system of video streaming image frame according to claim 1, it is characterized in that described conventional frame coupling storehouse with the sign language classification and storage that every kind of same number of frames constitutes, form successively frame coupling storehouse, two frames coupling storehouse ... N-1 frame coupling storehouse and N frame coupling storehouse.

3. the recognition system of video streaming image frame according to claim 1, it is characterized in that described matching unit begins the picture frame in the first-level buffer district to increase progressively successively from first picture frame deposits in the level 2 buffering district, wherein, when whenever depositing new picture frame in the level 2 buffering district in, matching unit mates the picture frame and the conventional frame coupling storehouse of level 2 buffering district stored, and the picture frame with level 2 buffering district stored after the match is successful deposits the subquery cache district in.

4. the recognition system of video streaming image frame according to claim 1, it is characterized in that at described matching unit after the match is successful, go out three groups of speech frame values (being made as M1, M2, M3 successively) according to the frame number of association's gesture needs and the picture frame association of level 2 buffering district stored, empty the level 2 buffering district; Then begin matching process, read the picture frame that does not mate in the first-level buffer district successively by three groups of speech frame values, deposit the picture frame that is read in the level 2 buffering district and mate, then the picture frame of level 2 buffering district stored is sent to the subquery cache district as the match is successful and carries out pending with conventional frame coupling storehouse; Carry out above-mentioned matching process repeatedly, when three groups of speech frame values all can not the match is successful, association's matching unit finished the identification of two-wire coupling.

5. the recognition system of video streaming image frame according to claim 1, it is characterized in that the interior picture frame of being stored of the instant inspect statement buffer area of described matching unit, when the picture frame of being stored in the while statement buffer area can form complete sentence formula, by matching unit optimization and output result.

6. the recognition methods of the video streaming image frame that system according to claim 1 realizes is characterized in that may further comprise the steps:

1) matching unit operation;

2) the current packet that receives is divided into the continuous images frame, will cuts apart the successive image frame of finishing and be stored in the first-level buffer district;

3) successive image frame in the first-level buffer district is begun to increase progressively successively from first frame that also mates deposit the level 2 buffering district in, carry out matching treatment with conventional frame coupling storehouse again, then carry out step 5, as mate and unsuccessfully then carry out step 4 as the match is successful;

4) will mate a picture frame that unsuccessful picture frame adds its back together, mate again with conventional frame coupling storehouse; Then carry out step 5 as the match is successful, as mate and unsuccessful then carry out step 4 again;

5) picture frame that will the match is successful is sent to the subquery cache district and carries out pendingly, when the picture frame of being stored in the while statement buffer area can form complete sentence formula, carry out step 8;

6) association's matching unit carries out the identification of two-wire coupling;

7) go out three groups of speech frame values (being made as M1, M2, M3 successively) according to the frame number of association's gesture needs and the picture frame association of level 2 buffering district stored, empty the level 2 buffering district; Then begin matching process, read the picture frame that does not mate in the first-level buffer district successively by three groups of speech frame values, deposit the picture frame that is read in the level 2 buffering district and mate with conventional frame coupling storehouse, then the picture frame of level 2 buffering district stored is sent to the subquery cache district as the match is successful and carries out pendingly, then carry out step 8 when the picture frame of being stored in the while statement buffer area can form complete sentence formula; Carry out above-mentioned matching process repeatedly, when three groups of speech frame values all can not the match is successful, association's matching unit finished the identification of two-wire coupling, empties the level 2 buffering district, returns step 3;

8) picture frame of being stored in the subquery cache district being optimized arrangement handles;

9) output result.

7. the recognition methods of video streaming image frame according to claim 6 is characterized in that comprising in the described step 7 following matching process:

7.1) speech frame value M1 is mail to the level cache district, read the picture frame that does not mate in the first-level buffer district, deposit the picture frame that is read in the level 2 buffering district and mate with conventional frame coupling storehouse, as mate and unsuccessful then carry out step 7.2, then the picture frame of level 2 buffering district stored being sent to the subquery cache district as the match is successful carries out pending, the picture frame of being stored in the while statement buffer area then carries out described step 8 in the time of can forming complete sentence formula, otherwise proceeds step 7.1;

7.2) as can not the match is successful, then speech frame value M2 is mail to the level cache district, read in the first-level buffer district the not picture frame of coupling, deposit the picture frame that is read in the level 2 buffering district and mate, as mate and unsuccessfully then carry out step 7.3 with conventional frame coupling storehouse; Then the picture frame of level 2 buffering district stored is sent to the subquery cache district as the match is successful and carries out pendingly, the picture frame of being stored in the while statement buffer area then carries out described step 8 in the time of can forming complete sentence formula, otherwise carry out step 7.1;

7.3) as can not the match is successful, then speech frame value M3 is mail to the level cache district, read in the first-level buffer district the not picture frame of coupling, deposit the picture frame that is read in the level 2 buffering district and mate, as mate and unsuccessfully then carry out step 7.4 with conventional frame coupling storehouse; Then the picture frame of level 2 buffering district stored is sent to the subquery cache district as the match is successful and carries out pendingly, the picture frame of being stored in the while statement buffer area then carries out described step 8 in the time of can forming complete sentence formula, otherwise carry out step 7.1;

7.4) association's matching unit end two-wire coupling identification.