[go: up one dir, main page]

CN111580649B - Deep learning-based aerial handwriting interaction method and system - Google Patents

Deep learning-based aerial handwriting interaction method and system Download PDF

Info

Publication number
CN111580649B
CN111580649B CN202010334825.2A CN202010334825A CN111580649B CN 111580649 B CN111580649 B CN 111580649B CN 202010334825 A CN202010334825 A CN 202010334825A CN 111580649 B CN111580649 B CN 111580649B
Authority
CN
China
Prior art keywords
data
action
neural network
data set
action data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010334825.2A
Other languages
Chinese (zh)
Other versions
CN111580649A (en
Inventor
曹明亮
张浩洋
李鸣棠
曾瑜晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN202010334825.2A priority Critical patent/CN111580649B/en
Publication of CN111580649A publication Critical patent/CN111580649A/en
Application granted granted Critical
Publication of CN111580649B publication Critical patent/CN111580649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/014Hand-worn input/output arrangements, e.g. data gloves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an aerial handwriting interaction method and system based on deep learning, wherein action data are acquired through a sensor; preprocessing the action data to obtain an action data set; constructing a neural network and training the neural network through the action data set after manual labeling to serve as an action recognition model; identifying action data acquired by the sensor through an action identification model; the invention has the main functions of facilitating the text input of equipment such as virtual reality, augmented reality and the like, and has better naturalness and high efficiency compared with the most commonly used information input interaction equipment such as a keyboard and a tablet at present; compared with other interaction modes, such as voice interaction, the method has stronger robustness; the neural network algorithm model is adopted for carrying out attitude estimation analysis, so that the accuracy is higher.

Description

一种基于深度学习的空中手写交互方法及系统A method and system for in-air handwriting interaction based on deep learning

技术领域technical field

本公开涉及深度学习、数据通信相结合的技术,具体涉及一种基于深度学习的空中手写交互方法及系统。The present disclosure relates to a technology combining deep learning and data communication, and in particular to a method and system for in-air handwriting interaction based on deep learning.

背景技术Background technique

随着可穿戴设备、智能家居、物联网等领域在科技圈的大热以及落地,全面打造智能化的生活成为了接下来的聚焦点,而多通道、多媒体的智能人机交互方式会逐渐成为实现这种生活的关键部分。以虚拟现实为代表的新的人机交互环境和以手持电脑、智能手机为代表的移动交互平台,是当前计算机的两个重要的发展趋势。而以鼠标和键盘为代表的人机交互技术是影响它们发展的主要瓶颈。利用人的多种感知交互通道(如语音、手写、姿势、视线、表情等),以并行、非精确的方式与(可见或不可见的)计算机环境进行交互,可以提高人机交互的自然性和高效性。多通道、多媒体的智能人机交互对我们既是一个挑战,也是一个极好的机遇。With the popularity and implementation of wearable devices, smart home, Internet of Things and other fields in the technology circle, building an intelligent life in an all-round way has become the next focus, and multi-channel, multimedia intelligent human-computer interaction methods will gradually become A key part of achieving this kind of life. The new human-computer interaction environment represented by virtual reality and the mobile interactive platform represented by handheld computers and smart phones are two important development trends of current computers. The human-computer interaction technology represented by mouse and keyboard is the main bottleneck affecting their development. Interacting with the (visible or invisible) computer environment in a parallel and non-precise manner by using multiple perceptual interaction channels of humans (such as voice, handwriting, gesture, sight, expression, etc.) can improve the naturalness of human-computer interaction and efficiency. Multi-channel, multimedia intelligent human-computer interaction is both a challenge and an excellent opportunity for us.

空中手写识别是手势识别的一个重要分支,其主要作用是便于虚拟现实、增强现实等设备的文字输入。这些设备重点在于沉浸式体验,因而不可能像电脑端使用鼠标键盘进行人机交互,这时空中手写的研发具有重大意义。In-air handwriting recognition is an important branch of gesture recognition, and its main function is to facilitate text input in virtual reality, augmented reality and other devices. These devices focus on immersive experience, so it is impossible to use a mouse and keyboard for human-computer interaction like a computer. At this time, the research and development of handwriting in the air is of great significance.

发明内容Contents of the invention

本发明的目的在于提出一种基于深度学习的空中手写交互方法及系统,以解决现有技术中所存在的一个或多个技术问题,至少提供一种有益的选择或创造条件。The purpose of the present invention is to propose a method and system for in-air handwriting interaction based on deep learning, so as to solve one or more technical problems existing in the prior art, and at least provide a beneficial option or create conditions.

为解决上述问题,本公开提供一种基于深度学习的空中手写交互方法及系统的技术方案,通过传感器采集动作数据;对动作数据进行预处理得到动作数据集;构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型;通过动作识别模型识别传感器采集的动作数据。In order to solve the above problems, the present disclosure provides a deep learning-based air handwriting interaction method and a technical solution of the system, which collects motion data through sensors; preprocesses the motion data to obtain motion data sets; constructs a neural network and passes manual annotation The action data set trains the neural network as an action recognition model; the action data collected by the sensor is recognized through the action recognition model.

为了实现上述目的,根据本公开的一方面,提供一种基于深度学习的空中手写交互方法,所述方法包括以下步骤:In order to achieve the above object, according to one aspect of the present disclosure, a method for in-air handwriting interaction based on deep learning is provided, the method includes the following steps:

将动捕手套佩戴于手上;拇指触碰触摸传感器,挥动食指完成文字空中书写动作;其中,动捕手套设置有多个传感器,所述传感器为陀螺仪。Wear the motion capture glove on your hand; touch the touch sensor with your thumb, and wave your index finger to complete the action of writing in the air; wherein, the motion capture glove is equipped with multiple sensors, and the sensors are gyroscopes.

S100,通过传感器采集动作数据;S100, collecting action data through sensors;

S200,对动作数据进行预处理得到动作数据集;S200, preprocessing the action data to obtain an action data set;

S300,构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型;S300, constructing a neural network and training the neural network as an action recognition model through the manually marked action data set;

S400,通过动作识别模型识别传感器采集的动作数据。S400. Recognize the motion data collected by the sensor by using the motion recognition model.

进一步地,在S100中,所述传感器为陀螺仪。Further, in S100, the sensor is a gyroscope.

进一步地,在S200中,对动作数据进行预处理的方法包括但不限于卡尔曼滤波、均值中心化、主成分分析任意一种或多种。Further, in S200, the method for preprocessing the action data includes but not limited to any one or more of Kalman filtering, mean centering, and principal component analysis.

进一步地,在S300中,所谓神经网络为LSTM、BP中任意一种神经网络。Further, in S300, the so-called neural network is any neural network among LSTM and BP.

进一步地,在S300中,构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型的方法为:Further, in S300, the method of constructing a neural network and training the neural network as an action recognition model through the manually marked action data set is as follows:

S301:构建LSTM、BP中任意一种神经网络,并将神经网络的输入设置为传感器采集的动作数据,输出设置为判断的文字;S301: Construct any neural network among LSTM and BP, and set the input of the neural network as the action data collected by the sensor, and set the output as the judgment text;

S302:对动作数据的数据集进行人工标注的方法为:通过采样,以N个动作数据的长度作为一个数据组(例如N=60,为每个数据组为60个动作数据),从数据库的预存的数据组中读取与每个数据组的相似度前三高所对应的字母、文字或符号(即相似度进行排序后,排前三的相似度所代表的数据组),(也即,获取最大的3个似度值,将所述3个相似度值对应的预存的数据组中的字母、文字或符号同时输出,对输出的字母、文字或符号进行人工标注,标注出正确的字母、文字或符号;),所述数据库中存储有多个预存的数据组,每个预存的数据组对应表示一个字母、文字或符号,进而人工从相似度前三高中所对应的预存的数据组所对应的字母、文字或符号中同时输出的字母、文字或符号中标注出正确的字母、文字或符号;N默认设置为60个动作数据,可人工设置调整;其中数据库的预存的数据组中读取与每个数据组的相似度的计算方法为,令每个数据组为Act,数据库的预存的数据组为Actj,i=1…m,m为数据库中的预存的数据组的数据数量,则Act与Actj之间的相似度为数据组Act与每个数据库的预存的数据组Actj的相同动作数据个数,即如果Act和Actj中有10个相同动作数据则相似度为10;如果存在相似度相同的预存的数据组,则把相似度相同的预存的数据组所对应的字母、文字或符号中同时输出以供人工进行选择标注,将标注完成后的动作数据的数据集作为标注数据集。S302: The method for manually labeling the data set of action data is: by sampling, the length of N action data is used as a data group (for example, N=60, 60 action data for each data group), from the database Read the letters, characters or symbols corresponding to the top three similarities of each data group in the pre-stored data group (that is, after the similarity is sorted, the data group represented by the top three similarities), (that is, , obtain the three largest similarity values, output the letters, characters or symbols in the pre-stored data groups corresponding to the three similarity values at the same time, manually mark the output letters, characters or symbols, and mark the correct Letters, characters or symbols;), there are multiple pre-stored data groups stored in the database, and each pre-stored data group corresponds to a letter, character or symbol, and then manually from the pre-stored data corresponding to the top three high schools of similarity The correct letters, characters or symbols are marked in the letters, characters or symbols corresponding to the group; the default setting of N is 60 action data, which can be manually set and adjusted; among them, the prestored data groups in the database The calculation method for reading the similarity with each data group is as follows: Let each data group be Act, the pre-stored data group in the database is Act j , i=1...m, m is the number of the pre-stored data group in the database The number of data, the similarity between Act and Act j is the number of the same action data in the data group Act and the pre-stored data group Act j in each database, that is, if there are 10 identical action data in Act and Act j , they are similar The degree is 10; if there are pre-stored data groups with the same similarity, then output the letters, characters or symbols corresponding to the pre-stored data groups with the same similarity at the same time for manual selection and labeling, and the action data after the labeling is completed The data set is used as the labeled data set.

S302:通过标注数据集训练神经网络得到动作识别模型;S302: Obtain an action recognition model by training the neural network through the labeled data set;

其中,训练神经网络采用反向传播算法,网络参数梯度的计算采用随机梯度下降法,采用交叉熵(cross entropy)计算分类误差,迭代进行这一过程直到满足平均误差不再下降为止;其损失函数L为:Among them, the training neural network adopts the backpropagation algorithm, the calculation of the gradient of the network parameters adopts the stochastic gradient descent method, and the cross entropy (cross entropy) is used to calculate the classification error, and this process is iteratively carried out until the average error is satisfied; the loss function L is:

Figure BDA0002466198350000031
Figure BDA0002466198350000031

L是损失函数,yj为真实值标签,sj是softmax逻辑回归模型的输出向量s的第j个值,表示的是这个样本属于第j个类别的概率。L is the loss function, y j is the true value label, and s j is the jth value of the output vector s of the softmax logistic regression model, indicating the probability that this sample belongs to the jth category.

进一步地,在S400中,通过动作识别模型识别传感器采集的动作数据为通过动作识别模型识别传感器采集的动作数据在数据库中对应的字母、文字或符号。Further, in S400, the motion data collected by the motion recognition model recognition sensor is the letter, character or symbol corresponding to the motion data collected by the motion recognition model recognition sensor in the database.

本发明还提供了一种基于深度学习的空中手写交互系统,所述系统包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序运行在以下系统的单元中:The present invention also provides a deep learning-based air handwriting interaction system, the system comprising: a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor executing Said computer program runs in units of the following systems:

动作采集单元,用于通过传感器采集动作数据;an action acquisition unit, configured to acquire action data through sensors;

预处理单元,用于对动作数据进行预处理得到动作数据集;A preprocessing unit, configured to preprocess the action data to obtain an action data set;

识别模型构建单元,用于构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型;The recognition model construction unit is used to construct a neural network and train the neural network as an action recognition model through the manually marked action data set;

动作识别模型单元,用于通过动作识别模型识别传感器采集的动作数据。The motion recognition model unit is used to recognize the motion data collected by the sensor through the motion recognition model.

本公开的有益效果为:本发明提供一种基于深度学习的空中手写交互方法及系统,其主要作用是便于虚拟现实、增强现实等设备的文字输入,对比目前最常用的文字输入交互设备,如键盘、写字板,本发明具有更好的自然性与高效性;对比其他交互方式,如语音交互,本发明具有更强的鲁棒性;采用神经网络算法模型进行姿态估计分析,具有更高的准确率。The beneficial effects of the present disclosure are as follows: the present invention provides a method and system for in-air handwriting interaction based on deep learning, and its main function is to facilitate the text input of devices such as virtual reality and augmented reality. keyboard, writing board, the present invention has better naturalness and high efficiency; compared with other interactive modes, such as voice interaction, the present invention has stronger robustness; adopting the neural network algorithm model for attitude estimation analysis has higher Accuracy.

附图说明Description of drawings

通过对结合附图所示出的实施方式进行详细说明,本公开的上述以及其他特征将更加明显,本公开附图中相同的参考标号表示相同或相似的元素,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图,在附图中:The above and other features of the present disclosure will be more apparent through a detailed description of the embodiments shown in the drawings. The same reference numerals in the drawings of the present disclosure represent the same or similar elements. Obviously, the appended The drawings are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative work. In the drawings:

图1所示为一种基于深度学习的空中手写交互方法的流程图;Fig. 1 shows the flow chart of a kind of aerial handwriting interaction method based on deep learning;

图2所示为一种基于深度学习的空中手写交互系统图。Figure 2 shows a system diagram of an air handwriting interaction system based on deep learning.

具体实施方式Detailed ways

以下将结合实施例和附图对本公开的构思、具体结构及产生的技术效果进行清楚、完整的描述,以充分地理解本公开的目的、方案和效果。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The concept, specific structure and technical effects of the present disclosure will be clearly and completely described below in conjunction with the embodiments and drawings, so as to fully understand the purpose, scheme and effect of the present disclosure. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

如图1所示为根据本公开的一种基于深度学习的空中手写交互方法的流程图,下面结合图1来阐述根据本公开的实施方式的一种基于深度学习的空中手写交互方法。FIG. 1 is a flowchart of a method for handwriting in-air interaction based on deep learning according to the present disclosure. The method for handwriting in-air interaction based on deep learning according to an embodiment of the present disclosure will be described below in conjunction with FIG. 1 .

本公开提出一种基于深度学习的空中手写交互方法,具体包括以下步骤:The present disclosure proposes a method for in-air handwriting interaction based on deep learning, which specifically includes the following steps:

S100,通过传感器采集动作数据;S100, collecting action data through sensors;

S200,对动作数据进行预处理得到动作数据集;S200, preprocessing the action data to obtain an action data set;

S300,构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型;S300, constructing a neural network and training the neural network as an action recognition model through the manually marked action data set;

S400,通过动作识别模型识别传感器采集的动作数据。S400. Recognize the motion data collected by the sensor by using the motion recognition model.

进一步地,在S100中,所述传感器为陀螺仪。Further, in S100, the sensor is a gyroscope.

进一步地,在S200中,对动作数据进行预处理的方法包括但不限于卡尔曼滤波、均值中心化、主成分分析任意一种或多种。Further, in S200, the method for preprocessing the action data includes but not limited to any one or more of Kalman filtering, mean centering, and principal component analysis.

进一步地,在S300中,所谓神经网络为LSTM、BP中任意一种神经网络。Further, in S300, the so-called neural network is any neural network among LSTM and BP.

进一步地,在S300中,构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型的方法为:Further, in S300, the method of constructing a neural network and training the neural network as an action recognition model through the manually marked action data set is as follows:

S301:构建LSTM、BP中任意一种神经网络,并将神经网络的输入设置为传感器采集的动作数据,输出设置为判断的文字;S301: Construct any neural network among LSTM and BP, and set the input of the neural network as the action data collected by the sensor, and set the output as the judgment text;

S302:对动作数据的数据集进行人工标注的方法为:通过采样,以N个动作数据的长度作为一个数据组(例如N=60,为每个数据组为60个动作数据),从数据库的预存的数据组中读取与每个数据组的相似度前三高所对应的字母、文字或符号(即相似度进行排序后,排前三的相似度所代表的数据组),所述数据库中存储有多个预存的数据组,每个预存的数据组对应表示一个字母、文字或符号,进而人工从相似度前三高中所对应的预存的数据组所对应的字母、文字或符号中同时输出的字母、文字或符号中标注出正确的字母、文字或符号;N默认设置为60个动作数据,可人工设置调整;其中数据库的预存的数据组中读取与每个数据组的相似度的计算方法为,令每个数据组为Act,数据库的预存的数据组为Actj,i=1…m,m为数据库中的预存的数据组的数据数量,则Act与Actj之间的相似度为数据组Act与每个数据库的预存的数据组Actj的相同动作数据个数,即如果Act和Actj中有10个相同动作数据则相似度为10;如果存在相似度相同的预存的数据组,则把相似度相同的预存的数据组所对应的字母、文字或符号同时输出以供人工进行选择标注,将标注完成后的动作数据的数据集作为标注数据集。S302: The method for manually labeling the data set of action data is: by sampling, the length of N action data is used as a data group (for example, N=60, 60 action data for each data group), from the database Read the letters, characters or symbols corresponding to the top three similarities of each data group in the pre-stored data groups (that is, after the similarity is sorted, the data groups represented by the top three similarities), the database There are multiple pre-stored data groups stored in it, and each pre-stored data group corresponds to a letter, text or symbol, and then the letters, text or symbols corresponding to the pre-stored data groups corresponding to the top three high schools are manually selected. The correct letters, words or symbols are marked in the output letters, words or symbols; the default setting of N is 60 action data, which can be manually set and adjusted; the similarity between the pre-stored data groups in the database and each data group is read The calculation method of is, let each data group be Act, the pre-stored data group of the database is Act j , i=1...m, m is the data quantity of the pre-stored data group in the database, then the distance between Act and Act j The similarity is the number of identical action data between the data group Act and the pre-stored data group Act j of each database, that is, if there are 10 identical action data in Act and Act j , the similarity is 10; if there are pre-stored data with the same similarity If there is a data group with the same similarity, the letters, characters or symbols corresponding to the pre-stored data groups with the same similarity are output at the same time for manual selection and labeling, and the data set of the action data after the labeling is completed is used as the labeling data set.

S302:通过标注数据集训练神经网络得到动作识别模型;S302: Obtain an action recognition model by training the neural network through the labeled data set;

其中,训练神经网络采用反向传播算法,网络参数梯度的计算采用随机梯度下降法,采用交叉熵(cross entropy)计算分类误差,迭代进行这一过程直到满足平均误差不再下降为止;其损失函数L为:Among them, the training neural network adopts the backpropagation algorithm, the calculation of the gradient of the network parameters adopts the stochastic gradient descent method, and the cross entropy (cross entropy) is used to calculate the classification error, and this process is iteratively carried out until the average error is satisfied; the loss function L is:

Figure BDA0002466198350000051
Figure BDA0002466198350000051

L是损失函数,yj为真实值标签,sj是softmax逻辑回归模型的输出向量s的第j个值,表示的是这个样本属于第j个类别的概率。L is the loss function, y j is the true value label, and s j is the jth value of the output vector s of the softmax logistic regression model, indicating the probability that this sample belongs to the jth category.

进一步地,在S400中,通过动作识别模型识别传感器采集的动作数据为通过动作识别模型识别传感器采集的动作数据在数据库中对应的字母、文字或符号。Further, in S400, the motion data collected by the motion recognition model recognition sensor is the letter, character or symbol corresponding to the motion data collected by the motion recognition model recognition sensor in the database.

本公开的实施例提供的一种基于深度学习的空中手写交互系统,如图2所示为本公开的一种基于深度学习的空中手写交互系统图,该实施例的一种基于深度学习的空中手写交互系统包括:处理器、存储器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述一种基于深度学习的空中手写交互系统实施例中的步骤。An in-air handwriting interaction system based on deep learning provided by an embodiment of the present disclosure, as shown in FIG. 2 is a diagram of an in-air handwriting interaction system based on deep learning in the present disclosure. The handwriting interaction system includes: a processor, a memory, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the above-mentioned air handwriting interaction based on deep learning is realized. Steps in the system embodiment.

所述系统包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序运行在以下系统的单元中:The system includes: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the computer program to run in the following system units:

动作采集单元,用于通过传感器采集动作数据;an action acquisition unit, configured to acquire action data through sensors;

预处理单元,用于对动作数据进行预处理得到动作数据集;A preprocessing unit, configured to preprocess the action data to obtain an action data set;

识别模型构建单元,用于构建神经网络并通过人工标注后的动作数据集训练神经网络作为动作识别模型;The recognition model construction unit is used to construct a neural network and train the neural network as an action recognition model through the manually marked action data set;

动作识别模型单元,用于通过动作识别模型识别传感器采集的动作数据。The motion recognition model unit is used to recognize the motion data collected by the sensor through the motion recognition model.

所述一种基于深度学习的空中手写交互系统可以运行于桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备中。所述一种基于深度学习的空中手写交互系统,可运行的系统可包括,但不仅限于,处理器、存储器。本领域技术人员可以理解,所述例子仅仅是一种基于深度学习的空中手写交互系统的示例,并不构成对一种基于深度学习的空中手写交互系统的限定,可以包括比例子更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述一种基于深度学习的空中手写交互系统还可以包括输入输出设备、网络接入设备、总线等。The air handwriting interaction system based on deep learning can run on computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers. The operable system of the in-air handwriting interaction system based on deep learning may include, but not limited to, a processor and a memory. Those skilled in the art will understand that the example is only an example of a deep learning-based air handwriting interaction system, and does not constitute a limitation to a deep learning-based air handwriting interaction system, and may include more or more few components, or combine certain components, or different components, for example, the air handwriting interaction system based on deep learning may also include input and output devices, network access devices, buses and so on.

所称处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,所述处理器是所述一种基于深度学习的空中手写交互系统运行系统的控制中心,利用各种接口和线路连接整个一种基于深度学习的空中手写交互系统可运行系统的各个部分。The so-called processor can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., the processor is the control center of the operating system of the air handwriting interactive system based on deep learning, and utilizes various interfaces and The line connects the various parts of the entire operating system.

所述存储器可用于存储所述计算机程序和/或模块,所述处理器通过运行或执行存储在所述存储器内的计算机程序和/或模块,以及调用存储在存储器内的数据,实现所述一种基于深度学习的空中手写交互系统的各种功能。所述存储器可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory can be used to store the computer programs and/or modules, and the processor realizes the one by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. Various functions of a deep learning-based aerial handwriting interaction system. The memory may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.) and the like; the storage data area may store Data created based on the use of the mobile phone (such as audio data, phonebook, etc.), etc. In addition, the memory can include high-speed random access memory, and can also include non-volatile memory, such as hard disk, internal memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card , flash card (Flash Card), at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

尽管本公开的描述已经相当详尽且特别对几个所述实施例进行了描述,但其并非旨在局限于任何这些细节或实施例或任何特殊实施例,而是应当将其视作是通过参考所附权利要求考虑到现有技术为这些权利要求提供广义的可能性解释,从而有效地涵盖本公开的预定范围。此外,上文以发明人可预见的实施例对本公开进行描述,其目的是为了提供有用的描述,而那些目前尚未预见的对本公开的非实质性改动仍可代表本公开的等效改动。While the description of the present disclosure has been presented with considerable detail and in particular has described a few described embodiments, it is not intended to be limited to any such details or embodiments or to any particular embodiment, but rather should be read by reference The appended claims provide the broadest possible interpretation of these claims in view of the prior art, effectively encompassing the intended scope of the present disclosure. Furthermore, the disclosure has been described above in terms of embodiments foreseeable by the inventors for the purpose of providing a useful description, and insubstantial modifications of the disclosure which are not presently foreseeable may still represent equivalent modifications of the disclosure.

Claims (5)

1. An air handwriting interaction method based on deep learning, which is characterized by comprising the following steps:
s100, collecting action data through a sensor;
s200, preprocessing action data to obtain an action data set;
s300, constructing a neural network and training the neural network through the manually marked action data set to serve as an action recognition model;
s400, identifying action data acquired by a sensor through an action identification model;
in S300, the method for constructing the neural network and training the neural network as the motion recognition model through the manually marked motion data set is as follows
S301: constructing any one of LSTM and BP, setting the input of the neural network as action data acquired by a sensor, and outputting characters set as judgment;
s302: the method for manually labeling the data set of the action data comprises the following steps: reading letters, characters or symbols corresponding to the first three high similarity of each data group from the prestored data groups of a database by taking the length of N pieces of action data as one data group through sampling, wherein a plurality of prestored data groups are stored in the database, each prestored data group correspondingly represents one letter, character or symbol, and then correct letters, characters or symbols are marked from letters, characters or symbols which are simultaneously output from the letters, characters or symbols corresponding to the prestored data groups corresponding to the first three high similarity manually; wherein the database is pre-storedThe method for calculating the similarity between the reading of the data sets and each data set is that each data set is made to be Act, and the pre-stored data sets of the database are made to be Act
Figure QLYQS_1
I= … m, m is the number of data of the pre-stored data set in the database, then Act and +.>
Figure QLYQS_2
The similarity between the data set Act and the pre-stored data set +/for each database>
Figure QLYQS_3
The same action data number; if the prestored data sets with the same similarity exist, letters, characters or symbols corresponding to the prestored data sets with the same similarity are output at the same time for manual selection marking, and a data set of action data after marking is used as a marking data set;
s302: training the neural network through the labeling data set to obtain an action recognition model;
the training neural network adopts a back propagation algorithm, the calculation of network parameter gradients adopts a random gradient descent method, cross entropy (cross entropy) is adopted to calculate classification errors, and the process is iterated until the average errors are not descended any more; the loss function L is:
Figure QLYQS_4
l is the loss function of the optical fiber,
Figure QLYQS_5
for a true value tag, ++>
Figure QLYQS_6
Is the j-th value of the output vector s of the softmax logistic regression model, representing the probability that this sample belongs to the j-th class;
in S400, the motion data collected by the motion recognition model recognition sensor is a letter, a word or a symbol corresponding to the motion data collected by the motion recognition model recognition sensor in the database.
2. The method of air handwriting interaction based on deep learning according to claim 1 and wherein in S100, said sensor is a gyroscope.
3. The method of air handwriting interaction based on deep learning according to claim 1 and wherein in S200, the method of preprocessing the motion data includes any one or more of kalman filtering, mean centering and principal component analysis.
4. The method of air handwriting interaction based on deep learning according to claim 1, wherein in S300, the neural network is any one of LSTM and BP.
5. An air handwriting interaction system based on deep learning, which implements the air handwriting interaction method based on deep learning as claimed in claim 1, and is characterized in that the system comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the computer program to run in units of the following system:
the action acquisition unit is used for acquiring action data through the sensor;
the preprocessing unit is used for preprocessing the action data to obtain an action data set;
the recognition model construction unit is used for constructing a neural network and training the neural network through the action data set after manual labeling to serve as an action recognition model;
and the motion recognition model unit is used for recognizing the motion data acquired by the sensor through the motion recognition model.
CN202010334825.2A 2020-04-24 2020-04-24 Deep learning-based aerial handwriting interaction method and system Active CN111580649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010334825.2A CN111580649B (en) 2020-04-24 2020-04-24 Deep learning-based aerial handwriting interaction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010334825.2A CN111580649B (en) 2020-04-24 2020-04-24 Deep learning-based aerial handwriting interaction method and system

Publications (2)

Publication Number Publication Date
CN111580649A CN111580649A (en) 2020-08-25
CN111580649B true CN111580649B (en) 2023-04-25

Family

ID=72115511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010334825.2A Active CN111580649B (en) 2020-04-24 2020-04-24 Deep learning-based aerial handwriting interaction method and system

Country Status (1)

Country Link
CN (1) CN111580649B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102650905A (en) * 2011-02-23 2012-08-29 西安龙飞软件有限公司 Method utilizing gesture operation in three-dimensional space to realize word input of mobile phone
CN103577843A (en) * 2013-11-22 2014-02-12 中国科学院自动化研究所 Identification method for handwritten character strings in air
WO2019232857A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Handwritten character model training method, handwritten character recognition method, apparatus, device, and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10891540B2 (en) * 2015-12-18 2021-01-12 National Technology & Engineering Solutions Of Sandia, Llc Adaptive neural network management system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102650905A (en) * 2011-02-23 2012-08-29 西安龙飞软件有限公司 Method utilizing gesture operation in three-dimensional space to realize word input of mobile phone
CN103577843A (en) * 2013-11-22 2014-02-12 中国科学院自动化研究所 Identification method for handwritten character strings in air
WO2019232857A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Handwritten character model training method, handwritten character recognition method, apparatus, device, and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李丹 ; .基于BP神经网络的多样本手写体字符识别.软件.2016,(07),全文. *

Also Published As

Publication number Publication date
CN111580649A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2021135469A1 (en) Machine learning-based information extraction method, apparatus, computer device, and medium
CN113673432B (en) Handwriting recognition method, touch display device, computer device and storage medium
CN108416003A (en) A kind of picture classification method and device, terminal, storage medium
WO2021208727A1 (en) Text error detection method and apparatus based on artificial intelligence, and computer device
CN107301248B (en) Word vector construction method and device of text, computer equipment and storage medium
Guan et al. On-device mobile landmark recognition using binarized descriptor with multifeature fusion
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
WO2022247403A1 (en) Keypoint detection method, electronic device, program, and storage medium
CN112861934A (en) Image classification method and device of embedded terminal and embedded terminal
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN113626576A (en) Method and device for extracting relational characteristics in remote supervision, terminal and storage medium
CN112214595A (en) Category determination method, device, equipment and medium
CN114723652B (en) Cell density determination method, device, electronic device and storage medium
Roy et al. CNN based recognition of handwritten multilingual city names
CN110020638B (en) Facial expression recognition method, device, equipment and medium
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN110059180B (en) Article author identity recognition and evaluation model training method and device and storage medium
CN114036280A (en) Intelligent question and answer method and device based on emotion recognition, electronic equipment and medium
CN115700828A (en) Table element identification method and device, computer equipment and storage medium
CN111027533B (en) Click-to-read coordinate transformation method, system, terminal equipment and storage medium
CN111580649B (en) Deep learning-based aerial handwriting interaction method and system
WO2021218126A1 (en) Gesture identification method, terminal device, and computer readable storage medium
Zhou et al. Training convolutional neural network for sketch recognition on large-scale dataset.
CN105308535A (en) Hands-free assistance
CN110765942A (en) Image data labeling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 528000 No. 18, Jiangwan Road, Chancheng District, Guangdong, Foshan

Patentee after: Foshan University

Country or region after: China

Address before: 528000 No. 18, Jiangwan Road, Chancheng District, Guangdong, Foshan

Patentee before: FOSHAN University

Country or region before: China

CP03 Change of name, title or address