[go: up one dir, main page]

CN118587757A - AR-based emotional data processing method, device and electronic device - Google Patents

AR-based emotional data processing method, device and electronic device Download PDF

Info

Publication number
CN118587757A
CN118587757A CN202410900132.3A CN202410900132A CN118587757A CN 118587757 A CN118587757 A CN 118587757A CN 202410900132 A CN202410900132 A CN 202410900132A CN 118587757 A CN118587757 A CN 118587757A
Authority
CN
China
Prior art keywords
user
data
emotion
recognition
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410900132.3A
Other languages
Chinese (zh)
Inventor
崔海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goolton Technology Co ltd
Original Assignee
Goolton Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goolton Technology Co ltd filed Critical Goolton Technology Co ltd
Priority to CN202410900132.3A priority Critical patent/CN118587757A/en
Publication of CN118587757A publication Critical patent/CN118587757A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Hospice & Palliative Care (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供一种基于AR的情感数据处理方法、装置及电子设备,涉及数据处理领域。在该方法中,获取AR眼镜发送的针对用户的情感数据,用户佩戴有AR眼镜,情感数据包括面部表情数据和用户语音数据;对面部表情数据进行情感识别,得到第一识别结果;对用户语音数据进行语音识别,得到第二识别结果;若确定第一识别结果和第二识别结果均指示用户的情绪状态为消极情绪,则获取当前时刻数据;根据当前时刻数据,生成对应的处理策略,并控制AR眼镜执行处理策略,以缓解用户的消极情绪。实施本申请提供的技术方案,便于提高对情绪的识别准确度。

The present application provides an AR-based emotional data processing method, device and electronic device, which relates to the field of data processing. In this method, the emotional data for the user sent by AR glasses is obtained, and the user wears AR glasses, and the emotional data includes facial expression data and user voice data; emotional recognition is performed on the facial expression data to obtain a first recognition result; voice recognition is performed on the user voice data to obtain a second recognition result; if it is determined that both the first recognition result and the second recognition result indicate that the user's emotional state is a negative emotion, the current moment data is obtained; according to the current moment data, a corresponding processing strategy is generated, and the AR glasses are controlled to execute the processing strategy to alleviate the user's negative emotions. Implementing the technical solution provided by the present application is convenient for improving the accuracy of emotion recognition.

Description

一种基于AR的情感数据处理方法、装置及电子设备AR-based emotional data processing method, device and electronic device

技术领域Technical Field

本申请涉及数据处理的技术领域,具体涉及一种基于AR的情感数据处理方法、装置及电子设备。The present application relates to the technical field of data processing, and specifically to an AR-based emotion data processing method, device and electronic device.

背景技术Background Art

在现有的技术背景下,对于用户情感状态的识别与响应,尤其是在增强现实(AR)技术中,仍然面临着诸多挑战。随着AR技术的快速发展,AR眼镜等可穿戴设备逐渐成为人们日常生活和工作中的重要工具。Under the existing technical background, the recognition and response of users' emotional states, especially in augmented reality (AR) technology, still faces many challenges. With the rapid development of AR technology, wearable devices such as AR glasses have gradually become important tools in people's daily life and work.

目前,相关技术中AR眼镜在情绪识别的维度主要依赖于面部识别技术,而这种技术往往受限于光照条件、用户的面部遮挡以及面部表情的自然度等因素,导致对情绪的识别准确度较低。At present, AR glasses in the relevant technology mainly rely on facial recognition technology in the dimension of emotion recognition. However, this technology is often limited by factors such as lighting conditions, user's facial occlusion and the naturalness of facial expressions, resulting in low accuracy in emotion recognition.

因此,急需一种基于AR的情感数据处理方法、装置及电子设备。Therefore, there is an urgent need for an AR-based emotion data processing method, device and electronic device.

发明内容Summary of the invention

本申请提供了一种基于AR的情感数据处理方法、装置及电子设备,便于提高对情绪的识别准确度。The present application provides an AR-based emotion data processing method, device, and electronic device to facilitate improving the accuracy of emotion recognition.

在本申请的第一方面提供了一种基于AR的情感数据处理方法,所述方法包括:获取AR眼镜发送的针对用户的情感数据,所述用户佩戴有所述AR眼镜,所述情感数据包括面部表情数据和用户语音数据;对所述面部表情数据进行情感识别,得到第一识别结果;对所述用户语音数据进行语音识别,得到第二识别结果;若确定所述第一识别结果和所述第二识别结果均指示所述用户的情绪状态为消极情绪,则获取当前时刻数据;根据当前时刻数据,生成对应的处理策略,并控制所述AR眼镜执行所述处理策略,以缓解所述用户的消极情绪。In a first aspect of the present application, an AR-based emotion data processing method is provided, the method comprising: obtaining emotion data for a user sent by AR glasses, the user wearing the AR glasses, the emotion data comprising facial expression data and user voice data; performing emotion recognition on the facial expression data to obtain a first recognition result; performing voice recognition on the user voice data to obtain a second recognition result; if it is determined that both the first recognition result and the second recognition result indicate that the user's emotional state is negative, obtaining current moment data; generating a corresponding processing strategy based on the current moment data, and controlling the AR glasses to execute the processing strategy to alleviate the user's negative emotions.

通过采用上述技术方案,通过结合面部表情数据和用户语音数据,这一系统能够更全面地分析用户的情感状态。因为情绪表达通常是多方面的,包括面部表情和声音,综合这些信息能够显著提高情绪识别的准确度。当系统识别出用户处于消极情绪时,它能够立即获取当前时刻的数据,并据此生成相应的处理策略。这种实时响应能力对于缓解用户的消极情绪至关重要,因为它能够在情绪恶化之前提供及时的干预。根据当前时刻的数据,系统能够生成个性化的处理策略。这种策略是提供安慰信息、推荐放松活动等,以满足用户在不同情境下的需求。通过及时识别并缓解用户的消极情绪,这一系统能够显著提升用户在使用AR眼镜时的体验。用户会感受到设备不仅是一个提供信息的工具,更是一个能够理解和响应自己情感需求的智能伙伴。消极情绪可能与心理健康问题有关。通过长期监控用户的情感状态,系统能够为用户提供更全面的健康关怀,甚至在某些情况下,能够作为预警系统,提醒用户或医疗专业人员关注可能的心理问题。由此,通过更加准确的面部识别和语音识别,能够提高对情绪的识别准确度。By adopting the above technical solution, by combining facial expression data and user voice data, this system can more comprehensively analyze the user's emotional state. Because emotional expression is usually multifaceted, including facial expressions and voices, the integration of this information can significantly improve the accuracy of emotion recognition. When the system recognizes that the user is in a negative emotion, it can immediately obtain the data at the current moment and generate a corresponding treatment strategy based on it. This real-time response capability is crucial to alleviating the user's negative emotions because it can provide timely intervention before the emotions worsen. Based on the data at the current moment, the system can generate personalized treatment strategies. This strategy is to provide comfort information, recommend relaxation activities, etc. to meet the needs of users in different situations. By timely identifying and alleviating the user's negative emotions, this system can significantly improve the user's experience when using AR glasses. Users will feel that the device is not only a tool to provide information, but also an intelligent partner that can understand and respond to their emotional needs. Negative emotions may be related to mental health problems. By monitoring the user's emotional state for a long time, the system can provide users with more comprehensive health care, and even in some cases, it can serve as an early warning system to remind users or medical professionals to pay attention to possible psychological problems. As a result, through more accurate facial recognition and voice recognition, the accuracy of emotion recognition can be improved.

可选地,所述获取AR眼镜发送的针对用户的情感数据,具体包括:接收所述AR眼镜发送的原始面部表情数据和原始用户语音数据;对所述原始面部表情数据和所述原始用户语音数据进行预处理,得到所述情感数据,所述预处理包括去噪、滤波以及归一化处理。Optionally, obtaining the emotion data for the user sent by the AR glasses specifically includes: receiving original facial expression data and original user voice data sent by the AR glasses; preprocessing the original facial expression data and the original user voice data to obtain the emotion data, and the preprocessing includes denoising, filtering and normalization.

通过采用上述技术方案,去噪处理能够消除数据中的噪音成分,这些噪音可能来源于环境干扰、设备故障或数据传输错误等。通过去噪,可以提高数据的清晰度和准确性,为后续的情感识别奠定坚实的基础。滤波处理可以平滑数据中的高频波动,使数据更加稳定。这有助于减少由于用户短暂的表情变化或语音波动导致的误判,提高情感识别的稳定性。归一化处理可以将不同来源、不同量纲的数据转换到同一量纲下,使得不同数据之间具有可比性。在情感识别中,归一化能够确保面部表情数据和用户语音数据在权重上得到平等对待,提高情感识别的公正性和准确性。预处理后的数据具有更好的数据结构和特征表示,这有助于后续的情感识别算法更快地找到数据中的关键信息。因此,预处理可以提高情感识别的计算效率,使得系统能够更快地响应用户的情感状态。由于预处理能够消除原始数据中的噪音和波动,使得系统对输入数据的敏感性降低。这有助于增强系统的鲁棒性,使得系统在不同环境下都能保持稳定的性能。预处理后的情感数据具有更高的质量和更好的可比性,这为后续的情感分析、模型训练等步骤提供了更好的基础。通过对预处理后的数据进行深入分析和挖掘,可以进一步提高情感识别的准确性和可靠性。By adopting the above technical solutions, denoising can eliminate the noise components in the data, which may come from environmental interference, equipment failure or data transmission errors. By denoising, the clarity and accuracy of the data can be improved, laying a solid foundation for subsequent emotion recognition. Filtering can smooth high-frequency fluctuations in the data and make the data more stable. This helps to reduce misjudgments caused by short-term changes in user expressions or voice fluctuations, and improve the stability of emotion recognition. Normalization can convert data from different sources and different dimensions to the same dimension, making different data comparable. In emotion recognition, normalization can ensure that facial expression data and user voice data are treated equally in terms of weight, improving the fairness and accuracy of emotion recognition. The preprocessed data has a better data structure and feature representation, which helps the subsequent emotion recognition algorithm to find key information in the data faster. Therefore, preprocessing can improve the computational efficiency of emotion recognition and enable the system to respond to the user's emotional state more quickly. Because preprocessing can eliminate noise and fluctuations in the original data, the system is less sensitive to input data. This helps to enhance the robustness of the system and enable the system to maintain stable performance in different environments. The preprocessed sentiment data has higher quality and better comparability, which provides a better foundation for subsequent sentiment analysis, model training and other steps. By conducting in-depth analysis and mining of the preprocessed data, the accuracy and reliability of sentiment recognition can be further improved.

可选地,所述对所述面部表情数据进行情感识别,得到第一识别结果,具体包括:按照预设定位点编号顺序,对所述面部表情数据包括的至少一个定位点进行识别;确定所述至少一个定位点中一个定位点值大于定位点阈值的目标定位点,将所述目标定位点添加至目标面部特征定位点中;查找所述目标定位点的对称定位点,若所述对称定位点的定位点值大于所述定位点阈值,则将所述对称定位点添加至所述目标面部特征定位点中;确定所述至少一个定位点中下一个定位点值大于定位点阈值的目标定位点,并执行所述确定所述至少一个定位点中一个定位点值大于定位点阈值的目标定位点的步骤;直至所述目标面部特征定位点的数量达到预设数量阈值;根据不同目标面部特征定位点与不同目标面部表情之间的对应关系,确定所述用户的目标面部表情,所述第一识别结果包括所述用户的目标面部表情,所述目标面部表情用于指示所述用户的情绪状态。Optionally, the emotion recognition is performed on the facial expression data to obtain a first recognition result, which specifically includes: identifying at least one positioning point included in the facial expression data according to a preset positioning point numbering sequence; determining a target positioning point among the at least one positioning point whose positioning point value is greater than a positioning point threshold, and adding the target positioning point to the target facial feature positioning points; searching for a symmetrical positioning point of the target positioning point, and if the positioning point value of the symmetrical positioning point is greater than the positioning point threshold, adding the symmetrical positioning point to the target facial feature positioning points; determining a target positioning point among the at least one positioning point whose next positioning point value is greater than the positioning point threshold, and executing the step of determining a target positioning point among the at least one positioning point whose positioning point value is greater than the positioning point threshold; until the number of the target facial feature positioning points reaches a preset number threshold; determining the target facial expression of the user according to the correspondence between different target facial feature positioning points and different target facial expressions, wherein the first recognition result includes the target facial expression of the user, and the target facial expression is used to indicate the emotional state of the user.

通过采用上述技术方案,通过预设定位点编号顺序进行面部特征识别,可以确保系统关注到面部的关键区域,这些区域在情感表达中通常起关键作用。因此,这种方法能够提高情感识别的精度。该流程不仅关注单一面部特征点,还考虑了对称定位点。当对称定位点的值都超过阈值时,它们被添加到目标面部特征定位点中,这有助于识别出细微的面部表情,如微笑、皱眉等,从而更准确地反映用户的情绪状态,这使得系统能够适应不同的表情,并准确地识别出用户的情绪状态。一旦系统识别出足够数量的目标面部特征定位点(达到预设数量阈值),就可以立即确定用户的面部表情。这种快速响应能力使得系统能够及时地响应用户的情感状态,提供及时的反馈或处理。该流程基于明确的面部特征点进行情感识别,这使得识别结果具有可解释性和可信度。当系统识别出某种情绪时,它可以明确指出是基于哪些面部特征点做出的判断,这增加了用户对系统信任度。该流程可以通过增加或减少定位点、调整阈值或更新面部特征点与面部表情之间的对应关系来适应不同的应用场景和用户需求。这使得系统具有更强的适应性和扩展性。通过综合考虑多个面部特征点和对称定位点,系统可以减少由于单一特征点误判而导致的误报率。只有当多个关键特征点都满足条件时,系统才会做出情感识别判断,这提高了识别的准确性。By adopting the above technical solution, facial feature recognition is performed by presetting the sequence of positioning point numbers, which can ensure that the system focuses on key areas of the face, which usually play a key role in emotional expression. Therefore, this method can improve the accuracy of emotion recognition. The process not only focuses on single facial feature points, but also considers symmetrical positioning points. When the values of symmetrical positioning points all exceed the threshold, they are added to the target facial feature positioning points, which helps to identify subtle facial expressions, such as smiles, frowns, etc., so as to more accurately reflect the user's emotional state, which enables the system to adapt to different expressions and accurately identify the user's emotional state. Once the system identifies a sufficient number of target facial feature positioning points (reaching the preset number threshold), the user's facial expression can be immediately determined. This fast response capability enables the system to respond to the user's emotional state in a timely manner and provide timely feedback or processing. The process performs emotion recognition based on clear facial feature points, which makes the recognition results interpretable and credible. When the system recognizes an emotion, it can clearly indicate which facial feature points are used to make the judgment, which increases the user's trust in the system. The process can adapt to different application scenarios and user needs by adding or reducing positioning points, adjusting thresholds, or updating the correspondence between facial feature points and facial expressions. This makes the system more adaptable and scalable. By comprehensively considering multiple facial feature points and symmetrical positioning points, the system can reduce the false alarm rate caused by misjudgment of a single feature point. Only when multiple key feature points meet the conditions will the system make an emotion recognition judgment, which improves the accuracy of recognition.

可选地,所述对所述用户语音数据进行语音识别,得到第二识别结果,具体包括:对所述用户语音数据进行特征识别,得到用户语义特征、用户语速特征以及用户音调特征;将所述用户语义特征、所述用户语速特征以及所述用户音调特征输入至预设识别模型中,得到所述用户的目标声学特征,所述预设识别模型为基于深度学习预先构建和训练的模型,所述第二识别结果包括所述目标声学特征,所述目标声学特征用于指示所述用户的情绪状态。Optionally, performing speech recognition on the user voice data to obtain a second recognition result specifically includes: performing feature recognition on the user voice data to obtain user semantic features, user speaking rate features and user pitch features; inputting the user semantic features, the user speaking rate features and the user pitch features into a preset recognition model to obtain target acoustic features of the user, the preset recognition model is a model pre-built and trained based on deep learning, and the second recognition result includes the target acoustic features, and the target acoustic features are used to indicate the emotional state of the user.

通过采用上述技术方案,语音识别不仅局限于简单的文本转换,而是深入分析了用户语音的多个维度,包括语义、语速和音调。这种多维度的特征分析能够更全面地捕捉用户的情感状态,提高情感识别的准确性。使用基于深度学习的预设识别模型,意味着系统能够处理复杂的语音数据和模式识别任务。深度学习模型能够自动学习和优化特征提取和分类过程,从而提高语音识别的效率和准确性。深度学习模型通常具有很强的适应性和泛化能力。这意味着系统能够处理不同用户的语音数据,包括不同的口音、语速和语调等,而不必为每个用户单独训练模型。这大大减少了开发成本和时间。由于深度学习模型的优化和计算能力的提升,系统能够实时地对用户语音数据进行处理和分析。这确保了系统在用户表达情感时能够立即做出响应,提供及时的反馈。通过对用户语音数据的深入分析和处理,系统能够更准确地识别出用户的情绪状态。这为用户提供了更加个性化和贴心的服务体验,同时也为情感分析和情感计算等领域的研究提供了有价值的数据支持。由于系统是基于明确的语音特征和深度学习模型进行情感识别的,因此识别结果具有可解释性和可信度。当系统识别出用户的情绪状态时,它可以明确指出是基于哪些语音特征和模型判断得出的,这增加了用户对系统信任度。该流程可以通过更新或调整深度学习模型、添加新的语音特征或调整特征权重等方式进行扩展和定制。这使得系统能够适应不同的应用场景和用户需求,提供更加灵活和多样化的服务。By adopting the above technical solutions, speech recognition is not limited to simple text conversion, but deeply analyzes multiple dimensions of user speech, including semantics, speaking speed and intonation. This multi-dimensional feature analysis can capture the user's emotional state more comprehensively and improve the accuracy of emotion recognition. Using a preset recognition model based on deep learning means that the system is able to handle complex speech data and pattern recognition tasks. The deep learning model can automatically learn and optimize the feature extraction and classification process, thereby improving the efficiency and accuracy of speech recognition. Deep learning models usually have strong adaptability and generalization capabilities. This means that the system can handle the speech data of different users, including different accents, speaking speeds and intonations, without having to train a model for each user separately. This greatly reduces development costs and time. Due to the optimization of the deep learning model and the improvement of computing power, the system is able to process and analyze user speech data in real time. This ensures that the system can respond immediately when the user expresses emotions and provide timely feedback. Through in-depth analysis and processing of user speech data, the system can more accurately identify the user's emotional state. This provides users with a more personalized and caring service experience, and also provides valuable data support for research in fields such as sentiment analysis and sentiment computing. Because the system performs emotion recognition based on clear voice features and deep learning models, the recognition results are interpretable and credible. When the system recognizes the user's emotional state, it can clearly indicate which voice features and models are used to make the judgment, which increases the user's trust in the system. The process can be expanded and customized by updating or adjusting the deep learning model, adding new voice features, or adjusting feature weights. This enables the system to adapt to different application scenarios and user needs, providing more flexible and diverse services.

可选地,在所述将所述用户语义特征、所述用户语速特征以及所述用户音调特征输入至预设识别模型中之前,训练所述预设识别模型;所述训练所述预设识别模型,具体包括:采用CNN从所述用户的历史语音数据中提取全局时空特征,所述全局时空特征包括用户历史语义特征、用户历史语速特征以及用户历史音调特征;采用RNN对所述全局时空特征进行序列建模,得到时序依赖特征;采用Softmax函数对所述时序依赖特征进行分类,输出情感标签,一个情感标签对应一个目标声学特征,所述情感标签包括积极情绪情感标签、中性情绪情感标签以及消极情绪情感标签。Optionally, before inputting the user semantic features, the user speaking rate features and the user pitch features into the preset recognition model, the preset recognition model is trained; the training of the preset recognition model specifically includes: using CNN to extract global spatiotemporal features from the user's historical voice data, the global spatiotemporal features including user historical semantic features, user historical speaking rate features and user historical pitch features; using RNN to perform sequence modeling on the global spatiotemporal features to obtain time-dependent features; using a Softmax function to classify the time-dependent features and output emotion labels, one emotion label corresponds to one target acoustic feature, and the emotion labels include positive emotion labels, neutral emotion labels and negative emotion labels.

通过采用上述技术方案,使用CNN从用户的历史语音数据中提取全局时空特征,能够捕获到语音中的复杂结构和模式。这种全局时空特征不仅包括了语义信息,还包含了语速和音调等关键的情感表达元素,为后续的情感识别提供了坚实的基础。RNN被用来对全局时空特征进行序列建模,这是因为语音数据具有时序性。RNN能够处理序列数据中的依赖关系,捕捉到语音中不同时间点之间的关联,这对于理解用户的情感状态至关重要。Softmax函数用于对RNN输出的时序依赖特征进行分类,输出情感标签。Softmax函数能够将模型的输出转换为概率分布,使得每个情感标签都有一个对应的概率值。通过选择概率值最高的情感标签作为预测结果,可以确保分类的准确性和效率。情感标签包括积极情绪、中性情绪和消极情绪,这些标签能够明确地指示用户的情绪状态。这种明确的情感标签有助于系统更好地理解用户的情感需求,并提供相应的服务和反馈。通过使用用户的历史语音数据进行模型训练,可以使模型学习到不同用户的语音特征和情感表达方式。这使得模型具有较强的泛化能力,能够处理不同用户的语音数据,并准确地识别出他们的情感状态。经过训练的预设识别模型可以在用户语音数据输入时立即进行情感识别,实现实时响应。这有助于系统及时捕捉用户的情感变化,并提供相应的反馈和服务。By adopting the above technical solution, CNN is used to extract global spatiotemporal features from the user's historical voice data, which can capture the complex structure and pattern in the voice. This global spatiotemporal feature not only includes semantic information, but also includes key emotional expression elements such as speech rate and pitch, providing a solid foundation for subsequent emotion recognition. RNN is used to perform sequence modeling on global spatiotemporal features because voice data has temporal sequence. RNN can process dependencies in sequence data and capture the association between different time points in voice, which is crucial for understanding the user's emotional state. The Softmax function is used to classify the temporal dependency features output by RNN and output the emotional label. The Softmax function can convert the output of the model into a probability distribution so that each emotional label has a corresponding probability value. By selecting the emotional label with the highest probability value as the prediction result, the accuracy and efficiency of the classification can be ensured. Emotional labels include positive emotions, neutral emotions, and negative emotions, which can clearly indicate the user's emotional state. This clear emotional label helps the system better understand the user's emotional needs and provide corresponding services and feedback. By using the user's historical voice data for model training, the model can learn the voice characteristics and emotional expressions of different users. This enables the model to have strong generalization capabilities, and it can process voice data from different users and accurately identify their emotional states. The trained preset recognition model can immediately perform emotion recognition when user voice data is input, achieving real-time response. This helps the system to capture the user's emotional changes in a timely manner and provide corresponding feedback and services.

可选地,所述根据当前时刻数据,生成对应的处理策略,具体包括:若所述当前时刻数据指示当前时刻为白天,则生成第一处理策略,所述第一处理策略包括用户外出活动规划策略,所述用户外出活动规划策略展示于所述AR眼镜上;若所述当前时刻数据指示当前时刻为夜晚,则生成第二处理策略,所述第二处理策略包括显示预设文本和播放预设音频策略。Optionally, the corresponding processing strategy is generated based on the current time data, specifically including: if the current time data indicates that the current time is daytime, a first processing strategy is generated, the first processing strategy includes a user outdoor activity planning strategy, and the user outdoor activity planning strategy is displayed on the AR glasses; if the current time data indicates that the current time is night, a second processing strategy is generated, the second processing strategy includes displaying preset text and playing preset audio strategies.

通过采用上述技术方案,通过判断当前时刻数据,系统能够智能地识别当前是白天还是夜晚,并根据不同时间段的特性生成相应的处理策略。这种智能适应性使得服务更加贴近用户的实际需求。在白天,系统生成的用户外出活动规划策略能够根据用户的历史行为、偏好等信息,为用户推荐合适的活动地点、路线等,增强用户的出行体验。这种个性化服务不仅提升了用户的满意度,也增加了系统的使用价值。当系统判断当前为夜晚时,生成的处理策略侧重于显示预设文本和播放预设音频。这种夜间模式不仅减少了屏幕亮度对用户的干扰,也避免了在夜晚进行过多的视觉交互,从而提高了用户的使用舒适度。通过将处理策略展示在AR眼镜上,用户能够更加方便地获取所需信息,无需额外的设备或操作。这种交互方式不仅提高了用户的使用便捷性,也增强了AR眼镜作为可穿戴设备的实用性。根据不同时间段生成不同的处理策略,使得用户在不同时间使用AR眼镜时都能获得连贯、一致的服务体验。这种连贯性有助于提升用户对系统的信任和依赖。在夜晚降低屏幕亮度和减少视觉交互的策略,还有助于节能减排,降低设备对能源的消耗,符合绿色环保的理念。By adopting the above technical solution, by judging the current time data, the system can intelligently identify whether it is daytime or nighttime, and generate corresponding processing strategies according to the characteristics of different time periods. This intelligent adaptability makes the service closer to the actual needs of users. During the day, the user's outing activity planning strategy generated by the system can recommend suitable activity locations, routes, etc. to users based on the user's historical behavior, preferences and other information, thereby enhancing the user's travel experience. This personalized service not only improves user satisfaction, but also increases the use value of the system. When the system determines that it is nighttime, the generated processing strategy focuses on displaying preset text and playing preset audio. This night mode not only reduces the interference of screen brightness on users, but also avoids excessive visual interaction at night, thereby improving the user's comfort. By displaying the processing strategy on AR glasses, users can more conveniently obtain the required information without additional equipment or operation. This interactive method not only improves the user's convenience of use, but also enhances the practicality of AR glasses as wearable devices. Different processing strategies are generated according to different time periods, so that users can get a coherent and consistent service experience when using AR glasses at different times. This consistency helps to enhance users' trust and dependence on the system. The strategy of lowering screen brightness and reducing visual interaction at night can also help save energy and reduce emissions, lower the energy consumption of equipment, and comply with the concept of green environmental protection.

可选地,所述方法还包括:若确定所述第一识别结果指示所述用户的情绪状态为积极情绪,所述第二识别结果指示所述用户的情绪状态为消极情绪,或,若确定所述第一识别结果指示所述用户的情绪状态为消极情绪,所述第二识别结果指示所述用户的情绪状态为积极情绪,则生成咨询策略,并控制所述AR眼镜执行所述咨询策略,所述咨询策略用于确定所述用户的情绪状态。Optionally, the method also includes: if it is determined that the first recognition result indicates that the user's emotional state is a positive emotion, and the second recognition result indicates that the user's emotional state is a negative emotion, or if it is determined that the first recognition result indicates that the user's emotional state is a negative emotion, and the second recognition result indicates that the user's emotional state is a positive emotion, then generating a consulting strategy and controlling the AR glasses to execute the consulting strategy, and the consulting strategy is used to determine the user's emotional state.

通过采用上述技术方案,文本分析和语音识别是两种不同的情绪识别方式,它们可能由于各种原因产生不一致的结果。当这种情况发生时,生成咨询策略可以作为一个“验证”步骤,通过直接与用户交互来确认其真实的情绪状态,从而提高情绪识别的准确性。通过生成并执行咨询策略,系统可以确保不会因为误判用户的情绪状态而提供不恰当的服务或反馈。例如,如果系统错误地将用户的消极情绪识别为积极情绪,并据此提供了不合适的建议或信息,可能会让用户感到困扰或不满。而咨询策略的使用可以避免这种情况的发生,从而优化用户体验。当系统发现情绪识别结果存在冲突时,能够主动生成咨询策略并与用户进行交互,这体现了系统较高的智能性和交互性。这种交互不仅有助于解决当前的情绪识别问题,还能增强用户对系统的信任和依赖。咨询策略可以根据用户的实际情绪状态提供个性化的支持。例如,如果系统确定用户当前处于消极情绪中,可以通过咨询策略引导用户进行情绪宣泄、提供情绪调节的建议或资源等,从而帮助用户更好地管理自己的情绪。通过执行咨询策略并与用户进行交互,系统可以收集到更多关于用户情绪状态的数据。这些数据可以用于进一步训练和优化情绪识别模型,提高模型在未来识别用户情绪时的准确性和效率。通过生成并执行咨询策略来验证情绪识别结果,可以降低误判的风险,确保系统提供的服务或反馈更加可靠和安全。By adopting the above technical solutions, text analysis and speech recognition are two different ways of emotion recognition, which may produce inconsistent results for various reasons. When this happens, generating a consulting strategy can be used as a "verification" step to confirm the user's true emotional state by directly interacting with the user, thereby improving the accuracy of emotion recognition. By generating and executing consulting strategies, the system can ensure that inappropriate services or feedback are not provided due to misjudgment of the user's emotional state. For example, if the system mistakenly identifies the user's negative emotions as positive emotions and provides inappropriate advice or information based on this, it may make the user feel troubled or dissatisfied. The use of consulting strategies can avoid this situation and optimize the user experience. When the system finds that there is a conflict in the emotion recognition results, it can actively generate consulting strategies and interact with the user, which reflects the high intelligence and interactivity of the system. This interaction not only helps to solve the current emotion recognition problem, but also enhances the user's trust and dependence on the system. The consulting strategy can provide personalized support based on the user's actual emotional state. For example, if the system determines that the user is currently in a negative emotion, it can guide the user to vent emotions through consulting strategies, provide advice or resources for emotion regulation, etc., so as to help the user better manage his or her emotions. By executing consulting strategies and interacting with users, the system can collect more data about the user's emotional state. This data can be used to further train and optimize the emotion recognition model, improving the accuracy and efficiency of the model in identifying user emotions in the future. By generating and executing consulting strategies to verify emotion recognition results, the risk of misjudgment can be reduced, ensuring that the service or feedback provided by the system is more reliable and secure.

本申请还提供了一种基于AR的情感数据处理装置,所述基于AR的情感数据处理装置包括获取模块和处理模块,其中,所述获取模块,用于获取AR眼镜发送的针对用户的情感数据,所述用户佩戴有所述AR眼镜,所述情感数据包括面部表情数据和用户语音数据;所述处理模块,用于对所述面部表情数据进行情感识别,得到第一识别结果;所述处理模块,还用于对所述用户语音数据进行语音识别,得到第二识别结果;所述处理模块,还用于若确定所述第一识别结果和所述第二识别结果均指示所述用户的情绪状态为消极情绪,则获取当前时刻数据;所述处理模块,还用于根据当前时刻数据,生成对应的处理策略,并控制所述AR眼镜执行所述处理策略,以缓解所述用户的消极情绪。The present application also provides an AR-based emotion data processing device, which includes an acquisition module and a processing module, wherein the acquisition module is used to acquire emotion data for a user sent by AR glasses, wherein the user wears the AR glasses, and the emotion data includes facial expression data and user voice data; the processing module is used to perform emotion recognition on the facial expression data to obtain a first recognition result; the processing module is also used to perform voice recognition on the user voice data to obtain a second recognition result; the processing module is also used to acquire current moment data if it is determined that both the first recognition result and the second recognition result indicate that the user's emotional state is a negative emotion; the processing module is also used to generate a corresponding processing strategy based on the current moment data, and control the AR glasses to execute the processing strategy to alleviate the user's negative emotions.

在本申请的第三方面提供了一种电子设备,所述电子设备包括处理器、存储器、用户接口以及网络接口,所述存储器用于存储指令,所述用户接口和所述网络接口均用于给其他设备通信,所述处理器用于执行所述存储器中存储的指令,以使所述电子设备执行如上所述的方法。In the third aspect of the present application, an electronic device is provided, which includes a processor, a memory, a user interface and a network interface, the memory is used to store instructions, the user interface and the network interface are both used to communicate with other devices, and the processor is used to execute the instructions stored in the memory so that the electronic device performs the method described above.

在本申请的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有指令,当所述指令被执行时,执行如上所述的方法。In a fourth aspect of the present application, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores instructions, and when the instructions are executed, the method described above is executed.

综上所述,本申请实施例中提供的一个或多个技术方案,至少具有如下技术效果或优点:In summary, one or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

1.通过结合面部表情数据和用户语音数据,这一系统能够更全面地分析用户的情感状态。因为情绪表达通常是多方面的,包括面部表情和声音,综合这些信息能够显著提高情绪识别的准确度。当系统识别出用户处于消极情绪时,它能够立即获取当前时刻的数据,并据此生成相应的处理策略。这种实时响应能力对于缓解用户的消极情绪至关重要,因为它能够在情绪恶化之前提供及时的干预。根据当前时刻的数据,系统能够生成个性化的处理策略。这种策略是提供安慰信息、推荐放松活动等,以满足用户在不同情境下的需求。通过及时识别并缓解用户的消极情绪,这一系统能够显著提升用户在使用AR眼镜时的体验。用户会感受到设备不仅是一个提供信息的工具,更是一个能够理解和响应自己情感需求的智能伙伴。消极情绪可能与心理健康问题有关。通过长期监控用户的情感状态,系统能够为用户提供更全面的健康关怀,甚至在某些情况下,能够作为预警系统,提醒用户或医疗专业人员关注可能的心理问题。由此,通过更加准确的面部识别和语音识别,能够提高对情绪的识别准确度;1. By combining facial expression data and user voice data, the system can more comprehensively analyze the user's emotional state. Because emotional expression is usually multifaceted, including facial expressions and voices, combining this information can significantly improve the accuracy of emotion recognition. When the system recognizes that the user is in a negative emotion, it can immediately obtain the current data and generate a corresponding treatment strategy based on it. This real-time response capability is crucial to alleviating the user's negative emotions because it can provide timely intervention before the emotions worsen. Based on the current data, the system can generate personalized treatment strategies. This strategy is to provide comfort information, recommend relaxing activities, etc. to meet the needs of users in different situations. By promptly identifying and alleviating the user's negative emotions, the system can significantly improve the user's experience when using AR glasses. Users will feel that the device is not only a tool to provide information, but also an intelligent partner that can understand and respond to their emotional needs. Negative emotions may be related to mental health problems. By monitoring the user's emotional state for a long time, the system can provide users with more comprehensive health care, and in some cases, it can even serve as an early warning system to remind users or medical professionals to pay attention to possible psychological problems. Therefore, through more accurate facial recognition and voice recognition, the accuracy of emotion recognition can be improved;

2.一旦系统识别出足够数量的目标面部特征定位点(达到预设数量阈值),就可以立即确定用户的面部表情。这种快速响应能力使得系统能够及时地响应用户的情感状态,提供及时的反馈或处理。该流程基于明确的面部特征点进行情感识别,这使得识别结果具有可解释性和可信度。当系统识别出某种情绪时,它可以明确指出是基于哪些面部特征点做出的判断,这增加了用户对系统信任度。该流程可以通过增加或减少定位点、调整阈值或更新面部特征点与面部表情之间的对应关系来适应不同的应用场景和用户需求。这使得系统具有更强的适应性和扩展性。通过综合考虑多个面部特征点和对称定位点,系统可以减少由于单一特征点误判而导致的误报率。只有当多个关键特征点都满足条件时,系统才会做出情感识别判断,这提高了识别的准确性。2. Once the system identifies a sufficient number of target facial feature points (reaching a preset number threshold), the user's facial expression can be immediately determined. This rapid response capability enables the system to respond to the user's emotional state in a timely manner and provide timely feedback or processing. The process is based on clear facial feature points for emotion recognition, which makes the recognition results interpretable and credible. When the system recognizes an emotion, it can clearly indicate which facial feature points are used to make the judgment, which increases the user's trust in the system. The process can adapt to different application scenarios and user needs by adding or reducing points, adjusting thresholds, or updating the correspondence between facial feature points and facial expressions. This makes the system more adaptable and scalable. By comprehensively considering multiple facial feature points and symmetrical positioning points, the system can reduce the false alarm rate caused by misjudgment of a single feature point. The system will only make an emotion recognition judgment when multiple key feature points meet the conditions, which improves the accuracy of recognition.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请实施例提供的一种基于AR的情感数据处理方法的流程示意图。FIG1 is a flow chart of an AR-based emotion data processing method provided in an embodiment of the present application.

图2为本申请实施例提供的一种基于AR的情感数据处理方法的另一流程示意图。FIG2 is another flow chart of an AR-based emotion data processing method provided in an embodiment of the present application.

图3为本申请实施例提供的一种基于AR的情感数据处理装置的模块示意图。FIG3 is a module diagram of an AR-based emotion data processing device provided in an embodiment of the present application.

图4为本申请实施例提供的一种电子设备的结构示意图。FIG. 4 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

附图标记说明:31、获取模块;32、处理模块;41、处理器;42、通信总线;43、用户接口;44、网络接口;45、存储器。Explanation of the reference numerals: 31, acquisition module; 32, processing module; 41, processor; 42, communication bus; 43, user interface; 44, network interface; 45, memory.

具体实施方式DETAILED DESCRIPTION

为了使本领域的技术人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described embodiments are only part of the embodiments of this application, not all of the embodiments.

在本申请实施例的描述中,“例如”或者“举例来说”等词用于表示作例子、例证或说明。本申请实施例中被描述为“例如”或者“举例来说”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“例如”或者“举例来说”等词旨在以具体方式呈现相关概念。In the description of the embodiments of the present application, words such as "for example" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "for example" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "for example" or "for example" is intended to present related concepts in a specific way.

在本申请实施例的描述中,术语“多个”的含义是指两个或两个以上。例如,多个系统是指两个或两个以上的系统,多个屏幕终端是指两个或两个以上的屏幕终端。此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。In the description of the embodiments of the present application, the meaning of the term "multiple" refers to two or more. For example, multiple systems refer to two or more systems, and multiple screen terminals refer to two or more screen terminals. In addition, the terms "first" and "second" are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. The terms "include", "comprise", "have" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.

在当前的技术背景下,尽管增强现实(AR)技术日新月异,但关于用户情感状态的精准识别与及时响应,尤其是在AR技术的应用中,仍然是一个待解的难题。随着AR眼镜等可穿戴设备的普及,它们逐渐成为我们日常生活和工作的得力助手。In the current technological context, although augmented reality (AR) technology is changing with each passing day, the accurate recognition and timely response of users' emotional states, especially in the application of AR technology, is still a difficult problem to be solved. With the popularization of wearable devices such as AR glasses, they have gradually become a powerful assistant in our daily life and work.

然而,目前这些AR眼镜在情感识别方面的能力主要依赖于面部识别技术。然而,这种技术在实际应用中常常受到诸多限制,例如光照条件的变化、用户面部的遮挡以及面部表情的自然度等因素,这些都可能导致情绪识别的准确性大打折扣。However, the current ability of these AR glasses in emotion recognition mainly relies on facial recognition technology. However, this technology is often subject to many limitations in practical applications, such as changes in lighting conditions, occlusion of the user's face, and the naturalness of facial expressions, which may greatly reduce the accuracy of emotion recognition.

为了解决上述技术问题,本申请提供了一种基于AR的情感数据处理方法的流程示意图。该情感数据处理方法应用于服务器,包括步骤S110至步骤S150,上述步骤如下:In order to solve the above technical problems, the present application provides a flowchart of an AR-based emotion data processing method. The emotion data processing method is applied to a server and includes steps S110 to S150, which are as follows:

S110、获取AR眼镜发送的针对用户的情感数据,用户佩戴有AR眼镜,情感数据包括面部表情数据和用户语音数据。S110. Acquire emotion data for the user sent by the AR glasses, where the user wears the AR glasses, and the emotion data includes facial expression data and user voice data.

具体地,服务器是网络中用于存储、处理或转发数据的计算机或设备。在这个场景中,服务器负责接收来自AR眼镜发送的情感数据。AR眼镜是一种增强现实(AugmentedReality,AR)设备,它允许用户通过眼镜的显示屏看到虚拟元素与真实世界的结合。AR眼镜不仅具备显示功能,还可能包含各种传感器和摄像头,用于捕捉用户的动作、位置、表情等信息。用户通过佩戴AR眼镜,可以体验到增强现实带来的互动和沉浸感。同时,AR眼镜也在不断地收集关于用户的数据,包括用户的面部表情和语音数据。情感数据是指能够反映用户情绪状态的数据。在本申请实施例中,情感数据包括两部分:面部表情数据是通过AR眼镜上的摄像头捕捉的用户面部表情信息。AR眼镜的摄像头可以实时跟踪用户的面部表情变化,并将这些变化转化为数据。例如,当用户微笑时,摄像头可以捕捉到嘴巴的弯曲程度和眼睛的形状变化,这些数据就是面部表情数据。用户语音数据是用户通过AR眼镜的麦克风或其他音频输入设备产生的声音数据。语音数据包含了用户的语调、语速、音量等信息,这些信息可以反映出用户的情绪状态。例如,当用户生气时,语速可能会加快,音量可能会提高,这些变化都可以通过语音数据来捕捉。Specifically, a server is a computer or device used to store, process or forward data in a network. In this scenario, the server is responsible for receiving emotional data sent from AR glasses. AR glasses are an augmented reality (AR) device that allows users to see the combination of virtual elements and the real world through the display screen of the glasses. AR glasses not only have display functions, but may also contain various sensors and cameras to capture user movements, positions, expressions and other information. By wearing AR glasses, users can experience the interaction and immersion brought by augmented reality. At the same time, AR glasses are also constantly collecting data about users, including users' facial expressions and voice data. Emotional data refers to data that can reflect the user's emotional state. In an embodiment of the present application, emotional data includes two parts: facial expression data is user facial expression information captured by the camera on AR glasses. The camera of AR glasses can track the changes in the user's facial expressions in real time and convert these changes into data. For example, when a user smiles, the camera can capture the curvature of the mouth and the changes in the shape of the eyes, which are facial expression data. User voice data is the sound data generated by the user through the microphone or other audio input device of AR glasses. Voice data contains information such as the user's tone, speaking speed, and volume, which can reflect the user's emotional state. For example, when a user is angry, the speaking speed may increase and the volume may rise, and these changes can be captured through voice data.

在一种可能的实施方式中,获取AR眼镜发送的针对用户的情感数据,具体包括:接收AR眼镜发送的原始面部表情数据和原始用户语音数据;对原始面部表情数据和原始用户语音数据进行预处理,得到情感数据,预处理包括去噪、滤波以及归一化处理。In a possible implementation, obtaining emotion data for a user sent by the AR glasses specifically includes: receiving original facial expression data and original user voice data sent by the AR glasses; preprocessing the original facial expression data and the original user voice data to obtain emotion data, wherein the preprocessing includes denoising, filtering, and normalization.

具体地,服务器首先接收来自AR眼镜发送的原始面部表情数据和原始用户语音数据。这些数据是未经处理的、直接从AR眼镜的摄像头和麦克风捕获的。预处理是对原始数据进行一系列操作,以消除噪声、提高数据质量,并使其更适合后续的分析和处理。其中,预处理包括去噪、滤波和归一化处理。由于环境中可能存在各种干扰因素(如光线变化、背景噪音等),原始数据往往包含噪声。去噪操作旨在减少或消除这些噪声,使数据更加纯净。滤波操作进一步对数据进行平滑处理,去除高频噪声或不需要的频率成分,使数据更加平滑和易于分析。归一化是将数据缩放到一定的范围内(如0到1之间),使得不同来源、不同尺度的数据可以在同一尺度下进行比较和分析。在情感数据分析中,归一化有助于确保不同特征之间的权重相等,从而避免某些特征因为数值范围较大而过度影响结果。Specifically, the server first receives the raw facial expression data and raw user voice data sent from the AR glasses. These data are unprocessed and captured directly from the camera and microphone of the AR glasses. Preprocessing is a series of operations on the raw data to eliminate noise, improve data quality, and make it more suitable for subsequent analysis and processing. Among them, preprocessing includes denoising, filtering, and normalization. Due to various interference factors that may exist in the environment (such as light changes, background noise, etc.), the raw data often contains noise. The denoising operation aims to reduce or eliminate these noises to make the data purer. The filtering operation further smoothes the data, removes high-frequency noise or unnecessary frequency components, and makes the data smoother and easier to analyze. Normalization is to scale the data to a certain range (such as between 0 and 1) so that data from different sources and different scales can be compared and analyzed at the same scale. In emotional data analysis, normalization helps to ensure that the weights between different features are equal, thereby avoiding some features from excessively affecting the results due to their large numerical range.

举例来说,假设用户正在使用一款AR眼镜,该应用通过AR眼镜捕捉用户的面部表情和语音数据,以分析用户的情感状态并据此调整教学策略。其中,AR眼镜的摄像头不断捕捉用户的面部表情数据,如眼睛、眉毛、嘴巴等的动作和位置变化。同时,麦克风也实时记录用户的语音数据,如语调、语速等。这些原始数据被实时发送到服务器。服务器接收到原始面部表情数据后,发现其中包含了由于光线变化导致的阴影和反光。为了消除这些噪声,服务器使用图像处理算法(如高斯模糊、中值滤波等)对图像进行去噪处理。对于语音数据,服务器使用音频处理算法(如频谱分析、小波变换等)去除背景噪音。对于面部表情数据,服务器可能使用边缘检测、角点检测等算法进一步提取面部特征点,并对这些特征点进行滤波处理,以去除高频噪声和抖动。对于语音数据,服务器可能使用带通滤波器、低通滤波器等进一步平滑语音信号。为了后续的情感分析,服务器需要对不同来源的数据进行归一化处理。例如,对于面部表情数据中的不同特征点(如眼睛宽度、嘴巴高度等),服务器可以将其缩放到0到1的范围内;对于语音数据中的不同特征(如基频、音量等),服务器也可以进行类似的归一化处理。For example, suppose a user is using an AR glasses. The application uses the AR glasses to capture the user's facial expressions and voice data to analyze the user's emotional state and adjust the teaching strategy accordingly. Among them, the camera of the AR glasses continuously captures the user's facial expression data, such as the movement and position changes of the eyes, eyebrows, mouth, etc. At the same time, the microphone also records the user's voice data in real time, such as intonation, speaking speed, etc. These raw data are sent to the server in real time. After receiving the raw facial expression data, the server finds that it contains shadows and reflections caused by light changes. In order to eliminate these noises, the server uses image processing algorithms (such as Gaussian blur, median filtering, etc.) to denoise the image. For voice data, the server uses audio processing algorithms (such as spectrum analysis, wavelet transform, etc.) to remove background noise. For facial expression data, the server may use edge detection, corner detection and other algorithms to further extract facial feature points, and filter these feature points to remove high-frequency noise and jitter. For voice data, the server may use bandpass filters, low-pass filters, etc. to further smooth the voice signal. For subsequent emotional analysis, the server needs to normalize data from different sources. For example, for different feature points in facial expression data (such as eye width, mouth height, etc.), the server can scale them to the range of 0 to 1; for different features in voice data (such as fundamental frequency, volume, etc.), the server can also perform similar normalization processing.

S120、对面部表情数据进行情感识别,得到第一识别结果。S120: Perform emotion recognition on the facial expression data to obtain a first recognition result.

具体地,首先,AR眼镜的摄像头会拍摄用户的面部图像或视频,并将这些数据发送给服务器或本地处理器进行分析。一旦获得了面部表情数据,下一步是提取与情感相关的特征。这些特征可能包括眉毛的形状、眼睛的睁闭程度、嘴巴的张开程度等。特征提取通常涉及图像处理和计算机视觉技术,如边缘检测、形状识别、纹理分析等。在提取了与情感相关的特征之后,就可以使用这些特征来进行情感识别了。情感识别是一个分类任务,它将输入的面部表情数据与预先定义的情感类别进行匹配。这些情感类别可能包括快乐、悲伤、愤怒、惊讶等。为了完成这个分类任务,可以使用各种机器学习算法,如支持向量机(SVM)、神经网络(NN)、深度学习模型(如卷积神经网络CNN)等。情感识别的结果是一个或多个情感类别,这些类别表示了系统对用户当前情感状态的估计。Specifically, first, the camera of AR glasses will take images or videos of the user's face and send this data to a server or local processor for analysis. Once the facial expression data is obtained, the next step is to extract emotion-related features. These features may include the shape of the eyebrows, the degree of eye opening and closing, the degree of mouth opening, etc. Feature extraction usually involves image processing and computer vision techniques such as edge detection, shape recognition, texture analysis, etc. After the emotion-related features are extracted, these features can be used for emotion recognition. Emotion recognition is a classification task that matches the input facial expression data with pre-defined emotion categories. These emotion categories may include happiness, sadness, anger, surprise, etc. To complete this classification task, various machine learning algorithms can be used, such as support vector machines (SVM), neural networks (NN), deep learning models (such as convolutional neural networks CNN), etc. The result of emotion recognition is one or more emotion categories that represent the system's estimation of the user's current emotional state.

在一种可能的实施方式中,对面部表情数据进行情感识别,得到第一识别结果,具体包括:按照预设定位点编号顺序,对面部表情数据包括的至少一个定位点进行识别;确定至少一个定位点中一个定位点值大于定位点阈值的目标定位点,将目标定位点添加至目标面部特征定位点中;查找目标定位点的对称定位点,若对称定位点的定位点值大于定位点阈值,则将对称定位点添加至目标面部特征定位点中;确定至少一个定位点中下一个定位点值大于定位点阈值的目标定位点,并执行确定至少一个定位点中一个定位点值大于定位点阈值的目标定位点的步骤;直至目标面部特征定位点的数量达到预设数量阈值;根据不同目标面部特征定位点与不同目标面部表情之间的对应关系,确定用户的目标面部表情,第一识别结果包括用户的目标面部表情,目标面部表情用于指示用户的情绪状态。In a possible implementation, emotion recognition is performed on facial expression data to obtain a first recognition result, which specifically includes: identifying at least one positioning point included in the facial expression data according to a preset positioning point numbering sequence; determining a target positioning point whose positioning point value is greater than a positioning point threshold among at least one positioning point, and adding the target positioning point to the target facial feature positioning points; searching for a symmetrical positioning point of the target positioning point, and if the positioning point value of the symmetrical positioning point is greater than the positioning point threshold, adding the symmetrical positioning point to the target facial feature positioning points; determining a target positioning point whose next positioning point value is greater than the positioning point threshold among at least one positioning point, and executing the step of determining a target positioning point whose positioning point value is greater than the positioning point threshold among at least one positioning point; until the number of target facial feature positioning points reaches a preset number threshold; determining the target facial expression of the user according to the correspondence between different target facial feature positioning points and different target facial expressions, the first recognition result includes the target facial expression of the user, and the target facial expression is used to indicate the emotional state of the user.

具体地,首先,服务器按照预设的定位点编号顺序对面部表情数据中的至少一个定位点进行识别。这些定位点通常是面部上的关键点,如眼角、嘴角、眉毛的特定位置等。接着,服务器检查每个定位点的值,可以是像素强度,如果某个定位点的值大于设定的定位点阈值,那么这个定位点就被认为是目标定位点,并添加到目标面部特征定位点集合中。为了捕捉更全面的面部特征,服务器还会查找目标定位点的对称定位点。如果对称定位点的值也大于定位点阈值,那么它也会被添加到目标面部特征定位点集合中。在确定了一个或一对目标定位点后,服务器会继续检查其他定位点,重复上述步骤,直到目标面部特征定位点的数量达到预设的数量阈值。在收集到足够数量的目标面部特征定位点后,服务器会根据这些定位点与不同目标面部表情之间的预设对应关系来确定用户的目标面部表情。这个对应关系可能是一个训练好的模型或一个映射表。最后,服务器输出第一识别结果,即用户的目标面部表情。这个面部表情用于指示用户的情绪状态。Specifically, first, the server identifies at least one anchor point in the facial expression data according to a preset anchor point number sequence. These anchor points are usually key points on the face, such as the corners of the eyes, the corners of the mouth, the specific positions of the eyebrows, etc. Then, the server checks the value of each anchor point, which can be a pixel intensity. If the value of a certain anchor point is greater than the set anchor point threshold, then this anchor point is considered to be a target anchor point and added to the target facial feature anchor point set. In order to capture more comprehensive facial features, the server will also find symmetrical anchor points of the target anchor point. If the value of the symmetrical anchor point is also greater than the anchor point threshold, then it will also be added to the target facial feature anchor point set. After determining one or a pair of target anchor points, the server will continue to check other anchor points and repeat the above steps until the number of target facial feature anchor points reaches the preset number threshold. After collecting a sufficient number of target facial feature anchor points, the server will determine the user's target facial expression based on the preset correspondence between these anchor points and different target facial expressions. This correspondence may be a trained model or a mapping table. Finally, the server outputs a first recognition result, that is, the user's target facial expression. This facial expression is used to indicate the user's emotional state.

举例来说,假设一个用户正在使用AR眼镜,它通过分析用户的面部表情来判断用户的情绪。服务器首先按照预设的顺序(如从左到右,从上到下)开始识别用户面部上的关键点,如左眼角、右眼角、左嘴角、右嘴角等。在检查这些关键点时,服务器发现用户的左嘴角明显上扬,其值大于设定的阈值。于是,服务器将这个左嘴角点作为目标定位点,并添加到目标面部特征定位点集合中。接着,服务器查找左嘴角的对称点——右嘴角。如果右嘴角也呈现上扬趋势且其值大于阈值,服务器也会将其添加到目标面部特征定位点集合中。在检查了这两个嘴角点后,服务器可能还需要检查其他关键点,如眉毛、眼睛等,以确定是否还有其他目标面部特征定位点。在收集到足够多的目标面部特征定位点后,服务器根据这些定位点与不同面部表情的对应关系(如“嘴角上扬”对应“微笑”),确定用户的目标面部表情为“微笑”。最后,服务器输出第一识别结果,即用户的面部表情为“微笑”,这个面部表情用于指示用户的情绪状态为积极或愉快。For example, suppose a user is wearing AR glasses, which judge the user's emotions by analyzing the user's facial expressions. The server first starts to identify key points on the user's face in a preset order (such as from left to right, from top to bottom), such as the left corner of the eye, the right corner of the eye, the left corner of the mouth, and the right corner of the mouth. When checking these key points, the server finds that the user's left corner of the mouth is obviously upturned, and its value is greater than the set threshold. Therefore, the server uses this left corner of the mouth point as the target positioning point and adds it to the target facial feature positioning point set. Next, the server looks for the symmetrical point of the left corner of the mouth - the right corner of the mouth. If the right corner of the mouth also shows an upward trend and its value is greater than the threshold, the server will also add it to the target facial feature positioning point set. After checking these two corners of the mouth points, the server may also need to check other key points, such as eyebrows, eyes, etc., to determine whether there are other target facial feature positioning points. After collecting enough target facial feature positioning points, the server determines that the user's target facial expression is "smile" based on the correspondence between these positioning points and different facial expressions (such as "upturned corner of the mouth" corresponds to "smile"). Finally, the server outputs a first recognition result, namely, the user's facial expression is "smiling", which is used to indicate that the user's emotional state is positive or happy.

S130、对用户语音数据进行语音识别,得到第二识别结果。S130: Perform voice recognition on the user voice data to obtain a second recognition result.

具体地,语音识别是一种技术,它能够将人的语音转化为机器可读的文本或命令。在这个过程中,服务器会接收到用户的语音数据(通常是声波信号),然后使用语音识别算法或模型对语音信号进行分析和处理,最终将其转化为对应的文本或命令。这里提到的“第二识别结果”是相对于前面提到的“第一识别结果”(通常是对面部表情数据进行情感识别得到的结果)而言的。第二识别结果是指服务器对用户语音数据进行语音识别后得到的输出,通常是一段文本或命令。Specifically, speech recognition is a technology that can convert human speech into machine-readable text or commands. In this process, the server receives the user's voice data (usually a sound wave signal), and then uses a speech recognition algorithm or model to analyze and process the voice signal, and finally converts it into the corresponding text or command. The "second recognition result" mentioned here is relative to the "first recognition result" mentioned above (usually the result obtained by emotion recognition of facial expression data). The second recognition result refers to the output obtained by the server after speech recognition of the user's voice data, usually a piece of text or command.

在一种可能的实施方式中,对用户语音数据进行语音识别,得到第二识别结果,具体包括:对用户语音数据进行特征识别,得到用户语义特征、用户语速特征以及用户音调特征;将用户语义特征、用户语速特征以及用户音调特征输入至预设识别模型中,得到用户的目标声学特征,预设识别模型为基于深度学习预先构建和训练的模型,第二识别结果包括目标声学特征,目标声学特征用于指示用户的情绪状态。In one possible implementation, speech recognition is performed on the user voice data to obtain a second recognition result, which specifically includes: performing feature recognition on the user voice data to obtain user semantic features, user speaking rate features, and user pitch features; inputting the user semantic features, user speaking rate features, and user pitch features into a preset recognition model to obtain the user's target acoustic features, the preset recognition model is a model pre-built and trained based on deep learning, and the second recognition result includes the target acoustic features, which are used to indicate the user's emotional state.

具体地,用户语义特征通常指的是从用户的语音数据中识别出的词汇、短语或句子所表达的含义。这需要通过自然语言处理(NLP)技术来实现,例如分词、词性标注、命名实体识别等。语速是指用户说话的速度,可以通过计算每秒发出的音节数或单词数来衡量。语速特征可以反映用户的心理状态,例如紧张、兴奋或放松。音调是指声音的高低变化,通常与情感状态密切相关。例如,高音调可能表示兴奋或惊讶,而低音调可能表示沮丧或平静。预设识别模型是一个基于深度学习的模型,它预先构建并训练用于识别用户语音中的声学特征。深度学习模型,如循环神经网络(RNN)、长短期记忆网络(LSTM)或Transformer等,通常用于处理序列数据,如语音或文本。在这个过程中,模型会学习如何从用户语义特征、语速特征和音调特征中提取出与目标情绪状态相关的声学特征。这是通过预设识别模型对用户语音数据进行处理后得到的输出。这些特征被用来指示用户的情绪状态,例如高兴、生气、惊讶、沮丧等。第二识别结果包括上述的目标声学特征,这些特征不仅提供了语音的文本内容,还提供了关于用户情绪状态的重要线索。例如,用户接连发出“唉”的语音感叹词,则服务器可以确定用户的情绪状态为消极情绪。Specifically, user semantic features usually refer to the meaning expressed by words, phrases or sentences recognized from the user's voice data. This needs to be achieved through natural language processing (NLP) technology, such as word segmentation, part-of-speech tagging, named entity recognition, etc. Speech rate refers to the speed at which the user speaks, which can be measured by counting the number of syllables or words uttered per second. Speech rate features can reflect the user's psychological state, such as tension, excitement or relaxation. Pitch refers to the high and low changes in the voice, which is usually closely related to the emotional state. For example, a high pitch may indicate excitement or surprise, while a low pitch may indicate frustration or calmness. The preset recognition model is a deep learning-based model that is pre-built and trained to recognize acoustic features in user voice. Deep learning models, such as recurrent neural networks (RNN), long short-term memory networks (LSTM) or Transformers, are usually used to process sequence data, such as speech or text. In this process, the model learns how to extract acoustic features related to the target emotional state from user semantic features, speech rate features and pitch features. This is the output obtained after processing the user's voice data through the preset recognition model. These features are used to indicate the user's emotional state, such as happiness, anger, surprise, frustration, etc. The second recognition result includes the above-mentioned target acoustic features, which not only provide the text content of the speech, but also provide important clues about the user's emotional state. For example, if the user utters the voice exclamation "Alas" in succession, the server can determine that the user's emotional state is negative.

S140、若确定第一识别结果和第二识别结果均指示用户的情绪状态为消极情绪,则获取当前时刻数据。S140: If it is determined that both the first recognition result and the second recognition result indicate that the user's emotional state is negative, then obtain current moment data.

具体地,第一识别结果是基于用户的面部表情数据,通过情感识别算法得到的识别结果。这个识别结果通常表示用户的情绪状态,如积极、消极、中性等。第二识别结果是基于用户的语音数据,通过语音识别和语音情感分析算法得到的识别结果。这个识别结果同样表示用户的情绪状态。服务器会比较第一识别结果和第二识别结果,以确定用户的情绪状态是否一致。如果两个识别结果都指示用户处于消极情绪(如沮丧、生气、焦虑等),那么服务器会认为用户当前的情绪状态是消极的。同理,如果两个识别结果都指示用户处于积极情绪(如愉快、开心等),那么服务器会认为用户当前的情绪状态是积极的。其中,当前时刻数据代表的是用户发生情绪状态的时间。Specifically, the first recognition result is a recognition result obtained by an emotion recognition algorithm based on the user's facial expression data. This recognition result usually indicates the user's emotional state, such as positive, negative, neutral, etc. The second recognition result is a recognition result obtained by a speech recognition and speech emotion analysis algorithm based on the user's voice data. This recognition result also indicates the user's emotional state. The server will compare the first recognition result and the second recognition result to determine whether the user's emotional state is consistent. If both recognition results indicate that the user is in a negative emotion (such as frustration, anger, anxiety, etc.), then the server will consider the user's current emotional state to be negative. Similarly, if both recognition results indicate that the user is in a positive emotion (such as happiness, joy, etc.), then the server will consider the user's current emotional state to be positive. Among them, the current moment data represents the time when the user's emotional state occurs.

S150、根据当前时刻数据,生成对应的处理策略,并控制AR眼镜执行处理策略,以缓解用户的消极情绪。S150: Generate a corresponding processing strategy based on the current moment data, and control the AR glasses to execute the processing strategy to alleviate the user's negative emotions.

具体地,服务器会根据当前时刻数据,结合预设的算法和模型,来生成一个或多个处理策略。这些策略旨在帮助缓解用户的消极情绪,可能包括提供个性化的建议、展示相关的内容或信息、播放轻松的音乐、调整环境设置等。一旦服务器生成了处理策略,它会通过无线通信技术(如Wi-Fi、蓝牙等)将指令发送给AR眼镜。AR眼镜接收到指令后,会根据策略执行相应的操作,如显示特定的内容、播放音乐、振动提醒等。通过执行这些处理策略,AR眼镜试图为用户创造一个更加积极、舒适的环境,或者提供有助于改善情绪的信息和建议。这有助于减轻用户的消极情绪,并可能提高他们的满意度和体验。Specifically, the server will generate one or more processing strategies based on the current moment data, combined with preset algorithms and models. These strategies are designed to help alleviate the user's negative emotions, and may include providing personalized suggestions, displaying relevant content or information, playing relaxing music, adjusting environmental settings, etc. Once the server generates the processing strategy, it will send the instruction to the AR glasses through wireless communication technology (such as Wi-Fi, Bluetooth, etc.). After receiving the instruction, the AR glasses will perform corresponding operations according to the strategy, such as displaying specific content, playing music, vibrating reminders, etc. By executing these processing strategies, AR glasses attempt to create a more positive and comfortable environment for users, or provide information and suggestions that help improve emotions. This helps alleviate users' negative emotions and may improve their satisfaction and experience.

举例来说,服务器分析数据,发现用户的情绪状态为消极,于是,服务器生成了一个处理策略,包括在AR眼镜的显示屏上播放轻松的音乐以及为用户提供搜索安慰、治愈类型的影片提示。服务器将处理策略指令发送给AR眼镜。AR眼镜接收到指令后,开始执行相应的操作,通过这些处理策略的执行,用户听到舒缓的音乐,并在搜索过程中得到帮助,这些操作有助于改善用户的情绪状态,使他们感到更加满意和舒适。For example, the server analyzes the data and finds that the user's emotional state is negative, so the server generates a processing strategy, including playing relaxing music on the display of the AR glasses and providing users with search prompts for comforting and healing movies. The server sends the processing strategy instructions to the AR glasses. After receiving the instructions, the AR glasses begin to perform the corresponding operations. Through the execution of these processing strategies, the user hears soothing music and gets help in the search process. These operations help improve the user's emotional state and make them feel more satisfied and comfortable.

在一种可能的实施方式中,根据当前时刻数据,生成对应的处理策略,具体包括:若当前时刻数据指示当前时刻为白天,则生成第一处理策略,第一处理策略包括用户外出活动规划策略,用户外出活动规划策略展示于AR眼镜上;若当前时刻数据指示当前时刻为夜晚,则生成第二处理策略,第二处理策略包括显示预设文本和播放预设音频策略。In one possible implementation, a corresponding processing strategy is generated based on the current time data, specifically including: if the current time data indicates that the current time is daytime, a first processing strategy is generated, and the first processing strategy includes a user outdoor activity planning strategy, and the user outdoor activity planning strategy is displayed on the AR glasses; if the current time data indicates that the current time is night, a second processing strategy is generated, and the second processing strategy includes displaying preset text and playing preset audio strategies.

具体地,服务器根据当前时刻数据,可以判断现在是白天还是夜晚。这基于预设的时间段,如日出到日落为白天,日落到次日日出为夜晚。如果当前时刻数据指示当前时刻为白天,服务器会生成第一处理策略。这个策略包括用户外出活动规划策略,可以是基于用户的兴趣、位置、天气等因素为用户规划的一系列户外活动,如徒步、野餐、参观景点等。这些活动规划会通过AR眼镜展示给用户。如果当前时刻数据指示当前时刻为夜晚,服务器会生成第二处理策略。这个策略可能更加偏向于室内的活动或放松方式,包括显示预设文本(可能是心灵鸡汤、励志格言等)和播放预设音频(可能是轻音乐、自然声音等)。这些文本和音频会通过AR眼镜展示和播放给用户。Specifically, the server can determine whether it is daytime or nighttime based on the current time data. This is based on a preset time period, such as sunrise to sunset for daytime, and sunset to sunrise the next day for nighttime. If the current time data indicates that the current time is daytime, the server will generate a first processing strategy. This strategy includes a user's outdoor activity planning strategy, which may be a series of outdoor activities planned for users based on user interests, location, weather and other factors, such as hiking, picnics, visiting attractions, etc. These activity plans will be displayed to the user through AR glasses. If the current time data indicates that the current time is night, the server will generate a second processing strategy. This strategy may be more inclined to indoor activities or relaxation methods, including displaying preset text (which may be chicken soup for the soul, inspirational mottos, etc.) and playing preset audio (which may be light music, natural sounds, etc.). These texts and audio will be displayed and played to the user through AR glasses.

举例来说,假设用户因为某些原因感到沮丧,此时服务器通过AR眼镜正在根据当前时刻数据为用户生成处理策略。如果当前时刻数据指示现在是上午10点(白天),服务器会生成第一处理策略。这个策略可能是一个用户外出活动规划,比如:“今天天气晴朗,适合户外活动。为您推荐附近的一个森林公园,您可以在那里进行徒步和野餐。”。这个规划会以AR的形式展示在用户的AR眼镜上,用户可以直观地看到森林公园的地图、路线、附近的设施等信息。如果当前时刻数据指示现在是晚上8点(夜晚),服务器会生成第二处理策略。这个策略可能包括在AR眼镜的屏幕上显示一段预设文本,如:“每一个夜晚都是一个新的开始,放下一天的疲惫,明天会更好。”。同时,服务器还会控制AR眼镜播放一段预设的轻音乐或自然声音,如海浪声、森林虫鸣等,帮助用户放松和入眠。For example, suppose the user is frustrated for some reason. At this time, the server is generating a processing strategy for the user through the AR glasses based on the current time data. If the current time data indicates that it is 10 am (daytime), the server will generate the first processing strategy. This strategy may be a user's outdoor activity plan, such as: "Today's weather is fine and suitable for outdoor activities. I recommend a nearby forest park where you can go hiking and picnic." This plan will be displayed on the user's AR glasses in the form of AR, and the user can intuitively see the map, route, nearby facilities and other information of the forest park. If the current time data indicates that it is 8 pm (night), the server will generate the second processing strategy. This strategy may include displaying a preset text on the screen of the AR glasses, such as: "Every night is a new beginning. Let go of the fatigue of the day, and tomorrow will be better." At the same time, the server will also control the AR glasses to play a preset light music or natural sound, such as the sound of waves and forest insects, to help users relax and fall asleep.

在一种可能的实施方式中,参照图2,图2为本申请实施例提供的一种基于AR的情感数据处理方法的另一流程示意图。在将用户语义特征、用户语速特征以及用户音调特征输入至预设识别模型中之前,训练预设识别模型;训练预设识别模型,包括步骤S210至步骤S230,上述步骤如下:S210、采用CNN从用户的历史语音数据中提取全局时空特征,全局时空特征包括用户历史语义特征、用户历史语速特征以及用户历史音调特征;S220、采用RNN对全局时空特征进行序列建模,得到时序依赖特征;S230、采用Softmax函数对时序依赖特征进行分类,输出情感标签,一个情感标签对应一个目标声学特征,情感标签包括积极情绪情感标签、中性情绪情感标签以及消极情绪情感标签。In a possible implementation, refer to FIG2 , which is another flow chart of an AR-based emotional data processing method provided in an embodiment of the present application. Before inputting the user semantic features, user speech rate features, and user pitch features into the preset recognition model, train the preset recognition model; training the preset recognition model includes steps S210 to S230, and the above steps are as follows: S210, using CNN to extract global spatiotemporal features from the user's historical voice data, and the global spatiotemporal features include user historical semantic features, user historical speech rate features, and user historical pitch features; S220, using RNN to perform sequence modeling on the global spatiotemporal features to obtain time-dependent features; S230, using the Softmax function to classify the time-dependent features and output an emotion label, one emotion label corresponds to one target acoustic feature, and the emotion label includes a positive emotion label, a neutral emotion label, and a negative emotion label.

具体地,为了使预设识别模型能够准确地识别情感,需要使用已知标签的数据来训练它。在这个过程中,模型会学习如何从输入数据中提取特征,并将这些特征映射到相应的情感标签上。CNN(卷积神经网络)用于处理一维的时间序列数据(如语音信号),在这里,CNN被用来从用户的历史语音数据中提取全局时空特征。这些特征包括用户历史语义特征(从语音内容中提取的语义信息)、用户历史语速特征(语音的速率或节奏)以及用户历史音调特征(语音的音调或音高)。RNN(循环神经网络)特别擅长处理序列数据,因为它能够捕捉数据中的时序依赖关系。在这里,RNN被用来对CNN提取的全局时空特征进行序列建模,从而得到包含时序依赖信息的特征。Softmax函数是一个多分类函数,它将模型的输出转换为概率分布,每个概率对应于一个可能的情感标签。在这个例子中,Softmax函数将RNN输出的时序依赖特征映射到三个情感标签上:积极情绪情感标签、中性情绪情感标签和消极情绪情感标签。Specifically, in order for the preset recognition model to accurately recognize emotions, it needs to be trained with data with known labels. In this process, the model learns how to extract features from the input data and map these features to corresponding emotion labels. CNN (Convolutional Neural Network) is used to process one-dimensional time series data (such as speech signals). Here, CNN is used to extract global spatiotemporal features from the user's historical speech data. These features include user historical semantic features (semantic information extracted from the speech content), user historical speech rate features (the rate or rhythm of the speech), and user historical pitch features (the pitch or pitch of the speech). RNN (Recurrent Neural Network) is particularly good at processing sequence data because it can capture the temporal dependencies in the data. Here, RNN is used to perform sequence modeling on the global spatiotemporal features extracted by CNN, thereby obtaining features containing temporal dependency information. The Softmax function is a multi-classification function that converts the output of the model into a probability distribution, with each probability corresponding to a possible emotion label. In this example, the Softmax function maps the temporal dependency features output by the RNN to three emotion labels: positive emotion emotion label, neutral emotion emotion label, and negative emotion emotion label.

在一种可能的实施方式中,方法还包括:若确定第一识别结果指示用户的情绪状态为积极情绪,第二识别结果指示用户的情绪状态为消极情绪,或,若确定第一识别结果指示用户的情绪状态为消极情绪,第二识别结果指示用户的情绪状态为积极情绪,则生成咨询策略,并控制AR眼镜执行咨询策略,咨询策略用于确定用户的情绪状态。In a possible implementation, the method also includes: if it is determined that the first recognition result indicates that the user's emotional state is a positive emotion, and the second recognition result indicates that the user's emotional state is a negative emotion, or, if it is determined that the first recognition result indicates that the user's emotional state is a negative emotion, and the second recognition result indicates that the user's emotional state is a positive emotion, then generating a consulting strategy and controlling the AR glasses to execute the consulting strategy, and the consulting strategy is used to determine the user's emotional state.

具体地,如果两个识别结果不一致,比如一个指示用户处于积极情绪状态,而另一个指示用户处于消极情绪状态,服务器就需要进一步确定用户的真实情绪状态。在识别结果不一致的情况下,服务器会生成一个咨询策略。这个策略包括通过AR眼镜向用户显示一些问题或提示,让用户主动确认自己的情绪状态,或者通过其他方式(如语音提示)与用户进行交互,以获取更准确的情绪信息。一旦咨询策略生成,服务器会控制AR眼镜执行这个策略。这可以包括在AR眼镜的显示屏上显示问题或提示,或者通过AR眼镜的音频输出功能播放语音提示。通过执行咨询策略,服务器可以获取用户关于自己情绪状态的直接反馈,从而更准确地确定用户的情绪状态。Specifically, if the two recognition results are inconsistent, such as one indicating that the user is in a positive emotional state and the other indicating that the user is in a negative emotional state, the server needs to further determine the user's true emotional state. In the case of inconsistent recognition results, the server generates a consultation strategy. This strategy includes displaying some questions or prompts to the user through AR glasses, asking the user to actively confirm his or her emotional state, or interacting with the user through other means (such as voice prompts) to obtain more accurate emotional information. Once the consultation strategy is generated, the server controls the AR glasses to execute this strategy. This can include displaying questions or prompts on the display of the AR glasses, or playing voice prompts through the audio output function of the AR glasses. By executing the consultation strategy, the server can obtain direct feedback from the user about his or her emotional state, thereby more accurately determining the user's emotional state.

举例来说,假设一个用户正在使用AR眼镜,并且服务器通过两个不同的方法来识别用户的情绪状态。第一识别结果可能表明用户的语音中充满了热情和活力,因此判断用户处于积极情绪状态。第二识别结果可能发现用户的面部表情显得沮丧和失落,因此判断用户处于消极情绪状态。服务器检测到两个识别结果存在不一致性,即一个判断为积极情绪,另一个判断为消极情绪。服务器将生成一个咨询策略,该策略通过在AR眼镜的显示屏上显示一条消息来询问用户当前的情绪状态,例如:“您感觉开心还是不开心?”同时,服务器可能还通过AR眼镜的音频输出功能播放相应的语音提示。AR眼镜根据系统的指令执行咨询策略,在显示屏上显示消息并通过音频输出播放语音提示。用户看到并听到AR眼镜的提示后,通过语音或手势等方式与AR眼镜进行交互,确认自己的真实情绪状态。服务器根据用户的反馈,最终确定用户的情绪状态,并根据这个状态采取相应的处理策略。For example, suppose a user is wearing AR glasses, and the server identifies the user's emotional state through two different methods. The first recognition result may indicate that the user's voice is full of enthusiasm and vitality, so the user is judged to be in a positive emotional state. The second recognition result may find that the user's facial expression appears frustrated and lost, so the user is judged to be in a negative emotional state. The server detects that there is an inconsistency between the two recognition results, that is, one is judged as a positive emotion and the other is judged as a negative emotion. The server will generate a consulting strategy that asks the user about the current emotional state by displaying a message on the display of the AR glasses, such as: "Are you feeling happy or unhappy?" At the same time, the server may also play the corresponding voice prompt through the audio output function of the AR glasses. The AR glasses execute the consulting strategy according to the instructions of the system, displaying the message on the display and playing the voice prompt through the audio output. After seeing and hearing the prompts of the AR glasses, the user interacts with the AR glasses through voice or gestures to confirm his true emotional state. The server finally determines the user's emotional state based on the user's feedback and adopts the corresponding processing strategy based on this state.

本申请还提供了一种基于AR的情感数据处理装置,参照图3,图3为本申请实施例提供的一种基于AR的情感数据处理装置的模块示意图。该情感数据处理装置为服务器,服务器包括获取模块31和处理模块32,其中,获取模块31获取AR眼镜发送的针对用户的情感数据,用户佩戴有AR眼镜,情感数据包括面部表情数据和用户语音数据;处理模块32对面部表情数据进行情感识别,得到第一识别结果;处理模块32对用户语音数据进行语音识别,得到第二识别结果;处理模块32若确定第一识别结果和第二识别结果均指示用户的情绪状态为消极情绪,则获取当前时刻数据;处理模块32根据当前时刻数据,生成对应的处理策略,并控制AR眼镜执行处理策略,以缓解用户的消极情绪。The present application also provides an AR-based emotion data processing device, with reference to FIG3, which is a module diagram of an AR-based emotion data processing device provided in an embodiment of the present application. The emotion data processing device is a server, and the server includes an acquisition module 31 and a processing module 32, wherein the acquisition module 31 acquires the emotion data for the user sent by the AR glasses, the user wears the AR glasses, and the emotion data includes facial expression data and user voice data; the processing module 32 performs emotion recognition on the facial expression data to obtain a first recognition result; the processing module 32 performs voice recognition on the user voice data to obtain a second recognition result; if the processing module 32 determines that both the first recognition result and the second recognition result indicate that the user's emotional state is negative, then the current moment data is acquired; the processing module 32 generates a corresponding processing strategy based on the current moment data, and controls the AR glasses to execute the processing strategy to alleviate the user's negative emotions.

在一种可能的实施方式中,获取模块31获取AR眼镜发送的针对用户的情感数据,具体包括:获取模块31接收AR眼镜发送的原始面部表情数据和原始用户语音数据;处理模块32对原始面部表情数据和原始用户语音数据进行预处理,得到情感数据,预处理包括去噪、滤波以及归一化处理。In a possible implementation, the acquisition module 31 acquires the emotion data for the user sent by the AR glasses, specifically including: the acquisition module 31 receives the original facial expression data and the original user voice data sent by the AR glasses; the processing module 32 preprocesses the original facial expression data and the original user voice data to obtain the emotion data, and the preprocessing includes denoising, filtering and normalization.

在一种可能的实施方式中,处理模块32对面部表情数据进行情感识别,得到第一识别结果,具体包括:处理模块32按照预设定位点编号顺序,对面部表情数据包括的至少一个定位点进行识别;处理模块32确定至少一个定位点中一个定位点值大于定位点阈值的目标定位点,将目标定位点添加至目标面部特征定位点中;处理模块32查找目标定位点的对称定位点,若对称定位点的定位点值大于定位点阈值,则将对称定位点添加至目标面部特征定位点中;处理模块32确定至少一个定位点中下一个定位点值大于定位点阈值的目标定位点,并执行确定至少一个定位点中一个定位点值大于定位点阈值的目标定位点的步骤;处理模块32直至目标面部特征定位点的数量达到预设数量阈值;处理模块32根据不同目标面部特征定位点与不同目标面部表情之间的对应关系,确定用户的目标面部表情,第一识别结果包括用户的目标面部表情,目标面部表情用于指示用户的情绪状态。In a possible implementation, the processing module 32 performs emotion recognition on the facial expression data to obtain a first recognition result, which specifically includes: the processing module 32 identifies at least one positioning point included in the facial expression data according to a preset positioning point numbering sequence; the processing module 32 determines a target positioning point whose positioning point value is greater than a positioning point threshold among at least one positioning point, and adds the target positioning point to the target facial feature positioning point; the processing module 32 searches for a symmetrical positioning point of the target positioning point, and if the positioning point value of the symmetrical positioning point is greater than the positioning point threshold, the symmetrical positioning point is added to the target facial feature positioning point; the processing module 32 determines a target positioning point whose next positioning point value is greater than the positioning point threshold among at least one positioning point, and executes the step of determining a target positioning point whose positioning point value is greater than the positioning point threshold among at least one positioning point; the processing module 32 performs the step of processing until the number of target facial feature positioning points reaches a preset number threshold; the processing module 32 determines the target facial expression of the user based on the correspondence between different target facial feature positioning points and different target facial expressions, and the first recognition result includes the target facial expression of the user, and the target facial expression is used to indicate the emotional state of the user.

在一种可能的实施方式中,处理模块32对用户语音数据进行语音识别,得到第二识别结果,具体包括:处理模块32对用户语音数据进行特征识别,得到用户语义特征、用户语速特征以及用户音调特征;处理模块32将用户语义特征、用户语速特征以及用户音调特征输入至预设识别模型中,得到用户的目标声学特征,预设识别模型为基于深度学习预先构建和训练的模型,第二识别结果包括目标声学特征,目标声学特征用于指示用户的情绪状态。In one possible implementation, the processing module 32 performs speech recognition on the user voice data to obtain a second recognition result, which specifically includes: the processing module 32 performs feature recognition on the user voice data to obtain user semantic features, user speaking rate features, and user pitch features; the processing module 32 inputs the user semantic features, user speaking rate features, and user pitch features into a preset recognition model to obtain the user's target acoustic features, the preset recognition model is a model pre-built and trained based on deep learning, and the second recognition result includes the target acoustic features, which are used to indicate the user's emotional state.

在一种可能的实施方式中,处理模块32在将用户语义特征、用户语速特征以及用户音调特征输入至预设识别模型中之前,训练预设识别模型;训练预设识别模型,具体包括:处理模块32采用CNN从用户的历史语音数据中提取全局时空特征,全局时空特征包括用户历史语义特征、用户历史语速特征以及用户历史音调特征;处理模块32采用RNN对全局时空特征进行序列建模,得到时序依赖特征;处理模块32采用Softmax函数对时序依赖特征进行分类,输出情感标签,一个情感标签对应一个目标声学特征,情感标签包括积极情绪情感标签、中性情绪情感标签以及消极情绪情感标签。In a possible implementation, the processing module 32 trains a preset recognition model before inputting the user semantic features, the user speaking rate features and the user pitch features into the preset recognition model; the training of the preset recognition model specifically includes: the processing module 32 uses CNN to extract global spatiotemporal features from the user's historical voice data, and the global spatiotemporal features include the user's historical semantic features, the user's historical speaking rate features and the user's historical pitch features; the processing module 32 uses RNN to perform sequence modeling on the global spatiotemporal features to obtain time-dependent features; the processing module 32 uses the Softmax function to classify the time-dependent features and output an emotion label, where one emotion label corresponds to one target acoustic feature, and the emotion label includes a positive emotion label, a neutral emotion label and a negative emotion label.

在一种可能的实施方式中,处理模块32根据当前时刻数据,生成对应的处理策略,具体包括:处理模块32若当前时刻数据指示当前时刻为白天,则生成第一处理策略,第一处理策略包括用户外出活动规划策略,用户外出活动规划策略展示于AR眼镜上;处理模块32若当前时刻数据指示当前时刻为夜晚,则生成第二处理策略,第二处理策略包括显示预设文本和播放预设音频策略。In one possible implementation, the processing module 32 generates a corresponding processing strategy based on the current time data, specifically including: if the current time data indicates that the current time is daytime, the processing module 32 generates a first processing strategy, the first processing strategy includes a user outdoor activity planning strategy, and the user outdoor activity planning strategy is displayed on the AR glasses; if the current time data indicates that the current time is night, the processing module 32 generates a second processing strategy, the second processing strategy includes displaying preset text and playing preset audio strategies.

在一种可能的实施方式中,处理模块32若确定第一识别结果指示用户的情绪状态为积极情绪,第二识别结果指示用户的情绪状态为消极情绪,或,处理模块32若确定第一识别结果指示用户的情绪状态为消极情绪,第二识别结果指示用户的情绪状态为积极情绪,则生成咨询策略,并控制AR眼镜执行咨询策略,咨询策略用于确定用户的情绪状态。In one possible implementation, if the processing module 32 determines that the first recognition result indicates that the user's emotional state is a positive emotion and the second recognition result indicates that the user's emotional state is a negative emotion, or if the processing module 32 determines that the first recognition result indicates that the user's emotional state is a negative emotion and the second recognition result indicates that the user's emotional state is a positive emotion, then a consulting strategy is generated and the AR glasses are controlled to execute the consulting strategy, where the consulting strategy is used to determine the user's emotional state.

需要说明的是:上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置和方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: when the device provided in the above embodiment realizes its function, only the division of the above functional modules is used as an example. In actual application, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device and method embodiments provided in the above embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

本申请还提供了一种电子设备,参照图4,图4为本申请实施例提供的一种电子设备的结构示意图。电子设备可以包括:至少一个处理器41,至少一个网络接口44,用户接口43,存储器45,至少一个通信总线42。The present application also provides an electronic device, referring to Figure 4, which is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application. The electronic device may include: at least one processor 41, at least one network interface 44, a user interface 43, a memory 45, and at least one communication bus 42.

其中,通信总线42用于实现这些组件之间的连接通信。The communication bus 42 is used to realize the connection and communication between these components.

其中,用户接口43可以包括显示屏(Display)、摄像头(Camera),可选用户接口43还可以包括标准的有线接口、无线接口。The user interface 43 may include a display screen (Display) and a camera (Camera). Optionally, the user interface 43 may also include a standard wired interface and a wireless interface.

其中,网络接口44可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。The network interface 44 may optionally include a standard wired interface or a wireless interface (such as a WI-FI interface).

其中,处理器41可以包括一个或者多个处理核心。处理器41利用各种接口和线路连接整个服务器内的各个部分,通过运行或执行存储在存储器45内的指令、程序、代码集或指令集,以及调用存储在存储器45内的数据,执行服务器的各种功能和处理数据。可选的,处理器41可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器41可集成中央处理器(Central ProcessingUnit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器41中,单独通过一块芯片进行实现。Among them, the processor 41 may include one or more processing cores. The processor 41 uses various interfaces and lines to connect various parts in the entire server, and executes various functions of the server and processes data by running or executing instructions, programs, code sets or instruction sets stored in the memory 45, and calling data stored in the memory 45. Optionally, the processor 41 can be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). The processor 41 can integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU) and a modem. Among them, the CPU mainly processes the operating system, user interface and application programs; the GPU is responsible for rendering and drawing the content to be displayed on the display screen; and the modem is used to process wireless communications. It can be understood that the above-mentioned modem may not be integrated into the processor 41, but may be implemented separately through a chip.

其中,存储器45可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选的,该存储器45包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器45可用于存储指令、程序、代码、代码集或指令集。存储器45可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等;存储数据区可存储上面各个方法实施例中涉及的数据等。存储器45可选的还可以是至少一个位于远离前述处理器41的存储装置。如图4所示,作为一种计算机存储介质的存储器45中可以包括操作系统、网络通信模块、用户接口模块以及一种基于AR的情感数据处理方法的应用程序。Among them, the memory 45 may include a random access memory (RAM) or a read-only memory (Read-Only Memory). Optionally, the memory 45 includes a non-transitory computer-readable storage medium. The memory 45 can be used to store instructions, programs, codes, code sets or instruction sets. The memory 45 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the above-mentioned various method embodiments, etc.; the data storage area may store data involved in the above-mentioned various method embodiments, etc. The memory 45 may also be at least one storage device located away from the aforementioned processor 41. As shown in Figure 4, the memory 45 as a computer storage medium may include an operating system, a network communication module, a user interface module, and an application program for an AR-based emotional data processing method.

在图4所示的电子设备中,用户接口43主要用于为用户提供输入的接口,获取用户输入的数据;而处理器41可以用于调用存储器45中存储一种基于AR的情感数据处理方法的应用程序,当由一个或多个处理器执行时,使得电子设备执行如上述实施例中一个或多个的方法。In the electronic device shown in FIG4 , the user interface 43 is mainly used to provide an input interface for the user and obtain data input by the user; and the processor 41 can be used to call an application program storing an AR-based emotion data processing method in the memory 45. When executed by one or more processors, the electronic device executes one or more methods in the above-mentioned embodiments.

需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必需的。It should be noted that, for the above-mentioned method embodiments, for the sake of simplicity, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the present application is not limited by the order of the actions described, because according to the present application, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required for the present application.

本申请还提供了一种计算机可读存储介质,计算机可读存储介质存储有指令。当由一个或多个处理器执行时,使得电子设备执行如上述实施例中一个或多个所述的方法。The present application also provides a computer-readable storage medium, which stores instructions. When executed by one or more processors, the electronic device executes one or more of the methods described in the above embodiments.

在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中,应该理解到,所披露的装置,可通过其他的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些服务接口,装置或单元的间接耦合或通信连接,可以是电性或其他的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices can be implemented in other ways. For example, the device embodiments described above are only schematic, such as the division of units, which is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some service interfaces, and the indirect coupling or communication connection of devices or units can be electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储器包括:U盘、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a memory and includes several instructions for a computer device (which can be a personal computer, server or network device, etc.) to execute all or part of the steps of the various embodiments of the present application. The aforementioned memory includes: various media that can store program codes, such as USB flash drives, mobile hard drives, magnetic disks or optical disks.

以上所述者,仅为本公开的示例性实施例,不能以此限定本公开的范围。即但凡依本公开教导所作的等效变化与修饰,皆仍属本公开涵盖的范围内。本领域技术人员在考虑说明书及实践真理的公开后,将容易想到本公开的其他实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未记载的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的范围和精神由权利要求限定。The above is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure cannot be limited thereto. That is, any equivalent changes and modifications made according to the teachings of the present disclosure are still within the scope of the present disclosure. After considering the disclosure of the specification and the truth of practice, it will be easy for those skilled in the art to think of other embodiments of the present disclosure. This application is intended to cover any modification, use or adaptation of the present disclosure, which follows the general principles of the present disclosure and includes common knowledge or customary technical means in the technical field that are not recorded in the present disclosure. The description and examples are regarded as exemplary only, and the scope and spirit of the present disclosure are defined by the claims.

Claims (10)

1. An AR-based emotion data processing method, the method comprising:
the method comprises the steps of obtaining emotion data sent by AR glasses and aiming at a user, wherein the user wears the AR glasses, and the emotion data comprises facial expression data and user voice data;
carrying out emotion recognition on the facial expression data to obtain a first recognition result;
performing voice recognition on the user voice data to obtain a second recognition result;
If the first recognition result and the second recognition result are determined to indicate that the emotion state of the user is negative emotion, acquiring current moment data;
and generating a corresponding processing strategy according to the current time data, and controlling the AR glasses to execute the processing strategy so as to relieve the negative emotion of the user.
2. The AR-based emotion data processing method of claim 1, wherein the acquiring emotion data for a user sent by AR glasses specifically includes:
Receiving original facial expression data and original user voice data sent by the AR glasses;
Preprocessing the original facial expression data and the original user voice data to obtain the emotion data, wherein the preprocessing comprises denoising, filtering and normalization.
3. The AR-based emotion data processing method of claim 1, wherein said emotion recognition of said facial expression data to obtain a first recognition result specifically includes:
Identifying at least one locating point included in the facial expression data according to a preset locating point numbering sequence;
determining a target positioning point with one positioning point value larger than a positioning point threshold value in the at least one positioning point, and adding the target positioning point into a target facial feature positioning point;
searching a symmetrical locating point of the target locating point, and if the locating point value of the symmetrical locating point is larger than the locating point threshold value, adding the symmetrical locating point into the target facial feature locating point;
Determining a target positioning point with a next positioning point value larger than a positioning point threshold value in the at least one positioning point, and executing the step of determining the target positioning point with the next positioning point value larger than the positioning point threshold value in the at least one positioning point;
until the number of the target facial feature positioning points reaches a preset number threshold;
And determining the target facial expression of the user according to the corresponding relation between different target facial feature positioning points and different target facial expressions, wherein the first recognition result comprises the target facial expression of the user, and the target facial expression is used for indicating the emotional state of the user.
4. The AR-based emotion data processing method of claim 1, wherein said performing speech recognition on said user speech data to obtain a second recognition result specifically includes:
performing feature recognition on the user voice data to obtain user semantic features, user speech speed features and user tone features;
inputting the user semantic features, the user speech speed features and the user tone features into a preset recognition model to obtain target acoustic features of the user, wherein the preset recognition model is a model which is built and trained in advance based on deep learning, and the second recognition result comprises the target acoustic features which are used for indicating the emotional state of the user.
5. The AR-based emotion data processing method of claim 4, wherein said preset recognition model is trained prior to said inputting said user semantic features, said user speech rate features, and said user pitch features into said preset recognition model; the training of the preset recognition model specifically comprises the following steps:
extracting global space-time features from the historical voice data of the user by adopting CNN, wherein the global space-time features comprise user historical semantic features, user historical speech speed features and user historical tone features;
performing sequence modeling on the global space-time features by adopting RNNs to obtain time sequence dependent features;
and classifying the time sequence dependent features by adopting a Softmax function, outputting emotion labels, wherein one emotion label corresponds to one target acoustic feature, and the emotion labels comprise positive emotion labels, neutral emotion labels and negative emotion labels.
6. The AR-based emotion data processing method of claim 1, wherein the generating a corresponding processing policy according to the current time data specifically includes:
If the current time data indicates that the current time is daytime, a first processing strategy is generated, wherein the first processing strategy comprises a user outgoing activity planning strategy, and the user outgoing activity planning strategy is displayed on the AR glasses;
And if the current time data indicates that the current time is night, generating a second processing strategy, wherein the second processing strategy comprises displaying a preset text and playing a preset audio strategy.
7. The AR-based emotion data processing method of claim 1, further comprising:
if it is determined that the first recognition result indicates that the emotional state of the user is a positive emotion, the second recognition result indicates that the emotional state of the user is a negative emotion, or,
If the first recognition result indicates that the emotion state of the user is a negative emotion, the second recognition result indicates that the emotion state of the user is a positive emotion, a consultation strategy is generated, the AR glasses are controlled to execute the consultation strategy, and the consultation strategy is used for determining the emotion state of the user.
8. An AR-based emotion data processing device, characterized in that the AR-based emotion data processing device comprises an acquisition module (31) and a processing module (32), wherein,
The acquiring module (31) is configured to acquire emotion data sent by the AR glasses and specific to a user, where the user wears the AR glasses, and the emotion data includes facial expression data and user voice data;
The processing module (32) is used for carrying out emotion recognition on the facial expression data to obtain a first recognition result;
the processing module (32) is further used for performing voice recognition on the user voice data to obtain a second recognition result;
the processing module (32) is further configured to acquire current time data if it is determined that the first recognition result and the second recognition result both indicate that the emotional state of the user is a negative emotion;
The processing module (32) is further configured to generate a corresponding processing policy according to the current time data, and control the AR glasses to execute the processing policy so as to alleviate the negative emotion of the user.
9. An electronic device, characterized in that the electronic device comprises a processor (41), a memory (45), a user interface (43) and a network interface (44), the memory (45) being arranged to store instructions, the user interface (43) and the network interface (44) being arranged to communicate to other devices, the processor (41) being arranged to execute the instructions stored in the memory (45) to cause the electronic device to perform the method according to any one of claims 1 to 7.
10. A computer readable storage medium storing instructions which, when executed, perform the method of any one of claims 1 to 7.
CN202410900132.3A 2024-07-05 2024-07-05 AR-based emotional data processing method, device and electronic device Pending CN118587757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410900132.3A CN118587757A (en) 2024-07-05 2024-07-05 AR-based emotional data processing method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410900132.3A CN118587757A (en) 2024-07-05 2024-07-05 AR-based emotional data processing method, device and electronic device

Publications (1)

Publication Number Publication Date
CN118587757A true CN118587757A (en) 2024-09-03

Family

ID=92524807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410900132.3A Pending CN118587757A (en) 2024-07-05 2024-07-05 AR-based emotional data processing method, device and electronic device

Country Status (1)

Country Link
CN (1) CN118587757A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119937799A (en) * 2025-04-07 2025-05-06 杭州秋果计划科技有限公司 Immersive conversation method, smart glasses and storage medium
CN120335208A (en) * 2025-04-18 2025-07-18 谷东科技有限公司 A large-model-based AR glasses automatic color-changing system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119937799A (en) * 2025-04-07 2025-05-06 杭州秋果计划科技有限公司 Immersive conversation method, smart glasses and storage medium
CN120335208A (en) * 2025-04-18 2025-07-18 谷东科技有限公司 A large-model-based AR glasses automatic color-changing system and method

Similar Documents

Publication Publication Date Title
CN111368609B (en) Voice interaction method, intelligent terminal and storage medium based on emotion engine technology
CN111459290B (en) Interaction intention determination method and device, computer equipment and storage medium
CN108334583A (en) Affective interaction method and device, computer readable storage medium, computer equipment
CN110110169A (en) Man-machine interaction method and human-computer interaction device
CN118587757A (en) AR-based emotional data processing method, device and electronic device
KR102704748B1 (en) Artificial intelligence-based psychological counseling system and method
CN119904901B (en) Emotion recognition methods and related devices based on large models
CN120745853B (en) Dialogue Interaction System Based on Multimodal Emotion Perception and Dynamic Enhancement of Knowledge Graph
CN120653118A (en) Multi-mode interaction method and system for digital human intelligent body
JP2025051660A (en) system
Karpouzis et al. Induction, recording and recognition of natural emotions from facial expressions and speech prosody
JP7781997B2 (en) system
US12413814B2 (en) Systems and methods for generating control parameters to operate sexual stimulation device
US20250387292A1 (en) Systems and methods for generating control parameters to operate sexual stimulation device
JP2025049097A (en) system
JP2026023382A (en) system
CN121374630A (en) A method, apparatus, computer equipment, and storage medium for controlling a companion robot.
JP2025051665A (en) system
JP2025048829A (en) system
JP2025048815A (en) system
JP2026018474A (en) system
JP2026018417A (en) system
JP2026024591A (en) system
CN121306111A (en) system
JP2026018387A (en) system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination