CN116011542A

CN116011542A - Intelligent questionnaire interview model training method, intelligent questionnaire interview method and device

Info

Publication number: CN116011542A
Application number: CN202211732153.6A
Authority: CN
Inventors: 狄东林; 崔晟嘉; 张钋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-04-25

Abstract

The application discloses an intelligent questionnaire interview model training method, an intelligent questionnaire interview device and an intelligent questionnaire interview storage medium based on man-machine conversation, which relate to the field of computer technology, in particular to the field of artificial intelligence and natural language processing. The specific implementation scheme of the intelligent questionnaire interview model training method is as follows: acquiring a voice sample and tag information of the voice sample; inputting the voice sample into an intelligent questionnaire interview model; obtaining semantic feature vectors and voice feature vectors of a voice sample text based on the voice samples; and predicting question-answer option contents matched with the voice samples by adopting a classification prediction unit according to the semantic feature vector and the voice feature vector to obtain prediction information output by the classification prediction unit, and performing model training on the intelligent questionnaire interview model according to the prediction information and the label information. Semantic understanding and generalization capabilities of the intelligent questionnaire interview model can be improved.

Description

Intelligent questionnaire interview model training method, intelligent questionnaire interview method and device

技术领域technical field

本申请涉及计算机技术领域，尤其涉及人工智能及自然语言处理领域，特别涉及一种智能问卷访谈模型训练方法、智能问卷访谈方法及装置。The present application relates to the field of computer technology, in particular to the field of artificial intelligence and natural language processing, and in particular to an intelligent questionnaire interview model training method, intelligent questionnaire interview method and device.

背景技术Background technique

问卷访谈系统是任务型对话系统的一类实现场景。在得到用户的语音输入后，系统中的自然语言理解模块需要对该语音进行相应处理，以对用户意图进行识别并完成槽填充。Questionnaire interview system is a kind of implementation scenario of task-based dialogue system. After getting the user's voice input, the natural language understanding module in the system needs to process the voice accordingly to recognize the user's intention and complete the slot filling.

相关技术中，自然语言理解是基于对用户语音输入的进行语音识别得到的文本进行的。但在语音对话问卷访谈场景下，由于环境音、方言和口语化等因素，语音识别结果并不准确，自然语言理解过程就无法提取到可靠的语义信息。此外，基于规则的判别方法需要进行规则模板的设计，如果用户回答的内容超出访谈模板的设计范围，问卷访谈模型将无法进行正确判别，存在泛化性差的问题。In related technologies, natural language understanding is performed based on text obtained through speech recognition of a user's speech input. However, in the voice dialogue questionnaire interview scenario, due to factors such as environmental sounds, dialects, and colloquialism, the voice recognition results are not accurate, and reliable semantic information cannot be extracted during the natural language understanding process. In addition, the rule-based discrimination method requires the design of the rule template. If the content of the user's answer exceeds the design scope of the interview template, the questionnaire interview model will not be able to correctly discriminate, and there is a problem of poor generalization.

发明内容Contents of the invention

本申请提供了一种智能问卷访谈模型训练方法、基于人机对话的智能问卷访谈方法、装置、设备以及存储介质。The present application provides an intelligent questionnaire interview model training method, an intelligent questionnaire interview method based on man-machine dialogue, a device, a device and a storage medium.

根据本申请的第一方面，提供了一种智能问卷访谈模型训练方法，包括：获取语音样本和所述语音样本的标签信息；将所述语音样本输入至智能问卷访谈模型；其中，所述智能问卷访谈模型包括语音识别单元、文本语义特征提取单元、语音特征提取单元和分类预测单元；基于所述语音识别单元对所述语音样本进行识别，得到对应的语音样本文本，并基于所述文本语义特征提取单元对所述语音样本文本进行编码以得到所述语音样本文本的语义特征向量；基于所述语音特征提取单元对所述语音样本进行特征提取，得到对应的语音特征向量；根据所述语义特征向量和所述语音特征向量，采用所述分类预测单元预测与所述语音样本匹配的问答选项内容，得到所述分类预测单元输出的预测信息，并根据所述预测信息和所述标签信息对所述智能问卷访谈模型进行模型训练。According to the first aspect of the present application, an intelligent questionnaire interview model training method is provided, including: acquiring voice samples and label information of the voice samples; inputting the voice samples into the intelligent questionnaire interview model; wherein, the intelligent The questionnaire interview model includes a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit; the speech sample is recognized based on the speech recognition unit to obtain a corresponding speech sample text, and based on the text semantics The feature extraction unit encodes the speech sample text to obtain the semantic feature vector of the speech sample text; based on the speech feature extraction unit, the speech sample is subjected to feature extraction to obtain a corresponding speech feature vector; according to the semantic The feature vector and the speech feature vector, using the classification prediction unit to predict the content of the question and answer option matching the speech sample, obtaining the prediction information output by the classification prediction unit, and according to the prediction information and the label information. The intelligent questionnaire interview model performs model training.

在一种实现方式中，所述智能问卷访谈模型还包括特征融合单元；所述根据所述语义特征向量和所述语音特征向量，采用所述分类预测单元预测与所述语音样本匹配的问答选项内容，得到所述分类预测单元输出的预测信息，包括：基于所述特征融合单元将所述语义特征向量和所述语音特征向量进行特征融合处理，得到融合特征向量；将所述融合特征向量输入至所述分类预测单元中以预测与所述语音样本匹配的问答选项内容，得到所述分类预测单元输出的预测信息。In an implementation manner, the intelligent questionnaire interview model further includes a feature fusion unit; according to the semantic feature vector and the voice feature vector, the classification prediction unit is used to predict the question and answer option matching the voice sample The content is to obtain the prediction information output by the classification prediction unit, including: performing feature fusion processing on the semantic feature vector and the speech feature vector based on the feature fusion unit to obtain a fusion feature vector; inputting the fusion feature vector to the classification prediction unit to predict the content of the question and answer option matching the speech sample, and obtain the prediction information output by the classification prediction unit.

在一种可选地实现方式中，所述基于所述特征融合单元将所述语义特征向量和所述语音特征向量进行特征融合处理，得到融合特征向量，包括：基于所述特征融合单元将所述语义特征向量和所述语音特征向量进行拼接处理，并对拼接处理后得到的向量进行降维处理，以得到所述融合特征向量。In an optional implementation manner, performing feature fusion processing on the semantic feature vector and the speech feature vector based on the feature fusion unit to obtain a fusion feature vector includes: combining the feature vector based on the feature fusion unit performing splicing processing on the semantic feature vector and the speech feature vector, and performing dimensionality reduction processing on the vector obtained after the splicing processing, so as to obtain the fusion feature vector.

在一种实现方式中，所述根据所述预测信息和所述标签信息对所述智能问卷访谈模型进行模型训练，包括：根据所述预测信息和所述标签信息计算模型损失；根据所述模型损失计算梯度，并进行反向传播，以利用梯度下降方式对所述智能问卷访谈模型的模型参数进行更新。In an implementation manner, the performing model training on the intelligent questionnaire interview model according to the prediction information and the label information includes: calculating a model loss according to the prediction information and the label information; The loss calculates the gradient and performs backpropagation to update the model parameters of the intelligent questionnaire interview model by using the gradient descent method.

根据本申请的第二方面，提供了一种基于人机对话的智能问卷访谈方法，包括：获取当前对话访谈轮次用户输入的语音；将所述语音转换成对应的语音文本，并对所述语音文本进行编码以得到所述语音文本的语义特征向量；对所述语音进行特征提取，得到对应的语音特征向量；根据所述语义特征向量和所述语音特征向量，生成融合特征向量；根据所述融合特征向量获取与所述语音匹配的问答选项内容。According to the second aspect of the present application, an intelligent questionnaire interview method based on man-machine dialogue is provided, including: obtaining the voice input by the user in the current dialogue interview round; converting the voice into corresponding voice text, and analyzing the Encoding the speech text to obtain the semantic feature vector of the speech text; performing feature extraction on the speech to obtain a corresponding speech feature vector; generating a fusion feature vector according to the semantic feature vector and the speech feature vector; The fused feature vector is used to obtain the content of the question and answer option matched with the voice.

在一种实现方式中，所述将所述语音转换成对应的语音文本，并对所述语音文本进行编码以得到所述语音文本的语义特征向量，包括：将所述语音输入至智能问卷访谈模型；其中，所述智能问卷访谈模型为采用如第一方面任一项所述的训练方法训练得到的模型；基于所述语音识别单元对所述语音进行识别，得到对应的语音文本；基于所述文本语义特征提取单元对所述语音文本进行编码以得到所述语音文本的语义特征向量。In an implementation manner, the converting the speech into a corresponding speech text, and encoding the speech text to obtain the semantic feature vector of the speech text includes: inputting the speech into an intelligent questionnaire interview model; wherein, the intelligent questionnaire interview model is a model obtained by using the training method described in any one of the first aspect; based on the speech recognition unit, the speech is recognized to obtain the corresponding speech text; based on the The text semantic feature extraction unit encodes the speech text to obtain a semantic feature vector of the speech text.

在一种可选地实现方式中，所述对所述语音进行特征提取，得到对应的语音特征向量，包括：基于所述语音特征提取单元对所述语音进行特征提取，得到对应的语音特征向量。In an optional implementation manner, the performing feature extraction on the speech to obtain a corresponding speech feature vector includes: performing feature extraction on the speech based on the speech feature extraction unit to obtain a corresponding speech feature vector .

在一种可选地实现方式中，所述根据所述融合特征向量获取与所述语音匹配的问答选项内容，包括：将所述融合特征向量输入至所述分类预测单元中进行预测，得到所述分类预测单元输出的预测信息；根据所述预测信息确定与所述语音匹配的问答选项内容。In an optional implementation manner, the obtaining the content of the question and answer option matching the voice according to the fusion feature vector includes: inputting the fusion feature vector into the classification prediction unit for prediction, and obtaining the The prediction information output by the classification prediction unit; according to the prediction information, the content of the question and answer option matching the voice is determined.

在一种实现方式中，所述方法还包括：基于预设的问卷规则和所述问答选项内容，触发下一对话访谈轮次的相应动作，并获得所述下一对话访谈轮次的问题文本；对所述问题文本进行从文本到语音TTS语音合成，得到与所述问题文本对应的问题语音；将所述问题语音通过语音播放模块播放给所述用户。In an implementation manner, the method further includes: based on the preset questionnaire rules and the content of the question and answer options, triggering the corresponding action of the next dialogue interview round, and obtaining the question text of the next dialogue interview round ; performing text-to-speech TTS speech synthesis on the question text to obtain a question voice corresponding to the question text; playing the question voice to the user through a voice playback module.

根据本申请的第三方面，提供一种智能问卷访谈模型训练装置，包括：获取模块，语音获取语音样本和所述语音样本的标签信息；输入模块，用于将所述语音样本输入至智能问卷访谈模型；其中，所述智能问卷访谈模型包括语音识别单元、文本语义特征提取单元、语音特征提取单元和分类预测单元；第一处理模块，用于基于所述语音识别单元对所述语音样本进行识别，得到对应的语音样本文本，并基于所述文本语义特征提取单元对所述语音样本文本进行编码以得到所述语音样本文本的语义特征向量；第二处理模块，用于基于所述语音特征提取单元对所述语音样本进行特征提取，得到对应的语音特征向量；第三处理模块，用于根据所述语义特征向量和所述语音特征向量，采用所述分类预测单元预测与所述语音样本匹配的问答选项内容，得到所述分类预测单元输出的预测信息，并根据所述预测信息和所述标签信息对所述智能问卷访谈模型进行模型训练。According to the third aspect of the present application, an intelligent questionnaire interview model training device is provided, including: an acquisition module, which acquires voice samples and label information of the voice samples; an input module, which is used to input the voice samples into the intelligent questionnaire Interview model; wherein, the intelligent questionnaire interview model includes a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit; a first processing module, for performing the speech sample on the basis of the speech recognition unit Recognize, obtain the corresponding speech sample text, and encode the speech sample text based on the text semantic feature extraction unit to obtain the semantic feature vector of the speech sample text; the second processing module is used for based on the speech feature The extraction unit performs feature extraction on the speech sample to obtain the corresponding speech feature vector; the third processing module is used to use the classification prediction unit to predict the speech sample according to the semantic feature vector and the speech feature vector The matching question and answer option content is used to obtain the prediction information output by the classification prediction unit, and perform model training on the intelligent questionnaire interview model according to the prediction information and the label information.

在一种实现方式中，所述智能问卷访谈模型还包括特征融合单元；所述第三处理模块，具体用于：基于所述特征融合单元将所述语义特征向量和所述语音特征向量进行特征融合处理，得到融合特征向量；将所述融合特征向量输入至所述分类预测单元中以预测与所述语音样本匹配的问答选项内容，得到所述分类预测单元输出的预测信息。In an implementation manner, the intelligent questionnaire interview model further includes a feature fusion unit; the third processing module is specifically configured to: perform feature extraction of the semantic feature vector and the speech feature vector based on the feature fusion unit Fusion processing to obtain a fusion feature vector; input the fusion feature vector into the classification prediction unit to predict the content of the question and answer option matching the speech sample, and obtain the prediction information output by the classification prediction unit.

在一种可选地实现方式中，所述第三处理模块，具体用于：基于所述特征融合单元将所述语义特征向量和所述语音特征向量进行拼接处理，并对拼接处理后得到的向量进行降维处理，以得到所述融合特征向量。In an optional implementation manner, the third processing module is specifically configured to: perform splicing processing on the semantic feature vector and the speech feature vector based on the feature fusion unit, and perform splicing processing on the obtained Vectors are subjected to dimensionality reduction processing to obtain the fused feature vectors.

在一种可选地实现方式中，所述第三处理模块具体用于：根据所述预测信息和所述标签信息计算模型损失；根据所述模型损失计算梯度，并进行反向传播，以利用梯度下降方式对所述智能问卷访谈模型的模型参数进行更新。In an optional implementation manner, the third processing module is specifically configured to: calculate a model loss according to the prediction information and the label information; calculate a gradient according to the model loss, and perform backpropagation to utilize The gradient descent method updates the model parameters of the intelligent questionnaire interview model.

根据本申请的第四方面，提供一种基于人机对话的智能问卷访谈装置，包括：获取模块，用于获取当前对话访谈轮次用户输入的语音；编码模块，用于将所述语音转换成对应的语音文本，并对所述语音文本进行编码以得到所述语音文本的语义特征向量；特征提取模块，用于对所述语音进行特征提取，得到对应的语音特征向量；第一处理模块，用于根据所述语义特征向量和所述语音特征向量，生成融合特征向量；第二处理模块，用于根据所述融合特征向量获取与所述语音匹配的问答选项内容。According to the fourth aspect of the present application, there is provided an intelligent questionnaire interview device based on man-machine dialogue, including: an acquisition module for acquiring the voice input by the user in the current dialogue interview round; an encoding module for converting the voice into Corresponding voice text, and encode the voice text to obtain the semantic feature vector of the voice text; the feature extraction module is used to perform feature extraction on the voice to obtain the corresponding voice feature vector; the first processing module, A fusion feature vector is generated according to the semantic feature vector and the voice feature vector; a second processing module is used to obtain the content of the question and answer option matching the voice according to the fusion feature vector.

在一种实现方式中，所述编码模块具体用于：将所述语音输入至智能问卷访谈模型；其中，所述智能问卷访谈模型为采用如第一方面所述的训练方法训练得到的模型；基于所述语音识别单元对所述语音进行识别，得到对应的语音文本；基于所述文本语义特征提取单元对所述语音文本进行编码以得到所述语音文本的语义特征向量。In an implementation manner, the encoding module is specifically configured to: input the voice into an intelligent questionnaire interview model; wherein, the intelligent questionnaire interview model is a model trained by using the training method as described in the first aspect; Recognizing the speech based on the speech recognition unit to obtain a corresponding speech text; encoding the speech text based on the text semantic feature extraction unit to obtain a semantic feature vector of the speech text.

在一种可选地实现方式中，所述编码模块具体用于：基于所述语音特征提取单元对所述语音进行特征提取，得到对应的语音特征向量。In an optional implementation manner, the encoding module is specifically configured to: perform feature extraction on the speech based on the speech feature extraction unit to obtain a corresponding speech feature vector.

在一种可选地实现方式中，所述编码模块具体用于：将所述融合特征向量输入至所述分类预测单元中进行预测，得到所述分类预测单元输出的预测信息；根据所述预测信息确定与所述语音匹配的问答选项内容。In an optional implementation manner, the encoding module is specifically configured to: input the fused feature vector into the classification prediction unit for prediction, and obtain prediction information output by the classification prediction unit; according to the prediction The information identifies the content of the question and answer options that match the voice.

在一种实现方式中，所述装置还包括：第三处理模块，用于基于预设的问卷规则和所述问答选项内容，触发下一对话访谈轮次的相应动作，并获得所述下一对话访谈轮次的问题文本；语音合成模块，用于对所述问题文本进行从文本到语音TTS语音合成，得到与所述问题文本对应的问题语音；将所述问题语音通过语音播放模块播放给所述用户。In one implementation, the device further includes: a third processing module, configured to trigger corresponding actions in the next round of dialogue and interview based on the preset questionnaire rules and the content of the question and answer options, and obtain the next The question text of dialogue interview round; Speech synthesis module, is used for carrying out from text to speech TTS speech synthesis to described question text, obtains the question speech corresponding to described question text; Said question speech is played to by speech playing module said user.

根据本申请的第五方面，提供一种电子设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行第一方面所述的方法，或者，执行第二方面所述的方法。According to a fifth aspect of the present application, there is provided an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores information executable by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect, or execute the method described in the second aspect.

根据本申请的第六方面，提供一种存储有计算机指令的非瞬时计算机可读存储介质，其特征在于，所述计算机指令用于使所述计算机执行第一方面所述的方法，或者，执行第二方面所述的方法。According to a sixth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect, or to execute The method described in the second aspect.

根据本申请的第七方面，提供一种计算机程序产品，包括计算机程序，所述计算机程序在被处理器执行时实现根据第一方面所述的方法的步骤，或者，实现根据第二方面所述的方法的步骤。According to a seventh aspect of the present application, there is provided a computer program product, including a computer program, which, when executed by a processor, implements the steps of the method according to the first aspect, or implements the steps of the method according to the second aspect. steps of the method.

根据本申请的技术，可以基于获取的语音样本得到语音特征向量和语音特征向量，并结合分类预测单元预测得到代表与语音样本匹配的问答选项内容的预测信息，从而根据预测信息和标签信息对智能问卷访谈模型进行模型训练，以提升智能问卷访谈模型的语义理解能力和泛化能力，并缩减问卷对话的访谈轮次，提升问卷访谈过程的用户满意度，提高问卷回收率。According to the technology of the present application, the speech feature vector and the speech feature vector can be obtained based on the acquired speech samples, and combined with the prediction of the classification prediction unit, the prediction information representing the content of the question and answer options matching the speech samples can be obtained, so that the intelligent The questionnaire interview model is trained to improve the semantic understanding and generalization capabilities of the intelligent questionnaire interview model, reduce the interview rounds of the questionnaire dialogue, improve user satisfaction during the questionnaire interview process, and increase the questionnaire recovery rate.

应当理解，本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征，也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will be easily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本申请的限定。其中：The accompanying drawings are used to better understand the solution, and do not constitute a limitation to the application. in:

图1是根据本申请第一实施例的示意图；Fig. 1 is a schematic diagram according to the first embodiment of the present application;

图2是根据本申请第二实施例的示意图；Fig. 2 is a schematic diagram according to the second embodiment of the present application;

图3是根据本申请第三实施例的示意图；Fig. 3 is a schematic diagram according to a third embodiment of the present application;

图4是根据本申请第四实施例的示意图；Fig. 4 is a schematic diagram according to a fourth embodiment of the present application;

图5是根据本申请第五实施例的示意图；Fig. 5 is a schematic diagram according to a fifth embodiment of the present application;

图6是本申请实施例提供的一种基于多模态深度学习的智能问卷访谈方案的示意图；FIG. 6 is a schematic diagram of an intelligent questionnaire interview scheme based on multimodal deep learning provided by an embodiment of the present application;

图7是本申请实施例提供的一种智能问卷访谈系统中的应用流程图；FIG. 7 is an application flow chart in an intelligent questionnaire interview system provided by an embodiment of the present application;

图8是本申请实施例提供的一种智能问卷访谈模型训练装置的示意图；Fig. 8 is a schematic diagram of an intelligent questionnaire interview model training device provided by an embodiment of the present application;

图9是本申请实施例提供一种基于人机对话的智能问卷访谈装置的结构示意图；Fig. 9 is a schematic structural diagram of an intelligent questionnaire interview device based on man-machine dialogue provided by an embodiment of the present application;

图10是本申请实施例提供的另一种于人机对话的智能问卷访谈装置的结构示意图；Fig. 10 is a schematic structural diagram of another intelligent questionnaire interview device for man-machine dialogue provided by the embodiment of the present application;

图11是本申请实施例提供的一种电子设备的框图。Fig. 11 is a block diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

以下结合附图对本申请的示范性实施例做出说明，其中包括本申请实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本申请的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

其中，在本申请的描述中，除非另有说明，“/”表示或的意思，例如，A/B可以表示A或B；本文中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。本申请中涉及的第一、第二等各种数字编号仅为描述方便进行的区分，并不用来限制本申请实施例的范围，也不表示先后顺序。Among them, in the description of this application, unless otherwise specified, "/" means or means, for example, A/B can mean A or B; "and/or" in this article is only a kind of association describing associated objects A relationship means that there may be three kinds of relationships, for example, A and/or B means: A exists alone, A and B exist simultaneously, and B exists alone. The first, second, and other numbers involved in the present application are only for convenience of description, and are not used to limit the scope of the embodiments of the present application, nor do they indicate a sequence.

需要说明的是，本申请的技术方案中，所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。It should be noted that in the technical solution of this application, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

请参见图1，图1是根据本申请第一实施例的示意图。如图1所示，该实施例的智能问卷访谈模型训练方法，可以包括但不限于以下步骤：Please refer to FIG. 1 , which is a schematic diagram according to a first embodiment of the present application. As shown in Figure 1, the intelligent questionnaire interview model training method of this embodiment may include but not limited to the following steps:

步骤S101：获取语音样本和语音样本的标签信息。Step S101: Obtain a voice sample and label information of the voice sample.

举例而言，获取问卷访谈对话场景中的用户输入的语音作为语音样本，并获取该语音样本的内容所属内容类别的标签信息。例如，访谈过程中可以为用户提供A、B和C等选项，假设语音样本为选项A，或者与选项A内容相似的其他内容，或者是选项A的内容中的部分内容，则该语音样本的标签可以是代表选项A的类别标签。For example, the speech input by the user in the questionnaire interview dialogue scene is obtained as a speech sample, and the tag information of the content category to which the content of the speech sample belongs is obtained. For example, during the interview, the user may be provided with options such as A, B, and C. Assuming that the voice sample is option A, or other content similar to the content of option A, or part of the content of option A, the voice sample’s Labels can be category labels representing option A.

步骤S102：将语音样本输入至智能问卷访谈模型。Step S102: Input the voice sample into the intelligent questionnaire interview model.

其中，在本申请的实施例中，智能问卷访谈模型包括语音识别单元、文本语义特征提取单元、语音特征提取单元和分类预测单元。Wherein, in the embodiment of the present application, the intelligent questionnaire interview model includes a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit.

步骤S103：基于语音识别单元对语音样本进行识别，得到对应的语音样本文本，并基于文本语义特征提取单元对语音样本文本进行编码以得到语音样本文本的语义特征向量。Step S103: Recognize the speech samples based on the speech recognition unit to obtain corresponding speech sample texts, and encode the speech sample texts based on the text semantic feature extraction unit to obtain semantic feature vectors of the speech sample texts.

举例而言，基于语音识别单元对语音样本进行语音识别，得到该语音样本对应的语音样本文本，并基于文本语义特征提取单元对该语音样本文本进行特征编码，以得到语音样本文本的语义特征向量。For example, perform speech recognition on the speech sample based on the speech recognition unit to obtain the speech sample text corresponding to the speech sample, and perform feature encoding on the speech sample text based on the text semantic feature extraction unit to obtain the semantic feature vector of the speech sample text .

其中，在本申请的实施例中，文本语义特征提取单元可以为BERT(BidirectionalEncoder Representations from Transformers)模型。Wherein, in the embodiment of the present application, the text semantic feature extraction unit may be a BERT (BidirectionalEncoder Representations from Transformers) model.

在本申请的一些实施例中，可以对预训练Pre-training BERT模型进行初始化，然后基于带有标签信息的访谈内容数据对BERT模型的参数进行微调(fine-tuning)处理，使用处理后BERT模型获取语音样本文本的语义特征向量。In some embodiments of the present application, the pre-training Pre-training BERT model can be initialized, and then the parameters of the BERT model can be fine-tuned based on the interview content data with label information, and the processed BERT model can be used Get the semantic feature vector of the speech sample text.

在本申请的一些实施例中，可以采用ASR(Automatic Speech Recognition，自动语音识别)增强技术，结合语音特征与场景特征，进行特定问卷访谈场景下的语音文本识别功能增强得到增强后的语音样本文本，再针对增强后的语音样本文本进行编码以得到语音样本文本的语义特征向量。In some embodiments of the present application, ASR (Automatic Speech Recognition, automatic speech recognition) enhancement technology can be used, combined with speech features and scene features, to enhance the speech text recognition function in a specific questionnaire interview scene to obtain the enhanced speech sample text , and then encode the enhanced speech sample text to obtain the semantic feature vector of the speech sample text.

步骤S104：基于语音特征提取单元对语音样本进行特征提取，得到对应的语音特征向量。Step S104: Perform feature extraction on the speech sample based on the speech feature extraction unit to obtain a corresponding speech feature vector.

举例而言，将获取的语音样本输入语音特征提取单元，采用MFCC(Mel-FrequencyCepstral Coefficients，梅尔频率倒谱系数)方法，对语音样本依次进行预加重、分帧、加窗和FFT(Fast Fourier transform，快速傅里叶变换)处理，将处理后的数据输入Mel滤波器组得到输出数据，对输出数据进行对数运算，之后进行DCT(Discrete CosineTransform，离散余弦变换)以完成特征提取，得到作为语音样本的物理信息表示的语音特征向量。For example, the acquired speech samples are input into the speech feature extraction unit, and the MFCC (Mel-Frequency Cepstral Coefficients, Mel-Frequency Cepstral Coefficients) method is used to sequentially perform pre-emphasis, framing, windowing and FFT (Fast Fourier Coefficients) on the speech samples. transform, Fast Fourier Transform) processing, input the processed data into the Mel filter bank to obtain output data, perform logarithmic operation on the output data, and then perform DCT (Discrete Cosine Transform, Discrete Cosine Transform) to complete feature extraction, and obtain as The speech feature vector represented by the physical information of the speech sample.

步骤S105：根据语义特征向量和语音特征向量，采用分类预测单元预测与语音样本匹配的问答选项内容，得到分类预测单元输出的预测信息，并根据预测信息和标签信息对智能问卷访谈模型进行模型训练。Step S105: According to the semantic feature vector and the voice feature vector, use the classification prediction unit to predict the content of the question and answer options that match the voice sample, obtain the prediction information output by the classification prediction unit, and perform model training on the intelligent questionnaire interview model according to the prediction information and label information .

举例而言，将语义特征向量和语音特征向量作为输入数据，输入分类预测单元，得到分类预测单元输出的预测信息，该预测信息为与上述语义特征向量和语音特征向量对应的语音样本匹配的问答选项内容，并根据该预测信息和该语音样本对应的标签信息对智能问卷访谈模型进行模型训练。For example, the semantic feature vector and the speech feature vector are used as input data, input into the classification prediction unit, and the prediction information output by the classification prediction unit is obtained, and the prediction information is a question and answer matching the speech sample corresponding to the above semantic feature vector and speech feature vector option content, and perform model training on the intelligent questionnaire interview model according to the prediction information and the label information corresponding to the voice sample.

通过实施本申请实施例，可以基于获取的语音样本得到语音特征向量和语音特征向量，并结合分类预测单元预测得到代表与语音样本匹配的问答选项内容的预测信息，从而根据预测信息和标签信息对智能问卷访谈模型进行模型训练，以提升智能问卷访谈模型的语义理解能力和泛化能力。By implementing the embodiment of the present application, the speech feature vector and the speech feature vector can be obtained based on the acquired speech sample, and combined with the prediction of the classification prediction unit, the prediction information representing the content of the question and answer option matching the speech sample can be obtained, so that the prediction information and label information can be used for The intelligent questionnaire interview model performs model training to improve the semantic understanding and generalization capabilities of the intelligent questionnaire interview model.

在一种实现方式中，可以将语义特征向量和语音特征向量进行特征融合，从而基于融合得到的融合特征向量和分类预测单元得到预测信息。作为一种示例，请参见图2，图2是根据本申请第二实施例的示意图。如图2所示，该实施例的智能问卷访谈模型训练方法可以包括但不限于以下步骤。In an implementation manner, the semantic feature vector and the speech feature vector may be fused, so as to obtain prediction information based on the fused feature vector and classification prediction unit obtained through fusion. As an example, please refer to FIG. 2 , which is a schematic diagram according to a second embodiment of the present application. As shown in FIG. 2 , the intelligent questionnaire interview model training method of this embodiment may include but not limited to the following steps.

步骤S201：获取语音样本和语音样本的标签信息。Step S201: Acquire voice samples and label information of the voice samples.

在本申请的实施例中，步骤S201可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiments of the present application, step S201 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiments of the present application, and will not be repeated here.

步骤S202：将语音样本输入至智能问卷访谈模型。Step S202: Input the voice sample into the intelligent questionnaire interview model.

在本申请的实施例中，步骤S202可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S202 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

其中，智能问卷访谈模型包括语音识别单元、文本语义特征提取单元、语音特征提取单元和分类预测单元。Among them, the intelligent questionnaire interview model includes a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit.

步骤S203：基于语音识别单元对语音样本进行识别，得到对应的语音样本文本，并基于文本语义特征提取单元对语音样本文本进行编码以得到语音样本文本的语义特征向量。Step S203: Recognize the speech samples based on the speech recognition unit to obtain corresponding speech sample texts, and encode the speech sample texts based on the text semantic feature extraction unit to obtain semantic feature vectors of the speech sample texts.

在本申请的实施例中，步骤S203可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiments of the present application, step S203 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiments of the present application, and will not be repeated here.

步骤S204：基于语音特征提取单元对语音样本进行特征提取，得到对应的语音特征向量。Step S204: Perform feature extraction on the speech sample based on the speech feature extraction unit to obtain a corresponding speech feature vector.

在本申请的实施例中，步骤S204可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S204 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S205：基于特征融合单元将语义特征向量和语音特征向量进行特征融合处理，得到融合特征向量。Step S205: Perform feature fusion processing on the semantic feature vector and the speech feature vector based on the feature fusion unit to obtain a fusion feature vector.

在一种可选地实现方式中，上述基于特征融合单元将语义特征向量和语音特征向量进行特征融合处理，得到融合特征向量，包括：基于特征融合单元将语义特征向量和语音特征向量进行拼接处理，并对拼接处理后得到的向量进行降维处理，以得到融合特征向量。In an optional implementation, the feature fusion unit performs feature fusion processing on the semantic feature vector and the speech feature vector to obtain the fusion feature vector, including: splicing the semantic feature vector and the speech feature vector based on the feature fusion unit , and perform dimensionality reduction on the vectors obtained after splicing to obtain the fusion feature vector.

举例而言，将语义特征向量和语音特征向量输入特征融和单元，以将语义特征向量和语音特征向量进行直接向量拼接，并对拼接处理后得到的向量进行PCA(PrincipalComponent Analysis，主成分分析)降维处理。For example, the semantic feature vector and the phonetic feature vector are input into the feature fusion unit, so that the semantic feature vector and the phonetic feature vector are directly vector-spliced, and the vector obtained after the splicing process is subjected to PCA (Principal Component Analysis, principal component analysis) reduction. dimension processing.

其中，上述将语义特征向量和语音特征向量进行拼接处理可表示为：Wherein, the above-mentioned splicing process of the semantic feature vector and the phonetic feature vector can be expressed as:

v＝[v₁,v₂]∈R^n+m v=[v ₁ ,v ₂ ]∈R ^n+m

其中，v为拼接处理后得到的向量，v₁为语义特征向量v₁∈Rⁿ，v₂为语音特征向量，v₂∈R^m，m和n为正整数。Among them, v is the vector obtained after splicing, v ₁ is the semantic feature vector v ₁ ∈ R ⁿ , v ₂ is the phonetic feature vector, v ₂ ∈ R ^m , m and n are positive integers.

在本申请的一些实施例中，可以采用线性映射方法，将拼接处理后得到的向量转换成v＝Wv∈Rⁿ形式。In some embodiments of the present application, a linear mapping method may be used to convert the vector obtained after splicing into the form of v=Wv∈R ⁿ .

步骤S206：将融合特征向量输入至分类预测单元中以预测与语音样本匹配的问答选项内容，得到分类预测单元输出的预测信息。Step S206: Input the fused feature vector into the classification prediction unit to predict the content of the question and answer option matching the voice sample, and obtain the prediction information output by the classification prediction unit.

举例而言，将融合特征向量作为输入数据，输入至分类预测单元得到分类预测单元输出的预测信息，该预测信息即为分类预测单元预测的与语音样本匹配的问答选项内容。For example, the fused feature vector is used as input data to the classification prediction unit to obtain the prediction information output by the classification prediction unit. The prediction information is the content of the question and answer options predicted by the classification prediction unit and matched with the speech sample.

步骤S207：根据预测信息和标签信息对智能问卷访谈模型进行模型训练。Step S207: Perform model training on the intelligent questionnaire interview model according to the prediction information and label information.

举例而言，根据预测信息和标签信息获取模型的当前损失值，根据当前损失值更新智能问卷访谈模型参数，以缩小模型的损失值，从而对模型进行训练。For example, the current loss value of the model is obtained according to the prediction information and label information, and the intelligent questionnaire interview model parameters are updated according to the current loss value to reduce the loss value of the model, thereby training the model.

通过实施本申请实施例，可以基于获取的语音样本得到语音特征向量和语音特征向量，并将语义特征向量和语音特征向量进行特征融合，从而基于融合得到的融合特征向量和分类预测单元得到代表与语音样本匹配的问答选项内容的预测信息，以根据预测信息和标签信息对智能问卷访谈模型进行模型训练。可以提升智能问卷访谈模型的语义理解能力和泛化能力。By implementing the embodiment of the present application, the speech feature vector and the speech feature vector can be obtained based on the acquired speech samples, and the semantic feature vector and the speech feature vector can be fused together, so that the representative and The prediction information of the content of the question and answer option matched by the voice sample is used to perform model training on the intelligent questionnaire interview model according to the prediction information and label information. It can improve the semantic understanding ability and generalization ability of the intelligent questionnaire interview model.

在一种实现方式中，可以根据预测信息和标签信息计算模型损失，从而根据模型损失对模型参数进行更新。作为一种示例，请参见图3，图3是根据本申请第三实施例的示意图。如图3所示，该实施例的智能问卷访谈模型训练方法可以包括但不限于以下步骤。In an implementation manner, the model loss may be calculated according to the prediction information and the label information, so as to update the model parameters according to the model loss. As an example, please refer to FIG. 3 , which is a schematic diagram according to a third embodiment of the present application. As shown in FIG. 3 , the intelligent questionnaire interview model training method of this embodiment may include but not limited to the following steps.

步骤S301：获取语音样本和语音样本的标签信息。Step S301: Obtain a voice sample and label information of the voice sample.

在本申请的实施例中，步骤S301可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S301 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S302：将语音样本输入至智能问卷访谈模型。Step S302: Input the voice sample into the intelligent questionnaire interview model.

在本申请的实施例中，步骤S302可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S302 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S303：基于语音识别单元对语音样本进行识别，得到对应的语音样本文本，并基于文本语义特征提取单元对语音样本文本进行编码以得到语音样本文本的语义特征向量。Step S303: Recognize the speech samples based on the speech recognition unit to obtain corresponding speech sample texts, and encode the speech sample texts based on the text semantic feature extraction unit to obtain semantic feature vectors of the speech sample texts.

在本申请的实施例中，步骤S303可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S303 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S304：基于语音特征提取单元对语音样本进行特征提取，得到对应的语音特征向量。Step S304: Perform feature extraction on the speech sample based on the speech feature extraction unit to obtain a corresponding speech feature vector.

在本申请的实施例中，步骤S304可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S304 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S305：根据语义特征向量和语音特征向量，采用分类预测单元预测与语音样本匹配的问答选项内容，得到分类预测单元输出的预测信息。Step S305: According to the semantic feature vector and the speech feature vector, the classification prediction unit is used to predict the content of the question and answer option matching the speech sample, and the prediction information output by the classification prediction unit is obtained.

在本申请的实施例中，步骤S305可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S305 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S306：根据预测信息和标签信息计算模型损失。Step S306: Calculate model loss according to prediction information and label information.

举例而言，根据预测信息和标签信息，采用预设的损失函数计算模型损失。For example, according to the prediction information and label information, a preset loss function is used to calculate the model loss.

步骤S307：根据模型损失计算梯度，并进行反向传播，以利用梯度下降方式对智能问卷访谈模型的模型参数进行更新。Step S307: Calculate the gradient according to the model loss, and perform backpropagation to update the model parameters of the intelligent questionnaire interview model by gradient descent.

举例而言，根据模型损失计算梯度，在反向传播过程中，利用梯度下降法迭代搜索模型损失函数的最小值，以对智能问卷访谈模型的模型参数进行更新。For example, the gradient is calculated according to the model loss, and in the backpropagation process, the gradient descent method is used to iteratively search for the minimum value of the model loss function to update the model parameters of the intelligent questionnaire interview model.

通过实施本申请实施例，可以基于获取的语音样本得到语音特征向量和语音特征向量，并结合分类预测单元预测得到代表与语音样本匹配的问答选项内容的预测信息，从而根据预测信息和标签信息计算模型损失，以根据模型损失对模型参数进行更新，实现对智能问卷访谈模型进行模型训练，以提升智能问卷访谈模型的语义理解能力和泛化能力。By implementing the embodiment of the present application, the speech feature vector and the speech feature vector can be obtained based on the acquired speech samples, and combined with the prediction of the classification prediction unit to obtain the prediction information representing the content of the question and answer option matching the speech sample, so as to calculate according to the prediction information and label information Model loss, to update the model parameters according to the model loss, realize the model training of the intelligent questionnaire interview model, and improve the semantic understanding ability and generalization ability of the intelligent questionnaire interview model.

请参见图4，图4是根据本申请第四实施例的示意图。如图4所示，该实施例的基于人机对话的智能问卷访谈方法可以包括但不限于以下步骤。Please refer to FIG. 4 , which is a schematic diagram according to a fourth embodiment of the present application. As shown in FIG. 4 , the intelligent questionnaire interview method based on man-machine dialogue in this embodiment may include but not limited to the following steps.

步骤S401：获取当前对话访谈轮次用户输入的语音。Step S401: Obtain the voice input by the user in the current dialogue interview round.

举例而言，获取正在进行的当前对话访谈轮次中用户输入的语音信息。For example, to obtain the voice information input by the user in the current interview round of the ongoing dialogue.

步骤S402：将语音转换成对应的语音文本，并对语音文本进行编码以得到语音文本的语义特征向量。Step S402: converting the speech into corresponding speech text, and encoding the speech text to obtain a semantic feature vector of the speech text.

举例而言，对语音进行语音识别，以将语音转换成对应的语音文本，并对语音文本输入BERT模型进行编码，得到语音文本的语义特征向量。For example, speech recognition is performed on the speech to convert the speech into corresponding speech text, and the speech text is input into the BERT model for encoding to obtain the semantic feature vector of the speech text.

步骤S403：对语音进行特征提取，得到对应的语音特征向量。Step S403: Perform feature extraction on the speech to obtain a corresponding speech feature vector.

举例而言，采用MFCC方法对语音进行特征提取，得到对应的语音特征向量。For example, the MFCC method is used to perform feature extraction on the speech to obtain the corresponding speech feature vector.

步骤S404：根据语义特征向量和语音特征向量，生成融合特征向量。Step S404: Generate a fusion feature vector according to the semantic feature vector and the phonetic feature vector.

举例而言，将语义特征向量和语音特征向量进行向量拼接，生成融合特征向量。For example, the semantic feature vector and the phonetic feature vector are vector concatenated to generate a fusion feature vector.

步骤S405：根据融合特征向量获取与语音匹配的问答选项内容。Step S405: Obtain the content of the question and answer option matched with the voice according to the fused feature vector.

举例而言，对融合特征向量进行语义理解处理以理解对话语义，从而识别得到相应的问答选项内容。For example, semantic understanding processing is performed on the fused feature vector to understand dialogue semantics, so as to identify corresponding question and answer option content.

通过实施本申请实施例，可以将基于用户输入的语音获取的语义特征向量和语音特征向量进行特征融合生成融合特征向量，以根据融合特征向量获取与语音匹配的问答选项内容。可以提高人机对话场景中对用户语音的语义理解能力，从而提升问卷访谈过程的用户满意度，提高问卷回收率。By implementing the embodiment of the present application, the semantic feature vector and the speech feature vector obtained based on the voice input by the user can be fused to generate a fused feature vector, so as to obtain the question and answer option content matched with the voice according to the fused feature vector. It can improve the semantic understanding ability of the user's voice in the man-machine dialogue scene, thereby improving the user satisfaction in the questionnaire interview process and improving the questionnaire recovery rate.

在一种实现方式中，可以基于本申请实施例提供的智能问卷访谈模型，将语音转换成对应的语音文本，并对语音文本进行编码以得到语音文本的语义特征向量。作为一种示例，请参见图5，图5是根据本申请第五实施例的示意图。如图5所示，该实施例的基于人机对话的智能问卷访谈方法可以包括但不限于以下步骤。In an implementation manner, based on the intelligent questionnaire interview model provided by the embodiment of the present application, the speech may be converted into corresponding speech text, and the speech text may be encoded to obtain a semantic feature vector of the speech text. As an example, please refer to FIG. 5 , which is a schematic diagram according to a fifth embodiment of the present application. As shown in FIG. 5 , the intelligent questionnaire interview method based on man-machine dialogue in this embodiment may include but not limited to the following steps.

步骤S501：获取当前对话访谈轮次用户输入的语音。Step S501: Obtain the voice input by the user in the current dialogue interview round.

在本申请的实施例中，步骤S501可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, step S501 may be implemented in any one of the embodiments of the present application, which is not limited in the embodiment of the present application, and will not be repeated here.

步骤S502：将语音输入至智能问卷访谈模型。Step S502: Input voice into the intelligent questionnaire interview model.

其中，在本申请的实施例中，智能问卷访谈模型为本申请任一实施例提供的训练方法训练得到的模型。Wherein, in the embodiment of the present application, the intelligent questionnaire interview model is a model trained by the training method provided in any embodiment of the present application.

步骤S503：基于语音识别单元对语音进行识别，得到对应的语音文本。Step S503: Recognize the voice based on the voice recognition unit to obtain the corresponding voice text.

举例而言，基于智能问卷访谈模型中的语音识别单元对用户输入的语音进行识别，得到对应的语音文本。For example, based on the speech recognition unit in the intelligent questionnaire interview model, the speech input by the user is recognized to obtain the corresponding speech text.

步骤S504：基于文本语义特征提取单元对语音文本进行编码以得到语音文本的语义特征向量。Step S504: Encoding the speech text based on the text semantic feature extraction unit to obtain a semantic feature vector of the speech text.

举例而言，基于智能问卷访谈模型中的文本语义特征提取单元对语音文本进行特征编码，以得到对应的语音文本的语义特征向量。For example, based on the text semantic feature extraction unit in the intelligent questionnaire interview model, feature encoding is performed on the speech text to obtain the corresponding semantic feature vector of the speech text.

步骤S505：基于语音特征提取单元对语音进行特征提取，得到对应的语音特征向量。Step S505: Perform feature extraction on the speech based on the speech feature extraction unit to obtain a corresponding speech feature vector.

举例而言，基于智能问卷访谈模型中的语音特征提取单元对用户输入的语音进行特征提取，得到对应的语音特征向量。For example, the speech feature extraction unit in the intelligent questionnaire interview model performs feature extraction on the speech input by the user to obtain a corresponding speech feature vector.

步骤S506：将融合特征向量输入至分类预测单元中进行预测，得到分类预测单元输出的预测信息。Step S506: Input the fused feature vector into the classification prediction unit for prediction, and obtain the prediction information output by the classification prediction unit.

举例而言，将融合特征向量作为输入数据输入至分类预测单元，获得分类预测单元输出的预测信息。For example, the fusion feature vector is input into the classification prediction unit as input data, and the prediction information output by the classification prediction unit is obtained.

步骤S507：根据预测信息确定与语音匹配的问答选项内容。Step S507: Determine the content of the question and answer option that matches the voice according to the prediction information.

举例而言，预先为多种不同的预设预测信息对应设置不同的预设问答选项内容，根据分类预测单元输出的预测信息确定相对应的目标预设预测信息，将该目标预设预测信息对应的预设问答选项内容确定为与语音匹配的问答选项内容。For example, different preset question and answer option contents are correspondingly set in advance for a variety of different preset prediction information, and the corresponding target preset prediction information is determined according to the prediction information output by the classification prediction unit, and the target preset prediction information is corresponding to The preset question-and-answer option content of is determined as the voice-matched question-and-answer option content.

通过实施本申请实施例，可以将用户输入的语音输入至预先训练得到的智能问卷访谈模型能够处理，得到预测信息，从而根据预测信息确定与语音匹配的问答选项内容。可以提高人机对话场景中对用户语音的语义理解能力，从而提升问卷访谈过程的用户满意度，提高问卷回收率。By implementing the embodiment of the present application, the voice input by the user can be input into the intelligent questionnaire interview model obtained through pre-training, which can be processed to obtain prediction information, so as to determine the content of the question and answer option matching the voice according to the prediction information. It can improve the semantic understanding ability of the user's voice in the man-machine dialogue scene, thereby improving the user satisfaction in the questionnaire interview process and improving the questionnaire recovery rate.

在本申请的一些实施例中，上述基于人机对话的智能问卷访谈方法还可以包括：基于预设的问卷规则和问答选项内容，触发下一对话访谈轮次的相应动作，并获得下一对话访谈轮次的问题文本；对问题文本进行TTS(Text To Speech，从文本到语音)语音合成，得到与问题文本对应的问题语音；将问题语音通过语音播放模块播放给用户。In some embodiments of the present application, the above-mentioned intelligent questionnaire interview method based on man-machine dialogue may also include: based on the preset questionnaire rules and question and answer option content, triggering the corresponding action of the next dialogue interview round, and obtaining the next dialogue The question text of the interview round; TTS (Text To Speech, from text to speech) speech synthesis is performed on the question text to obtain the question voice corresponding to the question text; the question voice is played to the user through the voice playback module.

作为一种示例，假设用户输入的语音是性别为女，基于预设的问卷规则的逻辑，在下一对话访谈轮次中，将不会询问该用户针对男性的问题，直接获得与女性对应的问题文本，并对该问题文本进行TTS语音合成，得到与问题文本对应的问题语音；将问题语音通过语音播放模块播放给用户。从而可以缩减对话访谈轮次。As an example, assuming that the voice input by the user is female, based on the logic of the preset questionnaire rules, in the next dialogue interview round, the user will not be asked questions about men, and the questions corresponding to women will be directly obtained text, and perform TTS speech synthesis on the question text to obtain the question voice corresponding to the question text; play the question voice to the user through the voice playback module. This reduces the number of dialogue interview rounds.

请参见图6，图6是本申请实施例提供的一种基于多模态深度学习的智能问卷访谈方案的示意图。如图6所示，本申请的技术方案首先获取用户的语音，基于MFCC方法对语音进行特征编码获取语音特征向量，并将语音转换为文本，基于BERT模型获取语义特征向量，将语音特征向量和语义特征向量进行特征融合，基于融合得到的向量进行分类预测，得到代表与语音样本匹配的问答选项内容的预测信息。Please refer to FIG. 6 . FIG. 6 is a schematic diagram of an intelligent questionnaire interview scheme based on multimodal deep learning provided by an embodiment of the present application. As shown in Figure 6, the technical solution of the present application first obtains the voice of the user, performs feature encoding on the voice based on the MFCC method to obtain the voice feature vector, converts the voice into text, obtains the semantic feature vector based on the BERT model, and combines the voice feature vector and Semantic feature vectors are fused, and classification prediction is performed based on the fused vectors, and prediction information representing the content of the question and answer options matched with the voice sample is obtained.

请参见图7，图7是本申请实施例提供的一种智能问卷访谈系统中的应用流程图。如图7所示，本申请的智能问卷访谈系统受限获取用户输入的语音，之后基于数据处理模块对获取的语音数据进行MFCC特征提取得到语音特征向量，并使用ASR和BERT对语音转换得到的文本进行特征构建得到文本特征(即语义特征向量)，之后基于特征融合模块将语音特征和文本特征进行特征融合，将融合得到的特征向量输入分类预测模块进行分类预测，得到预测结果，并将预测结果发送至问卷系统，问卷系统存储预测结果并可以基于预测结果并进行对话动作的判断处理。Please refer to FIG. 7 . FIG. 7 is an application flowchart of an intelligent questionnaire interview system provided by an embodiment of the present application. As shown in Figure 7, the intelligent questionnaire interview system of the present application is limited to obtain the speech input by the user, and then perform MFCC feature extraction on the acquired speech data based on the data processing module to obtain the speech feature vector, and use ASR and BERT to convert the obtained speech Text features are constructed to obtain text features (ie, semantic feature vectors), and then based on the feature fusion module, speech features and text features are feature-fused, and the fused feature vectors are input into the classification prediction module for classification prediction, and the prediction results are obtained. The results are sent to the questionnaire system, and the questionnaire system stores the prediction results and can judge and process dialogue actions based on the prediction results.

请参见图8，图8是本申请实施例提供的一种智能问卷访谈模型训练装置的示意图。如图8所示，该装置800包括：获取模块801，用于语音获取语音样本和语音样本的标签信息；输入模块802，用于将语音样本输入至智能问卷访谈模型；其中，智能问卷访谈模型包括语音识别单元、文本语义特征提取单元、语音特征提取单元和分类预测单元；第一处理模块803，用于基于语音识别单元对语音样本进行识别，得到对应的语音样本文本，并基于文本语义特征提取单元对语音样本文本进行编码以得到语音样本文本的语义特征向量；第二处理模块804，用于基于语音特征提取单元对语音样本进行特征提取，得到对应的语音特征向量；第三处理模块805，用于根据语义特征向量和语音特征向量，采用分类预测单元预测与语音样本匹配的问答选项内容，得到分类预测单元输出的预测信息，并根据预测信息和标签信息对智能问卷访谈模型进行模型训练。Please refer to FIG. 8 . FIG. 8 is a schematic diagram of an intelligent questionnaire interview model training device provided by an embodiment of the present application. As shown in Figure 8, the device 800 includes: an acquisition module 801, which is used to acquire voice samples and label information of the voice samples; an input module 802, which is used to input voice samples into the intelligent questionnaire interview model; wherein, the intelligent questionnaire interview model Including a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit; the first processing module 803 is used to identify the speech sample based on the speech recognition unit, obtain the corresponding speech sample text, and based on the text semantic feature The extraction unit encodes the speech sample text to obtain the semantic feature vector of the speech sample text; the second processing module 804 is used to perform feature extraction on the speech sample based on the speech feature extraction unit to obtain a corresponding speech feature vector; the third processing module 805 , which is used to predict the content of the question and answer options that match the voice sample by using the classification prediction unit according to the semantic feature vector and the speech feature vector, and obtain the prediction information output by the classification prediction unit, and perform model training on the intelligent questionnaire interview model according to the prediction information and label information .

在一种实现方式中，智能问卷访谈模型还包括特征融合单元；第三处理模块805具体用于：基于特征融合单元将语义特征向量和语音特征向量进行特征融合处理，得到融合特征向量；将融合特征向量输入至分类预测单元中以预测与语音样本匹配的问答选项内容，得到分类预测单元输出的预测信息。In one implementation, the intelligent questionnaire interview model also includes a feature fusion unit; the third processing module 805 is specifically used to: perform feature fusion processing on the semantic feature vector and the speech feature vector based on the feature fusion unit to obtain the fusion feature vector; The feature vector is input into the classification prediction unit to predict the content of the question and answer option matched with the voice sample, and the prediction information output by the classification prediction unit is obtained.

在一种可选地实现方式中，第三处理模块805具体用于：基于特征融合单元将语义特征向量和语音特征向量进行拼接处理，并对拼接处理后得到的向量进行降维处理，以得到融合特征向量。In an optional implementation manner, the third processing module 805 is specifically configured to: perform splicing processing on the semantic feature vector and the speech feature vector based on the feature fusion unit, and perform dimensionality reduction processing on the vector obtained after splicing processing, so as to obtain Merge feature vectors.

在一种可选地实现方式中，第三处理模块805具体用于：根据预测信息和标签信息计算模型损失；根据模型损失计算梯度，并进行反向传播，以利用梯度下降方式对智能问卷访谈模型的模型参数进行更新。In an optional implementation, the third processing module 805 is specifically configured to: calculate the model loss according to the prediction information and the label information; calculate the gradient according to the model loss, and perform backpropagation to use the gradient descent method to interview the intelligent questionnaire The model parameters of the model are updated.

本申请实施例的装置，可以基于获取的语音样本得到语音特征向量和语音特征向量，并结合分类预测单元预测得到代表与语音样本匹配的问答选项内容的预测信息，从而根据预测信息和标签信息对智能问卷访谈模型进行模型训练，以提升智能问卷访谈模型的语义理解能力和泛化能力。The device of the embodiment of the present application can obtain the speech feature vector and the speech feature vector based on the acquired speech sample, and combine the prediction of the classification prediction unit to obtain the prediction information representing the content of the question and answer option that matches the speech sample, so that according to the prediction information and label information The intelligent questionnaire interview model performs model training to improve the semantic understanding and generalization capabilities of the intelligent questionnaire interview model.

请参见图9，图9是本申请实施例提供一种基于人机对话的智能问卷访谈装置的结构示意图。如图9所示，该装置900包括：获取模块901，用于获取当前对话访谈轮次用户输入的语音；编码模块902，用于将语音转换成对应的语音文本，并对语音文本进行编码以得到语音文本的语义特征向量；特征提取模块903，用于对语音进行特征提取，得到对应的语音特征向量；第一处理模块904，用于根据语义特征向量和语音特征向量，生成融合特征向量；第二处理模块905，用于根据融合特征向量获取与语音匹配的问答选项内容。Please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of an intelligent questionnaire interview device based on man-machine dialogue provided by an embodiment of the present application. As shown in FIG. 9 , the device 900 includes: an acquisition module 901 for acquiring the voice input by the user in the current dialogue interview round; an encoding module 902 for converting the voice into corresponding voice text, and encoding the voice text to Obtain the semantic feature vector of speech text; Feature extraction module 903 is used for feature extraction to speech, obtains corresponding speech feature vector; The first processing module 904 is used for generating fusion feature vector according to semantic feature vector and speech feature vector; The second processing module 905 is configured to obtain the content of the question and answer option matched with the voice according to the fusion feature vector.

在一种实现方式中，编码模块902具体用于：将语音输入至智能问卷访谈模型；其中，智能问卷访谈模型为采用如第一方面的训练方法训练得到的模型；基于语音识别单元对语音进行识别，得到对应的语音文本；基于文本语义特征提取单元对语音文本进行编码以得到语音文本的语义特征向量。In one implementation, the encoding module 902 is specifically configured to: input the voice into the intelligent questionnaire interview model; wherein, the intelligent questionnaire interview model is a model obtained by using the training method as in the first aspect; perform voice recognition based on the voice recognition unit Recognition to obtain the corresponding speech text; based on the text semantic feature extraction unit, the speech text is encoded to obtain the semantic feature vector of the speech text.

在一种可选地实现方式中，编码模块902具体用于：基于语音特征提取单元对语音进行特征提取，得到对应的语音特征向量。In an optional implementation manner, the encoding module 902 is specifically configured to: perform feature extraction on the speech based on the speech feature extraction unit to obtain a corresponding speech feature vector.

在一种可选地实现方式中，编码模块902具体用于：将融合特征向量输入至分类预测单元中进行预测，得到分类预测单元输出的预测信息；根据预测信息确定与语音匹配的问答选项内容。In an optional implementation, the encoding module 902 is specifically configured to: input the fused feature vector into the classification prediction unit for prediction, and obtain the prediction information output by the classification prediction unit; determine the content of the question and answer option that matches the voice according to the prediction information .

通过本申请实施例的装置，可以将基于用户输入的语音获取的语义特征向量和语音特征向量进行特征融合生成融合特征向量，以根据融合特征向量获取与语音匹配的问答选项内容。可以提高人机对话场景中对用户语音的语义理解能力，从而提升问卷访谈过程的用户满意度，提高问卷回收率。Through the device of the embodiment of the present application, the semantic feature vector and the speech feature vector obtained based on the voice input by the user can be fused to generate a fused feature vector, so as to obtain the question and answer option content matching the voice according to the fused feature vector. It can improve the semantic understanding ability of the user's voice in the man-machine dialogue scene, thereby improving the user satisfaction in the questionnaire interview process and improving the questionnaire recovery rate.

在一种实现方式中，上述装置还包括第三处理模块。作为一种示例，请参见图10，图10是本申请实施例提供的另一种于人机对话的智能问卷访谈装置的结构示意图。如图10所示，该装置1000还包括第三处理模块1006，用于基于预设的问卷规则和问答选项内容，触发下一对话访谈轮次的相应动作，并获得下一对话访谈轮次的问题文本；语音合成模块，用于对问题文本进行从文本到语音TTS语音合成，得到与问题文本对应的问题语音；将问题语音通过语音播放模块播放给用户。其中，图10中1001～1005和图9中901～905具有相同功能和结构。In an implementation manner, the foregoing apparatus further includes a third processing module. As an example, please refer to FIG. 10 . FIG. 10 is a schematic structural diagram of another intelligent questionnaire and interview device for man-machine dialogue provided by the embodiment of the present application. As shown in Figure 10, the device 1000 also includes a third processing module 1006, configured to trigger the corresponding actions of the next round of dialogue and interview based on the preset questionnaire rules and question and answer option content, and obtain the results of the next round of dialogue and interview. Question text; speech synthesis module, used for text-to-speech TTS speech synthesis to the question text to obtain question speech corresponding to the question text; play the question speech to the user through the speech playback module. Wherein, 1001-1005 in FIG. 10 and 901-905 in FIG. 9 have the same function and structure.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

根据本申请的实施例，本申请还提供了一种电子设备和一种可读存储介质。According to the embodiments of the present application, the present application also provides an electronic device and a readable storage medium.

如图11所示，是本申请实施例提供的一种电子设备的框图。该电子设备能够执行根据本申请实施例的智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本申请的实现。As shown in FIG. 11 , it is a block diagram of an electronic device provided by an embodiment of the present application. The electronic device is a block diagram of an electronic device capable of executing the intelligent questionnaire interview model training method according to the embodiment of the present application, or the intelligent questionnaire interview method based on man-machine dialogue. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the applications described and/or claimed herein.

如图11所示，该电子设备包括：一个或多个处理器1101、存储器1102，以及用于连接各部件的接口，包括高速接口和低速接口。各个部件利用不同的总线互相连接，并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理，包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如，耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中，若需要，可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样，可以连接多个电子设备，各个设备提供部分必要的操作(例如，作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图11中以一个处理器1101为例。As shown in FIG. 11 , the electronic device includes: one or more processors 1101 , a memory 1102 , and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface. In other implementations, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system). In FIG. 11, a processor 1101 is taken as an example.

存储器1102即为本申请所提供的非瞬时计算机可读存储介质。其中，所述存储器存储有可由至少一个处理器执行的指令，以使所述至少一个处理器执行本申请所提供的智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法。本申请的非瞬时计算机可读存储介质存储计算机指令，该计算机指令用于使计算机执行本申请所提供的智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法。The memory 1102 is a non-transitory computer-readable storage medium provided in this application. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the intelligent questionnaire interview model training method provided in this application, or the intelligent questionnaire interview method based on human-computer dialogue. The non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to make the computer execute the intelligent questionnaire interview model training method provided in the present application, or the intelligent questionnaire interview method based on man-machine dialogue.

存储器1102作为一种非瞬时计算机可读存储介质，可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块，如本申请实施例中的智能问卷访谈模型训练方法对应的程序指令/模块(例如，附图8所示的获取模块801、输入模块802、第一处理模块803、第二处理模块804和第三处理模块805)或者基于人机对话的智能问卷访谈方法对应的程序指令/模块(例如，附图9所示的获取模块901、编码模块902、特征提取模块903、第一处理模块904和第二处理模块905，以及附图10所示的第三处理模块1006)。处理器1101通过运行存储在存储器1102中的非瞬时软件程序、指令以及模块，从而执行服务器的各种功能应用以及数据处理，即实现上述方法实施例中的智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法。The memory 1102, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules corresponding to the intelligent questionnaire interview model training method in the embodiment of the present application ( For example, the acquisition module 801, input module 802, first processing module 803, second processing module 804, and third processing module 805 shown in Figure 8) or program instructions/modules corresponding to the intelligent questionnaire interview method based on man-machine dialogue (For example, acquisition module 901, encoding module 902, feature extraction module 903, first processing module 904 and second processing module 905 shown in FIG. 9, and third processing module 1006 shown in FIG. 10). The processor 1101 executes various functional applications and data processing of the server by running the non-transient software programs, instructions and modules stored in the memory 1102, that is, to realize the intelligent questionnaire interview model training method in the above method embodiment, or based on human Intelligent questionnaire interview method of machine dialogue.

存储器1102可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储根据智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法的电子设备的使用所创建的数据等。此外，存储器1102可以包括高速随机存取存储器，还可以包括非瞬时存储器，例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中，存储器1102可选包括相对于处理器1101远程设置的存储器，这些远程存储器可以通过网络连接至智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 1102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; Data created by the use of electronic equipment using the smart questionnaire interview method, etc. In addition, the memory 1102 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory 1102 may optionally include a memory set remotely relative to the processor 1101, and these remote memories may be connected to electronic devices of the intelligent questionnaire interview model training method or the intelligent questionnaire interview method based on man-machine dialogue through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法的电子设备还可以包括：输入装置1103和输出装置1104。处理器1101、存储器1102、输入装置1103和输出装置1104可以通过总线或者其他方式连接，图11中以通过总线连接为例。The electronic equipment of the intelligent questionnaire interview model training method, or the intelligent questionnaire interview method based on man-machine dialogue may further include: an input device 1103 and an output device 1104 . The processor 1101, the memory 1102, the input device 1103, and the output device 1104 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 11 .

输入装置1103可接收输入的数字或字符信息，以及产生与智能问卷访谈模型训练方法，或者基于人机对话的智能问卷访谈方法的电子设备的用户设置以及功能控制有关的键信号输入，例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置1104可以包括显示设备、辅助照明装置(例如，LED)和触觉反馈装置(例如，振动电机)等。该显示设备可以包括但不限于，液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中，显示设备可以是触摸屏。The input device 1103 can receive input digital or character information, and generate key signal input related to the intelligent questionnaire interview model training method, or the user setting and function control of electronic equipment based on the intelligent questionnaire interview method of man-machine dialogue, such as touch screen, Input devices such as keypads, mice, trackpads, touchpads, pointing sticks, one or more mouse buttons, trackballs, joysticks, etc. The output device 1104 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令，并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的，术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如，磁盘、光盘、存储器、可编程逻辑装置(PLD))，包括，接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for a programmable processor and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine language calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)、互联网和区块链网络。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: local area networks (LANs), wide area networks (WANs), the Internet, and blockchain networks.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与VPS服务("Virtual Private Server"，或简称"VPS")中，存在的管理难度大，业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS") Among them, there are defects such as difficult management and weak business scalability. The server can also be a server of a distributed system, or a server combined with a blockchain.

根据本申请实施例的技术方案，可以基于获取的语音样本得到语音特征向量和语音特征向量，并结合分类预测单元预测得到代表与语音样本匹配的问答选项内容的预测信息，从而根据预测信息和标签信息对智能问卷访谈模型进行模型训练，以提升智能问卷访谈模型的语义理解能力和泛化能力，从而提升问卷访谈过程的用户满意度，提高问卷回收率。According to the technical solution of the embodiment of the present application, the speech feature vector and the speech feature vector can be obtained based on the acquired speech samples, and combined with the prediction of the classification prediction unit, the prediction information representing the content of the question and answer option matching the speech sample can be obtained, so that according to the prediction information and the label Information conducts model training on the intelligent questionnaire interview model to improve the semantic understanding and generalization capabilities of the intelligent questionnaire interview model, thereby improving user satisfaction during the questionnaire interview process and improving the questionnaire recovery rate.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本申请公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present application can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本申请保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等，均应包含在本申请保护范围之内。The above specific implementation methods are not intended to limit the protection scope of the present application. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

1. A method for training an intelligent questionnaire interview model, comprising:

Obtaining a voice sample and label information of the voice sample;

The speech sample is input to the intelligent questionnaire interview model; wherein, the intelligent questionnaire interview model includes a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit;

Recognizing the speech sample based on the speech recognition unit to obtain a corresponding speech sample text, and encoding the speech sample text based on the text semantic feature extraction unit to obtain a semantic feature vector of the speech sample text;

Carrying out feature extraction to the speech sample based on the speech feature extraction unit to obtain a corresponding speech feature vector;

According to the semantic feature vector and the speech feature vector, use the classification prediction unit to predict the content of the question and answer option matching the speech sample, obtain the prediction information output by the classification prediction unit, and according to the prediction information and the obtained The label information is used to perform model training on the intelligent questionnaire interview model.

2. method as claimed in claim 1, wherein, described intelligent questionnaire interview model also comprises feature fusion unit; Described according to described semantic feature vector and described speech feature vector, adopt described classification prediction unit to predict and described The content of the question and answer option matched by the voice sample is obtained to obtain the prediction information output by the classification prediction unit, including:

performing feature fusion processing on the semantic feature vector and the speech feature vector based on the feature fusion unit to obtain a fusion feature vector;

The fused feature vector is input into the classification prediction unit to predict the content of the question and answer option matching the speech sample, and the prediction information output by the classification prediction unit is obtained.

3. The method according to claim 2, wherein said feature fusion unit performs feature fusion processing on said semantic feature vector and said speech feature vector to obtain a fusion feature vector, comprising:

Splicing the semantic feature vector and the speech feature vector based on the feature fusion unit, and performing dimensionality reduction processing on the vector obtained after the splicing process, so as to obtain the fusion feature vector.

4. The method according to claim 1, wherein said model training is carried out to said intelligent questionnaire interview model according to said prediction information and said label information, comprising:

calculating a model loss according to the prediction information and the label information;

The gradient is calculated according to the model loss, and backpropagation is performed to update the model parameters of the intelligent questionnaire interview model in a gradient descent manner.

5. An intelligent questionnaire interview method based on man-machine dialogue, comprising:

Obtain the voice input by the user in the current dialogue interview round;

Converting the speech into a corresponding phonetic text, and encoding the phonetic text to obtain a semantic feature vector of the phonetic text;

Carry out feature extraction to described speech, obtain corresponding speech feature vector;

Generate a fusion feature vector according to the semantic feature vector and the speech feature vector;

The content of the question and answer option matching the voice is obtained according to the fused feature vector.

6. The method as claimed in claim 5, wherein said converting said speech into corresponding phonetic text, and encoding said phonetic text to obtain the semantic feature vector of said phonetic text, comprising:

Said speech is input to intelligent questionnaire interview model; Wherein, said intelligent questionnaire interview model is the model that adopts training method training as described in any one of claims 1 to 4;

Recognizing the voice based on the voice recognition unit to obtain a corresponding voice text;

Encoding the phonetic text based on the text semantic feature extraction unit to obtain a semantic feature vector of the phonetic text.

7. method as claimed in claim 6, wherein, described speech is carried out feature extraction, obtains corresponding speech feature vector, comprises:

Based on the speech feature extraction unit, feature extraction is performed on the speech to obtain a corresponding speech feature vector.

8. The method according to claim 6, wherein said obtaining the question-and-answer option content matched with said voice according to said fusion feature vector comprises:

inputting the fused feature vector into the classification prediction unit for prediction, and obtaining prediction information output by the classification prediction unit;

The content of the question and answer option matching the voice is determined according to the prediction information.

9. The method of any one of claims 5 to 8, further comprising:

Based on the preset questionnaire rules and the content of the question and answer options, trigger the corresponding action of the next dialogue interview round, and obtain the question text of the next dialogue interview round;

Carrying out TTS speech synthesis from text to speech to described question text, obtains the question speech corresponding to described question text;

The voice of the question is played to the user through the voice playing module.

10. An intelligent questionnaire interview model training device, comprising:

An acquisition module, voice acquisition of voice samples and tag information of the voice samples;

The input module is used to input the speech sample into the intelligent questionnaire interview model; wherein, the intelligent questionnaire interview model includes a speech recognition unit, a text semantic feature extraction unit, a speech feature extraction unit and a classification prediction unit;

A first processing module, configured to recognize the speech sample based on the speech recognition unit to obtain a corresponding speech sample text, and encode the speech sample text based on the text semantic feature extraction unit to obtain the speech The semantic feature vector of the sample text;

The second processing module is used to perform feature extraction on the speech sample based on the speech feature extraction unit to obtain a corresponding speech feature vector;

A third processing module, configured to use the classification prediction unit to predict the content of the question and answer option matching the speech sample according to the semantic feature vector and the speech feature vector, obtain the prediction information output by the classification prediction unit, and Model training is performed on the intelligent questionnaire interview model according to the prediction information and the label information.

11. The device according to claim 10, wherein the intelligent questionnaire interview model also includes a feature fusion unit; the third processing module is specifically used for:

12. The device according to claim 11, wherein the third processing module is specifically configured to:

13. The device according to claim 10, wherein the third processing module is specifically configured to:

14. An intelligent questionnaire interview device based on man-machine dialogue, comprising:

An acquisition module, configured to acquire the voice input by the user in the current dialogue interview round;

An encoding module, configured to convert the speech into a corresponding phonetic text, and encode the phonetic text to obtain a semantic feature vector of the phonetic text;

A feature extraction module, configured to perform feature extraction on the speech to obtain a corresponding speech feature vector;

A first processing module, configured to generate a fusion feature vector according to the semantic feature vector and the speech feature vector;

The second processing module is configured to acquire the content of the question and answer option matched with the voice according to the fused feature vector.

15. The device according to claim 14, wherein the encoding module is specifically used for:

16. The device according to claim 15, wherein the encoding module is specifically used for:

17. The device according to claim 15, wherein the encoding module is specifically used for:

18. The apparatus of any one of claims 14 to 17, further comprising:

The third processing module is used to trigger the corresponding action of the next dialogue interview round based on the preset questionnaire rules and the content of the question and answer options, and obtain the question text of the next dialogue interview round;

Speech synthesis module, for carrying out from text to speech TTS speech synthesis to described question text, obtains the question speech corresponding to described question text;

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 4 The method, or, carry out the method described in any one in claim 5 to 9.

20. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1 to 4, or to execute the The method described in any one of 5 to 9 is required.

21. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 4, or implements the steps according to any one of claims 5 to 9 The step of any described method.