CN115312061A - Voice question-answer method and device in driving scene and vehicle-mounted terminal - Google Patents
Voice question-answer method and device in driving scene and vehicle-mounted terminal Download PDFInfo
- Publication number
- CN115312061A CN115312061A CN202210952625.2A CN202210952625A CN115312061A CN 115312061 A CN115312061 A CN 115312061A CN 202210952625 A CN202210952625 A CN 202210952625A CN 115312061 A CN115312061 A CN 115312061A
- Authority
- CN
- China
- Prior art keywords
- external environment
- question
- answer
- environment information
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/80—Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Traffic Control Systems (AREA)
Abstract
本申请实施例公开了一种驾驶场景中的语音问答方法、装置及车载终端,属于人机交互技术领域。该方法包括:在接收到语音问答指令的情况下,获取外部环境信息,所述外部环境信息由环境信息采集组件在载具行驶过程中采集得到,且外部环境信息用于表征所述载具所处的外部环境,基于所述外部环境信息以及所述语音问答指令,获取所述语音问答指令对应的问答结果,基于所述问答结果进行语音播报;采用本实施例提供的方案,用户可以对驾驶室外部环境进行提问,车载终端均可以根据环境作出回答,提高人车交互问答系统的智能化程度。
The embodiments of the present application disclose a voice question answering method, device and vehicle terminal in a driving scene, which belong to the technical field of human-computer interaction. The method includes: in the case of receiving a voice question and answer instruction, acquiring external environment information, the external environment information is collected by an environmental information acquisition component during the driving of the vehicle, and the external environment information is used to represent the information of the vehicle. based on the external environment information and the voice question and answer command, obtain the question and answer result corresponding to the voice question and answer command, and perform a voice broadcast based on the question and answer result; using the solution provided by this embodiment, the user can When asking questions in the outdoor environment, the in-vehicle terminal can answer according to the environment, which improves the intelligence of the human-vehicle interactive question-and-answer system.
Description
技术领域technical field
本申请涉及人机交互技术领域,特别涉及一种驾驶场景中的语音问答方法、装置及车载终端。The present application relates to the technical field of human-computer interaction, in particular to a voice question-and-answer method, device, and vehicle-mounted terminal in a driving scene.
背景技术Background technique
随着车联网系统的快速发展,驾驶场景中的人车交互功能也不断普及,其中语音作为一项便捷的交互的方式,在人车交互中得到了广泛的应用,由此驾驶场景中的问答系统也不断完善。With the rapid development of the Internet of Vehicles system, the human-vehicle interaction function in the driving scene is also becoming more and more popular. Among them, voice, as a convenient interaction method, has been widely used in the human-vehicle interaction. Therefore, the question and answer in the driving scene The system is also constantly improving.
相关技术中,智能车机配备语音交互系统,可以在驾驶场景中获取用户语音实现设备控制或语音问答,其中,语音问答功能先通过语音识别技术转换为文本,再通过数据查询的方式找到与该文本匹配的答案,最后通过屏幕显示或语音播报的形式返回结果给用户,以实现智能问答的目的。In related technologies, the smart car is equipped with a voice interaction system, which can obtain the user's voice in the driving scene to realize device control or voice question and answer. Among them, the voice question and answer function is first converted into text through voice recognition technology, and then finds the relevant information through data query. The text-matched answers are finally returned to the user in the form of screen display or voice broadcast, so as to achieve the purpose of intelligent question answering.
然而,上述方法中的问答系统仅适用于用户对智能车机设备状态或可直接在互联网搜索到答案的问题,仅能实现简单载具控制以及导航功能,智能化程度较低。However, the question answering system in the above method is only applicable to the user's questions about the state of the smart car equipment or the answers can be directly searched on the Internet, and can only realize simple vehicle control and navigation functions, and the degree of intelligence is low.
发明内容Contents of the invention
本申请实施例提供了一种驾驶场景中的语音问答方法、装置及车载终端。所述技术方案如下:Embodiments of the present application provide a voice question answering method, device, and vehicle-mounted terminal in a driving scene. Described technical scheme is as follows:
一方面,本申请实施例提供了一种驾驶场景中的语音问答方法,所述方法包括:On the one hand, an embodiment of the present application provides a voice question answering method in a driving scene, the method comprising:
在接收到语音问答指令的情况下,获取外部环境信息,所述外部环境信息由环境信息采集组件在载具行驶过程中采集得到,且所述外部环境信息用于表征所述载具所处的外部环境;In the case of receiving a voice question and answer instruction, acquire external environment information, the external environment information is collected by the environmental information collection component during the vehicle's driving process, and the external environment information is used to characterize the location of the vehicle external environment;
基于所述外部环境信息以及所述语音问答指令,获取所述语音问答指令对应的问答结果;Acquiring a question and answer result corresponding to the voice question and answer instruction based on the external environment information and the voice question and answer instruction;
基于所述问答结果进行语音播报。Perform voice broadcast based on the question and answer results.
另一方面,本申请实施例提供了一种驾驶场景中的语音问答装置,所述装置包括:On the other hand, an embodiment of the present application provides a voice question answering device in a driving scene, and the device includes:
信息获取模块,用于在接收到语音问答指令的情况下,获取外部环境信息,所述外部环境信息由环境信息采集组件在载具行驶过程中采集得到,且所述外部环境信息用于表征所述载具所处的外部环境;An information acquisition module, configured to acquire external environment information when a voice question-and-answer instruction is received, the external environment information is collected by the environment information collection component during vehicle driving, and the external environment information is used to represent the The external environment in which the vehicle is located;
结果获取模块,用于基于所述外部环境信息以及所述语音问答指令,获取所述语音问答指令对应的问答结果;A result acquisition module, configured to acquire a question and answer result corresponding to the voice question and answer instruction based on the external environment information and the voice question and answer instruction;
语音播报模块,用于基于所述问答结果进行语音播报。The voice broadcast module is used to perform voice broadcast based on the question and answer results.
另一方面,本申请实施例提供了一种终端,所述终端包括处理器和存储器;所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现如上述方面所述的驾驶场景中的语音问答方法。On the other hand, an embodiment of the present application provides a terminal, the terminal includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement the above aspects Voice question answering method in the described driving scene.
另一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述程序代码由处理器加载并执行以实现如上述方面所述的驾驶场景中的语音问答方法。On the other hand, an embodiment of the present application provides a computer-readable storage medium, where at least one piece of program code is stored in the computer-readable storage medium, and the program code is loaded and executed by a processor to implement the above aspects. A Voice Question Answering Method in Driving Scenarios.
另一方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的驾驶场景中的语音问答方法。On the other hand, an embodiment of the present application provides a computer program product, where the computer program product includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the voice question-and-answer method in the driving scene provided in various optional implementation manners of the above aspects.
本申请实施例中,在接收到语音问答指令的情况下,车载终端能够基于表征所处外部环境的外部环境信息,确定该语音问答指令的问答结果,并通过语音方式进行问答结果播报,实现驾驶过程中针对实时外部环境的智能问答,提高了驾驶过程中人机交互的交互成功率以及智能化程度。In the embodiment of the present application, when a voice question and answer instruction is received, the vehicle-mounted terminal can determine the question and answer result of the voice question and answer instruction based on the external environment information that characterizes the external environment, and broadcast the question and answer result by voice to realize driving. During the process, the intelligent question and answer for the real-time external environment improves the success rate of human-computer interaction and the degree of intelligence in the driving process.
附图说明Description of drawings
图1是本申请一个示例性实施例提供的实施环境的示意图;FIG. 1 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application;
图2是本申请一个示例性实施例提供的语音问答系统的主要组成部分的方框图;Fig. 2 is a block diagram of the main components of the voice question answering system provided by an exemplary embodiment of the present application;
图3是本申请一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图;Fig. 3 is a method flowchart of a voice question-and-answer method in a driving scene provided by an exemplary embodiment of the present application;
图4是本申请另一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图;Fig. 4 is a method flowchart of a voice question-and-answer method in a driving scene provided by another exemplary embodiment of the present application;
图5是本申请另一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图;Fig. 5 is a method flowchart of a voice question-and-answer method in a driving scene provided by another exemplary embodiment of the present application;
图6是本申请一个示例性实施例提供的获取外部环境影像过程的示意图;Fig. 6 is a schematic diagram of the process of acquiring an external environment image provided by an exemplary embodiment of the present application;
图7是本申请一示例性实施例提供的获取语音问答指令对应的问答文本过程的示意图;Fig. 7 is a schematic diagram of the process of obtaining the question and answer text corresponding to the voice question and answer instruction provided by an exemplary embodiment of the present application;
图8是本申请又一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图;Fig. 8 is a method flowchart of a voice question-and-answer method in a driving scene provided by another exemplary embodiment of the present application;
图9是本申请一示例性实施例提供的确定第一采集时段过程的示意图;FIG. 9 is a schematic diagram of a process of determining a first collection period provided by an exemplary embodiment of the present application;
图10是本申请一示例性实施例提供的获取语音问答指令对应的问答结果的流程图;Fig. 10 is a flow chart of obtaining the question and answer results corresponding to voice question and answer instructions provided by an exemplary embodiment of the present application;
图11是本申请实施例提供的问答分析算法的示意图;FIG. 11 is a schematic diagram of a question-and-answer analysis algorithm provided by an embodiment of the present application;
图12是本申请一示例性实施例提供的一种对外部环境信息进行特征提取,得到外部环境特征的方式的流程图;Fig. 12 is a flowchart of a method for extracting features of external environment information to obtain external environment features provided by an exemplary embodiment of the present application;
图13是本申请一个示例性实施例示出的观察者视角与拍摄视角间差异的示意图;Fig. 13 is a schematic diagram of the difference between the observer's angle of view and the shooting angle of view shown in an exemplary embodiment of the present application;
图14是本申请一示例性实施例提供的语音问答应用场景的示意图;Fig. 14 is a schematic diagram of a voice question answering application scenario provided by an exemplary embodiment of the present application;
图15是本申请一示例性实施例提出的车载终端状态转移的示意图;Fig. 15 is a schematic diagram of the state transition of the vehicle terminal proposed by an exemplary embodiment of the present application;
图16是本申请一示例性实施例提供的驾驶场景中的语音问答装置的结构框图;Fig. 16 is a structural block diagram of a voice question answering device in a driving scene provided by an exemplary embodiment of the present application;
图17是本申请一示例性实施例提供的车载终端的结构方框图。Fig. 17 is a structural block diagram of a vehicle-mounted terminal provided by an exemplary embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.
图1是本申请一个示例性实施例提供的实施环境的示意图,该实施环境可以包括:载具110、车载终端120以及服务器130。FIG. 1 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application. The implementation environment may include: a
载具110可以是车辆、船只、飞行器等,下述实施例均以车辆为例进行说明,但并不对此构成限定。The
载具110的外部设有环境信息采集组件,该环境信息采集组件可以包括图像采集组件140以及音频采集组件150。其中,图像采集组件用于采集外部环境影像,具体指能够被人眼看见的景物,例如,外部环境中的建筑、车辆等。音频采集组件用于采集外部环境音频,具体指能够为人耳听到的声音,例如,外部环境中的鸣笛声、鸟叫声等。An environmental information collection component is provided outside the
本申请实施例中,车载终端120设置在载具110中,车载终端120可以是车机121,也可以是与车机建立通信连接的移动终端122,例如,智能手机、笔记本电脑、可穿戴式设备等电子设备,图1中以移动终端122为智能手机为例进行说明。车机121与移动终端122间的通信连接可通过有线或无线方式建立,例如,蓝牙连接、通用串行总线(Universal SerialBus,USB)、无线保真(Wireless Fidelity,WiFi)连接或移动数据网络连接等等,本实施例对此不作限定。In the embodiment of the present application, the vehicle-mounted
车载终端120用于对外部环境信息以及用户语音指令进行处理,具体为从外部环境信息中提取目标外部环境信息,将语音问答指令转换成对应的语音问答文本,基于目标外部环境信息和语音问答指令对应的语音问答文本生成问答结果。The vehicle-mounted
本申请实施例中,车载终端120具有与服务器130进行数据通信的功能,以无线通信的方式建立连接,进而通过该连接进行数据通信。该通信连接可以为无线保真连接或移动数据网络连接等等,本申请实施例对此不作限定。In the embodiment of the present application, the vehicle-mounted
本申请实施例中,车载终端基于语音问答指令和外部环境信息生成问答结果时,可以通过本地车载终端的车机或移动终端进行处理,也可以借助服务器130生成问答结果。In the embodiment of the present application, when the vehicle-mounted terminal generates a question-and-answer result based on the voice question-and-answer instruction and external environment information, it may be processed by the local vehicle-mounted terminal or a mobile terminal, or the server 130 may be used to generate the question-and-answer result.
需要说明的是,车载终端的语音问答程序被唤醒后才能执行后续步骤,该唤醒指令预先设定,本申请实施例中的步骤是在车载终端语音问答程序被唤醒后执行的,本申请实施例对于唤醒语音问答程序的方式不作限定。It should be noted that the follow-up steps can only be performed after the voice question-answer program of the vehicle-mounted terminal is awakened. The wake-up instruction is preset. There is no limitation on the way of waking up the voice question answering program.
示意性的,如图1所示,载具110的前后左右四个方向均设有环境信息采集组件,每个环境信息采集组件均包括图像采集组件140以及音频采集组件150,图像采集组件140可以是车载外置感知摄像头、行车记录仪等,音频采集组件150可以是麦克风等。Schematically, as shown in FIG. 1 , environmental information collection components are provided in the four directions of front, rear, left, and right sides of the
图像采集组件140与音频采集组件150共同作用,每个图像采集组件140和音频采集组件150均与车载终端120进行连接,用于获取外部环境信息。The
可选的,图像采集组件中设置辅助成像组件用于辅助成像,辅助成像组件可以是毫米雷达或红外成像仪器等。Optionally, an auxiliary imaging component is set in the image acquisition component for auxiliary imaging, and the auxiliary imaging component may be a millimeter radar or an infrared imaging instrument.
可选的,图像采集组件与音频采集组件通过低电压差分信号(Low-VoltageDifferential Signaling,LVDS)、复合视频广播信号(Composite Video BroadcastSignal,CVBS)、控制器局域网络(Controller Area Network,CAN)、局域互联网络(LocalInterconnect Network,LIN)等方式与车载终端120建立连接,相应的车载终端120能够通过该连接读取环境采集组件获取的外部环境信息。Optionally, the image acquisition component and the audio acquisition component pass through Low-Voltage Differential Signaling (Low-Voltage Differential Signaling, LVDS), Composite Video Broadcast Signal (Composite Video Broadcast Signal, CVBS), Controller Area Network (Controller Area Network, CAN), LAN A connection is established with the vehicle-mounted
此外,车载终端120还设置有语音播报组件以及图像显示组件,用于播报或展示语音问答指令对应的问答结果。In addition, the vehicle-mounted
示意性的,如图1所示,行驶过程中,图像采集组件140和音频采集组件150采集外部环境信息并将采集到的外部环境信息缓存在车载终端内置存储器中。车辆中的乘客看到车窗外的景物,对所见的某个景物发出语音问答指令,车载终端接收到该语音问答指令后,从车载终端内置存储器中提取缓存的外部环境信息,通过车载终端120或服务器130进行数据处理,获取语音问答结果,并将该语音问答结果进行播报。Schematically, as shown in FIG. 1 , during driving, the
在一个示意性的例子中,语音问答系统的主要组成部分如图2所示,主要由车载终端210、环境信息采集组件220、语音采集组件250、语音播报组件260、图像显示组件270、近场分析组件240以及远端信息分析组件230构成。其中远端信息分析组件230主要为与车载终端建立通信连接的远端服务器,近场信息分析组件240主要为与车载终端建立连接的近场移动终端。In a schematic example, the main components of the voice question answering system are shown in FIG. The analysis component 240 and the remote
图2中的箭头方向表示信息流动方向,环境信息采集组件220实时采集车辆外部环境,并将外部环境信息数据发送给车载终端210进行缓存。语音采集组件250采集到用户发出的语音问答指令后,将该语音问答指令发送给车载终端210,车载终端210随即从缓存的外部环境信息中提取一定时长的外部环境信息。随后车载终端210对外部环境信息和语音问答指令进行计算分析,或者,车载终端210将提取的外部环境信息和语音问答指令发送给近场信息分析组件240或远端信息分析组件230进行计算分析,再将生成的问答结果返回给车载终端210,其中,近场信息分析装组件240也可以和远端信息分析组件230联合分布运算。在接收到问答结果后,车载终端210对结果进行一定处理后,分别向图像显示组件270与语音播报组件260发送语音问答结果,并由图像显示组件270与语音播报组件260向用户反馈问答结果。The direction of the arrow in FIG. 2 indicates the direction of information flow. The environment information collection component 220 collects the external environment of the vehicle in real time, and sends the external environment information data to the vehicle terminal 210 for buffering. After the
图3是本申请一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图,本实施例以该方法用于图1所示的车载终端为例进行说明。该方法包括以下步骤:Fig. 3 is a method flowchart of a voice question-and-answer method in a driving scene provided by an exemplary embodiment of the present application. In this embodiment, the method is used in the vehicle-mounted terminal shown in Fig. 1 as an example for illustration. The method includes the following steps:
步骤301,在接收到语音问答指令的情况下,获取外部环境信息。
外部环境信息由环境信息采集组件在车辆行驶过程中采集得到,且外部环境信息用于表征车辆所处的外部环境。The external environment information is collected by the environment information collection component during the driving of the vehicle, and the external environment information is used to characterize the external environment where the vehicle is located.
在一种可能的实施方式中,外部环境信息至少可以从两个维度表征车辆所处的外部环境,至少两个维度可以包括图像维度和声音维度。In a possible implementation manner, the external environment information may characterize the external environment of the vehicle from at least two dimensions, and the at least two dimensions may include an image dimension and a sound dimension.
在步骤302中,基于外部环境信息以及语音问答指令,获取与语音问答指令对应的问答结果。In
基于外部环境信息与语音问答指令是指,车载终端根据用户提出的语音问答指令,在缓存的外部环境信息数据中选取与语音问答指令相对应的外部环境信息,再对该外部环境信息和语音问答指令进行处理,得到与语音问答指令相对应的问答结果。Based on external environment information and voice question-and-answer instructions means that the vehicle-mounted terminal selects the external environment information corresponding to the voice question-and-answer instruction from the cached external environment information data according to the voice question-and-answer instruction proposed by the user, and then responds to the external environment information and voice question-and-answer instruction. The instruction is processed, and the question and answer result corresponding to the voice question and answer instruction is obtained.
可选的,可以通过本地车载终端即车机或移动终端获取与语音问答指令相对应的问答结果,也可以通过服务器进行数据处理后获取问答结果。Optionally, the question-and-answer results corresponding to the voice question-and-answer instructions can be obtained through the local vehicle-mounted terminal, that is, the vehicle machine or mobile terminal, or the question-and-answer results can be obtained after data processing by the server.
在步骤303中,基于问答结果进行语音播报。In
车载终端在得出问答结果后,会根据预设的语言播报模板,将分析结果自动地填充到播报模板中,语言播报模板中会带有更符合人类交互语言的礼貌用语以及一些安全驾驶提示用语。After getting the question and answer results, the vehicle terminal will automatically fill the analysis results into the broadcast template according to the preset language broadcast template. The language broadcast template will contain polite words that are more in line with the human interaction language and some safe driving reminders .
在一些实施例中,车载终端获取语音问答指令对应的问答结果的同时,也会提取当前车辆所在的位置信息,在预置的导航地图知识图谱中,查询当前位置可能存在的额外信息,额外信息如附近的地标、附近的餐饮、加油站等。可在进行结果播报时与问答结果一同填充到播报模板中,进行语音播报。In some embodiments, while the vehicle-mounted terminal obtains the question-and-answer results corresponding to the voice question-and-answer instructions, it also extracts the location information of the current vehicle, and searches for additional information that may exist at the current location in the preset navigation map knowledge map. Such as nearby landmarks, nearby restaurants, gas stations, etc. When the result is broadcast, it can be filled into the broadcast template together with the question and answer results for voice broadcast.
综上所述,本申请实施例中提供的驾驶场景中的语音问答方法,车载终端通过接收到的语音问答指令获取外部环境信息,并根据获取到的外部环境信息以及语音问答指令生成对应的问答结果,并进行播报。解决了问答系统不能在驾驶过程中根据当前驾驶环境进行交互的问题,达到了驾驶场景下用户可见即可问可答的效果。To sum up, in the voice question-and-answer method in the driving scene provided in the embodiment of the present application, the vehicle-mounted terminal acquires external environment information through the received voice question-and-answer instruction, and generates a corresponding question-and-answer according to the acquired external environment information and the voice question-and-answer instruction results and broadcast. It solves the problem that the question answering system cannot interact according to the current driving environment during driving, and achieves the effect that users can ask and answer questions visible in the driving scene.
车载终端的语音问答程序被唤醒后,需要对用户发出的语音指令进行分类,根据用户不同需求车载终端实现不同的功能,在判断出用户发出的语音指令是语音问答指令的情况下,车载终端执行本申请实施例后续步骤。After the voice question-and-answer program of the vehicle-mounted terminal is awakened, it is necessary to classify the voice commands issued by the user, and the vehicle-mounted terminal implements different functions according to different needs of the user. This application implements subsequent steps.
图4是本申请另一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图。该方法包括以下步骤:Fig. 4 is a method flowchart of a voice question answering method in a driving scene provided by another exemplary embodiment of the present application. The method includes the following steps:
步骤401,在接收到语音指令的情况下,对语音指令进行指令类型识别。
语音指令类型从功能上分为两类,语音问答指令和非语音问答指令,非语音问答指令包括设备控制指令和导航指令等,其中设备控制指令用于通过语音交互,调节车辆内部搭载的设备,如空调、车载电视、车载音响等,对设备的工作状态进行调节,导航指令用于通过人机交互,实现语音控制车载导航系统启动。The types of voice commands are divided into two types in terms of function, voice question and answer commands and non-voice question and answer commands. Non-voice question and answer commands include device control commands and navigation commands, among which device control commands are used to adjust the equipment carried inside the vehicle through voice interaction. Such as air conditioners, car TVs, car audio, etc., to adjust the working status of the equipment, and navigation instructions are used to realize voice control of the car navigation system through human-computer interaction.
语音问答指令即为提问式指令,例如,“今天会下雨吗?”、“外面的温度是多少?”和“那辆白色的车是什么车?”等都属于语音问答指令。本申请实施例中的后续步骤均是在语音问答指令针对车辆外部环境的情况下执行的,对于常规语音问答指令,车载终端可通过网络数据库搜索等方式得到答案,再进行语音播报,在此不作赘述。Voice question-and-answer instructions are question-and-answer instructions, for example, "Will it rain today?", "What's the temperature outside?" and "What is that white car?" are all voice question-and-answer instructions. Subsequent steps in the embodiment of the present application are all performed when the voice question-and-answer instruction is directed at the external environment of the vehicle. For conventional voice question-and-answer instructions, the vehicle-mounted terminal can obtain the answer through a network database search, etc., and then perform voice broadcast, which will not be described here. repeat.
在一种可能的方式中,语音指令类型识别可以通过指令分类模型实现。预先训练指令分类模型,该模型用于计算输入的语音指令的指令类型概率。车载终端将语音指令对应的语音文本输入指令分类模型中,可得到输出结果,该输出结果用于表示指令类型的概率,将概率高于阈值的指令类型确定为最终的指令类型。其中,指令类型包括设备控制指令、导航指令、语音问答指令等。该指令分类模型可以是基于大量样本指令以及对应的指令标签(指示指令类型)训练得到。In a possible manner, the speech instruction type recognition can be implemented through an instruction classification model. A pre-trained instruction classification model that is used to compute instruction type probabilities for input speech instructions. The vehicle-mounted terminal inputs the voice text corresponding to the voice command into the command classification model to obtain an output result, which is used to represent the probability of the command type, and the command type whose probability is higher than the threshold is determined as the final command type. The instruction types include device control instructions, navigation instructions, voice question and answer instructions, and the like. The instruction classification model may be trained based on a large number of sample instructions and corresponding instruction labels (indicating instruction types).
在另一种可能的方式中,车载终端可以基于关键词识别的方式判断该语音指令对应的指令类型。例如,语音指令中包含“空调温度”和“音箱音量”等关键词,该语音指令有较大可能属于设备控制指令,在语音指令中包含“是什么”等带有疑问关键词的情况下,该语音指令有较大可能属于语音问答指令。In another possible manner, the vehicle-mounted terminal may determine the instruction type corresponding to the voice instruction based on keyword recognition. For example, if the voice command contains keywords such as "air conditioner temperature" and "speaker volume", the voice command is likely to be a device control command. The voice command is likely to be a voice question and answer command.
本申请实施例并不对具体的语音指令分类的方式进行限定。The embodiment of the present application does not limit the specific manner of classifying voice instructions.
步骤402,在该语音指令的指令类型为问答指令的情况下,确定接收到语音问答指令,并获取外部环境信息。
车载终端获取到语音问答指令后,提取外部环境信息,执行本实施例后续分析语音问答指令的步骤。在该语音指令为非语音问答指令的情况下,车载终端不执行本申请实施例中的后续步骤,执行该语音指令对应的程序。非语音问答指令可能是调节车载设备的参数指令或导航指令,此时车载终端也执行相应的设备调节程序或导航程序。例如,用户发出指令:“调高空调温度”,此时为非问答指令,则不执行后续步骤,车载终端控制车载空调,调高温度。After the vehicle-mounted terminal acquires the voice question-and-answer command, it extracts the external environment information, and performs the subsequent step of analyzing the voice question-and-answer command in this embodiment. In the case that the voice command is a non-voice question-and-answer command, the vehicle-mounted terminal does not execute the subsequent steps in the embodiment of the present application, but executes the program corresponding to the voice command. The non-voice question-and-answer instruction may be a parameter instruction or a navigation instruction for adjusting the vehicle-mounted device. At this time, the vehicle-mounted terminal also executes a corresponding device adjustment program or navigation program. For example, if the user issues an instruction: "increase the temperature of the air conditioner", which is a non-question-and-answer instruction at this time, the subsequent steps will not be executed, and the vehicle-mounted terminal controls the vehicle-mounted air conditioner to increase the temperature.
步骤403,基于外部环境信息以及语音问答指令,获取与语音问答指令对应的问答结果。
本步骤的实施方式可以参考上述步骤302,本实施例在此不作赘述。For the implementation manner of this step, reference may be made to the foregoing
步骤404,基于问答结果进行语音播报。
本步骤的实施方式可以参考上述步骤303,本实施例在此不作赘述。For the implementation manner of this step, reference may be made to the foregoing
综上所述,在现实场景中,车载终端对用户发出的指令类型进行判断,确定该指令为语音问答指令后再执行后续步骤,可以避免车载终端结合外部环境信息对非语音问答指令进行处理,造成处理资源的浪费。To sum up, in the real scene, the vehicle-mounted terminal judges the type of instruction issued by the user, determines that the instruction is a voice question-and-answer instruction, and then performs the subsequent steps, which can prevent the vehicle-mounted terminal from processing non-voice question-and-answer instructions in combination with external environmental information. cause waste of processing resources.
在本申请实施例中,车载终端是基于外部环境信息以及语音指令,获取语音问答指令对应的问答结果,其中,外部环境信息是包括多个维度或一段时间内的全部信息,但语音问答指令可能仅仅是针对其中某一维度或其中某个特定的时间段提出的,如果基于所有的信息内容获取问答结果,不仅会造成车载终端进行不必要的数据处理,还会影响问答结果的准确性,因此,需要先对外部环境信息进行筛选。In the embodiment of the present application, the vehicle-mounted terminal obtains the question-and-answer results corresponding to the voice question-and-answer instruction based on external environment information and voice instructions, wherein the external environment information includes all information in multiple dimensions or within a period of time, but the voice question-and-answer instruction may It is only proposed for one of the dimensions or a specific time period. If the question and answer results are obtained based on all the information content, it will not only cause unnecessary data processing by the vehicle terminal, but also affect the accuracy of the question and answer results. Therefore, , it is necessary to screen the external environment information first.
在一种可能的实施方式中,车载终端基于语音问答指令,从外部环境信息中提取目标外部环境信息,该目标外部环境信息与语音问答指令的相关性高于其他外部环境信息与语音问答指令的相关性。In a possible implementation manner, the vehicle-mounted terminal extracts target external environment information from external environment information based on the voice question-and-answer instruction, and the correlation between the target external environment information and the voice question-and-answer instruction is higher than that of other external environment information and the voice question-answer instruction. Correlation.
其中,相关性可以包括维度相关性或时间相关性中的至少一种。相应的,目标外部环境信息可以是特定维度的信息,或者,特定时段采集的信息。Wherein, the correlation may include at least one of dimension correlation or time correlation. Correspondingly, the target external environment information may be information of a specific dimension, or information collected in a specific period of time.
因此,可以分别从识别问题维度以及判断特定时段两个方面从外部环境信息中提取目标外部环境信息。下面将通过两个示例性实施例分别对这两种提取目标外部环境信息的方式进行说明。Therefore, the target external environment information can be extracted from the external environment information from the two aspects of identifying the problem dimension and judging the specific time period. The two manners of extracting target external environment information will be described respectively below through two exemplary embodiments.
图5是本申请另一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图。该方法包括以下步骤:Fig. 5 is a method flowchart of a voice question answering method in a driving scene provided by another exemplary embodiment of the present application. The method includes the following steps:
步骤501,在接收到语音问答指令的情况下,获取外部环境信息。
外部环境信息中包含多个维度的信息,其中,多个维度至少包括图像维度和声音维度。图像维度对应于外部环境信息中的外部环境影像,声音维度对应于外部环境信息中的外部环境音频。The external environment information includes information of multiple dimensions, wherein the multiple dimensions include at least an image dimension and a sound dimension. The image dimension corresponds to the external environment image in the external environment information, and the sound dimension corresponds to the external environment audio in the external environment information.
下面对获取外部环境信息的过程进行说明。如图6所示,在一种可能的实施方式中,首先,车辆外置的图像采集组件对外部环境进行拍摄,再将拍摄到的内容通过成像处理得到视频图像,其中,图像采集组件包括车载的外置感知摄像头或行车记录仪等摄影设备。The process of obtaining external environment information will be described below. As shown in Figure 6, in a possible implementation, firstly, the external image acquisition component of the vehicle shoots the external environment, and then obtains a video image through imaging processing of the captured content, wherein the image acquisition component includes a vehicle-mounted Photographic equipment such as external perception cameras or driving recorders.
可选的,在获取外部环境信息时采用辅助成像设备进行辅助成像处理,如毫米波雷达或红外成像仪等,可以根据辅助成像设备采集的非可见光波段图像对图像帧进行正畸处理,使影像中的位置信息更具有准确性,最终获取的问答结果也更加准确。例如,在前方有多个车辆的情况下,用户针对其中某一车辆进行提问,仅凭借摄像头或行车记录仪拍摄的影像,很难准确定位到用户提问的目标车辆,加入辅助成像设备后,能够根据距离、方位等因素进一步确认用户提问的目标车辆。即可满足用户有针对性的提问,例如“前方第二辆车是什么?”此类问题。Optionally, auxiliary imaging equipment is used for auxiliary imaging processing when acquiring external environmental information, such as millimeter-wave radar or infrared imager, etc., orthodontic processing can be performed on the image frame according to the non-visible light band images collected by the auxiliary imaging equipment, so that the image The location information in is more accurate, and the final question and answer results are also more accurate. For example, when there are multiple vehicles in front of the user, if the user asks a question about one of the vehicles, it is difficult to accurately locate the target vehicle that the user is asking only by relying on the images captured by the camera or driving recorder. After adding auxiliary imaging equipment, it can Further confirm the target vehicle asked by the user according to factors such as distance and orientation. It can meet the user's targeted questions, such as "what is the second car in front?" Such questions.
图像采集组件采集到外部环境影像后,车载终端将其进行缓存,以备调用。由于影像信息占用存储空间较大,所以缓存时间不能过长,另一方面,在车辆行驶过程中用户会根据车辆实时所处环境进行提问,所以对影像缓存的时间设定在两分钟以内即可。当影像缓存时长达到预设时长后,将时间最久远的图像帧删除,再写入最新的图像帧。After the image acquisition component acquires the external environment image, the vehicle terminal caches it for invocation. Since the image information occupies a large storage space, the cache time should not be too long. On the other hand, the user will ask questions according to the real-time environment of the vehicle during the driving process, so the image cache time can be set within two minutes. . When the image cache duration reaches the preset duration, the oldest image frame is deleted, and then the latest image frame is written.
车载终端从摄像设备中按照固定帧率读取图像帧,读取到图像帧后,使用图像滤波算法对每一帧图像作快速的噪点消除处理,如果使用辅助成像设备,也会对辅助成像设备采集到的非可见光波段图像对图像帧进行正畸处理。The vehicle-mounted terminal reads the image frames from the camera equipment at a fixed frame rate. After reading the image frames, it uses the image filtering algorithm to quickly eliminate the noise of each frame of the image. If the auxiliary imaging equipment is used, the auxiliary imaging equipment will also be processed The collected non-visible light band images are subjected to orthodontic processing on the image frames.
可选的,帧率大小一般设置为20fps,即每秒读取20帧图像,也可以根据拍摄设备、应用场景不同进行调整,本申请实施例对此不作限定。Optionally, the frame rate is generally set to 20fps, that is, 20 frames of images are read per second, and it can also be adjusted according to different shooting devices and application scenarios, which is not limited in this embodiment of the present application.
外部环境音频的存储方法与图像缓存方法类似,也需要通过降噪手段对采集到的音频进行降噪处理或通过其他算法处理后再进行缓存,在此不作赘述。The storage method of the external environment audio is similar to the image caching method, and the collected audio also needs to be denoised by means of noise reduction or processed by other algorithms before caching, which will not be repeated here.
步骤502,对语音问答指令对应的语音问答文本进行问题维度识别,得到语音问答文本对应的问题维度,该问题维度包括图像维度和声音维度中的至少一种。Step 502: Perform question dimension recognition on the voice question and answer text corresponding to the voice question and answer instruction to obtain the question dimension corresponding to the voice question and answer text, and the question dimension includes at least one of image dimension and sound dimension.
其中,语音问答指令对应的问答文本是车载终端通过对语音问答指令依次执行波束成形算法、前端信号处理以及ASR(Automatic Speech Recognition,自动语音识别)算法得到的,如图7所示。Among them, the question and answer text corresponding to the voice question and answer command is obtained by the vehicle-mounted terminal by sequentially performing beamforming algorithm, front-end signal processing and ASR (Automatic Speech Recognition, automatic speech recognition) algorithm on the voice question and answer command, as shown in FIG. 7 .
可选的,前端信号处理采用ANC(Active Noise Cancellation,主动噪声消除)算法,用于消除环境噪音;AEC(Acoutic Echo Cancellation,声学回声消除)算法,用于消除车载终端播报的语音回声;AGC(Automatic Gain Control)算法,用于调整语音信号的幅值范围使得处理后输出的信号幅值平稳。Optionally, the front-end signal processing adopts the ANC (Active Noise Cancellation, active noise cancellation) algorithm to eliminate environmental noise; the AEC (Acoutic Echo Cancellation, acoustic echo cancellation) algorithm is used to eliminate the voice echo broadcast by the vehicle terminal; AGC ( Automatic Gain Control) algorithm, which is used to adjust the amplitude range of the speech signal so that the output signal amplitude after processing is stable.
在一些使用场景下,可能存在ASR识别结果为空的情况,此时车载终端不进行后续步骤,等待一段时间,返回待机状态。In some usage scenarios, there may be a situation where the ASR recognition result is empty. At this time, the vehicle-mounted terminal does not perform subsequent steps, waits for a period of time, and returns to the standby state.
车载终端得到语音问答指令对应的语音问答文本后,对其进行问题维度识别,识别出问题维度后,基于该问题维度以及外部环境信息的类型,从外部环境信息中提取相应目标外部环境信息。After the vehicle-mounted terminal obtains the voice question-and-answer text corresponding to the voice question-and-answer command, it identifies the question dimension. After identifying the question dimension, it extracts the corresponding target external environment information from the external environment information based on the question dimension and the type of external environment information.
在一种可能的方式中,预先训练问题分类模型,该问题分类模型用于计算表示问题维度的概率。车载终端将语音问答指令对应的语音问答文本输入问题分类模型中,可得到输出结果,该输出结果用于表示问题维度的概率,将概率高于阈值的维度确定为最终的问题维度。该问题分类模型可以是基于大量样本问题以及对应的问题标签(指示问题维度)训练得到。In one possible manner, a question classification model is pre-trained, which is used to calculate probabilities representing question dimensions. The vehicle-mounted terminal inputs the voice question-and-answer text corresponding to the voice question-and-answer command into the question classification model to obtain an output result, which is used to represent the probability of the question dimension, and the dimension whose probability is higher than the threshold is determined as the final question dimension. The question classification model can be trained based on a large number of sample questions and corresponding question labels (indicating question dimensions).
在另一种可能的方式中,车载终端可以基于关键词匹配的方式判断该语音问答指令对应的问题维度(图像相关的关键词,声音相关的关键词)。颜色、形状等能表征景物外观特征的词均可以作为图像相关的关键字,例如,红色、绿色、球型、最大的等。声音、拟声词等表征声音的词均可作为声音相关的关键字,例如,鸟鸣声,嘟嘟声等。In another possible manner, the vehicle-mounted terminal may determine the question dimension (image-related keywords, sound-related keywords) corresponding to the voice question-and-answer instruction based on keyword matching. Words that can characterize the appearance of the scene such as color and shape can be used as keywords related to the image, for example, red, green, spherical, the largest, etc. Sound, onomatopoeia and other sound-characterizing words can be used as sound-related keywords, for example, birdsong, beep, etc.
本申请实施例并不对具体的问题维度识别方式进行限定。The embodiment of the present application does not limit the specific problem dimension identification method.
步骤503,在问题维度为图像维度的情况下,从外部环境信息中提取外部环境影像作为目标外部环境信息。
其中,图像维度是指用户从影像的角度进行提问,用户对于形状、颜色以及体积大小等可由人眼观察到的信息描述,均属于图像维度。例如“那座H型的建筑是什么?”。显然,用户发出的语音问答指令是对提问目标的外形描述,因此该问题属于图像维度,车载终端提取外部环境信息中的外部环境影像作为目标外部环境信息。Among them, the image dimension means that the user asks questions from the perspective of the image, and the user's description of information such as shape, color, and volume that can be observed by the human eye belongs to the image dimension. For example, "What is that H-shaped building?". Obviously, the voice question and answer command issued by the user is a description of the appearance of the question target, so the question belongs to the image dimension, and the vehicle terminal extracts the external environment image from the external environment information as the target external environment information.
步骤504,在问题维度为声音维度的情况下,从外部环境信息中提取外部环境音频作为目标外部环境信息。
其中,声音维度是指用户从声音的角度进行提问,用户对于声音大小、声音特点和有无声音等能通过人耳捕捉到的描述都属于声音维度。例如,“是哪种鸟在鸣叫?”显然,该语音问答指令针对的对象为外部环境中的声音,因此,车载终端提取外部环境信息中的音频作为目标外部环境信息。Among them, the sound dimension means that the user asks questions from the perspective of sound, and the user's description of the volume, sound characteristics, and presence or absence of sound that can be captured by the human ear belongs to the sound dimension. For example, "What kind of bird is singing?" Obviously, the object of the voice question and answer instruction is the sound in the external environment. Therefore, the vehicle-mounted terminal extracts the audio from the external environment information as the target external environment information.
在一种可能的实施方式中,问题维度既包含图像维度,又包含声音维度。此时同时提取外部环境信息中的外部环境影像以及外部环境音频作为目标外部环境信息。例如,语音问答指令为“现在正在鸣笛的是哪辆车?”,显然该语音问答指令的针对的对象既包含外部环境中的声音,又包含外部环境中的图像,因此,此时车载终端需要同时提取外部环境影像和外部环境音频作为目标外部环境信息。In a possible implementation manner, the question dimension includes both the image dimension and the sound dimension. At this time, the external environment image and the external environment audio in the external environment information are simultaneously extracted as the target external environment information. For example, the voice question-and-answer command is "Which car is honking now?" Obviously, the target object of the voice question-and-answer command includes both the sound in the external environment and the image in the external environment. Therefore, at this time, the vehicle-mounted terminal It is necessary to simultaneously extract the external environment image and the external environment audio as the target external environment information.
步骤505,基于目标外部环境信息以及语音问答指令,获取语音问答指令对应的问答结果。Step 505, based on the target external environment information and the voice question and answer instruction, obtain the question and answer result corresponding to the voice question and answer instruction.
车载终端在提取出目标外部环境信息后,对目标外部环境信息以及语音问答指令分析处理,得到与语音问答指令相对应的问答结果。After the vehicle-mounted terminal extracts the target external environment information, it analyzes and processes the target external environment information and the voice question-and-answer command, and obtains a question-and-answer result corresponding to the voice question-and-answer command.
在目标外部环境信息为外部环境影像的情况下,车载终端对外部环境影像以及语音问答指令进行数据处理,得到问答结果;在目标外部环境信息为外部环境音频的情况下,车载终端对外部环境音频以及语音问答指令进行数据处理,得到问答结果;在目标外部环境信息同时包含外部环境影像和外部环境音频的情况下,车载终端对外部环境影像、外部环境音频以及语音问答指令进行数据处理,得到与语音问答指令对应的问答结果。When the target external environment information is the external environment image, the vehicle-mounted terminal performs data processing on the external environment image and the voice question-and-answer command to obtain the question-and-answer result; and voice question-and-answer instructions for data processing to obtain question-and-answer results; when the target external environment information includes external environment images and external environment audio at the same time, the vehicle-mounted terminal performs data processing on the external environment images, external environment audio, and voice question-and-answer instructions to obtain The question and answer result corresponding to the voice question and answer instruction.
在步骤504中,已经提取得到目标外部环境信息,在步骤505中车载终端需要分析的外部环境信息范围减小,进而使得分析外部环境信息以及语音指令时的运算量降低。In
步骤506,在外部环境信息包括外部环境影像的情况下,确定外部环境影像中问答结果对应的关联图像帧,对关联图像帧进行展示。
在一些应用场景下,针对用户发出的语音问答指令车载终端只进行问答结果的语音播报很难使用户直观的理解问答结果,例如,用户发出的语音问答指令为“刚才的路牌经过哪里?”,此类语音问答指令的问答结果仅通过语音播报很难使用户获取到足够多的信息量,因此,采用关联图像帧展示的方式,能够使用户更直观的获取语音问答指令对应的问答结果,获取到的信息量更多。In some application scenarios, it is difficult for the user to intuitively understand the question-and-answer results when the vehicle-mounted terminal only broadcasts the question-and-answer results for the voice question-and-answer instructions issued by the user. It is difficult for users to obtain enough information about the question and answer results of such voice question and answer instructions only through voice broadcast. More information is received.
可选的,基于语音问答指令对应的问答结果,确定问答结果所指示的目标对象在影像中所处的图像帧,然后从该图像帧之前和之后的若干帧中,选取图像质量最佳一帧作为关联图像帧。Optionally, based on the question and answer result corresponding to the voice question and answer instruction, determine the image frame where the target object indicated by the question and answer result is located in the video, and then select a frame with the best image quality from several frames before and after the image frame as the associated image frame.
例如,用户发出语音问答指令“刚刚经过的路牌指向哪里?”,车载终端执行本申请实施例中步骤后得到问答结果为“刚刚的路牌指向百货大厦、美食街和中央公园”,并在车载显示屏幕上展示图像采集组件采集到的路牌的图像,能够使用户更加直观的看到路牌所指向的各个地点及其所指方位,相较于语音播报获取到更多的信息量。For example, the user issues a voice question-and-answer command "Where does the street sign that just passed point to?", and the vehicle-mounted terminal executes the steps in the embodiment of the application to obtain a question-and-answer result that is "the street sign just now points to the department store, food street and Central Park", and displays it on the vehicle Displaying the image of the road sign collected by the image acquisition component on the screen can enable the user to more intuitively see the various places pointed to by the road sign and the directions they point to, and obtain more information than the voice broadcast.
步骤507,基于问答结果进行语音播报。
本步骤的实施方式可以参考上述步骤303,本实施例在此不作赘述。For the implementation manner of this step, reference may be made to the foregoing
综上所述,本实施例提供的驾驶场景下的问答方法,通过对问题维度识别,再基于问题维度提取目标外部环境信息的方式,使得车载终端能够有针对性的从外部环境信息中提取部分数据作为目标外部环境信息,减少了车载终端对于外部环境信息进行数据处理的压力,提高了车载终端处理问题的效率。To sum up, the question-and-answer method in the driving scene provided by this embodiment, by identifying the dimension of the question, and then extracting the target external environment information based on the dimension of the question, enables the vehicle-mounted terminal to extract part of the external environment information in a targeted manner. The data is used as the target external environment information, which reduces the pressure on the vehicle-mounted terminal to process data on the external environment information, and improves the efficiency of the vehicle-mounted terminal to deal with problems.
此外,本实施例中提供的一种对关联图像帧进行展示的方式,使用户不仅能够通过听语音播报得知问答结果,还能够通过可视化的方式得知问答结果,进一步确保了问答结果的可靠性和准确性。In addition, the method of displaying associated image frames provided in this embodiment enables users to know the question and answer results not only by listening to the voice broadcast, but also in a visual way, which further ensures the reliability of the question and answer results sex and accuracy.
图8是本申请又一示例性实施例提供的驾驶场景中的语音问答方法的方法流程图,该方法包括以下步骤:Fig. 8 is a method flowchart of a voice question answering method in a driving scene provided by another exemplary embodiment of the present application, the method includes the following steps:
步骤801,在接收到语音问答指令的情况下,对语音指令进行时间关键字识别。
步骤802,在识别出语音问答指令对应的语音问答文本中包含时间关键字的情况下,基于该时间关键字以及接收时刻,确定第一采集时段。In
车载终端接收到语音问答指令后,需要对外部环境信息和语音问答指令进行分析处理,由于图像和音频的数据量较大,对整个预设缓存时长内的外部环境信息进行数据处理的开销也很大。因此,车载终端可以先对语音问答指令对应的语音问答文本进行时间关键字识别,再根据识别到的时间关键字确定出一个特定的时间段,再对特定时间段内的外部环境信息进行数据分析处理,很大程度上降低了运算量,减小了开销。After the vehicle-mounted terminal receives the voice question and answer command, it needs to analyze and process the external environment information and the voice question and answer command. Due to the large amount of image and audio data, the overhead of data processing the external environment information within the entire preset cache time is also very high. big. Therefore, the vehicle-mounted terminal can first perform time keyword recognition on the voice question and answer text corresponding to the voice question and answer command, and then determine a specific time period according to the recognized time keyword, and then perform data analysis on the external environment information within the specific time period Processing greatly reduces the amount of computation and overhead.
车载终端接收到语音问答指令的情况下,对语音问答指令对应的问答文本进行时间关键字识别,例如“刚刚”、“五秒钟前”、“一分钟内”等都属于时间关键字。识别出语音问答指令包含的时间关键字后,将车载终端接收到这条语音问答指令的时刻减去时间关键字所表述的时长后得到的时刻,作为第一采集时段的开始时刻,从该开始时刻到接收这条语音问答指令的时间作为第一采集时段。When the vehicle-mounted terminal receives a voice question and answer instruction, it performs time keyword recognition on the question and answer text corresponding to the voice question and answer instruction, such as "just now", "five seconds ago", "within a minute", etc. all belong to the time keyword. After identifying the time keyword contained in the voice question-and-answer command, the time obtained by subtracting the time expressed by the time keyword from the moment when the vehicle-mounted terminal receives the voice question-and-answer command is used as the starting time of the first collection period. From time to time to receive this voice question and answer command as the first collection period.
如图9所示,假设t2为接收到这条语音问答指令的时刻,t1到t2的时间即为时间关键字所表述的时长,则将t1到t2的时间设定为第一采集时段。例如,车载终端在17:33接收到用户发出语音问答指令“刚过去的一分钟内我们经过了几个便利店?”,此时车载终端将17:32-17:33作为第一采集时段。As shown in Figure 9, assuming that t2 is the moment when this voice question-and-answer command is received, and the time from t1 to t2 is the duration expressed by the time keyword, then the time from t1 to t2 is set as the first A collection period. For example, at 17:33, the vehicle-mounted terminal receives a voice question-and-answer command "How many convenience stores have we passed in the past minute?", and the vehicle-mounted terminal uses 17:32-17:33 as the first collection period.
步骤803,将采集时刻位于第一采集时段的外部环境信息确定为目标外部环境信息。
车载终端确定第一采集时段后,从外部环境信息中提取第一采集时段内的外部环境信息,作为目标外部环境信息。After determining the first collection period, the vehicle-mounted terminal extracts the external environment information within the first collection period from the external environment information as the target external environment information.
例如,确定第一采集时段为17:32-17:33后车载终端从缓存的外部信息中提取17:32-17:33时段的数据,作为目标外部信息。For example, after determining that the first collection period is 17:32-17:33, the vehicle-mounted terminal extracts the data in the period of 17:32-17:33 from the cached external information as the target external information.
步骤804,在识别出语音问答指令对应的语音问答文本中不包含时间关键字的情况下,基于接收时刻确定第二采集时段。
本申请实施例应用于驾驶场景中,语音问答指令是用户基于车辆所处的实施环境提出的,通常是针对短时间内的内容提出的。因此,为了减小数据处理的压力,车载终端可以根据接收到语音问答指令的时刻,确定出一个相对较短的时间段作为第二采集时段。第二采集时段是接收语音问答指令时刻之前的一段较短时间。例如,例如车载终端在17:50:30接收到用户发出的语音问答指令“左边那栋蓝色的建筑是什么?”,此时车辆是处于行驶状态的,用户问题也是根据车辆所处的实时环境提出的,因此可以将车载终端接收到语音问答指令这一时刻以前的10秒钟,即将17:50:20-17:50:30作为第二采集时段。The embodiment of the present application is applied in a driving scene, and the voice question-and-answer instruction is proposed by the user based on the implementation environment of the vehicle, usually for content within a short period of time. Therefore, in order to reduce the pressure of data processing, the vehicle-mounted terminal may determine a relatively short period of time as the second collection period according to the moment when the voice question-and-answer instruction is received. The second collection period is a short period of time before the moment when the voice question and answer instruction is received. For example, for example, at 17:50:30, the vehicle-mounted terminal receives the voice question and answer command "What is the blue building on the left?" from the user. Therefore, the 10 seconds before the moment when the vehicle-mounted terminal receives the voice question-and-answer command, that is, 17:50:20-17:50:30, can be used as the second collection period.
步骤805,将采集时刻位于第二采集时段的外部环境信息确定为目标外部环境信息。
车载终端确定第二采集时段后,从外部环境信息中提取第二采集时段内的外部环境信息,作为目标外部环境信息。After the vehicle-mounted terminal determines the second collection period, it extracts the external environment information within the second collection period from the external environment information as the target external environment information.
例如,确定第二采集时段为17:50:20-17:50:30后车载终端从缓存的外部信息中提取17:50:20-17:50:30时段的数据,作为目标外部信息。For example, after determining that the second collection period is 17:50:20-17:50:30, the vehicle-mounted terminal extracts data from the cached external information during the period of 17:50:20-17:50:30 as the target external information.
步骤806,基于目标外部环境信息以及语音问答指令,获取语音问答指令对应的问答结果。
在一种可能的实施场景中,车载终端根据第一采集时段或第二采集时段内的目标外部环境信息没有得到与语音问答指令对应的问答结果的情况下,仍然需要对预设缓存时段内的所有外部环境信息进行处理,获取问答结果。In a possible implementation scenario, if the vehicle-mounted terminal does not obtain the question-and-answer result corresponding to the voice question-and-answer instruction according to the target external environment information in the first collection period or the second collection period, it is still necessary to All external environmental information is processed to obtain question and answer results.
步骤807,基于问答结果进行语音播报。
本步骤的实施方式可以参考上述步骤303,本实施例在此不作赘述。For the implementation manner of this step, reference may be made to the foregoing
综上所述,本实施例中,车载终端识别出语音问答指令包含时间关键字的情况下,在语音问答指令包含时间关键词的情况下,根据语音问答指令的接收时刻和时间关键词,确定第一采集时段,在不包含时间关键字的情况下,确定第二采集时段。并从外部环境信息中提取相应采集时段的外部环境信息进行分析,使得车载终端提取目标环境信息时更具有针对性,减小了车载终端数据处理的压力,提高了获得问答结果的效率。To sum up, in this embodiment, when the vehicle-mounted terminal recognizes that the voice question-and-answer instruction contains a time keyword, in the case that the voice question-and-answer instruction contains a time keyword, according to the receiving time of the voice question-and-answer instruction and the time keyword, determine If the first collection period does not contain the time keyword, the second collection period is determined. The external environment information of the corresponding collection period is extracted from the external environment information for analysis, which makes the vehicle-mounted terminal more targeted when extracting the target environment information, reduces the pressure of data processing of the vehicle-mounted terminal, and improves the efficiency of obtaining question-and-answer results.
本申请实施例是基于外部环境信息以及语音问答指令,获取语音问答指令对应的问答结果,所以,车载终端需要对目标外部环境信息以及语音问答指令进行分析。由于在行驶过程中采集的外部环境信息较多,虽然根据接收时刻提取目标外部环境信息一定程度上减小了数据处理的时间,但在多数情况下仅通过本地处理器进行图像数据处理,仍存在一定压力。所以本申请根据驾驶场景以及其他因素的不同情况,提供以下三种方式,均可以生成问答结果。In this embodiment of the present application, based on the external environment information and the voice question-and-answer command, the question-and-answer result corresponding to the voice question-and-answer command is obtained. Therefore, the vehicle-mounted terminal needs to analyze the target external environment information and the voice question-and-answer command. Due to the large amount of external environment information collected during driving, although extracting the target external environment information according to the receiving time reduces the data processing time to a certain extent, in most cases, only the local processor is used for image data processing, and there are still problems. Certain pressure. Therefore, this application provides the following three methods according to different driving scenarios and other factors, all of which can generate question and answer results.
一、车载终端基于外部环境信息以及语音问答指令,生成语音问答指令对应的问答结果。1. The vehicle terminal generates a question and answer result corresponding to the voice question and answer instruction based on the external environment information and the voice question and answer instruction.
二、在网络状态满足传输条件的情况下,车载终端将外部环境信息和语音问答指令上报至服务器,以便服务器基于外部环境信息以及语音问答指令,生成语音问答指令对应的问答结果。接收服务器下发的问答结果。2. When the network status meets the transmission conditions, the vehicle-mounted terminal reports the external environment information and the voice question-and-answer command to the server, so that the server can generate a question-and-answer result corresponding to the voice question-and-answer command based on the external environment information and the voice question-and-answer command. Receive the question and answer results sent by the server.
三、在网络状态不满足传输条件的情况下,车载终端基于设备算力从近场设备中确定出目标近场设备;向目标近场设备发送外部环境信息和语音问答指令,以便目标近场设备基于外部环境信息以及语音问答指令,生成语音问答指令对应的问答结果;接收目标近场设备发送的问答结果。其中,近场设备可以是处于车辆内的智能手机,或平板电脑等。3. When the network status does not meet the transmission conditions, the vehicle-mounted terminal determines the target near-field device from the near-field devices based on the computing power of the device; sends external environment information and voice question and answer instructions to the target near-field device, so that the target near-field device Based on the external environment information and the voice question and answer instruction, generate a question and answer result corresponding to the voice question and answer instruction; receive the question and answer result sent by the target near-field device. Wherein, the near-field device may be a smart phone or a tablet computer in the vehicle.
其中,近场设备是在车辆内与车载终端建立通信连接的移动终端,可以通过蓝牙、WiFi等方式建立通信连接,车载终端可通过蓝牙扫描等方式来确定近场设备。Among them, the near-field device is a mobile terminal that establishes a communication connection with the vehicle-mounted terminal in the vehicle. The communication connection can be established through Bluetooth, WiFi, etc., and the vehicle-mounted terminal can determine the near-field device through Bluetooth scanning and other methods.
确定近场设备后,车载终端根据设备算力从近场设备中确定目标近场设备。设备算力是指设备通过处理数据,实现特定结果输出的计算能力,算力可以用客观的数据衡量,不同设备的算力会预先通过专用测试程序进行测试得到算力的性能。After determining the near-field device, the vehicle-mounted terminal determines the target near-field device from the near-field devices according to the computing power of the device. The computing power of a device refers to the computing power of a device to achieve a specific output by processing data. The computing power can be measured by objective data. The computing power of different devices will be tested in advance through a dedicated test program to obtain the performance of the computing power.
可选的,将预先测试好的不同设备的算力性能进行排序,设定为不同优先级,优先级最高为1,代表该设备算力性能最强,再将设备算力优先级排序存储到车载终端内置存储器中,当需要确定目标近场设备时,车载终端基于设备算力优先级从近场设备中选择算力优先级相对最高的设备,将其确定为目标近场设备。例如,通常情况下,笔记本电脑的算力大于智能手机的算力大于智能手表的算力,将笔记本电脑的算力优先级设定为1,智能手机的算力优先级设定为2,智能手表的算力优先级设定为3,并将该优先级排序存储到车载终端内置存储器中,在车载终端需要确定目标近场设备时,在近场设备中存在笔记本电脑的情况下,确定笔记本电脑为目标近场设备,否则,确定其余近场设备中优先级最高的设备为目标近场设备。Optionally, sort the computing power performance of different devices that have been tested in advance, and set them to different priorities. The highest priority is 1, which means that the device has the strongest computing power performance, and then store the computing power priority of the device in In the built-in memory of the vehicle-mounted terminal, when it is necessary to determine the target near-field device, the vehicle-mounted terminal selects the device with the relatively highest computing power priority from the near-field devices based on the device computing power priority, and determines it as the target near-field device. For example, under normal circumstances, the computing power of laptops is greater than that of smartphones and smart watches. Set the computing power priority of laptops to 1, and that of smartphones to 2. The computing power priority of the watch is set to 3, and the priority is stored in the built-in memory of the vehicle terminal. When the vehicle terminal needs to determine the target near-field device, if there is a notebook computer in the near-field device, determine the notebook computer. The computer is the target near-field device; otherwise, the device with the highest priority among the remaining near-field devices is determined to be the target near-field device.
可选的,设定网络延迟和传输速度阈值,在当前网络延迟大于设定阈值,或者,在前网络传输速度小于设定阈值情况下,车载终端判定当前网络条件不满足传输状态。在当前网络延迟小于设定阈值且当前网络传输速度大于设定阈值的情况下,车载终端判定当前网络状态满足传输条件。Optionally, set network delay and transmission speed thresholds, and when the current network delay is greater than the set threshold, or, when the previous network transmission speed is lower than the set threshold, the vehicle terminal determines that the current network condition does not meet the transmission status. When the current network delay is less than the set threshold and the current network transmission speed is greater than the set threshold, the vehicle-mounted terminal determines that the current network state satisfies the transmission condition.
当然,也可以设定通过其他参数判断当前网络状态是否满足传输条件,本申请实施例对此不作限定。Of course, other parameters may also be used to determine whether the current network status meets the transmission condition, which is not limited in this embodiment of the present application.
在一种可能的实施方式中,车载终端在需要通过近场设备或远端服务器进行数据处理时,可以将接收到的语音问答指令直接传送给近场设备或远端服务器,也可以将接收到的语音问答指令转换为对应的语音问答文本后再传送给近场设备或远端服务器。本实施例对此不作限定。In a possible implementation, when the vehicle-mounted terminal needs to perform data processing through the near-field device or the remote server, it can directly transmit the received voice question-and-answer instruction to the near-field device or the remote server, or send the received The voice question and answer command is converted into the corresponding voice question and answer text and then sent to the near-field device or the remote server. This embodiment does not limit it.
图10是本申请一示例性实施例提供的获取语音问答指令对应的问答结果过程的流程图。本实施例基于上述方式一,说明获取语音问答指令对应的问答结果的方法步骤,如图10所示,该方法包括以下步骤:Fig. 10 is a flowchart of a process of obtaining a question and answer result corresponding to a voice question and answer instruction provided by an exemplary embodiment of the present application. Based on the above method 1, this embodiment describes the method steps for obtaining the question and answer result corresponding to the voice question and answer instruction. As shown in FIG. 10 , the method includes the following steps:
步骤1001,对外部环境信息进行特征提取,得到外部环境特征。
车载终端对外部环境信息中的特征进行提取,外部影像特征包括颜色特征、纹理特征、形状特征等,外部音频特征包括响度、音调、音色等。车载终端在提取外部影像特征时,可采用Non-local模型或Slow-fast模型等算法模型来进行特征提取。The vehicle-mounted terminal extracts the features in the external environment information. The external image features include color features, texture features, shape features, etc., and the external audio features include loudness, pitch, and timbre. When the vehicle-mounted terminal extracts external image features, it can use algorithm models such as Non-local model or Slow-fast model for feature extraction.
步骤1002,对外部环境特征和语音问答指令对应的语音问答文本的文本特征进行特征拼接,得到融合特征。
车载终端对外部环境信息进行特征提取后,在外部环境信息包含影像信息的情况下,将代表环境影像信息的三维张量降维成一维向量,再将代表图像信息的一维向量与语音问答文本对应的文本向量进行拼接,得到融合特征。After the vehicle-mounted terminal extracts the features of the external environment information, if the external environment information contains image information, it reduces the dimensionality of the three-dimensional tensor representing the environmental image information into a one-dimensional vector, and then combines the one-dimensional vector representing the image information with the voice question-answer text The corresponding text vectors are concatenated to obtain fusion features.
步骤1003,将融合特征输入问答模型,得到问答模型输出的问答结果。
其中,问答模型的输入是外部环境信息与语音问答指令的融合特征向量,输出是语音问答指令对应的问答结果,问答模型可以采用卷积神经网络、循环神经网络或Transformer模型等算法模型。Among them, the input of the question answering model is the fusion feature vector of the external environment information and the voice question answering command, and the output is the question answering result corresponding to the voice question answering command. The question answering model can use algorithm models such as convolutional neural network, recurrent neural network or Transformer model.
在此,以图像维度的外部环境信息为例,对以上步骤进行说明,本实施例中车载终端采用问答分析算法实现上述步骤,如图11所示。图11是本申请实施例提供的一种问答分析算法示意图。Here, the above steps are described by taking the external environment information in the image dimension as an example. In this embodiment, the vehicle-mounted terminal implements the above steps by using a question-and-answer analysis algorithm, as shown in FIG. 11 . FIG. 11 is a schematic diagram of a question and answer analysis algorithm provided by an embodiment of the present application.
其中,目标外部环境和语音问答指令对应的语音问答文本作为问答分析算法的输入,问答结果作为问答分析算法的输出。Among them, the voice question and answer text corresponding to the target external environment and the voice question and answer instruction is used as the input of the question and answer analysis algorithm, and the question and answer result is used as the output of the question and answer analysis algorithm.
首先,使用Slow-fast模型1102提取目标外部环境信息1101中的特征信息,得到外部环境特征。其中,快分支网络11021运算开销小,用于分析视频序列中的动态变化信息,慢分支网络11022运算开销大,参数量稍大,用于分析视频序列中的颜色、纹理、光照变化等信息。First, use the Slow-fast model 1102 to extract feature information in the target
快慢分支网络分别提取到特征信息后,通过特征融合网络11023融合,得到代表影像信息的三维张量,再经过降维网络1104,生成代表影像信息的一维向量1106。After the feature information is extracted by the fast and slow branch network respectively, it is fused through the feature fusion network 11023 to obtain a three-dimensional tensor representing image information, and then through a dimensionality reduction network 1104 to generate a one-
同时,对语音问答指令对应的语音问答文本1103通过文本向量生成1105生成语音问答指令对应的文本向量,再进行分词,然后查询其中每个词的词向量,对这些词的词向量进行加权平均,得到语音问答指令对应的语音问答文本的一维文本向量1107。Simultaneously, the voice question and answer text 1103 corresponding to the voice question and answer instruction generates the text vector corresponding to the voice question and answer instruction through the
最后将代表影像信息的一维向量和语音问答指令对应的语音问答文本的一维文本向量进行特征拼接,就得到了融合特征向量1108。将融合特征向量输入Transformer模型1109,生成问答结果。Finally, the one-dimensional vector representing the image information and the one-dimensional text vector of the voice question-answer text corresponding to the voice question-answer instruction are subjected to feature concatenation to obtain a
声音维度作为外部环境信息获取问答结果时,同样需要对外部环境音频进行特征提取,与语音问答指令对应的语音问答文本的文本特征进行特征拼接后得到融合向量,再输入问答系统模型。本实施例在此不作赘述。When the sound dimension is used as external environment information to obtain the Q&A results, it is also necessary to extract the features of the external environment audio, and the text features of the voiced Q&A text corresponding to the voiced Q&A instructions are spliced to obtain a fusion vector, which is then input into the Q&A system model. This embodiment will not be described in detail here.
在本实施例中,车载终端通过对外部环境信息进行特征提取,特征融合进而得到问答结果,这一操作使获取到的问答结果与用户发出的语音问答指令特征相匹配,实现了用户能够在驾驶场景中根据车辆环境感知进行提问并能得到准确问答结果的功能,智能化程度更高。In this embodiment, the vehicle-mounted terminal extracts features from the external environment information, and then obtains the question-and-answer results through feature fusion. In the scene, the function of asking questions based on the perception of the vehicle environment and obtaining accurate question and answer results is more intelligent.
此外,实际应用中触发语音问答指令的用户不限于是驾驶员,可能是坐在其他座位上的用户,此时观察者视角与外部环境影像的拍摄视角有所不同,所以本申请实施例提供了另一种得到外部环境特征的方式。In addition, in practical applications, the user who triggers the voice question and answer command is not limited to the driver, but may be a user sitting in another seat. At this time, the view angle of the observer is different from the shooting angle of the external environment image, so the embodiment of the present application provides Another way to get the characteristics of the external environment.
图12是本申请一示例性实施例提供的对外部环境信息进行特征提取,得到外部环境特征过程的流程图。该方法可以包括以下步骤:Fig. 12 is a flowchart of a process of extracting features of external environment information to obtain external environment features provided by an exemplary embodiment of the present application. The method may include the steps of:
步骤1201,确定观察视角,该观察视角为触发语音问答指令的观察者的视角。Step 1201: Determine the viewing angle, which is the viewing angle of the observer who triggers the voice question-and-answer instruction.
观察者视角是指,发出语音问答指令的观察者,在对外部环境进行观察时的视角。The observer's perspective refers to the perspective of the observer who issues the voice question and answer command when observing the external environment.
可选的,观察视角与观察者的身高,年龄以及所坐位置等都具有一定关系,因此,车载终端在确定观察者视角时,通过声源定位技术对发出语音问答指令的用户所处的空间位置进行大致的判断,空间位置包括但不限于所坐位置以及发声高度,进而合理推断出观察者视角。Optionally, the viewing angle has a certain relationship with the observer's height, age, and sitting position. Therefore, when the vehicle-mounted terminal determines the observer's viewing angle, it uses the sound source localization technology to determine the location of the user who issued the voice question and answer instruction. The spatial position includes but is not limited to the sitting position and the height of the sound, and then the observer's perspective can be reasonably inferred.
在一种可能的实施方式中,车载终端内部设置有声源定位设备,车载终端基于声源定位设备定位触发语音问答指令的用户在车辆内的位置,进而根据观察者视角对外部环境图像进行处理,生成更加准确的问答结果。In a possible implementation, the vehicle-mounted terminal is equipped with a sound source localization device, and the vehicle-mounted terminal locates the location of the user who triggers the voice question and answer command based on the sound source localization device, and then processes the external environment image according to the observer's perspective, Generate more accurate question answering results.
步骤1202,基于观察视角以及外部环境影像的拍摄视角,对外部环境影像进行图像仿射变换,得到变换后的外部环境影像。
拍摄视角与观察视角在很大程度上不能保持一致,图13是在一种应用场景下某一时刻观察者视角与拍摄视角的差异的示意图。The shooting angle of view and the observation angle of view cannot be consistent to a large extent. FIG. 13 is a schematic diagram of the difference between the observer's angle of view and the shooting angle of view at a certain moment in an application scenario.
在图13中,1301为正在行驶的车辆,1302为车辆行驶过程中经过的一座建筑物,该建筑可同时被车载外置摄像头1303和乘坐车辆的用户1304捕捉到。从图中可以看出,在这一时刻,车辆外置摄像头的拍摄视角与观察者视角有所差异。对于同一事物,由于观察者观察到的图像与拍摄设备拍摄的图像有所不同,所以用户针对该建筑物进行提问时,得到的问答结果与语音问答指令可能存在不对应的情况。In FIG. 13 , 1301 is a driving vehicle, and 1302 is a building passing by the vehicle during driving, which can be captured by the vehicle
仿射变换是指在向量空间中进行一次线性变换和一次平移,变换到另一个向量空间的过程。仿射变换变化包括缩放、平移、旋转、反射、错切,原本图像的直线仿射变换后还是直线,原来图像的平行线经过仿射变换之后还是平行线,这就是仿射。Affine transformation refers to the process of performing a linear transformation and a translation in a vector space to transform into another vector space. Affine transformation changes include scaling, translation, rotation, reflection, and miscutting. The straight line of the original image is still a straight line after affine transformation, and the parallel lines of the original image are still parallel lines after affine transformation. This is affine.
车载终端对外部环境影像进行仿射变换,目的是将外部拍摄设备所拍摄的影像通过图像仿射变换,变换成更符合观察者视角的影像,能够使图像特征与语音问答指令中所描述的特征相互对应,进而得到更准确的问答结果。The on-vehicle terminal performs affine transformation on the external environment image, the purpose is to transform the image captured by the external shooting device into an image that is more in line with the viewer's perspective through image affine transformation, and can make the image features and the features described in the voice question and answer instructions Correspond to each other, and then get more accurate question and answer results.
步骤1203,对变换后的外部环境影像进行特征提取,得到外部环境特征。
需要说明的是,本实施例中提供的一种对外部环境信息进行特征提取,得到外部环境特征的方式。同样可用于图10所示实施例,作为图10所示实施例中的步骤1001,在观察者视角与外部环境影像的拍摄视角不同的情况下,具有更优的实施效果。It should be noted that this embodiment provides a manner of extracting features of external environment information to obtain features of the external environment. It can also be used in the embodiment shown in FIG. 10 . As
综上所述,在真实驾驶环境中,车窗外的景物可能会根据观察者视角不同而呈现出不同状态,使得用户问答指令中的描述与摄像头所拍摄影像的特征存在偏差。本实施例中,根据车辆内不同位置观察者的视角,对外部环境影像进行仿射变换,使的变换后的图像特征与用户问答指令中的特征更加匹配,进而可以得出更准确的答案。To sum up, in a real driving environment, the scenery outside the car window may show different states according to the observer's perspective, which makes the description in the user's question and answer instruction deviate from the characteristics of the image captured by the camera. In this embodiment, according to the perspectives of observers at different positions in the vehicle, an affine transformation is performed on the external environment image, so that the transformed image features are more matched with the features in the user's question and answer instruction, and more accurate answers can be obtained.
图14是本申请一示例性实施例提供的一种语音问答应用场景的示意图。Fig. 14 is a schematic diagram of a voice question answering application scenario provided by an exemplary embodiment of the present application.
在图14中,车载终端和汽车外置图像采集设备均处于开启状态,用户启动语音问答程序,对当前车辆所处环境中的景物进行提问,车载终端根据感知到的外部环境信息,对用户提出的语音问答指令进行分析生成对应的问答结果,并进行语音播报以及图像展示。本实施例中采用风景有关的问答场景,并不对本实施例构成限制。In Figure 14, both the vehicle-mounted terminal and the vehicle’s external image acquisition device are turned on, and the user starts the voice question-and-answer program to ask questions about the scene in the current environment where the vehicle is located. The voice question and answer command is analyzed to generate the corresponding question and answer result, and the voice broadcast and image display are performed. In this embodiment, a question-and-answer scene related to scenery is adopted, which does not constitute a limitation to this embodiment.
在一个示意性的例子中,车辆驾驶场景下的语音问答过程中,车载终端状态转移过程,如图15所示。其中箭头方向表示车载终端状态转移的方向。In a schematic example, during the voice question-and-answer process in the vehicle driving scene, the state transition process of the vehicle terminal is shown in FIG. 15 . The direction of the arrow indicates the direction of the state transition of the vehicle terminal.
待机状态1501是指在整个语音问答程序开始运行前,车载终端处于该状态下。在待机状态1501中,外部信息采集组件持续运行,实时采集车辆所处外部环境信息,此时用户未发出语音问答指令。
车载终端处于语音接收状态1502时,用户发出语音指令,语音采集组件开始工作,并将采集到的语音指令发送给车载终端进行语音指令类型判断。When the vehicle-mounted terminal is in the
信息提取状态1503是指车载终端从缓存的外部环境信息中提取目标外部环境信息时的状态。The information extraction state 1503 refers to the state when the vehicle-mounted terminal extracts the target external environment information from the cached external environment information.
车载终端处于分析计算状态1504下,车载终端选择最佳算力设备并发出相应的数据计算指令,相应的设备采用问答分析模型进行计算,在生成语音问答指令对应的问答结果的情况下,将问答结果返回给车载终端。When the vehicle-mounted terminal is in the analysis and calculation state 1504, the vehicle-mounted terminal selects the best computing power device and issues a corresponding data calculation command, and the corresponding device uses the question-and-answer analysis model to perform calculations. When the question-and-answer result corresponding to the voice question-and-answer command is generated, the The result is returned to the vehicle terminal.
播报结果状态1505是指车载终端根据问答结果,进一步处理后,使图像显示组件和语音播报组件进行相应的输出。The broadcast result state 1505 means that the vehicle terminal makes the image display component and the voice broadcast component perform corresponding output after further processing according to the question and answer result.
待机状态1501下,在没有接收到语音问答指令的情况下,继续保持待机状态1501,在接收到语音问答指令的情况下,车载终端转移到语音接收状态1502。In the
语音接收状态1502下,在用户语音输入在截断等待时间之内,车载终端持续保持语音接收状态1502;语音信息ASR识别为空,或该语音指令被判定为为非语音问答指令时,车载终端回到待机状态1501;语音指令被判定为语音问答指令时,车载终端转移到信息提取状态1503。In the
信息提取状态1503下,智能车机从外部环境中提取目标环境信息失败时,返回待机状态1501;车载终端完成对目标外部环境信息的提取时,车载终端转移到分析计算状态1504。In the information extraction state 1503 , when the smart vehicle fails to extract the target environment information from the external environment, it returns to the
分析计算状态1504下,当生成的问答结果对应的问答结果文本为空时,车载终端返回待机状态1501;当生成的语音问答结果不为空时,车载终端转移到播报结果状态1505。In the analysis and calculation state 1504, when the question and answer result text corresponding to the generated question and answer result is empty, the vehicle terminal returns to the
播报结果状态1505下,用户发出新的语音指令,车载终端直接跳转到语音接收状态1502,继续执行下一次语音问答程序。In the broadcast result state 1505, the user issues a new voice command, and the vehicle terminal directly jumps to the
在上述任意一种状态下,用户手动中断语音问答程序时,车载终端均直接返回待机状态1501。In any of the above states, when the user manually interrupts the voice question-and-answer program, the vehicle-mounted terminal directly returns to the
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are device embodiments of the present application, which can be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图16,其示出了本申请一个示例性实施例提供的驾驶场景中语音问答装置的结构框图。该装置可以包括:Please refer to FIG. 16 , which shows a structural block diagram of a voice question answering device in a driving scene provided by an exemplary embodiment of the present application. The device can include:
信息获取模块1601,用于在接收到语音问答指令的情况下,获取外部环境信息,所述外部环境信息由环境信息采集组件在载具行驶过程中采集得到,且外部环境信息用于表征所述载具所处的外部环境;The
结果获取模块1602,用于基于所述外部环境信息以及所述语音问答指令,获取所述语音问答指令对应的问答结果;A
语音播报模块1603,用于基于所述问答结果进行语音播报。
可选的,所述结果获取模块1602,用于:Optionally, the
基于所述语音问答指令,从所述外部环境信息中提取目标外部环境信息,所述目标外部环境信息与所述语音问答指令的相关性高于其他外部环境信息与所述语音问答指令的相关性;用于基于所述目标外部环境信息以及所述语音问答指令,获取所述语音问答指令对应的问答结果。Extract target external environment information from the external environment information based on the voice question and answer instruction, the correlation between the target external environment information and the voice question and answer instruction is higher than that of other external environment information and the voice question and answer instruction ; Obtaining a question and answer result corresponding to the voice question and answer instruction based on the target external environment information and the voice question and answer instruction.
可选的,所述结果获取模块1602,用于:Optionally, the
基于所述语音问答指令,从所述外部环境信息中提取目标外部环境信息,所述目标外部环境信息与所述语音问答指令的相关性高于其他外部环境信息与所述语音问答指令的相关性;基于所述目标外部环境信息以及所述语音问答指令,获取所述语音问答指令对应的问答结果。Extract target external environment information from the external environment information based on the voice question and answer instruction, the correlation between the target external environment information and the voice question and answer instruction is higher than that of other external environment information and the voice question and answer instruction ; Based on the target external environment information and the voice question-and-answer instruction, obtain a question-and-answer result corresponding to the voice question-and-answer instruction.
可选的,所述结果获取模块1602,用于:Optionally, the
在所述问题维度为所述图像维度的情况下,从所述外部环境信息中提取外部环境影像作为所述目标外部环境信息;在所述问题维度为所述声音维度的情况下,从所述外部环境信息中提取外部环境音频作为所述目标外部环境信息。When the problem dimension is the image dimension, extract an external environment image from the external environment information as the target external environment information; when the problem dimension is the sound dimension, extract the external environment image from the external environment information External environment audio is extracted from the external environment information as the target external environment information.
可选的,所述结果获取模块1602,用于:Optionally, the
基于所述语音问答指令的接收时刻以及所述外部环境信息的采集时刻,从所述外部环境信息中提取所述目标外部环境信息。The target external environment information is extracted from the external environment information based on the receiving time of the voice question-answer instruction and the collection time of the external environment information.
可选的,所述结果获取模块1602,用于:Optionally, the
在识别出所述语音问答指令对应的语音问答文本中包含时间关键字的情况下,基于所述时间关键字以及所述接收时刻,确定第一采集时段;将所述采集时刻位于所述第一采集时段的所述外部环境信息确定为所述目标外部环境信息;在识别出所述语音问答指令对应的语音问答文本中不包含时间关键字的情况下,基于所述接收时刻确定第二采集时段;将所述采集时刻位于所述第二采集时段的所述外部环境信息确定为所述目标外部环境信息。When it is recognized that the voice question and answer text corresponding to the voice question and answer instruction contains a time keyword, based on the time keyword and the receiving time, determine a first collection period; set the collection time at the first The external environment information in the collection period is determined as the target external environment information; when it is recognized that the voice question and answer text corresponding to the voice question and answer instruction does not contain a time keyword, determine a second collection period based on the receiving time ; determining the external environment information whose collection time is within the second collection period as the target external environment information.
可选的,所述结果获取模块1602,用于:Optionally, the
基于所述外部环境信息以及所述语音问答指令,生成所述语音问答指令对应的所述问答结果;generating the question and answer result corresponding to the voice question and answer instruction based on the external environment information and the voice question and answer instruction;
或,or,
在网络状态满足传输条件的情况下,将所述外部环境信息和所述语音问答指令上报至服务器,以便所述服务器基于所述外部环境信息以及所述语音问答指令,生成所述语音问答指令对应的所述问答结果;接收所述服务器下发的所述问答结果;When the network status satisfies the transmission condition, report the external environment information and the voice question and answer instruction to the server, so that the server can generate the voice question and answer instruction corresponding to the external environment information and the voice question and answer instruction the question-and-answer result; receiving the question-and-answer result issued by the server;
或,or,
在网络状态不满足传输条件的情况下,基于设备算力从近场设备中确定出目标近场设备;向所述目标近场设备发送所述外部环境信息和所述语音问答指令,以便所述目标近场设备基于所述外部环境信息以及所述语音问答指令,生成所述语音问答指令对应的所述问答结果;接收所述目标近场设备发送的所述问答结果。When the network state does not meet the transmission conditions, determine the target near-field device from the near-field devices based on the device computing power; send the external environment information and the voice question-and-answer instruction to the target near-field device, so that the The target near-field device generates the question-and-answer result corresponding to the voice question-and-answer instruction based on the external environment information and the voice question-and-answer instruction; and receives the question-and-answer result sent by the target near-field device.
可选的,所述结果获取模块1602,用于:Optionally, the
对所述外部环境信息进行特征提取,得到外部环境特征;对所述外部环境特征和所述语音问答指令对应的语音问答文本的文本特征进行特征拼接,得到融合特征;将所述融合特征输入问答模型,得到所述问答模型输出的所述问答结果。Performing feature extraction on the external environment information to obtain external environment features; performing feature splicing on the external environment features and the text features of the voice question and answer text corresponding to the voice question and answer instruction to obtain fusion features; inputting the fusion features into the question and answer model to obtain the question answering result output by the question answering model.
可选的,所述结果获取模块1602,用于:Optionally, the
确定观察视角,所述观察视角为触发所述语音问答指令的观察者的视角;基于所述观察视角以及所述外部环境影像的拍摄视角,对所述外部环境影像进行图像仿射变换,得到变换后的所述外部环境影像;对变换后的所述外部环境影像进行特征提取,得到所述外部环境特征。Determining an observation angle of view, the observation angle of view is the angle of view of the observer who triggered the voice question and answer instruction; based on the observation angle of view and the shooting angle of the external environment image, image affine transformation is performed on the external environment image to obtain the transformed The transformed external environment image; performing feature extraction on the transformed external environment image to obtain the external environment feature.
可选的,所述信息获取模块1601,用于:Optionally, the
在接收到语音指令的情况下,对所述语音指令进行指令类型识别;在所述语音指令的指令类型为问答指令的情况下,确定接收到所述语音问答指令,并获取所述外部环境信息。In the case of receiving a voice instruction, perform instruction type recognition on the voice instruction; in the case of a question-and-answer instruction, determine that the voice question-and-answer instruction is received, and acquire the external environment information .
可选的,所述装置还包括:Optionally, the device also includes:
图像展示模块,用于在接收到语音指令的情况下,对所述语音指令进行指令类型识别;在所述语音指令的指令类型为问答指令的情况下,确定接收到所述语音问答指令,并获取所述外部环境信息。The image display module is used to identify the type of the voice command when the voice command is received; if the command type of the voice command is a question and answer command, determine that the voice question and answer command is received, and Obtain the external environment information.
综上所述,本实施例中提供的驾驶场景中语音问答装置,能够用于通过接收语音指令获取外部环境信息,并根据外部环境信息以及语音指令获取语音问答指令对应的问答结果,进行播报。解决了问答系统不能在驾驶过程中根据当前驾驶环境进行交互的问题,达到了驾驶场景下用户可见即可问可答的效果,智能化程度更高。To sum up, the voice question and answer device in the driving scene provided in this embodiment can be used to obtain external environment information by receiving voice instructions, and obtain the question and answer results corresponding to the voice question and answer instructions according to the external environment information and voice instructions, and broadcast. It solves the problem that the question answering system cannot interact according to the current driving environment during the driving process, and achieves the effect that the user can ask and answer questions visible to the user in the driving scene, and the degree of intelligence is higher.
请参考图17,其示出了本申请一个示例性实施例提供的车载终端的结构方框图。该终端1700可以实现成为上述各个实施例中的车载终端。终端1700可以包括一个或多个如下部件:处理器1710和存储器1720。Please refer to FIG. 17 , which shows a structural block diagram of a vehicle-mounted terminal provided by an exemplary embodiment of the present application. The terminal 1700 can be realized as the vehicle-mounted terminal in each of the foregoing embodiments. The terminal 1700 may include one or more of the following components: a
处理器1710可以包括一个或者多个处理核心。处理器1710利用各种接口和线路连接整个终端1700内的各个部分,通过运行或执行存储在存储器1720内的指令、程序、代码集或指令集,以及调用存储在存储器1720内的数据,执行终端1700的各种功能和处理数据。可选地,处理器1710可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable LogicArray,PLA)中的至少一种硬件形式来实现。处理器1710可集成中央处理器(CentralProcessing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)、神经网络处理器(Neural-network Processing Unit,NPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责触摸显示屏所需要显示的内容的渲染和绘制;NPU用于实现人工智能(Artificial Intelligence,AI)功能;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1710中,单独通过一块芯片进行实现。
存储器1720可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选地,该存储器1720包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1720可用于存储指令、程序、代码、代码集或指令集。存储器1720可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令、用于实现上述各个方法实施例的指令等;存储数据区可存储根据终端1700的使用所创建的数据等。The
除此之外,本领域技术人员可以理解,上述附图所示出的终端1700的结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。比如,终端1700中还包括显示屏、摄像组件、麦克风、扬声器、射频电路、传感器、音频电路、WiFi模块、电源、蓝牙模块等部件,在此不再赘述。In addition, those skilled in the art can understand that the structure of the terminal 1700 shown in the above drawings does not constitute a limitation on the terminal, and the terminal may include more or less components than those shown in the figure, or combine certain components , or different component arrangements. For example, the terminal 1700 also includes components such as a display screen, a camera component, a microphone, a speaker, a radio frequency circuit, a sensor, an audio circuit, a WiFi module, a power supply, and a Bluetooth module, which will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有至少一条程序代码,所述程序代码由处理器加载并执行以实现如上各个实施例所述的驾驶场景中的问答方法。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one program code, and the program code is loaded and executed by a processor to realize the driving scenario described in each of the above embodiments. question-and-answer method.
本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的驾驶场景中的问答方法。An embodiment of the present application provides a computer program product, where the computer program product includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the question answering method in the driving scene provided in various optional implementation manners of the above aspects.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。It should be understood that the "plurality" mentioned herein refers to two or more than two.
另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。In addition, the numbering of the steps described herein only exemplarily shows a possible sequence of execution among the steps. In some other embodiments, the above-mentioned steps may not be executed according to the order of the numbers, such as two different numbers The steps are executed at the same time, or two steps with different numbers are executed in the reverse order as shown in the illustration, which is not limited in this embodiment of the present application.
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only optional embodiments of the application, and are not intended to limit the application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the application shall be included in the protection of the application. within range.
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210952625.2A CN115312061A (en) | 2022-08-09 | 2022-08-09 | Voice question-answer method and device in driving scene and vehicle-mounted terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210952625.2A CN115312061A (en) | 2022-08-09 | 2022-08-09 | Voice question-answer method and device in driving scene and vehicle-mounted terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115312061A true CN115312061A (en) | 2022-11-08 |
Family
ID=83859821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210952625.2A Pending CN115312061A (en) | 2022-08-09 | 2022-08-09 | Voice question-answer method and device in driving scene and vehicle-mounted terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115312061A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110967169A (en) * | 2019-12-16 | 2020-04-07 | 塔普翊海(上海)智能科技有限公司 | Detection table and detection method for optical module of perspective AR glasses |
CN114467012A (en) * | 2019-09-26 | 2022-05-10 | 株式会社石田 | metering device |
CN118011375A (en) * | 2024-01-29 | 2024-05-10 | 桑蕾 | Three-dimensional material measurement system with interference object shielding function |
WO2024193419A1 (en) * | 2023-03-20 | 2024-09-26 | 合众新能源汽车股份有限公司 | Sensing interaction method based on smart cabin and sensing interaction device based on smart cabin |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255064A (en) * | 2018-08-30 | 2019-01-22 | Oppo广东移动通信有限公司 | Information searching method and device, intelligent glasses and storage medium |
CN109543019A (en) * | 2018-11-27 | 2019-03-29 | 苏州思必驰信息科技有限公司 | Dialogue service method and device for vehicle |
CN109710796A (en) * | 2019-01-14 | 2019-05-03 | Oppo广东移动通信有限公司 | Voice-based image search method, device, storage medium and terminal |
CN109992248A (en) * | 2019-02-25 | 2019-07-09 | 百度在线网络技术(北京)有限公司 | Implementation method, device, equipment and the computer readable storage medium of voice application |
CN110288989A (en) * | 2019-06-03 | 2019-09-27 | 安徽兴博远实信息科技有限公司 | Voice interactive method and system |
CN110459217A (en) * | 2019-08-21 | 2019-11-15 | 中国第一汽车股份有限公司 | A kind of vehicle-mounted answering method, system, vehicle and storage medium |
CN111694433A (en) * | 2020-06-11 | 2020-09-22 | 北京百度网讯科技有限公司 | Voice interaction method and device, electronic equipment and storage medium |
-
2022
- 2022-08-09 CN CN202210952625.2A patent/CN115312061A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255064A (en) * | 2018-08-30 | 2019-01-22 | Oppo广东移动通信有限公司 | Information searching method and device, intelligent glasses and storage medium |
CN109543019A (en) * | 2018-11-27 | 2019-03-29 | 苏州思必驰信息科技有限公司 | Dialogue service method and device for vehicle |
CN109710796A (en) * | 2019-01-14 | 2019-05-03 | Oppo广东移动通信有限公司 | Voice-based image search method, device, storage medium and terminal |
CN109992248A (en) * | 2019-02-25 | 2019-07-09 | 百度在线网络技术(北京)有限公司 | Implementation method, device, equipment and the computer readable storage medium of voice application |
CN110288989A (en) * | 2019-06-03 | 2019-09-27 | 安徽兴博远实信息科技有限公司 | Voice interactive method and system |
CN110459217A (en) * | 2019-08-21 | 2019-11-15 | 中国第一汽车股份有限公司 | A kind of vehicle-mounted answering method, system, vehicle and storage medium |
CN111694433A (en) * | 2020-06-11 | 2020-09-22 | 北京百度网讯科技有限公司 | Voice interaction method and device, electronic equipment and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114467012A (en) * | 2019-09-26 | 2022-05-10 | 株式会社石田 | metering device |
CN110967169A (en) * | 2019-12-16 | 2020-04-07 | 塔普翊海(上海)智能科技有限公司 | Detection table and detection method for optical module of perspective AR glasses |
WO2024193419A1 (en) * | 2023-03-20 | 2024-09-26 | 合众新能源汽车股份有限公司 | Sensing interaction method based on smart cabin and sensing interaction device based on smart cabin |
CN118011375A (en) * | 2024-01-29 | 2024-05-10 | 桑蕾 | Three-dimensional material measurement system with interference object shielding function |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115312061A (en) | Voice question-answer method and device in driving scene and vehicle-mounted terminal | |
EP4006901B1 (en) | Audio signal processing method and apparatus, electronic device, and storage medium | |
US11854550B2 (en) | Determining input for speech processing engine | |
CN110519636B (en) | Voice information playing method and device, computer equipment and storage medium | |
CN113516990B (en) | Voice enhancement method, neural network training method and related equipment | |
CN111368101B (en) | Multimedia resource information display method, device, equipment and storage medium | |
CN112598780B (en) | Instance object model construction method and device, readable medium and electronic equipment | |
CN108903521B (en) | Man-machine interaction method applied to intelligent picture frame and intelligent picture frame | |
US20240169687A1 (en) | Model training method, scene recognition method, and related device | |
CN111589138B (en) | Action prediction method, device, equipment and storage medium | |
CN107290975A (en) | A kind of house intelligent robot | |
CN112742024A (en) | Virtual object control method, device, equipment and storage medium | |
CN111341307A (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN116433810A (en) | Server, display device and virtual digital human interaction method | |
CN110491384B (en) | Voice data processing method and device | |
CN114333774B (en) | Speech recognition method, device, computer equipment and storage medium | |
WO2023231211A1 (en) | Voice recognition method and apparatus, electronic device, storage medium, and product | |
CN113763925B (en) | Speech recognition method, device, computer equipment and storage medium | |
CN117998166B (en) | Training method, training device, training equipment, training storage medium and training product for video generation model | |
CN114220034A (en) | Image processing method, device, terminal and storage medium | |
CN116610212A (en) | Multi-mode entertainment interaction method, device, equipment and medium | |
CN107908385B (en) | Holographic-based multi-mode interaction system and method | |
CN116977884A (en) | Training method of video segmentation model, video segmentation method and device | |
CN114462580A (en) | Text recognition model training method, text recognition method, device and device | |
CN114944152A (en) | Vehicle whistling sound identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |