[go: up one dir, main page]

CN110111789B - Voice interaction method and device, computing equipment and computer readable medium - Google Patents

Voice interaction method and device, computing equipment and computer readable medium Download PDF

Info

Publication number
CN110111789B
CN110111789B CN201910376854.2A CN201910376854A CN110111789B CN 110111789 B CN110111789 B CN 110111789B CN 201910376854 A CN201910376854 A CN 201910376854A CN 110111789 B CN110111789 B CN 110111789B
Authority
CN
China
Prior art keywords
voice
module
voice information
session identifier
semantics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910376854.2A
Other languages
Chinese (zh)
Other versions
CN110111789A (en
Inventor
殷切
欧阳能钧
张丙林
贺学焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Original Assignee
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Intelligent Connectivity Beijing Technology Co Ltd filed Critical Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority to CN201910376854.2A priority Critical patent/CN110111789B/en
Publication of CN110111789A publication Critical patent/CN110111789A/en
Application granted granted Critical
Publication of CN110111789B publication Critical patent/CN110111789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present disclosure provides a voice interaction method and an intelligent device, wherein voice recognition is started after the intelligent device is awakened, if no voice information is received, the number of times that no voice information is currently received is determined, and whether voice recognition is kept is determined according to the number of times that no voice information is currently received and a preset first threshold value; when the number of times that someone speaks is not detected continuously meets the preset condition, the voice recognition cannot be finished, if the user wants to continue to control the intelligent equipment, the user does not need to wake up again, but directly sends a voice control instruction to the intelligent equipment again, and the voice control instruction can be directly recognized because the voice recognition function of the intelligent equipment is still in an open state, so that the intelligent degree and the use convenience of the intelligent equipment are improved; by reasonably setting the first threshold value, the problems of intellectualization of intelligent equipment, CPU occupancy rate and flow consumption can be further considered. The present disclosure also provides a computing device and a computer-readable medium.

Description

Voice interaction method and device, computing equipment and computer readable medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a voice interaction method, apparatus, computing device, and computer readable medium.
Background
The voice man-machine interaction refers to interaction with intelligent equipment by taking voice as an information carrier. In recent years, with the development of voice recognition technology and the popularization of intelligent devices, voice man-machine interaction has become an important man-machine interaction mode.
The traditional voice recognition man-machine interaction scheme is that a keyword needs to be spoken first to wake up, a user is determined to have definite intention, and then a secondary man-machine conversation system for voice recognition is opened. The method effectively solves the problems of high occupation and user flow consumption of a general speech recognition CPU (Central Processing Unit) and the like by leading an offline keyword recognition. However, this approach also has the problem that each recognition needs to be awakened once, which is inflexible and not intelligent enough for artificial intelligence devices.
Disclosure of Invention
In view of the above-mentioned deficiencies in the prior art, the present disclosure provides a voice interaction method, apparatus, computing device and computer readable medium.
In a first aspect, an embodiment of the present disclosure provides a voice interaction method, where the method includes:
when the intelligent equipment is awakened, starting voice recognition;
judging whether voice information is received or not;
if the voice information is not received, determining the current times of not receiving the voice information, and determining whether to keep voice recognition according to the current times of not receiving the voice information and a preset first threshold value.
Preferably, the determining whether to keep voice recognition according to the number of times of currently not receiving the voice information and a preset first threshold specifically includes:
and if the current times of not receiving the voice information is less than a preset first threshold value, keeping voice recognition.
Further, the voice interaction method further comprises:
if the voice information is received, analyzing the semantics of the voice information and judging whether the semantics are effective semantics;
and if the semantic meaning is invalid, determining the number of times of currently judging invalid semantic meaning, and determining whether to keep voice recognition according to the number of times of currently judging invalid semantic meaning and a preset second threshold value.
Further, the voice interaction method further comprises:
if the semantic meaning is valid, executing corresponding operation according to the semantic meaning of the voice information;
and clearing the times of not receiving the voice information at present and the times of judging invalid semantics at present, and keeping voice recognition.
Preferably, the determining whether to keep voice recognition according to the number of currently judged invalid semantics and a preset second threshold specifically includes:
and if the number of times of currently judged invalid semantics is smaller than a preset second threshold value, keeping voice recognition.
Preferably, if the number of times of currently not receiving the voice message is greater than or equal to the first threshold, or if the number of times of currently judging that the semantic meaning is invalid is greater than or equal to the second threshold, the intelligent device is controlled to enter a sleep state.
Further, the voice interaction method further comprises: when the intelligent equipment is awakened, generating a session identifier;
after parsing the semantics of the voice information and before judging whether the semantics are valid semantics, the method further comprises:
judging whether the locally stored session identification is consistent with the session identification generated this time, if so, determining the semantic slot position information, and inquiring the semantic slot position information corresponding to the voice information received last time;
replacing the slot position information of the semantics corresponding to the voice information received last time by using the determined slot position information, and determining the semantics corresponding to the voice information received this time according to the replaced slot position information;
and if the locally stored session identifier is not consistent with the session identifier generated this time, updating the locally stored session identifier into the session identifier generated this time.
Further, before the controlling the smart device to enter the sleep state, the method further includes: and clearing the session identification stored locally.
On the other hand, this disclosed embodiment still provides a smart machine, includes: the device comprises a wake-up module, a voice recognition module, a first judgment module and a first processing module;
the awakening module is used for starting the voice recognition module when the intelligent equipment is awakened;
the voice recognition module is used for carrying out voice recognition;
the first judging module is used for judging whether voice information is received or not;
the first processing module is used for determining the current times of not receiving the voice information when the first judging module judges that the voice information is not received, and determining whether to keep voice recognition according to the current times of not receiving the voice information and a preset first threshold value.
Preferably, the first processing module is specifically configured to keep the voice recognition module turned on when the number of times that the voice information is not currently received is smaller than a preset first threshold.
Furthermore, the intelligent device also comprises a semantic analysis module, a second judgment module and a second processing module;
the semantic analysis module is used for analyzing the semantics of the voice information when the voice information is received;
the second judging module is used for judging whether the semantics are effective semantics;
and the second processing module is used for determining the number of times of currently judging invalid semantics when the second judging module judges the invalid semantics, and determining whether to keep voice recognition according to the number of times of currently judging the invalid semantics and a preset second threshold value.
Further, the intelligent device further includes a third processing module, where the third processing module is configured to, when the second determining module determines that the semantic meaning is valid, execute a corresponding operation according to the semantic meaning of the voice information; and clearing the times of not receiving the voice information at present and the times of judging invalid semantics at present, and keeping the voice recognition module open.
Preferably, the second processing module is specifically configured to keep the voice recognition module turned on when the number of times of currently judging the invalid semantics is smaller than a preset second threshold.
Further, the smart device further includes a dormancy module, where the dormancy module is configured to control the smart device to enter a dormant state when the first determination module determines that the number of times of currently not receiving the voice message is greater than or equal to the first threshold, or when the second determination module determines that the number of times of currently determining the invalid semantics is greater than or equal to the second threshold.
Furthermore, the intelligent device further comprises a multi-round session management module, wherein the multi-round session management module comprises a session identifier generation unit, a storage unit, a judgment unit, a slot position information processing unit and a session identifier updating unit;
the session identifier generating unit is used for generating a session identifier when the intelligent device is awakened by the awakening module;
the judging unit is configured to judge whether the session identifier stored in the storage unit is consistent with the session identifier generated by the session identifier generating unit this time after the semantic parsing module parses the semantic meaning of the voice message and before the second judging module judges whether the semantic meaning is an effective semantic meaning;
the slot position information processing unit is used for determining the semantic slot position information and inquiring the semantic slot position information corresponding to the voice information received last time when the judging unit judges that the session identifier stored in the storage unit is consistent with the session identifier generated by the session identifier generating unit this time; replacing the slot position information of the semantic corresponding to the voice information received last time by the determined slot position information, and determining the semantic corresponding to the voice information received this time according to the replaced slot position information;
the session identifier updating unit is configured to update the locally stored session identifier to the session identifier generated this time when the judging unit judges that the session identifier stored in the storage unit is inconsistent with the session identifier generated this time by the session identifier generating unit.
Further, the session identifier updating unit is further configured to clear the locally stored session identifier before the hibernation module controls the smart device to enter the hibernation state.
In yet another aspect, an embodiment of the present disclosure further provides a computing device, including:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the voice interaction method as previously described.
In still another aspect, the embodiments of the present disclosure further provide a computer-readable medium, on which a computer program is stored, where the program is executed to implement the information interaction method as described above.
According to the embodiment of the disclosure, voice recognition is started after the intelligent device is awakened, if voice information is not received, it is indicated that someone talking is not detected, the number of times that the voice information is not received currently is determined, and whether the voice recognition is kept is determined according to the number of times that the voice information is not received currently and a preset first threshold value; therefore, when the number of times that someone speaks is not detected continuously meets the preset condition, the voice recognition cannot be finished, at the moment, if the user wants to continuously control the intelligent equipment, the user does not need to wake up again, but directly sends the voice control instruction to the intelligent equipment again, correspondingly, the voice control instruction can be directly recognized because the voice recognition function of the intelligent equipment is still in an open state, and therefore the intelligent degree and the use convenience of the intelligent equipment are improved; by reasonably setting the first threshold value, the problems of intellectualization of intelligent equipment, CPU occupancy rate and flow consumption can be further considered.
Drawings
FIG. 1 is a flow chart of a voice interaction provided by an embodiment of the present disclosure;
FIG. 2 is one of the flow charts provided by the embodiments of the present disclosure for determining whether to maintain speech recognition;
FIG. 3 is a second flowchart of determining whether to maintain speech recognition according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a multi-round session management provided by an embodiment of the present disclosure;
fig. 5 is one of schematic structural diagrams of an intelligent device provided in an embodiment of the present disclosure;
fig. 6 is a second schematic structural diagram of an intelligent device according to an embodiment of the present disclosure;
fig. 7 is a third schematic structural diagram of an intelligent device according to an embodiment of the present disclosure;
fig. 8 is a fourth schematic structural diagram of an intelligent device provided in the embodiment of the present disclosure;
fig. 9 is a fifth schematic structural diagram of an intelligent device provided in the embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a multi-round session management module according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the unmanned vehicle controller test scheme provided by the present invention is described in detail below with reference to the accompanying drawings.
Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Embodiments described herein may be described with reference to plan and/or cross-sectional views in light of idealized schematic illustrations of the disclosure. Accordingly, the example illustrations can be modified in accordance with manufacturing techniques and/or tolerances. Accordingly, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of configurations formed based on a manufacturing process. Thus, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate specific shapes of regions of elements, but are not intended to be limiting.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
One embodiment of the present disclosure provides a voice interaction method, which is described in detail below with reference to fig. 1, and as shown in fig. 1, the method includes the following steps:
step 1-step 2, when the intelligent device is awakened, starting voice recognition.
In particular, to conserve traffic and CPU usage, the speech recognition function is typically turned off when the smart device is in a sleep state. When the intelligent equipment is in a dormant state, a user awakens the intelligent equipment by sending voice information containing keywords, and once the intelligent equipment is awakened, the voice recognition function is started.
Step 3, judging whether the voice information is received, if the voice information is not received, executing step 4; otherwise, step 6 is executed.
And after the intelligent device starts voice recognition, judging whether voice information is received, if the voice information is not received, and if the voice information is not received, indicating that someone does not recognize the speech, determining the current times of not receiving the voice information (namely, executing the step 4).
Specifically, after dividing the voice information into a plurality of voice packets, the intelligent device decodes each voice packet respectively to obtain the text information corresponding to each voice packet, thereby realizing voice recognition. If the character information can not be obtained by decoding each voice packet, the voice information is not received.
And 4, determining the current times of not receiving the voice information.
The number of times no _ speed _ times that voice information is not currently received as referred to herein is the cumulative number of times that voice information is not received. Specifically, if the smart device determines that the voice message is not received in step 3, in this step, the currently recorded number of times of receiving no voice message is increased by 1 (i.e., no _ speed _ times + +).
And step 5, determining whether to keep voice recognition according to the current times of not receiving the voice information and a preset first threshold value.
Specifically, as shown in fig. 2, the process of determining whether to keep voice recognition according to the number of times that the voice information is not currently received and a preset first threshold specifically includes the following steps:
step 21, determining whether the number no _ speed _ times of the current voice message not received is less than a first threshold N1, if yes, executing step 22.
Preferably, the first threshold may be set to 30.
Step 22, keeping the voice recognition.
Preferably, the first threshold may be set to 30.
As can be seen from steps 1 to 5, in the embodiment of the present disclosure, voice recognition is started after the intelligent device is awakened, if voice information is not received, it is determined that someone talking is not detected, the number of times that voice information is not currently received is determined, and whether voice recognition is maintained is determined according to the number of times that voice information is not currently received and a preset first threshold value; therefore, when the number of times that someone speaks is not detected continuously meets the preset condition, the voice recognition cannot be finished, at the moment, if the user wants to continuously control the intelligent equipment, the user does not need to wake up again, but directly sends the voice control instruction to the intelligent equipment again, correspondingly, the voice control instruction can be directly recognized because the voice recognition function of the intelligent equipment is still in an open state, and therefore the intelligent degree and the use convenience of the intelligent equipment are improved; by reasonably setting the first threshold value, the problems of intellectualization of intelligent equipment, CPU occupancy rate and flow consumption can be further considered.
And 6, analyzing the semantics of the voice information.
Specifically, the intelligent device determines semantics by analyzing text information corresponding to the voice information, and a specific implementation manner belongs to the prior art and is not described herein again.
Step 7, judging whether the semantics are valid semantics, and if the semantics are invalid semantics, executing step 8; otherwise, step 10 is performed.
Specifically, the intelligent device judges whether the semantics are effective semantics by determining whether the intention can be determined, and if the intention can be determined, the semantics are effective semantics; if the intention cannot be determined, the semantic meaning is an invalid semantic meaning. For example, the semantics corresponding to the voice information such as "hello", "haha", etc. have no definite intention, and thus are invalid semantics.
If the intelligent device judges that the semantic meaning is invalid, it indicates that the voice information may be the content of the user chatting with other people, but not the voice control instruction sent to the intelligent device, and then the voice information is not processed (i.e. the chatting content is ignored), and the number of times of judging the invalid semantic meaning is determined (i.e. step 8 is executed).
And 8, determining the number of times of currently judged invalid semantics.
The number no _ useful _ times of currently determined invalid semantics refers to the accumulated number of times of determined invalid semantics. Specifically, if the intelligent device determines that the semantic meaning is invalid in step 7, in this step, the currently recorded number of times of determining the invalid semantic meaning is increased by 1 (i.e., no _ useful _ times + +).
And 9, determining whether to keep voice recognition according to the number of times of currently judged invalid semantics and a preset second threshold value.
Specifically, as shown in fig. 3, the process of determining whether to keep speech recognition according to the number of times of currently judging invalid semantics and a preset second threshold includes the following steps:
in step 31, it is determined whether the currently determined number of times of invalid semantics is less than a preset second threshold N2, if yes, step 32 is executed.
Step 32, keeping the speech recognition.
Preferably, the second threshold may be set to 3.
As can be seen from the steps 6 to 9, the voice recognition is not finished when the number of times of continuously recognizing the invalid semantics is small by recording the number of times of continuously recognizing the invalid semantics, so that the user can continuously send the voice control instruction to the intelligent device without waking up again. By reasonably setting the second threshold, the problems of intellectualization of the intelligent equipment, CPU occupancy rate and flow consumption can be further considered.
And step 10, executing corresponding operation according to the semantic meaning of the voice information.
Specifically, the intelligent device determines the category of the intention according to the semantic meaning obtained by the analysis, and executes corresponding operation according to the category of the intention. The categories of intent may include: navigation, music, phone, encyclopedia, stock, weather, etc., may be set in advance in the smart device. For example, a user sends out voice information of "i want to go xxx", and the intelligent device analyzes the voice information to determine that the corresponding semantics are: i want to go xxx to determine the intent of the navigation class.
And 11, clearing the times of not receiving the voice information currently and the times of judging the invalid semantics currently, and keeping voice recognition.
Once the intelligent device recognizes the intention of the voice information, the user may continue to send the voice control instruction at this time, and at this time, the recording of the times of not receiving the voice information and the times of judging the invalid semantics need to be restarted, so that inconvenience brought to the user by automatic termination of the voice recognition due to the times reaching the threshold value is avoided.
By clearing the number of times that the voice information is not currently received and the number of times that the invalid semantics are currently judged, the number of times that the voice information is not currently received is ensured to be the number of times that the voice information is not continuously received, and the number of times that the invalid semantics are currently judged is the number of times that the invalid semantics are continuously judged, so that whether the condition that someone talks is detected in a time period or not and whether the voice information received in the time period is idle chat information or not can be accurately reflected, and the accuracy of voice identification is ensured to be maintained.
In order to reduce the CPU occupancy and traffic consumption of the intelligent device, as shown in fig. 2 and 3, the voice interaction method further includes the following steps:
if the number of times of currently not receiving the voice information is larger than or equal to the first threshold value in the step 5, or if the number of times of currently judging the invalid semantics is larger than or equal to the second threshold value in the step 9, controlling the intelligent device to enter a dormant state, so that the intelligent device does not perform voice recognition any more.
In order to improve the intelligent degree of the intelligent device and improve the user experience, further, the voice interaction method provided by the embodiment of the disclosure can also realize a multi-round session management function, that is, can recognize whether the voice information received this time and the voice information received last time belong to the same round of session, so as to understand the user intention more intelligently. The multi-round session management flow of the embodiment of the present disclosure is described in detail below with reference to fig. 4.
It should be noted that, when the smart device is awakened (i.e. after step 1), the voice interaction method further includes the step of generating a session identifier. When a user initiatively wakes up the intelligent device by speaking the wake-up word, a new round of conversation starts, a new conversation identifier is generated at the moment, and one round of conversation corresponds to one conversation identifier. A round of conversation can comprise one piece of voice information or a plurality of pieces of voice information, and the plurality of pieces of voice information can be correlated or independent from each other.
The multi-round session management process is described in detail below with reference to fig. 4. As shown in fig. 4, after parsing the semantic meaning of the voice information (i.e. step 6) and before determining whether the semantic meaning is a valid semantic meaning (i.e. step 7), the voice interaction method further includes the following steps:
step 41, judging whether the locally stored session identifier is consistent with the session identifier generated this time, if so, executing step 42; otherwise, step 45 is performed.
Specifically, if the intelligent device determines that the locally stored session identifier is consistent with the session identifier generated this time, it indicates that the received voice information and the voice information received last time belong to the same session, and therefore, when the received voice information is correlated with the voice information received last time, the semantics of the voice information can be determined according to the semantics corresponding to the voice information received last time.
And 42, determining the semantic slot position information, and inquiring the semantic slot position information corresponding to the previous voice information.
In this step, when it is determined that the voice information received this time and the voice information received last time belong to the same round of conversation, the intelligent device respectively obtains slot position information corresponding to the voice information this time and slot position information corresponding to the voice information last time.
And 43, judging whether the current voice information is associated with the previous voice information, if so, executing a step 44, otherwise, executing a step 7.
Specifically, whether the current voice information is associated with the previous voice information is determined according to the determined slot information (namely, the slot information corresponding to the current voice information) and the slot information corresponding to the previous voice information.
And step 44, replacing the slot position information of the semantic corresponding to the voice information of the previous time by using the determined slot position information, and determining the semantic corresponding to the voice information received this time according to the replaced slot position information.
Specifically, if the voice information received this time is associated with the voice information received last time, even if the intention of the voice information received this time is uncertain, the slot information in the voice information received last time can be obtained, and the slot information of the voice information received last time is replaced by the slot information different from the voice information received last time in the slot information of the voice information received this time, so that complete slot information corresponding to the voice information received this time (i.e., inheriting the related slot information in the voice information received last time) is obtained, and the semantics corresponding to the voice information received this time is obtained. After the step is executed, it is continuously determined whether the semantic meaning of the voice message received this time is valid (i.e., step 7 is executed).
And step 45, updating the locally stored session identifier into the session identifier generated this time.
Specifically, if the intelligent device determines that the locally stored session identifier is inconsistent with the session identifier generated this time, it indicates that the received voice information and the voice information received last time do not belong to the same session, that is, the received voice information starts a new session, and therefore, the locally stored session identifier is updated to the session identifier generated this time. After the step is executed, it is continuously determined whether the semantic meaning of the voice message received this time is valid (i.e., step 7 is executed).
It can be seen from steps 41 to 45 that, in the multi-round session management process, the session identifier is used to manage the dialog context, it can be determined whether the received voice information and the voice information received last time belong to the same round of session, and it is determined whether two adjacent voice information in the same round of session are associated with each other, and for the associated voice information in the same round of session, the complete semantics of the voice information can be determined by inheriting the slot position information of the voice information last time, so that the intelligence and the user experience of the intelligent device are further improved.
For example, after waking up the smart device, the user first sends out "how is the weather today? The intelligent device analyzes the voice information to obtain corresponding semantics and recalls corresponding response results (namely the weather condition of today). The slot position information corresponding to the voice information is as follows: the intent is to query weather, time today, location current location. Then, the user sends out voice information of the "sky wool", and the intelligent device judges that the voice information is the same round of conversation with the previous voice through the conversation identifier, and then the semantic slot position information corresponding to the voice information (namely the "sky wool") is determined as follows: the intent is to query weather, time tomorrow, location current location. Therefore, the semantic meaning corresponding to the voice information of the "open sky wool" is "how is the open weather? ".
Based on the same technical concept, an embodiment of the present disclosure further provides an intelligent device, as shown in fig. 5, the intelligent device includes: a wake-up module 51, a voice recognition module 52, a first judgment module 53 and a first processing module 54.
The wake-up module 51 is configured to activate the voice recognition module 52 when the smart device is woken up.
The speech recognition module 52 is used for performing speech recognition.
The first determining module 53 is configured to determine whether a voice message is received.
The first processing module 54 is configured to, when the first determining module 53 determines that the voice message is not received, determine the number of times that the voice message is not currently received, and determine whether to keep voice recognition according to the number of times that the voice message is not currently received and a preset first threshold.
Preferably, the first processing module 54 is specifically configured to keep the voice recognition module 52 turned on when the number of times that the voice message is not currently received is smaller than a preset first threshold.
Further, as shown in fig. 6, the intelligent device further includes a semantic parsing module 55, a second determining module 56, and a second processing module 57.
The semantic parsing module 55 is configured to parse the semantic meaning of the voice message when the voice message is received.
The second judging module 56 is configured to judge whether the semantic meaning is a valid semantic meaning.
The second processing module 57 is configured to, when the second judging module 56 judges that the semantic meaning is invalid, determine the number of times that the semantic meaning is currently judged to be invalid, and determine whether to maintain the voice recognition according to the number of times that the semantic meaning is currently judged to be invalid and a preset second threshold.
Further, as shown in fig. 7, the intelligent device further includes a third processing module 58, where the third processing module 58 is configured to, when the second determining module 56 determines that the semantic meaning is valid, execute a corresponding operation according to the semantic meaning of the voice message; and clearing the times of not receiving the voice information currently and the times of judging invalid semantics currently, and keeping the voice recognition module 52 open.
Preferably, the second processing module 57 is specifically configured to, when the number of times of currently judging that the invalid semantics is smaller than a preset second threshold, keep the voice recognition module 52 turned on.
Further, as shown in fig. 8, the intelligent device further includes a sleep module 59, where the sleep module 59 is configured to, when the first determining module 53 determines that the number of times of currently not receiving the voice message is greater than or equal to the first threshold, or when the second determining module 56 determines that the number of times of currently determining the invalid semantics is greater than or equal to the second threshold, control the intelligent device to enter a sleep state.
Further, as shown in fig. 9, the intelligent device further includes a multi-round session management module 50, and as shown in fig. 10, the multi-round session management module 50 includes a session identifier generation unit 501, a judgment unit 502, a storage unit 503, a slot information processing unit 504, and a session identifier update unit 505.
The session identifier generating unit 501 is configured to generate a session identifier when the wake-up module 51 wakes up the smart device.
The judging unit 502 is configured to, after the semantic parsing module 55 parses the semantic meaning of the voice information, before the second judging module 56 judges whether the semantic meaning is a valid semantic meaning, judge whether the session identifier stored in the storage unit 503 is consistent with the session identifier generated by the session identifier generating unit 505 this time.
The slot information processing unit 504 is configured to, when the determining unit 502 determines that the session identifier stored in the storage unit 503 is consistent with the session identifier generated by the session identifier generating unit 505 this time, determine slot information of the semantic meaning, and query slot information of the semantic meaning corresponding to the voice information received last time; and replacing the slot position information of the semantics corresponding to the voice information received last time by using the slot position information determined this time, and determining the semantics corresponding to the voice information received this time according to the replaced slot position information.
The session identifier updating unit 505 is configured to update the locally stored session identifier to the session identifier generated this time when the determining unit 502 determines that the session identifier stored in the storage unit 503 is not consistent with the session identifier generated this time by the session identifier generating unit 501.
Further, the session identifier updating unit 505 is further configured to clear the locally stored session identifier before the hibernation module 59 controls the smart device to enter the hibernation state.
An embodiment of the present disclosure further provides a computing device, including: one or more processors and storage; the storage device stores one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the voice interaction method provided in the foregoing embodiments.
The computing device may be a server, a client device, or the like.
The disclosed embodiments also provide a computer readable medium, on which a computer program is stored, wherein the computer program, when executed, implements the voice interaction method provided by the foregoing embodiments.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods disclosed above, functional modules/units in the apparatus, may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. It will, therefore, be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims (16)

1. A voice interaction method, wherein the method comprises:
when the intelligent equipment is awakened, starting voice recognition;
judging whether voice information is received or not;
if the voice information is not received, determining the current times of not receiving the voice information, and determining whether to keep voice recognition according to the current times of not receiving the voice information and a preset first threshold value;
if the voice information is received, analyzing the semantics of the voice information, and judging whether the semantics are effective semantics or not by determining whether the intention is determined; and if the semantic meaning is invalid, determining the number of times of currently judging invalid semantic meaning, and determining whether to keep voice recognition according to the number of times of currently judging invalid semantic meaning and a preset second threshold value.
2. The method according to claim 1, wherein the determining whether to maintain the voice recognition according to the number of times of currently not receiving the voice information and a preset first threshold specifically includes:
and if the current times of not receiving the voice information is less than a preset first threshold value, keeping voice recognition.
3. The method of claim 1, further comprising:
if the semantic meaning is valid, executing corresponding operation according to the semantic meaning of the voice information;
and clearing the times of not receiving the voice information at present and the times of judging invalid semantics at present, and keeping voice recognition.
4. The method according to claim 1, wherein the determining whether to maintain the speech recognition according to the number of times of currently judged invalid semantics and a preset second threshold specifically includes:
and if the number of times of currently judged invalid semantics is smaller than a preset second threshold value, keeping voice recognition.
5. The method of claim 1, wherein if the number of times that no voice message is currently received is greater than or equal to the first threshold, or if the number of times that invalid semantics are currently determined is greater than or equal to the second threshold, the smart device is controlled to enter a sleep state.
6. The method of claim 5, further comprising: when the intelligent equipment is awakened, generating a session identifier;
after parsing the semantics of the voice information and before judging whether the semantics are valid semantics, the method further comprises:
judging whether the locally stored session identification is consistent with the session identification generated this time, if so, determining the semantic slot position information, and inquiring the semantic slot position information corresponding to the voice information received last time;
replacing the slot position information of the semantics corresponding to the voice information received last time by using the determined slot position information, and determining the semantics corresponding to the voice information received this time according to the replaced slot position information;
and if the locally stored session identifier is not consistent with the session identifier generated this time, updating the locally stored session identifier into the session identifier generated this time.
7. The method of claim 6, wherein the method further comprises: and before controlling the intelligent equipment to enter a dormant state, clearing the locally stored session identifier.
8. A smart device, comprising: the voice recognition module is used for receiving a voice command from the voice recognition module;
the awakening module is used for starting the voice recognition module when the intelligent equipment is awakened;
the voice recognition module is used for carrying out voice recognition;
the first judging module is used for judging whether voice information is received or not;
the first processing module is used for determining the number of times of not receiving the voice information currently when the first judging module judges that the voice information is not received, and determining whether to keep voice recognition according to the number of times of not receiving the voice information currently and a preset first threshold value;
the semantic analysis module is used for analyzing the semantics of the voice information when the voice information is received;
the second judging module is used for judging whether the semantics are effective semantics;
and the second processing module is used for determining the number of times of currently judging invalid semantics when the second judging module judges the invalid semantics, and determining whether to keep voice recognition according to the number of times of currently judging the invalid semantics and a preset second threshold value.
9. The smart device according to claim 8, wherein the first processing module is specifically configured to keep the voice recognition module turned on when the number of times that the voice information is not currently received is less than a preset first threshold.
10. The intelligent device according to claim 8, further comprising a third processing module, wherein the third processing module is configured to, when the second determining module determines that the semantic meaning is valid, perform a corresponding operation according to the semantic meaning of the voice message; and clearing the times of not receiving the voice information at present and the times of judging invalid semantics at present, and keeping the voice recognition module open.
11. The smart device according to claim 8, wherein the second processing module is specifically configured to keep the voice recognition module turned on when the number of times that the currently determined invalid semantics are smaller than a preset second threshold.
12. The smart device according to claim 8, further comprising a sleep module, wherein the sleep module is configured to control the smart device to enter a sleep state when the first determining module determines that the number of times of currently not receiving the voice message is greater than or equal to the first threshold, or when the second determining module determines that the number of times of currently determining the invalid semantics is greater than or equal to the second threshold.
13. The intelligent device according to claim 12, further comprising a multi-round session management module including a session identifier generation unit, a storage unit, a judgment unit, a slot information processing unit, a session identifier update unit;
the session identifier generating unit is used for generating a session identifier when the intelligent device is awakened by the awakening module;
the judging unit is configured to judge whether the session identifier stored in the storage unit is consistent with the session identifier generated by the session identifier generating unit this time after the semantic parsing module parses the semantic meaning of the voice message and before the second judging module judges whether the semantic meaning is an effective semantic meaning;
the slot position information processing unit is used for determining the semantic slot position information and inquiring the semantic slot position information corresponding to the voice information received last time when the judging unit judges that the session identifier stored in the storage unit is consistent with the session identifier generated by the session identifier generating unit this time; replacing the slot position information of the semantic corresponding to the voice information received last time by the determined slot position information, and determining the semantic corresponding to the voice information received this time according to the replaced slot position information;
the session identifier updating unit is configured to update the locally stored session identifier to the session identifier generated this time when the judging unit judges that the session identifier stored in the storage unit is inconsistent with the session identifier generated this time by the session identifier generating unit.
14. The smart device of claim 13, wherein the session identifier update unit is further configured to clear the locally stored session identifier before the hibernation module controls the smart device to enter the hibernation state.
15. A computing device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the voice interaction method of any of claims 1-7.
16. A computer-readable medium, on which a computer program is stored, wherein the program, when executed, implements the voice interaction method of any one of claims 1-7.
CN201910376854.2A 2019-05-07 2019-05-07 Voice interaction method and device, computing equipment and computer readable medium Active CN110111789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910376854.2A CN110111789B (en) 2019-05-07 2019-05-07 Voice interaction method and device, computing equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910376854.2A CN110111789B (en) 2019-05-07 2019-05-07 Voice interaction method and device, computing equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN110111789A CN110111789A (en) 2019-08-09
CN110111789B true CN110111789B (en) 2022-02-08

Family

ID=67488559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910376854.2A Active CN110111789B (en) 2019-05-07 2019-05-07 Voice interaction method and device, computing equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN110111789B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619873A (en) 2019-08-16 2019-12-27 北京小米移动软件有限公司 Audio processing method, device and storage medium
CN110706703A (en) * 2019-10-16 2020-01-17 珠海格力电器股份有限公司 Voice wake-up method, device, medium and equipment
CN112750429A (en) * 2019-10-31 2021-05-04 合肥海尔洗衣机有限公司 Voice interaction method and device, electronic equipment and storage medium
WO2021212388A1 (en) * 2020-04-22 2021-10-28 南京阿凡达机器人科技有限公司 Interactive communication implementation method and device, and storage medium
CN111583923B (en) * 2020-04-28 2023-11-14 北京小米松果电子有限公司 Information control method and device and storage medium
CN112201242A (en) * 2020-09-29 2021-01-08 北京小米移动软件有限公司 Method and device for waking up equipment, electronic equipment and storage medium
CN112185379A (en) * 2020-09-29 2021-01-05 珠海格力电器股份有限公司 Voice interaction method and device, electronic equipment and storage medium
CN112652304B (en) * 2020-12-02 2022-02-01 北京百度网讯科技有限公司 Voice interaction method and device of intelligent equipment and electronic equipment
CN113707143A (en) * 2021-08-20 2021-11-26 珠海格力电器股份有限公司 Voice processing method, device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741838A (en) * 2016-01-20 2016-07-06 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device
CN106250747A (en) * 2016-08-01 2016-12-21 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN106297777A (en) * 2016-08-11 2017-01-04 广州视源电子科技股份有限公司 Method and device for awakening voice service
CN106328132A (en) * 2016-08-15 2017-01-11 歌尔股份有限公司 Voice interaction control method and device for intelligent equipment
CN107102713A (en) * 2016-02-19 2017-08-29 北京君正集成电路股份有限公司 It is a kind of to reduce the method and device of power consumption
CN109003605A (en) * 2018-07-02 2018-12-14 北京百度网讯科技有限公司 Intelligent sound interaction processing method, device, equipment and storage medium
CN109584874A (en) * 2018-12-15 2019-04-05 深圳壹账通智能科技有限公司 Electrical equipment control method, device, electrical equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853453B2 (en) * 2005-06-30 2010-12-14 Microsoft Corporation Analyzing dialog between a user and an interactive application
US8380511B2 (en) * 2007-02-20 2013-02-19 Intervoice Limited Partnership System and method for semantic categorization
CN106205612B (en) * 2016-07-08 2019-12-24 北京光年无限科技有限公司 Information processing method and system for intelligent robot
US10643601B2 (en) * 2017-02-09 2020-05-05 Semantic Machines, Inc. Detection mechanism for automated dialog systems
CN107845381A (en) * 2017-10-27 2018-03-27 安徽硕威智能科技有限公司 A kind of method and system of robot semantic processes
CN109119078A (en) * 2018-10-26 2019-01-01 北京石头世纪科技有限公司 Automatic robot control method, device, automatic robot and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741838A (en) * 2016-01-20 2016-07-06 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device
CN107102713A (en) * 2016-02-19 2017-08-29 北京君正集成电路股份有限公司 It is a kind of to reduce the method and device of power consumption
CN106250747A (en) * 2016-08-01 2016-12-21 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN106297777A (en) * 2016-08-11 2017-01-04 广州视源电子科技股份有限公司 Method and device for awakening voice service
CN106328132A (en) * 2016-08-15 2017-01-11 歌尔股份有限公司 Voice interaction control method and device for intelligent equipment
CN109003605A (en) * 2018-07-02 2018-12-14 北京百度网讯科技有限公司 Intelligent sound interaction processing method, device, equipment and storage medium
CN109584874A (en) * 2018-12-15 2019-04-05 深圳壹账通智能科技有限公司 Electrical equipment control method, device, electrical equipment and storage medium

Also Published As

Publication number Publication date
CN110111789A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110111789B (en) Voice interaction method and device, computing equipment and computer readable medium
CN107704275B (en) Intelligent device awakening method and device, server and intelligent device
CN108182943B (en) Intelligent device control method and device and intelligent device
CN108962262A (en) Voice data processing method and device
WO2019007245A1 (en) Processing method, control method and recognition method, and apparatus and electronic device therefor
CN110223691A (en) Voice wakes up the method for handover control and device of identification
CN105210146A (en) Method and apparatus for controlling voice activation
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
US11810593B2 (en) Low power mode for speech capture devices
CN112489648A (en) Wake-up processing threshold adjustment method, voice home appliance, and storage medium
CN109686368B (en) Voice wake-up response processing method and device, electronic equipment and storage medium
CN113611316A (en) Man-machine interaction method, device, equipment and storage medium
KR20240090400A (en) Continuous conversation based on digital signal processor
CN112133302B (en) Method, device and storage medium for pre-waking up terminal
CN114724564A (en) Voice processing method, device and system
CN111833874B (en) Man-machine interaction method, system, equipment and storage medium based on identifier
CN114141233A (en) Voice awakening method and related equipment thereof
CN111739515B (en) Speech recognition method, equipment, electronic equipment, server and related system
CN113362830A (en) Starting method, control method, system and storage medium of voice assistant
CN114582347B (en) Method, device, equipment and medium for determining speech semantics based on wake-up word speed
CN112885341A (en) Voice wake-up method and device, electronic equipment and storage medium
CN110853632A (en) Voice recognition method based on voiceprint information and intelligent interaction equipment
CN109524010A (en) A kind of sound control method, device, equipment and storage medium
WO2023246036A1 (en) Control method and apparatus for speech recognition device, and electronic device and storage medium
CN112151028A (en) Voice recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211013

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd.

Address before: Unit D, Unit 3, 301, Productivity Building No. 5, High-tech Secondary Road, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: BAIDU INTERNATIONAL TECHNOLOGY (SHENZHEN) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant