[go: up one dir, main page]

CN113409778A - Voice interaction method, system and terminal - Google Patents

Voice interaction method, system and terminal Download PDF

Info

Publication number
CN113409778A
CN113409778A CN202010183403.XA CN202010183403A CN113409778A CN 113409778 A CN113409778 A CN 113409778A CN 202010183403 A CN202010183403 A CN 202010183403A CN 113409778 A CN113409778 A CN 113409778A
Authority
CN
China
Prior art keywords
voice
user
information
information flow
voice input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010183403.XA
Other languages
Chinese (zh)
Inventor
徐贤仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010183403.XA priority Critical patent/CN113409778A/en
Publication of CN113409778A publication Critical patent/CN113409778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

公开了一种语音交互方法、系统和终端。所述语音交互方法包括:呈现当前信息流;获取来自用户的语音输入;基于所述当前信息流和所述语音输入,确定后续信息流的呈现内容。所述信息流可以是包括含剧情分支的信息流,或是包括可操控虚拟化身的信息流。由此,本发明提供一种用户能够通过语音交互主动影响内容走向的方案。用户通过语音输入,能够决定当前信息流的后续走向,尤其是能够通过语音输入决定剧情类游戏的剧情分支,从而增强用户的沉浸感和参与感,提升游戏的可玩性。

Figure 202010183403

A voice interaction method, system and terminal are disclosed. The voice interaction method includes: presenting a current information flow; acquiring voice input from a user; and determining presentation content of a subsequent information flow based on the current information flow and the voice input. The information flow may be an information flow including a plot branch, or an information flow including a controllable avatar. Thus, the present invention provides a solution in which the user can actively influence the direction of the content through voice interaction. The user can determine the follow-up direction of the current information flow through voice input, especially the plot branch of the plot game can be determined through voice input, thereby enhancing the user's sense of immersion and participation, and improving the playability of the game.

Figure 202010183403

Description

Voice interaction method, system and terminal
Technical Field
The present disclosure relates to a voice processing technology, and in particular, to a voice interaction method, system, and terminal.
Background
With the development of voice interaction technology, smart speakers capable of performing various controls and content acquisition using voice commands have become popular. The intelligent loudspeaker box is popular in content functions such as listening to songs and listening to stories. On the sound box with the screen, the content can be presented by combining various media such as videos, pictures, characters, audios and the like. For the content programs with the scenarios, the broadcasting can be completed at one time after the user instruction is triggered. For example, the user may speak into the smart speaker, "XXX, i want to hear the story". The intelligent sound box can broadcast a story for the user until the album is played. Although a user may perform operations such as play, pause, selection, etc., such manipulative operations may not actively affect the trend of the content, such that the user lacks a sense of immersion and engagement.
Therefore, an interactive scheme is needed in which a user can actively influence the trend of content.
Disclosure of Invention
One technical problem to be solved by the present disclosure is to provide a scheme that a user can actively influence the trend of content through voice interaction. The user can determine the subsequent trend of the current information flow through voice input, particularly can determine the plot branches of plot games through voice input, so that the immersion and participation of the user are enhanced, and the playability of the games is improved.
According to a first aspect of the present disclosure, there is provided a voice interaction method, including: presenting the current information stream; acquiring a voice input from a user; based on the current information stream and the speech input, the presentation content of a subsequent information stream is determined. The information stream may be an information stream comprising storyline branches or an information stream comprising a steerable avatar.
According to a second aspect of the present disclosure, there is provided a voice interaction method, including: broadcasting a storyline story by voice; the voice broadcast is used for triggering a plurality of options of different plot branches; acquiring voice selection of a user on one option in the multiple options; and triggering a scenario branch corresponding to the selected option based on the voice selection.
According to a third aspect of the present disclosure, there is provided a voice interaction system, a server and a plurality of terminals, wherein the terminals are configured to: presenting the information flow obtained from the server; collecting voice input from a user; uploading the voice input to the server; acquiring voice input feedback issued by the server; and presenting a subsequent information stream based on the voice input feedback, the server being configured to: issuing a current information flow for presentation; acquiring the voice input uploaded by the terminal; and generating and issuing the voice input feedback based on the voice input.
According to a fourth aspect of the present disclosure, there is provided a voice interaction terminal, comprising: presentation means for presenting a current information stream; input means for acquiring a voice input from a user; processing means for determining the presentation content of a subsequent information stream based on the current information stream and the speech input.
According to a fifth aspect of the present disclosure, there is provided a voice interaction method, comprising: presenting the current information stream; obtaining a plurality of speech inputs from a plurality of users; based on the current information stream and the plurality of speech inputs, determining presentation content for a subsequent information stream.
According to a sixth aspect of the present disclosure, there is provided a voice interaction method, comprising: presenting the current information stream; acquiring multiple rounds of voice input from a user; based on the current information stream and the multiple rounds of voice input, determining presentation content of a subsequent information stream.
According to a seventh aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described in the first and second aspects and the fifth and sixth aspects above.
According to an eighth aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method as described in the first and second aspects and the fifth and sixth aspects above.
Therefore, the invention can realize the technical effects of influencing the trend of the plot and broadcasting different audio and video contents in a voice interaction mode. Specifically, the voice instruction can be used for triggering subsequent content broadcasting, and multidimensional information such as text information, execution time, whether to interrupt sound box broadcasting, instruction sending time, corpus emotion and the like which are recognized by voice can be used as a decision and generation basis of subsequent broadcasting content.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
FIG. 1 shows a schematic flow diagram of a voice interaction method according to one embodiment of the present invention.
Fig. 2 shows an example of a scenario branching structure.
Fig. 3 shows an example of triggering a scenario branch by selection.
Fig. 4 shows an example of interaction of a voice broadcast storyline.
FIG. 5 illustrates a schematic diagram of the components of a voice interaction system in which the present invention may be implemented.
Fig. 6 is a schematic diagram illustrating the composition of a voice interactive terminal according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
With the development of voice interaction technology, smart speakers capable of performing various controls and content acquisition using voice commands have become popular. The intelligent loudspeaker box is popular in content functions such as listening to songs and listening to stories. On the sound box with the screen, the content can be presented by combining various media such as videos, pictures, characters, audios and the like. For the content programs with the scenarios, the broadcasting can be completed at one time after the user instruction is triggered. For example, the user may speak into the smart speaker, "XXX, i want to hear the story". The intelligent sound box can broadcast a story for the user until the album is played. Although a user may perform operations such as play, pause, selection, etc., such manipulative operations may not actively affect the trend of the content, such that the user lacks a sense of immersion and engagement.
Therefore, the invention provides a scheme that the user can actively influence the trend of the content through voice interaction. The user can determine the subsequent trend of the current information flow through voice input, particularly can determine the plot branches of plot games through voice input, so that the immersion and participation of the user are enhanced, and the playability of the games is improved.
FIG. 1 shows a schematic flow diagram of a voice interaction method according to one embodiment of the present invention. In some embodiments, the method may be implemented by a voice interactive terminal through interaction with a user of the terminal. In further embodiments, the voice interaction terminal needs to implement the above scheme by using processing and/or storage capabilities of the cloud.
In step S110, the current information stream is presented. Here, "presenting" refers to making the user aware of it through various perception means via the terminal device. In one embodiment, the information stream presented may be a page of information displayed in a display screen, for example, within a cell phone-mounted APP. Alternatively or additionally, the presentation may comprise a sound presentation. At this time, the speaker or the earphone can play corresponding scene sounds, such as music, voice prompts or descriptions, or sounds simulating a real scene (e.g., rain, wind, etc.). In other embodiments, the presentation of the information stream may also be performed by other means, such as vibration.
Here, "information flow" refers to information capable of updating presentation content. For example, the smart speaker may story read via an audio stream, at which time the story read may be viewed as an "information stream". In a game scenario, the content that is converted based on the user input may also be regarded as an information stream.
Corresponding voice feedback that the user can make with respect to the presented information stream. For this, in step S120, a voice input from the user may be acquired. After the voice input of the user is obtained, the presentation content of the subsequent information stream may be determined based on the current information stream and the voice input in step S130. Therefore, active influence on subsequent information flow presentation based on voice input of the user is achieved, interaction experience of the user and the terminal equipment is increased, and immersion is increased through voice participation.
As described above, in the present invention, "information flow" refers to information capable of updating presentation content. In a preferred implementation, the information flow of the present invention may particularly refer to an information flow comprising storyline branches. For example, a game containing a plot branch, an episode (e.g., a television show, a movie, an animation, etc.) or a novel. The 'plot branch' is a play design which leads to different plots according to different choices of a user, is also one of the most classical and important interactive elements in character adventure games and interactive novels, and can lead the user to obtain great achievement in choice and change.
Fig. 2 shows an example of a scenario branching structure. The scenario may begin with a common sequence portion 20 for setting up the scenes of a storyline, introducing characters to the player, and so forth. A decision is needed at the branching point a to G as to which path the storyline takes, so that the user reaches one of four possible end points W to Z through a feasible storyline. In contrast to the branch points a-G (which may be partly implemented as interaction points mentioned below) that require a decision, some paths may also be combined at nodes H, J and K, i.e. different storyline branches may also return to the same storyline and naturally proceed forward up to one of the end points W to Z.
Here, the endpoints W to Z may connect subsequent scenarios, i.e. the branching structure shown in fig. 2 may be part of the overall branching structure of a certain game or episode. In other embodiments, endpoints W through Z may correspond to the four endings of a scenario, for example, in a simpler game. The path 22 in the figure may refer to a conventional scenario path, while the double line 24 may refer to a scenario path requiring a particular condition trigger.
Different scenario branches may be triggered after a user or an avatar controlled by the user meets a certain condition (e.g., goes up to a certain number of levels) or completes a certain task, or may be triggered at a scenario interaction point by a selection of multiple options from the user. Fig. 3 shows an example of triggering a scenario branch by selection. As shown, in a marine game, the fleet of ships that the user takes encounter insufficient food, and at the point of game interaction, the user may be shown different options for dealing with. The user selects different options and proceeds to different branches, e.g., the branch that caused the flight task to fail or succeed, etc.
The voice interaction scheme is particularly suitable for the information flow with the plot branches, and the plot trend of the information flow can be influenced by the user through voice interaction, so that the immersive interaction effect is achieved. For example, when presenting a current information stream in a voice announcement, the user may be presented with a plurality of options for triggering different storyline branches, e.g. with the voice announcement option. Subsequently, a user's voice selection of one of the plurality of options may be obtained. Whereby a scenario branch corresponding to the selected option may be triggered based on the voice selection. For example, in a detective type voice game, a detective story may be voice-announced and the user may be motivated to look for clues. For example, the smart voice device may report "are you coming to a bifurcation, the left side leading to the forest, the right side leading to the side of the river, which way you are going to go? ". The user may directly reply to "left side forest removal" by voice to make a selection and proceed to the corresponding plot branch based on the selection. From this, under the condition of voice broadcast information flow, through introducing user's pronunciation reply, increased user's degree of participation to can let the user immerse all the time in the atmosphere that voice broadcast created, promoted object for appreciation nature
In other embodiments, before presenting the options, the method may further include: the user is presented with an interaction point prompt for triggering a different storyline branch. Subsequently, a voice instruction prompted by the user to the interaction point can be obtained, and based on the voice selection, a scenario branch or branch option corresponding to the interaction point is triggered. Here, an interaction point may refer to an interaction occurrence for causing a different scenario branch. For example, the aforementioned image display options and voice options of fig. 3. When the interaction point is not necessarily passed by the scenario development, the interaction point can be prompted. For example, in the case of voice broadcasting, the voice broadcasting case may include "whether or not an investigation is required by passing a gate". At this point, the entry gate itself may not directly invoke the storyline branch, but rather there is an option to invoke the storyline branch in the user voice selection entry gate. The user may initiate subsequent scenario branches by subsequent selections of options within the door, or by learning of clues within the door.
For the acquired user voice input, firstly, the text information of the voice input can be acquired and converted into an instruction which can be understood by a machine. For example, the understanding of "left side" and/or "forest" in the above example, thereby enabling the effect of selecting based on clicking, etc., in existing game interactions, for example, to determine the presentation content of subsequent information streams. In addition to replacing existing interaction means (e.g., mouse click, finger click on a touch screen), voice input may also provide the terminal with its own unique information to aid in the determination or generation of subsequent information streams. To this end, acquiring the voice input from the user may include: acquiring text information of the voice input; obtaining voice attribute information of the voice input and determining the presentation content of the subsequent information stream may comprise: and determining the presentation content of the subsequent information flow based on the text information and the voice attribute information. Besides presenting the content, the presenting mode of the subsequent information flow can be further determined based on the text information and the voice attribute information. For example, in a voice detective game, a scenario branch to be broadcasted next and a presentation manner of the scenario branch may be determined based on various types of information included in a voice input of a user, for example, the scenario branch is broadcasted in a more mysterious or tense tone.
In particular, voice attribute information may refer to information associated with the input voice itself, in addition to the semantic text content of the voice input.
In one embodiment, the voice attribute information may be a starting time of the voice input corresponding to the current information stream. As mentioned above, in step S130, the flow direction of the subsequent information stream needs to be determined based on the traveling condition of the current information stream and the related instruction of the voice input. In this embodiment, not only what the user said, but also when it was said, it becomes the source of judgment for the generation of the subsequent information stream. For example, in the case of the branch determination based on the interaction point and selection as described above, the user's mind may be determined based on whether the user interrupts the sound box announcement, and the subsequent presentation content or the presentation manner thereof more conforming to the user's current mind may be given. In addition, in some embodiments, the user may be allowed to make voice inputs at locations other than the interaction point, such as on pathways 22 or 24 other than the a-G interaction point in fig. 2, the content and start time of which may also be used as selection criteria for subsequent storyline branches.
Alternatively or additionally, the voice attribute information may be a duration of the voice input. The speech rate or the mood of the user can be judged from the duration, and the information can also be used as the selection standard of the subsequent plot branches.
Also alternatively or additionally, the voice attribute information may be emotion information and/or intonation information of the voice input. After the voice input of the user is acquired, the input voice itself may be analyzed, the emotion information of the user is judged based on the speaking intensity, the words and the like, and a scenario branch corresponding to the emotion information (for example, a playing method with greater difficulty and the like) is given later. In addition, when expressing the same text meaning, the user can also adopt different word sending sentences or tone tones, and the tone tones can be used for generating, determining or determining the presentation mode of the subsequent plot branches. For example, when the user responds with chuanpu (chuanwa), the user may use chuanpu to perform a virtual character conversation during interaction with a follow-up game. In languages such as japanese where people with different identities communicate using different sentence patterns, the corresponding sentence pattern may be selected for subsequent information flow presentation based on word sending sentence making of the current user.
In addition, the voice attribute information may further include a user identity corresponding to the voice input. For example, the user may voiceprint compare the captured speech to determine the user's identity, such as previously entered age, gender, preferences, and credit points, and may decide on subsequent information flow presentation based on the determined user identity. For example, a certain fighting game may have different versions of R-13, R-18, etc., and after the user's voiceprint is verified, it may be determined whether to open a scenario path 24 (e.g., an adult user path) requiring a particular condition trigger as shown in FIG. 2, depending on the age of the user.
Further, the voice interaction method of the present invention may further include: environmental information at the time of the voice input generation is obtained, and based on the environmental information, presentation content of a subsequent information stream is determined. The environment information may be information of a small environment where the user performs voice input, such as a home temperature, whether there is any other person, or environment information on a larger scale, such as a time period (holiday, peak hours during work, late night), a weather condition, geographical location information, and the like. The above information can also be used to determine the generation or selection of subsequent information streams, and the manner in which they are presented.
In addition, in addition to receiving voice input, the voice interaction method of the present invention may also obtain non-voice input of the user and determine the presentation content of the subsequent information stream based on the non-voice input. For example, the terminal may obtain a somatosensory input (e.g., based on a somatosensory sensor) or a video input (e.g., based on a 3D camera) of the user, and perform comprehensive judgment in combination with a voice input. At some interaction points, for example, interaction may be via voice input, and at other input points, interaction may be based on, for example, a mouse or screen click.
As an alternative or in addition to the application and the storyline branch class information flow, the voice interaction method of the invention may also be used to present the current information flow of the avatar.
The avatar may comprise a user avatar. For example, in a classical RPG (role playing) game, a user controls a principal role of the game. At this time, the user may control his avatar through voice, e.g., "go left, go out of town", etc., thereby replacing cumbersome hand mouse clicks or keyboard controls. To this end, acquiring the voice input from the user may include: obtaining speech control of the user over the user avatar, and determining presentation content of a subsequent information stream comprises: controlling presentation of the user avatar based on the speech input.
Further, the avatars include other avatars besides the user avatar. The other virtual avatar can be a virtual character or other living body in the stand-alone game, or can be the virtual avatar of other real users in the network game or the virtual character or other living body carried in the game. At this time, acquiring the voice input from the user may include: obtaining voice interactions of the user with the other avatars, and determining presentation content of a subsequent information stream includes: controlling presentation of the other avatars based on the voice interaction. Controlling the presentation of the other avatars may include obtaining cues that trigger storyline branches; and/or obtaining interaction points that trigger storyline branching.
Specifically, the user can have a voice conversation with other avatars directly or through their virtual avatars, thereby obtaining a stronger sense of immersion than existing clicking operations. The content, duration, etc. of the user's dialog with the avatar may trigger, for example, clue characters in the game to provide clues or interaction points that directly trigger storyline branches, thereby facilitating game play.
The voice interaction scheme according to the present invention, which is described above in connection with fig. 1, may be applied to various information streams actively influenced by a user with voice interaction, which may be presented using one or more ways, e.g., by images and/or sounds, and may receive various inputs including voice interaction, which may lead to a better interaction experience.
Among other things, the present invention is particularly well suited for implementation as an interactive method for voice broadcasting a storyline. Therefore, the terminal can broadcast the storyline by voice; and voice broadcasting a plurality of options for triggering different scenario branches; acquiring voice selection of a user on one option in the multiple options; and triggering a scenario branch corresponding to the selected option based on the voice selection. Furthermore, voice input inserted by the user in other time periods of the multiple options in voice broadcast can be acquired; presenting a plot interaction point based on the voice input; acquiring voice interaction of the user aiming at the plot interaction point; and generating or triggering a subsequent plot branch based on the voice interaction.
Fig. 4 shows an example of interaction of a voice broadcast storyline. The storyline can be a detective story, for example, and the system can determine the subsequent broadcast content by referring to the user voice command, and the main flow is as follows:
the game/program begins and the content begins to be broadcast. Subsequently, receiving of the user instruction may be started. The user command may be received at a specific time point, or may start receiving in any time period, in other words, the user command may start receiving in the current broadcast, or may start receiving in the broadcast process.
The received instruction can be a voice-type instruction of a user, corresponding instruction identification can be carried out on the voice-type instruction, the voice-type instruction is converted into structured data which can be understood by a program, and the received information can comprise identified text information, voice duration time, whether the instruction is sent after broadcasting is finished, the emotion of the identified user and the like. For example, when a plurality of options of an interaction point are voice-announced (e.g., when announcing "go left forest or right river side"), explicit selection of a subsequent plot branching may be achieved according to the text content of the user voice output (e.g., "go forest"). As another example, whether a "terrorist" hidden mode (e.g., the two-line scenario path 24 shown in FIG. 2) needs to be turned on may be determined based on the age of the user as determined by the voiceprint (e.g., age 18) and the emotional state as displayed by the user's voice input to increase the user's profound experience with the game.
The received command may also be triggered by other non-voice events, such as a user's mouse, key operation, change in geographic location, etc., or by a timeout event caused by the user without any command input. For example, in the case of a speaker with a screen, content options of a voice broadcast may be displayed on the screen at the same time, and the user may also complete the selection by clicking on the touch screen. For another example, in a more deep interactive game, the subsequent game trend can be determined jointly according to the user's physical movement, user's facial expression and user's voice interaction captured by the 3D camera.
And (3) deciding to generate broadcast contents according to the instruction information and various known context information (such as credit points, located cities and the like), wherein the broadcast contents can be prepared in advance and are to be selected, and can also be dynamically generated according to decision results. The latest content may then continue to be broadcast until the end condition is met.
Therefore, the invention can realize the technical effects of influencing the trend of the plot in a voice interaction mode and broadcasting different audio and video contents. Specifically, the voice instruction can be used for triggering subsequent content broadcasting, and multidimensional information such as text information, execution time, whether to interrupt sound box broadcasting, instruction sending time, corpus emotion and the like which are recognized by voice can be used as a decision and generation basis of subsequent broadcasting content.
In a particular application scenario, the voice interaction scheme of the present invention may also be implemented as a more complex voice interaction and information flow presentation scheme involving multi-user interactions, and/or multiple rounds of interactions.
In one embodiment, the invention may be implemented as a voice interaction method comprising: presenting the current information stream; obtaining a plurality of speech inputs from a plurality of users; based on the current information stream and the plurality of speech inputs, determining presentation content for a subsequent information stream.
Here, the presenting of the information stream and the acquiring of the voice input may be a manner of acquiring one user input at a time, a scenario advances, and a distinguishing acquisition manner of acquiring a next user input, or may be a manner of acquiring a plurality of user inputs at one time. Thus, obtaining a plurality of speech inputs from a plurality of users comprises at least one of: respectively acquiring voice input from different users aiming at different current information streams presented successively; and acquiring a plurality of voice inputs from different users for one current information stream.
Further, the method may further comprise: determining that the plurality of speech inputs are from different users. Then, based on the current information stream and the plurality of speech inputs, determining the presentation content of the subsequent information stream comprises at least one of: generating a sub information stream and determining the presentation content of the sub information stream aiming at different users; and comprehensively judging the user identities and the input contents of the voice inputs to determine the presentation contents of the subsequent information streams.
In role-playing games involving multiple players, such as those involving three players A, B and C, the method may ask different players A, B and C, respectively, at different asking points, or may ask the three players simultaneously. In a simultaneous query, if three players have respective voice input devices that do not interfere with each other, e.g., each wearing a microphone, the answers can be simultaneously made for system acquisition (e.g., in the case of an online game); if three players face a voice interaction device, e.g., a smart speaker, it is preferable to obtain the voice inputs of the three players if they do not speak simultaneously, and determine the content of the subsequent information stream according to the content of the voice inputs, the successive relationship of the inputs, etc. In the case of an online game, player A, B, C may each be presented (e.g., audibly announced) with their respective sub-streams of information. In the case of local games, presentation can be in the same stream.
Alternatively or additionally, in another embodiment, the invention may be implemented as a voice interaction method comprising: presenting the current information stream; acquiring multiple rounds of voice input from a user; and determining presentation content of a subsequent information stream based on the current information stream and the multiple rounds of voice input.
The user can perform multiple rounds of voice input under the guidance of the system. To this end, obtaining multiple rounds of speech input from a user may include: presenting the interactive content of the current round according to a preset frame; acquiring the voice input of the current round generated by the user aiming at the interactive content of the current round; and presenting a next round of interactive content based on the predetermined frame and/or the current round of voice input. For example, the information stream may include a storyline story, and a storyline framework of the storyline story may be constructed based on the multiple rounds of speech input.
For example, in a scenario of storyline story broadcasting, the system can enable a user to select or determine the background of story occurrence, such as london in 19 th century, a virtual world in 22 th century in the future, and then enable the user to select story types, such as a spy reasoning class and a comedy class, and even enable the user to determine character characteristics of a host and a public class, so that the user can deeply participate in story creation, realize co-creation with the system, and further improve participation and interest.
As described above, the voice interaction method of the present invention can be implemented by the voice interaction terminal through interaction with the terminal user. In further embodiments, the voice interaction terminal needs to implement the above scheme by using processing and/or storage capabilities of the cloud.
To this end, the invention may also be implemented as a voice interaction system. FIG. 5 illustrates a schematic diagram of the components of a voice interaction system in which the present invention may be implemented. As shown, the voice interactive system may include a server 510 and a plurality of terminals 520. The server 510 may include a plurality of platforms to provide a variety of services for the mass of terminals 520 involved in voice interaction of the present invention. As shown, the terminal 520 may be various smart speakers, such as a cylindrical smart speaker, a smart speaker with a screen, or a mobile smart terminal, such as a mobile phone.
Here, the terminal may be configured to: presenting the information flow obtained from the server; collecting voice input from a user; uploading the voice input to the server; acquiring voice input feedback issued by the server; and presenting a subsequent information stream based on the voice input feedback. In one embodiment, the terminal may be a physical terminal, such as a smart speaker and a mobile terminal shown in the figures, which may independently implement functions of information stream presentation (e.g., broadcast and display), voice collection, network transmission, and subsequent presentation, including processing capabilities that may include portions capable of being executed locally. In other embodiments, the terminal may include a plurality of physical terminals, for example, the smart speaker may communicate with a locally installed smart voice sticker in a short distance, and complete voice collection and reporting based on the voice sticker, which is not limited by the present invention.
Accordingly, the server 510 may be configured to: issuing a current information flow for presentation; acquiring the voice input uploaded by the terminal; and generating and issuing the voice input feedback based on the voice input.
In some embodiments, the terminal 520 may determine the content of the subsequent information stream from the information already in the terminal or directly generate the subsequent content according to the obtained voice input feedback. In other embodiments, the server 510 may be configured to determine and issue the presentation content for the subsequent information flow, that is, the terminal directly obtains the subsequent content issued by the server.
Further, the server 510 may be configured to: acquiring text information, voice attribute information and environment information of the voice input; and determining and issuing presentation content for subsequent information streams based on the text information, the voice attribute information and the environment information.
The terminal 510 may be configured to: presenting a plurality of options to a user for triggering different storyline branches; collecting voice selection of a user on one option in the multiple options; and presenting a subsequent information stream based on the voice selection.
Further, the present invention can also be implemented as a voice interaction terminal for implementing the voice interaction method described above in conjunction with fig. 1 and 4. Fig. 6 is a schematic diagram illustrating the composition of a voice interactive terminal according to an embodiment of the present invention. The terminal may perform the voice interaction method as described above in conjunction with fig. 1 and 4, or at least may complete the execution via the terminal and with the participation of the cloud. The terminal may also be referred to as terminal 510 in the system shown in fig. 5.
In particular, the terminal 600 may comprise a presentation means 610, an input means 620 and a processing means 630.
The presentation means 610 may be used to present the current information stream. The input device 620 may be used to obtain voice input from a user. The processing means 630 may then be used to perform processing, such as determining the presentation content of a subsequent information stream based on the current information stream and the speech input.
Further, when the terminal 600 needs to interact with the server to perform voice processing and subsequent information flow determination (including selection and generation) by means of the processing capability of the cloud platform, the terminal 600 may further include a networking device 640 for acquiring information to be presented; uploading the acquired voice input; and obtaining voice input feedback for determining a subsequent information stream. The networked device 640 may also obtain information streams for presentation, if necessary, such as previous game data downloads, or real-time downloads as the information streams are played.
In different embodiments, the presentation device 610 may have different modalities. For example, in some embodiments, the presentation means may comprise display means for visually outputting the content to be presented, for example displaying the current information stream and/or a subsequent information stream. Alternatively or additionally, the presentation means may further comprise: and the voice output device is used for voice broadcasting the current information flow and the subsequent information flow.
In addition, the input device 620 may further include: and the operation control device is used for acquiring the operation control input of the user. For example, the operation control means may include a keyboard, a mouse, a touch screen, a joystick, and the like. Subsequently, the processing device 630 may be configured to: based on the operational control input, the presentation content of the subsequent information stream is determined.
Further, the terminal 600 may be implemented as a computing device including conventional computing processing capabilities. The processing means 620 may be implemented as a processor of the computing device and the computing device may further comprise a memory for storing data and instructions required for the calculation.
The processor 620 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 620 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 620 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory may include various types of storage units such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are required by the processor 620 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, the memory may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-dense optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory may also have executable code stored thereon that, when processed by the processor 620, may cause the processor 620 to perform the voice interaction methods described above.
The voice interaction method, system and terminal according to the present invention have been described in detail above with reference to the accompanying drawings. The invention enables the user to actively influence the presentation of the information flow through voice interaction, and is particularly suitable for improving the immersive experience of the user on the plot branch information flow. Furthermore, the user can perform voice conversation with the character in the game through the virtual avatar, and the substituting feeling and the playability of the game are further improved.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.
Alternatively, the present invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the present invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (36)

1.一种语音交互方法,包括:1. A voice interaction method, comprising: 呈现当前信息流;present the current flow of information; 获取来自用户的语音输入;以及obtain speech input from the user; and 基于所述当前信息流和所述语音输入,确定后续信息流的呈现内容。Based on the current information flow and the speech input, the presentation content of the subsequent information flow is determined. 2.如权利要求1所述的方法,其中,所述信息流包括含剧情分支的信息流。2. The method of claim 1, wherein the information flow comprises an information flow with plot branches. 3.如权利要求2所述的方法,其中,基于所述当前信息流和所述语音输入,确定后续信息流的呈现内容包括:3. The method of claim 2, wherein, based on the current information flow and the voice input, determining the presentation content of a subsequent information flow comprises: 基于所述信息流的当前分支和所述语音输入,确定后续信息流的分支走向。Based on the current branch of the information flow and the voice input, a branching direction of the subsequent information flow is determined. 4.如权利要求2所述的方法,其中,所述信息流包括如下的至少一项:4. The method of claim 2, wherein the information flow includes at least one of the following: 具有剧情分支的游戏;games with plot branches; 具有剧情分支的剧集;以及an episode with a plot branch; and 具有剧情分支的小说。A novel with a plot branch. 5.如权利要求2所述的方法,其中,呈现当前信息流包括:5. The method of claim 2, wherein presenting the current flow of information comprises: 向用户呈现用于触发不同剧情分支的多个选项,Presenting the user with multiple options for triggering different story branches, 获取来自用户的语音输入包括:Getting voice input from the user includes: 获取用户对所述多个选项中一个选项的语音选择,并且obtaining a user's voice selection of one of the plurality of options, and 基于所述当前信息流和所述语音输入,确定后续信息流的呈现内容包括:Based on the current information flow and the voice input, determining the presentation content of the subsequent information flow includes: 基于所述语音选择,触发与被选选项相对应的剧情分支。Based on the voice selection, a scenario branch corresponding to the selected option is triggered. 6.如权利要求5所述的方法,其中,呈现当前信息流包括:6. The method of claim 5, wherein presenting the current flow of information comprises: 向用户呈现用于触发不同剧情分支的交互点提示,Presenting interaction point prompts for triggering different plot branches to the user, 获取来自用户的语音输入包括:Getting voice input from the user includes: 获取用户对所述交互点提示的语音指令,并且obtain the voice instruction prompted by the user to the interaction point, and 基于所述当前信息流和所述语音指令,确定后续信息流的呈现内容包括:Based on the current information flow and the voice command, determining the presentation content of the subsequent information flow includes: 基于所述语音选择,触发所述交互点相对应的剧情分支或分支选项。Based on the voice selection, a story branch or branch option corresponding to the interaction point is triggered. 7.如权利要求2所述的方法,其中,获取来自用户的语音输入包括:7. The method of claim 2, wherein obtaining speech input from a user comprises: 获取所述语音输入的文本信息;Obtain the text information of the voice input; 获取所述语音输入的语音属性信息,并且obtain voice attribute information of the voice input, and 确定后续信息流的呈现内容包括:Determining the presentation content of the subsequent information flow includes: 基于所述文本信息和所述语音属性信息,确定后续信息流的呈现内容。Based on the text information and the voice attribute information, the presentation content of the subsequent information flow is determined. 8.如权利要求7所述的方法,还包括:8. The method of claim 7, further comprising: 基于所述文本信息和所述语音属性信息,确定后续信息流的呈现方式。Based on the text information and the voice attribute information, a presentation manner of the subsequent information flow is determined. 9.如权利要求7所述的方法,其中,所述语音属性信息包括如下至少一项:9. The method of claim 7, wherein the voice attribute information comprises at least one of the following: 所述语音输入相对应当前信息流的起始时刻;The voice input corresponds to the start time of the current information flow; 所述语音输入的持续时间;the duration of the speech input; 所述语音输入的情感信息;emotional information of the voice input; 所述语音输入的语调信息;以及intonation information of the voice input; and 所述语音输入对应的用户身份。The user identity corresponding to the voice input. 10.如权利要求2所述的方法,还包括:10. The method of claim 2, further comprising: 获取所述语音输入生成时的环境信息,并且obtain the context information when the voice input was generated, and 确定后续信息流的呈现内容包括:Determining the presentation content of the subsequent information flow includes: 基于所述环境信息,确定后续信息流的呈现内容。Based on the context information, the presentation content of the subsequent information flow is determined. 11.如权利要求2所述的方法,还包括:11. The method of claim 2, further comprising: 获取所述用户的非语音输入,并且obtain the user's non-voice input, and 确定后续信息流的呈现内容包括:Determining the presentation content of the subsequent information flow includes: 基于所述非语音输入,确定后续信息流的呈现内容。Based on the non-voice input, the presentation content of the subsequent information stream is determined. 12.如权利要求2所述的方法,其中,基于所述当前信息流和所述语音输入,确定后续信息流的呈现内容包括:12. The method of claim 2, wherein, based on the current information flow and the voice input, determining the presentation content of a subsequent information flow comprises: 基于所述信息流的当前分支和所述语音输入,更改所述剧情分支的结构。The structure of the story branch is changed based on the current branch of the information flow and the speech input. 13.如权利要求1所述的方法,其中,呈现当前信息流包括:13. The method of claim 1, wherein presenting the current flow of information comprises: 呈现虚拟化身。Present an avatar. 14.如权利要求13所述的方法,其中,所述虚拟化身包括用户虚拟化身,并且14. The method of claim 13, wherein the avatar comprises a user avatar, and 获取来自用户的语音输入包括:Getting voice input from the user includes: 获取所述用户对所述用户虚拟化身的语音控制,并且obtaining voice control of the user avatar by the user, and 确定后续信息流的呈现内容包括:Determining the presentation content of the subsequent information flow includes: 基于所述语音输入,控制所述用户虚拟化身的呈现。Based on the voice input, the presentation of the user avatar is controlled. 15.如权利要求14所述的方法,其中,所述虚拟化身包括用户虚拟化身之外的其他虚拟化身,并且15. The method of claim 14, wherein the virtual avatar includes another virtual avatar than a user avatar, and 获取来自用户的语音输入包括:Getting voice input from the user includes: 获取所述用户与所述其他虚拟化身的语音交互,并且obtaining the user's voice interaction with the other avatar, and 确定后续信息流的呈现内容包括:Determining the presentation content of the subsequent information flow includes: 基于所述语音交互,控制所述其他虚拟化身的呈现。Based on the voice interaction, the presentation of the other avatars is controlled. 16.如权利要求15所述的方法,其中,控制所述其他虚拟化身的呈现包括如下至少一项:16. The method of claim 15, wherein controlling presentation of the other avatars comprises at least one of: 获取触发剧情分支的线索;Get clues that trigger plot branches; 获取触发剧情分支的交互点。Get the interaction point that triggers the story branch. 17.如权利要求1所述的方法,其中,呈现当前信息流包括:17. The method of claim 1, wherein presenting the current flow of information comprises: 语音播报所述当前信息流。Voice broadcasts the current information flow. 18.一种语音交互方法,包括:18. A voice interaction method, comprising: 语音播报剧情故事;Voice broadcast of the plot story; 语音播报用于触发不同剧情分支的多个选项;Voice announcements are used to trigger multiple options for different plot branches; 获取用户对所述多个选项中一个选项的语音选择;以及obtaining a user's voice selection of one of the plurality of options; and 基于所述语音选择,触发与被选选项相对应的剧情分支。Based on the voice selection, a scenario branch corresponding to the selected option is triggered. 19.如权利要求18所述的方法,还包括:19. The method of claim 18, further comprising: 获取用户在语音播报所述多个选项的其他时段内插入的语音输入;Obtain the voice input inserted by the user in other time periods when the voice broadcasts the multiple options; 基于所述语音输入,呈现剧情交互点;Based on the voice input, presenting a plot interaction point; 获取所述用户针对所述剧情交互点的语音交互;以及obtaining the user's voice interaction for the plot interaction point; and 基于所述语音交互,生成或触发后续剧情分支。Based on the voice interaction, subsequent plot branches are generated or triggered. 20.一种语音交互系统,服务端和多个终端,其中,20. A voice interaction system, a server and multiple terminals, wherein, 所述终端用于:The terminal is used for: 呈现从所述服务端获取的信息流;presenting the information flow obtained from the server; 采集来自用户的语音输入;Collect voice input from the user; 向所述服务端上传所述语音输入;uploading the voice input to the server; 获取所述服务端下发的语音输入反馈;以及Obtain the voice input feedback sent by the server; and 基于所述语音输入反馈,呈现后续信息流,Based on the voice input feedback, a subsequent information flow is presented, 所述服务端用于:The server is used for: 下发用于呈现的当前信息流;deliver the current information flow for presentation; 获取所述终端上传的所述语音输入;obtaining the voice input uploaded by the terminal; 基于所述语音输入,生成并下发所述语音输入反馈。Based on the voice input, the voice input feedback is generated and delivered. 21.如权利要求20所述的系统,其中,所述服务端用于:21. The system of claim 20, wherein the server is used to: 确定并下发用于后续信息流的呈现内容。Determine and deliver presentation content for subsequent information flows. 22.如权利要求21所述的系统,其中,所述服务端用于:22. The system of claim 21, wherein the server is used to: 获取所述语音输入的文本信息、语音属性信息和环境信息;以及acquiring text information, voice attribute information and environment information of the voice input; and 基于所述文本信息、语音属性信息和环境信息,确定并下发用于后续信息流的呈现内容。Based on the text information, voice attribute information and environmental information, the presentation content for the subsequent information flow is determined and delivered. 23.如权利要求20所述的系统,其中,所述终端用于:23. The system of claim 20, wherein the terminal is used to: 向用户呈现用于触发不同剧情分支的多个选项;Present the user with multiple options for triggering different story branches; 采集用户对所述多个选项中一个选项的语音选择;以及capturing a user's voice selection of one of the plurality of options; and 基于所述语音选择,呈现后续信息流。Based on the voice selection, a subsequent stream of information is presented. 24.一种语音交互终端,包括:24. A voice interactive terminal, comprising: 呈现装置,用于呈现当前信息流;a presentation device for presenting the current information flow; 输入装置,用于获取来自用户的语音输入;an input device for obtaining voice input from a user; 处理装置,用于基于所述当前信息流和所述语音输入,确定后续信息流的呈现内容。and a processing device, configured to determine the presentation content of a subsequent information flow based on the current information flow and the voice input. 25.如权利要求24所述的终端,还包括:25. The terminal of claim 24, further comprising: 联网装置,用于:Networked device for: 获取要呈现的信息;Get the information to be presented; 上传获取的所述语音输入;以及uploading the obtained voice input; and 获取用于确定后续信息流的语音输入反馈。Get voice input feedback used to determine subsequent information flow. 26.如权利要求24所述的终端,其中,所述呈现装置包括:26. The terminal of claim 24, wherein the rendering means comprises: 语音输出装置,用于语音播报所述当前信息流和后续信息流。The voice output device is used for voice broadcast of the current information flow and the subsequent information flow. 27.如权利要求26所述的终端,其中,所述呈现装置包括:27. The terminal of claim 26, wherein the presenting means comprises: 显示装置,用于显示所述当前信息流和/或后续信息流。A display device, configured to display the current information flow and/or the subsequent information flow. 28.如权利要求26所述的终端,其中,所述输入装置包括:28. The terminal of claim 26, wherein the input device comprises: 操作控制装置,用于获取所述用户的操作控制输入,并且an operation control device for acquiring the user's operation control input, and 所述处理装置,用于:The processing device is used for: 基于所述操作控制输入,确定后续信息流的呈现内容。Based on the operational control input, the presentation content of the subsequent information flow is determined. 29.一种语音交互方法,包括:29. A voice interaction method, comprising: 呈现当前信息流;present the current flow of information; 获取来自多个用户的多个语音输入;以及Get multiple voice inputs from multiple users; and 基于所述当前信息流和所述多个语音输入,确定后续信息流的呈现内容。Based on the current information stream and the plurality of speech inputs, the presentation content of the subsequent information stream is determined. 30.如权利要求29所述的方法,其中,获取来自多个用户的多个语音输入包括如下至少一项:30. The method of claim 29, wherein obtaining a plurality of speech inputs from a plurality of users comprises at least one of the following: 针对相继呈现的不同当前信息流,分别获取来自不同的用户的语音输入;以及obtaining speech input from different users respectively for different current information streams presented successively; and 针对一个当前信息流,获取来自不同用户的多个语音输入。Obtain multiple voice inputs from different users for a current information stream. 31.如权利要求29所述的方法,还包括:31. The method of claim 29, further comprising: 判定所述多个语音输入来自不同的用户,determining that the plurality of voice inputs are from different users, 其中,基于所述当前信息流和所述多个语音输入,确定后续信息流的呈现内容包括如下至少一项:Wherein, based on the current information flow and the multiple voice inputs, determining the presentation content of the subsequent information flow includes at least one of the following: 针对不同的用户,生成子信息流并确定所述子信息流的呈现内容;以及For different users, generating sub-information streams and determining presentation content of the sub-information streams; and 综合判定所述多个语音输入的用户身份和输入内容,确定后续信息流的呈现内容。The user identities and input contents of the multiple voice inputs are comprehensively determined, and the presentation contents of the subsequent information flow are determined. 32.一种语音交互方法,包括:32. A voice interaction method, comprising: 呈现当前信息流;present the current flow of information; 获取来自用户的多轮语音输入;以及Get multiple rounds of speech input from the user; and 基于所述当前信息流和所述多轮语音输入,确定后续信息流的呈现内容。Based on the current information flow and the multiple rounds of voice input, the presentation content of the subsequent information flow is determined. 33.如权利要求32所述的方法,其中,获取来自用户的多轮语音输入包括:33. The method of claim 32, wherein obtaining multiple rounds of speech input from a user comprises: 根据预定框架,呈现本轮交互内容;According to the predetermined framework, present the interactive content of this round; 获取用户针对本轮交互内容生成的本轮语音输入;以及Obtain the current round of voice input generated by the user for the current round of interactive content; and 基于所述预定框架和/或所述本轮语音输入,呈现下轮交互内容。Based on the predetermined frame and/or the current round of voice input, the next round of interactive content is presented. 34.如权利要求32所述的方法,其中,所述信息流包括剧情故事,并且基于所述当前信息流和所述多轮语音输入,确定后续信息流的呈现内容包括:34. The method of claim 32, wherein the information flow includes a story, and based on the current information flow and the multiple rounds of voice input, determining the presentation content of a subsequent information flow comprises: 基于所述多轮语音输入,构建所述剧情故事的剧情框架。Based on the multiple rounds of voice input, a plot framework of the plot story is constructed. 35.一种计算设备,包括:35. A computing device comprising: 处理器;以及processor; and 存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1-17及29-34中任一项所述的方法。A memory having executable code stored thereon which, when executed by the processor, causes the processor to perform the method of any of claims 1-17 and 29-34. 36.一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1-17及29-34中任一项所述的方法。36. A non-transitory machine-readable storage medium having executable code stored thereon which, when executed by a processor of an electronic device, causes the processor to perform the execution of claims 1-17 and 29 The method of any one of -34.
CN202010183403.XA 2020-03-16 2020-03-16 Voice interaction method, system and terminal Pending CN113409778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010183403.XA CN113409778A (en) 2020-03-16 2020-03-16 Voice interaction method, system and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010183403.XA CN113409778A (en) 2020-03-16 2020-03-16 Voice interaction method, system and terminal

Publications (1)

Publication Number Publication Date
CN113409778A true CN113409778A (en) 2021-09-17

Family

ID=77676638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010183403.XA Pending CN113409778A (en) 2020-03-16 2020-03-16 Voice interaction method, system and terminal

Country Status (1)

Country Link
CN (1) CN113409778A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114102628A (en) * 2021-12-04 2022-03-01 广州美术学院 A picture book interaction method, device and robot
CN114130042A (en) * 2021-12-04 2022-03-04 广州美术学院 An intelligent picture book toy system
CN114177621A (en) * 2021-12-15 2022-03-15 乐元素科技(北京)股份有限公司 Data processing method and device
CN115103237A (en) * 2022-06-13 2022-09-23 咪咕视讯科技有限公司 Video processing method, apparatus, device, and computer-readable storage medium
CN115220608A (en) * 2022-09-20 2022-10-21 深圳市人马互动科技有限公司 Method and device for processing multimedia data in interactive novel
CN115212580A (en) * 2022-09-21 2022-10-21 深圳市人马互动科技有限公司 Method and related device for updating game data based on telephone interaction
CN115408510A (en) * 2022-11-02 2022-11-29 深圳市人马互动科技有限公司 Plot interaction node-based skipping method and assembly and dialogue development system
CN115963963A (en) * 2022-12-29 2023-04-14 抖音视界有限公司 Generation method, presentation method, device, equipment and medium of interactive novel

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007052043A (en) * 2005-08-15 2007-03-01 Nippon Telegr & Teleph Corp <Ntt> Voice dialogue scenario creation method, apparatus, voice dialogue scenario creation program, recording medium
CN102947774A (en) * 2010-06-21 2013-02-27 微软公司 Natural user input for driving interactive stories
US9583106B1 (en) * 2013-09-13 2017-02-28 PBJ Synthetics Corporation Methods, systems, and media for presenting interactive audio content
CN107308657A (en) * 2017-07-31 2017-11-03 广州网嘉玩具科技开发有限公司 A kind of novel interactive intelligent toy system
CN109240564A (en) * 2018-10-12 2019-01-18 武汉辽疆科技有限公司 Artificial intelligence realizes the device and method of interactive more plot animations branch
CN110085221A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Speech emotional exchange method, computer equipment and computer readable storage medium
CN110265021A (en) * 2019-07-22 2019-09-20 深圳前海微众银行股份有限公司 Personalized speech exchange method, robot terminal, device and readable storage medium storing program for executing
CN116828238A (en) * 2023-07-12 2023-09-29 暨南大学 Network marketing platform based on advertisement push

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007052043A (en) * 2005-08-15 2007-03-01 Nippon Telegr & Teleph Corp <Ntt> Voice dialogue scenario creation method, apparatus, voice dialogue scenario creation program, recording medium
CN102947774A (en) * 2010-06-21 2013-02-27 微软公司 Natural user input for driving interactive stories
US9583106B1 (en) * 2013-09-13 2017-02-28 PBJ Synthetics Corporation Methods, systems, and media for presenting interactive audio content
CN107308657A (en) * 2017-07-31 2017-11-03 广州网嘉玩具科技开发有限公司 A kind of novel interactive intelligent toy system
CN110085221A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Speech emotional exchange method, computer equipment and computer readable storage medium
CN109240564A (en) * 2018-10-12 2019-01-18 武汉辽疆科技有限公司 Artificial intelligence realizes the device and method of interactive more plot animations branch
CN110265021A (en) * 2019-07-22 2019-09-20 深圳前海微众银行股份有限公司 Personalized speech exchange method, robot terminal, device and readable storage medium storing program for executing
CN116828238A (en) * 2023-07-12 2023-09-29 暨南大学 Network marketing platform based on advertisement push

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114102628A (en) * 2021-12-04 2022-03-01 广州美术学院 A picture book interaction method, device and robot
CN114130042A (en) * 2021-12-04 2022-03-04 广州美术学院 An intelligent picture book toy system
CN114177621A (en) * 2021-12-15 2022-03-15 乐元素科技(北京)股份有限公司 Data processing method and device
CN114177621B (en) * 2021-12-15 2024-03-22 乐元素科技(北京)股份有限公司 Data processing method and device
CN115103237A (en) * 2022-06-13 2022-09-23 咪咕视讯科技有限公司 Video processing method, apparatus, device, and computer-readable storage medium
CN115103237B (en) * 2022-06-13 2023-12-08 咪咕视讯科技有限公司 Video processing method, device, equipment and computer readable storage medium
CN115220608A (en) * 2022-09-20 2022-10-21 深圳市人马互动科技有限公司 Method and device for processing multimedia data in interactive novel
CN115212580A (en) * 2022-09-21 2022-10-21 深圳市人马互动科技有限公司 Method and related device for updating game data based on telephone interaction
CN115212580B (en) * 2022-09-21 2022-11-25 深圳市人马互动科技有限公司 Method and related device for updating game data based on telephone interaction
CN115408510A (en) * 2022-11-02 2022-11-29 深圳市人马互动科技有限公司 Plot interaction node-based skipping method and assembly and dialogue development system
CN115408510B (en) * 2022-11-02 2023-01-17 深圳市人马互动科技有限公司 Plot interaction node-based skipping method and assembly and dialogue development system
CN115963963A (en) * 2022-12-29 2023-04-14 抖音视界有限公司 Generation method, presentation method, device, equipment and medium of interactive novel

Similar Documents

Publication Publication Date Title
CN113409778A (en) Voice interaction method, system and terminal
Collins Playing with sound: a theory of interacting with sound and music in video games
US10987596B2 (en) Spectator audio analysis in online gaming environments
US10293260B1 (en) Player audio analysis in online gaming environments
JP6719747B2 (en) Interactive method, interactive system, interactive device, and program
Domsch Dialogue in video games
JP6699010B2 (en) Dialogue method, dialogue system, dialogue device, and program
US20140194201A1 (en) Communication methods and apparatus for online games
JP7602070B2 (en) Simulating crowd noise at live events with sentiment analysis of distributed inputs
JP7445938B1 (en) Servers, methods and computer programs
CN115462087A (en) Live broadcast interaction method, device, equipment, storage medium and program product
CN112752159A (en) Interaction method and related device
Harvey Virtual worlds: an ethnomusicological perspective
HK40059913A (en) Voice interaction method, system and terminal
CN116974507A (en) Virtual object interaction method, virtual object interaction device, computer equipment, storage medium and program product
Okkema Harvester of desires: Gaming amazon echo through john cayley’s the listeners
US20250050225A1 (en) Tailoring in-game dialogue to player attributes
CN112562430A (en) Auxiliary reading method, video playing method, device, equipment and storage medium
Huang et al. A voice-assisted intelligent software architecture based on deep game network
CN118304650A (en) Game development method, system, computer equipment and storage medium
US20250041733A1 (en) Modifying gameplay experiences
CN114444768B (en) Gamified learning system, gamified learning method and computing device
Fish Interactive and adaptive audio for home video game consoles
US20250050224A1 (en) Dynamic moderation based on speech patterns
CN118233665A (en) Live broadcast method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40059913

Country of ref document: HK