CN112185369A

CN112185369A - Volume adjusting method, device, equipment and medium based on voice control

Info

Publication number: CN112185369A
Application number: CN201910599655.8A
Authority: CN
Inventors: 陈宪涛; 贾孟华; 王任振; 宓佳琦; 周茉莉; 关岱松
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2021-01-05
Anticipated expiration: 2039-07-04
Also published as: CN112185369B

Abstract

The embodiment of the invention discloses a volume adjusting method, a volume adjusting device, volume adjusting equipment and a volume adjusting medium based on voice control. The method comprises the following steps: acquiring a voice instruction sent to intelligent equipment by a user; performing semantic recognition on the voice instruction to determine a semantic instruction; determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two; and adjusting the volume of the intelligent equipment according to the volume adjustment strategy. The method strengthens the intelligent degree of volume adjustment based on the voice instruction, and meets the requirements of personalized and intelligent interactive experience of users.

Description

Volume adjusting method, device, equipment and medium based on voice control

Technical Field

The embodiment of the invention relates to an artificial intelligence control technology, in particular to a volume adjusting method, a device, equipment and a medium based on voice control.

Background

With the continuous development and maturity of voice technology, voice interaction gradually becomes a main communication mode between people and intelligent equipment, and people can control the state of the equipment by speaking, for example, the volume of the equipment is adjusted by voice expression, and the equipment is controlled within a volume range comfortable for people.

However, in the current mode of processing voice commands to adjust the volume, the intelligent device mainly uses the interaction mode of touch commands or key commands of the traditional device for reference, and is more mechanical and inefficient in command input and volume output change. Lack of deep understanding of the user's intent fails to provide a personalized, intelligent interactive experience that meets the user's needs.

Disclosure of Invention

The embodiment of the invention provides a volume adjusting method, a volume adjusting device, volume adjusting equipment and a volume adjusting medium based on voice control, so that the intelligent degree of volume adjustment based on a voice instruction is strengthened, and the requirements of personalized and intelligent interaction experience of a user are met.

In a first aspect, an embodiment of the present invention provides a volume adjustment method based on voice control, including:

acquiring a voice instruction sent to intelligent equipment by a user;

performing semantic recognition on the voice instruction to determine a semantic instruction;

determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two;

and adjusting the volume of the intelligent equipment according to the volume adjustment strategy.

In a second aspect, an embodiment of the present invention further provides a volume adjusting apparatus based on voice control, where the apparatus includes:

the acquisition module is used for acquiring a voice instruction sent to the intelligent equipment by a user;

the recognition module is used for performing semantic recognition on the voice command to determine a semantic command;

the determining module is used for determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two;

and the control module is used for adjusting the volume of the intelligent equipment according to the volume adjustment strategy.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the volume adjustment method based on voice control according to any embodiment of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a volume adjustment method based on voice control according to any embodiment of the present invention.

The embodiment of the invention carries out semantic recognition on a voice instruction sent to intelligent equipment by a user and determines a corresponding semantic instruction; then, determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two; and finally, adjusting the volume of the intelligent equipment according to the volume adjustment strategy. The volume of the intelligent device can be flexibly controlled in multiple gears, the intelligent degree of volume adjustment based on voice instructions is enhanced, and the personalized and intelligent interactive experience requirements of users are met.

Drawings

Fig. 1 is a flowchart of a method for adjusting volume based on voice control according to a first embodiment of the present invention;

fig. 2 is a flowchart of a volume adjustment method based on voice control according to a second embodiment of the present invention;

fig. 3 is a flowchart of a volume adjustment method based on voice control according to a third embodiment of the present invention;

FIG. 4a is a schematic diagram of a comfortable volume curve outputted by the smart device at different distances according to the embodiment of the present invention;

fig. 4b is a schematic diagram of clustering instructions for adjusting the volume according to the user's voice according to the fourth embodiment of the present invention;

fig. 4c is a schematic view of an interaction scenario provided in the fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a volume adjustment device based on voice control according to a fifth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computer device in the sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a volume adjustment method based on voice control according to a first embodiment of the present invention, which is applicable to volume adjustment based on voice control, and in particular, is applicable to volume adjustment of an intelligent device based on a voice command. The method may be performed by a voice control based volume adjustment apparatus implemented by software and/or hardware and embodied in a computer device. The computer device includes, but is not limited to, a cloud server or a smart device. If the computer equipment is intelligent equipment, the intelligent equipment can have stronger hardware processing capacity. The smart device is not limited herein, and the smart device may be a device capable of playing voice, such as a smart speaker.

The method shown in fig. 1 specifically includes the following steps:

and S110, acquiring a voice instruction sent to the intelligent equipment by a user.

The voice instruction may be an instruction for controlling the smart device to perform volume adjustment. The voice instruction may include voice capable of volume adjustment. After the voice command is obtained in the step, the voice command can be analyzed, so that the volume can be adjusted.

Exemplary, voice instructions include, but are not limited to: "please make a loud click", "please make a small click", "loud click" and "the sound is too small".

The means for acquiring the voice command is not limited, and different methods may be executed by different execution bodies. When the execution main body of the method is the intelligent equipment, the step can acquire a voice instruction sent to the intelligent equipment by a user through a voice acquisition device of the intelligent equipment, such as a sound sensor; when the execution main body of the method is the cloud server, the cloud server can obtain the voice instruction reported by the intelligent equipment.

And S120, performing semantic recognition on the voice command to determine a semantic command.

After the voice command is determined, semantic recognition can be performed on the voice command in the step. Semantic recognition may be understood as parsing the natural semantics expressed by a voice command to determine the natural semantics implied by the voice command.

The specific means for semantic recognition is not limited here, as long as it can recognize the voice command of the audio and determine the corresponding natural semantic meaning. The semantic instruction is an instruction converted into a character or character form so as to further perform recognition and intelligent control of deep semantics.

S130, determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two.

In the prior art, intelligent equipment is mechanical and inefficient in command input and volume output change. Usually, the adjustment trend is determined, increased or decreased, and then adjusted by a fixed magnitude. This is similar to the operation of a touch key. For example, if "please click" the first volume is turned up, and "please click" the first volume is turned down. The volume adjustment amplitude corresponding to each instruction adjustment is basically determined and unchanged, and if the volume amplitude to be adjusted is large, a user needs to send instructions for many times.

In this embodiment, different semantic instructions may have different volume adjustment strategies, and the same semantic instruction may also have the same or different volume adjustment strategies. The volume multi-gear flexible control is realized by setting different volume adjustment strategies, and the defects that the existing intelligent equipment is mechanical and low in efficiency in instruction input and volume output change are effectively overcome.

The volume adjustment strategy can be understood as a means for adjusting the volume of the intelligent device. The volume adjustment strategy may include a volume adjustment amplitude and/or a target volume absolute value. The volume adjustment can be made to the smart device based on the volume adjustment magnitude and/or the target volume absolute value.

The same loudness adjustment strategy may have the same loudness adjustment magnitude and/or target loudness absolute value. Different adjustment strategies may have different volume adjustment magnitudes or different target volume absolute values.

It should be noted that, in order to realize flexible adjustment of the volume in multiple steps, the number of volume adjustment strategies with the same adjustment trend is at least two. The same adjustment trend can be understood as the same adjustment trend for the smart device. For example, volume up or volume down may be considered the same adjustment trend. More specifically, the number of volume adjustment strategies for volume up may be at least two, and each may have a different volume adjustment amplitude and/or target volume absolute value.

On the basis of the technical scheme, the volume adjustment strategy comprises at least two volume adjustment amplitudes with the same adjustment trend; and/or the volume adjustment strategy comprises at least two target volume absolute values with the same adjustment trend.

The volume adjustment amplitude can be understood as the step of the volume adjustment. If the volume adjustment amplitude can be 1 scale, the current sound level can be adjusted by 1 scale when the volume adjustment is performed. The 1 scale may be a set volume change step size or a set percentage of the full range of volume.

The target volume absolute value may be understood as the absolute value of the volume to which the smart device is to be adjusted. If the target absolute value of the volume is 41 db, the current sound level may be adjusted to 41 db when the volume adjustment is performed.

In order to implement multi-level flexible adjustment of the volume, the volume adjustment strategy in this embodiment may include at least two volume adjustment ranges and/or at least two target volume absolute values with the same adjustment trend, so that when the volume adjustment trend is the same, there can be at least two volume adjustment ranges and/or at least two target volume absolute values for the smart device to perform flexible volume control.

In this embodiment, a specific means for determining the volume adjustment policy based on the semantic instruction is not limited, such as searching the volume adjustment policy corresponding to the semantic instruction from a pre-stored mapping relationship. The mapping relationship may be a static set of relationships determined from a plurality of semantic instructions and a volume adjustment policy. The mapping relation can be general or personalized, namely, different users are determined based on respective semantic instructions and historical volume adjustment strategies; as another example, a volume adjustment policy corresponding to a semantic instruction is determined from a set of semantic instructions. The semantic instruction set may be a dynamic set of relationships determined from a plurality of semantic instructions and a volume adjustment policy. The semantic instruction set can be updated in real time based on the use habits of the user in the volume control process of the user so as to better meet the requirement of the user on volume adjustment; if the volume adjustment strategy is determined according to the semantic instruction, the volume adjustment strategy can be corrected by combining the historical volume adjustment strategy of the user so as to gradually establish the corresponding relation between the semantic instruction of the user and the volume control strategy; for another example, when the volume adjustment strategy is determined according to the semantic instruction, a more accurate volume adjustment strategy can be determined from multiple dimensions in combination with the state of the smart device and/or the state of the user.

It can be understood that, when the volume adjustment strategy is determined according to the semantic instruction, the volume control of the intelligent terminal can be performed based on the control information such as the state of the intelligent device, the state of the user and/or the user history volume adjustment strategy besides the semantic instruction. The control priority between the control information and the semantic instruction is not limited herein, and those skilled in the art can limit the control priority according to actual needs.

Optionally, when the control information conflicts with the semantic instruction, the semantic instruction may be used to control the priority.

S140, adjusting the volume of the intelligent device according to the volume adjusting strategy.

After the volume adjustment strategy is determined, the volume of the intelligent device can be adjusted based on the volume adjustment strategy. The specific adjustment means may be determined according to the specific content of the volume adjustment policy. If the step is carried out, the volume of the intelligent equipment can be adjusted according to the volume adjustment range included in the volume adjustment strategy; if the step is carried out, the volume of the intelligent device can be adjusted according to the target volume absolute value.

According to the technical scheme of the embodiment, semantic recognition is carried out on a voice instruction sent to the intelligent equipment by a user, and a corresponding semantic instruction is determined; then, determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two; and finally, adjusting the volume of the intelligent equipment according to the volume adjustment strategy. The volume of the intelligent device can be flexibly controlled in multiple gears, the intelligent degree of volume adjustment based on voice instructions is enhanced, and the personalized and intelligent interactive experience requirements of users are met.

On the basis of the technical scheme, the execution main body of the method is a cloud server or intelligent equipment;

if the execution subject is intelligent equipment, performing semantic recognition on the voice instruction to determine a semantic instruction, and determining a volume adjustment strategy according to the semantic instruction comprises the following steps:

performing semantic recognition according to the voice instruction to determine a semantic instruction, and determining a volume adjustment strategy corresponding to the semantic instruction based on a locally stored mapping relation; or

And sending the voice command to a cloud server for semantic recognition and determination of a volume adjustment strategy, and receiving a feedback volume adjustment strategy.

The volume adjustment method based on voice control in this embodiment may be executed by the cloud server, or may be executed by the smart device. When the execution subject of the method is the intelligent device, the operation of the intelligent terminal for determining the semantic instruction and the volume adjustment strategy can be completed locally. If the voice instruction is subjected to semantic recognition, determining a corresponding semantic instruction, and then determining a volume adjustment strategy corresponding to the semantic instruction based on a locally stored mapping relation; or the intelligent terminal can transfer the determination operation of the semantic instruction and the determination operation of the volume adjustment strategy to the cloud server to complete, so that the processing speed is increased. And if the voice instruction is sent to the cloud server for semantic recognition and determination of the volume adjustment strategy. And then receiving a volume adjustment strategy fed back by the cloud server so as to adjust the volume of the intelligent device based on the volume adjustment strategy.

Example two

Fig. 2 is a flowchart of a volume adjustment method based on voice control according to a second embodiment of the present invention. The embodiment specifically includes, according to the semantic instruction, determining a volume adjustment policy: according to the semantic instruction, matching with the semantic instruction in a semantic instruction set; the semantic instruction set comprises at least two types, and each type of semantic instruction set corresponds to one volume adjustment strategy. And if the matching is successful, determining a volume adjustment strategy corresponding to the semantic instruction set.

Further, after determining the corresponding volume adjustment policy from the semantic instruction set, the method further includes:

and correcting the determined volume adjustment strategy according to the historical volume adjustment strategy of the user. For those parts of this embodiment that are not explained in detail, please refer to embodiment one, and further description is omitted here.

The method shown in fig. 2 specifically includes the following steps:

s210, acquiring a voice instruction sent to the intelligent device by a user.

S220, performing semantic recognition on the voice command to determine a semantic command.

S230, matching with semantic instructions in a semantic instruction set according to the semantic instructions; the semantic instruction set comprises at least two types, and each type of semantic instruction set corresponds to one volume adjustment strategy.

When the volume adjustment strategy is determined according to the semantic instruction, the volume adjustment strategy can be determined through a semantic instruction set. A semantic instruction set may be understood as a set of semantic instructions corresponding to a volume adjustment strategy. The semantic instruction set may include a semantic instruction corresponding to one volume adjustment policy and a corresponding volume adjustment policy.

Specifically, after the semantic instruction is determined, the semantic instruction may be matched with the semantic instruction included in the semantic instruction set, and the semantic instruction set matched with the semantic instruction may be searched. And then determining a corresponding volume adjustment strategy according to the corresponding relation between the semantic instruction set and the volume adjustment strategy. The matching means includes but is not limited to: similarity matching and/or keyword matching.

It should be noted that, in this embodiment, at least two semantic instruction sets are included, and each semantic instruction set corresponds to one volume adjustment policy, so that multi-level flexible control of the volume of the smart device is implemented. The semantic instruction set may be determined based on a plurality of semantic instructions and a volume adjustment policy. Specifically, a semantic instruction corresponding to a volume adjustment policy among the plurality of semantic instructions may be added to one semantic instruction set.

Further, according to the semantic instruction, matching with a semantic instruction in a semantic instruction set includes:

according to the semantic instruction, similarity matching or keyword matching is carried out in the semantic instruction included in the semantic instruction set;

and if the matching result reaches the set condition, determining that the semantic instruction is successfully matched with the semantic instruction set.

When semantic instruction matching is performed, similarity matching or keyword matching may be performed on the semantic instruction and the semantic instruction included in the semantic instruction set.

When similarity matching is carried out, the similarity between the semantic instruction and the preset number of target semantic instructions in the semantic instruction set can be calculated. The preset number may be at least one. When performing similarity matching, the setting condition may be that the similarity is greater than a similarity threshold. And if the determined similarity of the preset number is greater than the similarity threshold, the semantic instruction and the semantic instruction set can be successfully matched.

When comparing the keywords, the semantic instruction and the keywords of the preset number of target semantic instructions in the semantic instruction set can be extracted. When performing keyword matching, the setting condition may be that the keywords are the same. If the keywords of the semantic instruction are the same as the keywords of any one of the preset number of target semantic instructions, it can be determined that the semantic instruction is successfully matched with the semantic instruction set.

Further, the semantic instruction set is determined according to semantic instructions and volume adjustment strategies of a plurality of users; or the semantic instruction set is determined according to the semantic instruction and the historical volume adjustment strategy of the single user.

The semantic instruction set in this embodiment may be determined according to a plurality of semantic instructions and corresponding voice call-up strategies. Wherein, the plurality of semantic instructions can be semantic instructions of a single user or a plurality of users. The single user may be the user who is currently making volume adjustments.

Specifically, semantic instructions of multiple users or a single user corresponding to the same volume adjustment strategy can be added to one semantic instruction set, so that clustering of the multiple semantic instructions is realized.

S240, if the matching is successful, determining a volume adjustment strategy corresponding to the semantic instruction set.

If the semantic instruction matches the semantic instruction included in the semantic instruction set, the semantic instruction matching may be considered successful. After the matching is successful, the step may determine the volume adjustment strategy corresponding to the semantic instruction set based on the correspondence between the semantic instruction set and the volume adjustment strategy. The semantic instructions included in the set of semantic instructions match the semantic instructions determined by the voice instructions.

If the matching is unsuccessful, a key instruction or a touch instruction of a user can be received, then a volume adjustment strategy corresponding to the key instruction or the touch instruction is determined, and the corresponding volume adjustment strategy and the semantic instruction are added into a semantic instruction set; and selecting a semantic instruction set with the highest matching degree with the semantic instruction from the semantic instruction set, and determining the volume adjustment strategy corresponding to the semantic instruction set as the volume adjustment strategy corresponding to the semantic instruction.

And S250, correcting the determined volume adjustment strategy according to the historical volume adjustment strategy of the user.

After the volume adjustment strategy is determined, a historical volume adjustment strategy of the user can be obtained in the step, and the historical volume adjustment strategy can be a historically determined volume adjustment strategy corresponding to the semantic instruction.

When the determined volume adjustment strategy is modified, the volume adjustment strategy determined in the semantic instruction set can be directly replaced by a historical volume adjustment strategy, so that personalized updating of the semantic instruction set is realized.

And S260, adjusting the volume of the intelligent equipment according to the volume adjusting strategy.

In this embodiment, when the volume adjustment policy is determined based on the semantic instruction, the semantic instruction may be matched with the semantic instruction included in the semantic instruction set. And if the matching is successful, determining the volume adjustment strategy corresponding to the successfully matched semantic instruction set. And then, according to the historical volume adjustment strategy of the user, correcting the determined volume adjustment strategy, and adjusting the intelligent equipment according to the corrected volume adjustment strategy. On the basis of realizing multi-gear adjustment of the volume, the volume control can better meet the requirement of a user on volume adjustment, and the requirement of more efficient and personalized volume control experience is met.

On the basis of the technical scheme, the method further comprises the following steps:

collecting at least two semantic instructions input by a user within a set control duration, and executing the initial volume and the ending volume of voice control within the set control duration;

determining a corresponding volume adjustment strategy according to the starting volume and the ending volume;

and adding the corresponding volume adjustment strategy and the semantic instruction into a semantic instruction set.

It can be understood that, in the process of controlling the volume of the intelligent device by the user, the semantic instruction set can be updated, so that the updated semantic instruction set better meets the requirements of the user.

The set control time period may be a preset time period for performing volume control. If the user inputs at least two semantic instructions within the set control duration, it can be considered that the volume adjustment strategy corresponding to the at least two semantic instructions determined based on the current semantic instruction set cannot meet the requirement of the user on volume control. The example may update the semantic instruction set based on the current speech control.

Specifically, at least two semantic instructions input by a user in a set control duration are collected, and the initial volume and the ending volume of voice control are executed in the set control duration. And setting the volume corresponding to the starting time point and the ending time point of the control duration as the starting volume and the ending volume. That is, the initial volume may be a volume at which voice control is not performed within the set control period. The ending volume may be a volume after performing voice control within a set control duration. At least two volume controls performed within the set duration may be considered one voice control.

After the start volume and the end volume are determined, a corresponding volume adjustment strategy may be determined based on a difference between the start volume and the end volume. If the difference is directly set to the volume adjustment amplitude corresponding to the current voice control, or the ending volume can be directly used as the absolute value of the target volume.

After the corresponding volume adjustment strategy is determined, the corresponding volume adjustment strategy and at least two semantic instructions may be added to the semantic instruction set. Different semantic instructions may have different add strategies. If the semantic instructions are the same semantic instructions at least twice, directly adding the semantic instructions and the volume adjustment strategy to a semantic instruction set; and when the at least two semantic instructions are different semantic instructions, performing semantic analysis on the at least two semantic instructions to determine the semantic instructions which can be corrected. And if the semantic instruction acquired for the first time is to increase the volume and the semantic instruction acquired for the second time is to reduce the volume by one point, the corrected semantic instruction is to increase the volume by one point. And after the corrected semantic instruction is determined, adding the corrected semantic instruction and the corresponding volume adjustment strategy to a semantic instruction set.

The semantic instruction set may be generic or personalized. The general semantic instruction set is established by collecting semantic instructions of a large number of users and other volume adjustment strategies and carrying out clustering association. The general semantic instruction set may be preferentially used when a user first uses the smart device. Each user also has the semantic expression habit of the user, so that if the historical data of the user is collected to indicate that the historical data is inconsistent with the general semantic instruction set, a personalized semantic instruction set can be built for the user step by step according to the historical data, and intelligent control is only provided for the user. The process of establishing the semantic instruction set will be described in detail later by embodiments.

EXAMPLE III

Fig. 3 is a flowchart of a volume adjustment method based on voice control according to a third embodiment of the present invention, which is embodied on the basis of the above-mentioned embodiments. In this embodiment, the method specifically includes: determining the state of the intelligent equipment and/or the state of the user;

correspondingly, the determining the volume adjustment strategy according to the semantic instruction specifically includes:

and determining a volume adjustment strategy according to the semantic instruction, the state of the intelligent equipment and/or the state of the user. For details, please refer to the previous embodiments, which are not described herein.

The method as shown in fig. 3 specifically includes the following steps:

s310, acquiring a voice instruction sent to the intelligent device by a user.

S320, performing semantic recognition on the voice command to determine a semantic command.

S330, determining the state of the intelligent device and/or the state of the user.

When the volume is controlled, the volume adjustment strategy can be determined according to the state of the intelligent device and/or the state of the user.

The state of the intelligent device can be understood as the current playing scene of the intelligent terminal. The state of the user can be understood as personal information of the user, such as the age of the user, the mood of the user and the like. The state of the intelligent device or the state of the user is different, and different determining means can be correspondingly provided. The state of the intelligent terminal can be obtained through dynamic identification. The state of the user can be acquired in advance by registration or obtained by dynamic identification.

Further, determining that the smart device is in the state includes at least one of:

determining a distance between the smart device and a user;

determining the noise state of the environment where the intelligent device is located;

determining an application scene to which the current playing content of the intelligent equipment belongs;

and determining the time interval scene to which the intelligent equipment plays currently.

The distance between the intelligent device and the user can be determined through a sensor, sound source positioning identification and the like. The noise state of the environment where the intelligent terminal is located is determined, and the sound of the environment where the intelligent device is located can be acquired through collection and analysis. The noise condition may include a noise type and/or a noise magnitude. The application scene for determining the current playing content of the smart device can be determined based on the current playing content. Different playing contents correspond to different application scenes. The application scenes comprise playing rock music, playing children stories, news and the like. The time interval scene can be understood as the time interval in which the current playing is performed. The time period scene comprises late night, day, working time or entertainment time and the like. The determined period scene may be determined based on the time of the current play.

Further, determining that the user is in the state comprises: the age of the user.

Different user ages can be different to the acceptance of audio frequency, and this embodiment can combine user age when confirming the volume adjustment strategy to make the volume control to smart machine accord with user's demand more.

S340, determining a volume adjustment strategy according to the semantic instruction, the state of the intelligent device and/or the state of the user.

The step can determine the volume adjustment strategy by combining the state of the intelligent equipment and/or the state of the user on the basis of the semantic instruction.

When the volume adjustment strategy is determined, the comfortable volume range can be determined according to the state of the intelligent equipment and/or the state of the user; the volume comfort range may be understood as a comfortable volume range corresponding to the state the smart device is in and/or the state the user is in. After the volume comfort range is determined, a volume adjustment strategy capable of adjusting the volume in the volume comfort range can be determined based on the semantic instruction; the volume adjustment strategy can also be determined directly based on the predetermined semantic instruction, the state of the intelligent device and/or the state of the user, and the corresponding relation of the volume adjustment strategy.

Further, determining a volume adjustment policy according to the semantic instruction, the state of the smart device and/or the state of the user comprises:

determining a volume comfort range based on a preset corresponding relation according to the state of the intelligent equipment and/or the state of the user;

and determining a volume adjustment strategy according to the semantic instruction, wherein the volume adjustment strategy enables the adjusted volume to be within the volume comfort range.

The preset correspondence may be understood as a correspondence for determining a volume comfort range. When the volume adjustment strategy is determined, the volume comfort range corresponding to the state of the intelligent device and/or the state of the user can be determined based on the preset corresponding relation; the volume comfort range may define a range of volume adjustments. And then based on the semantic instruction, determining a corresponding semantic instruction and a volume adjustment strategy of the volume comfort range, wherein the volume adjustment strategy can enable the adjusted volume to be in the volume comfort range.

Further, the preset corresponding relation is determined according to the corresponding relation between the states of the plurality of intelligent devices and/or the states of the users and the expected volume range; or the preset corresponding relation is determined according to the corresponding relation between the state of the intelligent equipment owned by the single user and/or the state of the single user and the historical expected volume range.

The desired volume range may be understood as a desired volume range corresponding to the state the smart device is in and/or the state the user is in. The historical expected volume range is the volume range of the state where the single user historical expected to correspond to the intelligent device and/or the state where the user is located.

The preset correspondence determined based on the correspondence of the states of the plurality of smart devices and/or the state of the user with the expected volume range may have generality. The preset corresponding relation is determined based on the corresponding relation between the state of the intelligent device owned by the single user and/or the state of the single user and the historical expected volume range, so that the requirements of the user can be better met, and personalized volume adjustment is realized.

And S350, adjusting the volume of the intelligent equipment according to the volume adjusting strategy.

In this embodiment, when the volume adjustment policy is determined based on the semantic instruction, the state of the smart device and/or the state of the user may be determined, and then the volume adjustment policy is determined according to the semantic instruction, the state of the smart device, and/or the state of the user. When the volume of the intelligent equipment is controlled, the volume can be controlled more effectively on the basis of realizing multi-gear adjustment of the volume by combining the state of the intelligent equipment and/or the state of the user, and the requirements of the state of the intelligent equipment and/or the state of the user on the volume are met, so that the requirements of more efficient and personalized volume control experience are met.

Example four

The embodiments of the present invention provide several specific implementation manners based on the technical solutions of the above embodiments.

At present, the volume adjustment lacks understanding of user intention, interaction environment and user habits, and cannot provide efficient, personalized and intelligent interaction experience meeting the user requirements.

When the volume adjustment is carried out in the embodiment, the personalized volume adjustment can be carried out on the intelligent equipment. And realizing personalized regulation and output of the volume based on the state of the intelligent equipment, semantic instructions corresponding to the voice instructions, historical volume regulation strategies and other information. Namely, the intelligent device comprehensively judges the volume adjustment requirement of the user through acquiring, analyzing and integrating the interaction distance, the depth understanding of the user input instruction and the historical data of the adjustment volume, and provides personalized volume adjustment interaction experience. The smart device may be a DuerOS system-mounted smart device.

Specifically, the most comfortable volume range, i.e., the volume comfort range, is initially determined by the interaction distance: the method comprises the steps of obtaining interaction distance information of a user and intelligent equipment through a sensor of the intelligent sound box, such as a camera, infrared equipment or sound source positioning, judging the most comfortable volume range of the user under the current interaction distance condition according to the distance, and judging the sound output power corresponding to the intelligent equipment.

Fig. 4a is a schematic view of a comfortable volume curve output by the smart device at different distances, which is applicable to the embodiment of the present invention, and taking the smart speaker as an example, a more common interaction distance between the user and the device and a corresponding optimal volume output by the device in a smart home scene are shown in fig. 4a, where a source of data in fig. 4a may be experimental data obtained by performing a smart device volume adjustment experience study on 30 users. The abscissa of fig. 4a is the volume level of the volume output in decibels. The ordinate is the user who thinks the current volume is comfortable.

Then, user requirements are mined based on deep analysis of user adjustment instructions, voice instructions for adjusting the volume of the user are obtained and subjected to deep semantic analysis, the instructions of the size-up/size-down type are taken as examples (accounting for 80% of all the voice adjustment instructions), deep semantic analysis is carried out on user input instructions, and semantic differences among different instructions and corresponding specific demands of the user are extracted. The user's voice adjustment input command can be classified into the following three categories, and the volume change amplitude suggestions corresponding to the different categories. Fig. 4b is a schematic diagram of clustering instructions for adjusting the volume according to the user's voice according to the fourth embodiment of the present invention. Referring to fig. 4b, the clustering result may be a cluster of test data for performing a smart device volume adjustment experience study on 30 users. The clustering results may include category 1, category 2, and category 3. Table 1 is a suggested table of volume change amplitudes under different command categories. Fig. 4b collects instruction sets of a large number of users, and sets corresponding adjustment ranges according to user expectation values, as shown in table 1.

TABLE 1 volume change amplitude suggestion table under different instruction categories

Referring to table 1, the stride changes for the different categories are listed in table 1.

And finally, matching the appropriate volume preference based on the user habit, and matching the volume preference of the user habit through the analysis of the previous user volume interactive data of the user, wherein the user habit data comprises: 1) habitual volume size and time status information of the device, such as volume setting states of the device at different time periods; 2) setting conditions of initial volume and terminal volume when a user adjusts the volume by voice; 3) an instruction set for adjusting the volume by the voice of a user; 4) volume settings in different scenes, such as volume settings when the user consumes different content, such as music, encyclopedia, news, etc. And matching the proper volume level by using the historical information of the user.

The embodiment provides an individualized volume adjustment strategy and method based on interaction distance between a user and an intelligent device, depth analysis of volume adjustment instructions and volume adjustment history of the user. Fig. 4c is a schematic view of an interaction scenario provided by the fourth embodiment of the present invention, referring to fig. 4c, a user inputs a voice adjustment instruction (i.e., a voice instruction) to the smart device, and the smart device may send the volume adjustment instruction to a cloud (i.e., a cloud server) to perform semantic difference analysis on the volume adjustment instruction. The cloud feeds back a corresponding volume change value (namely, a volume adjustment strategy), and the intelligent device performs differentiated volume output based on the volume change value.

Aiming at the voice volume adjustment behavior of the intelligent equipment, through further semantic analysis on the current highest proportion turning up/down instruction (the proportion is nearly 80%), and by combining information such as the interaction distance between a user and the intelligent equipment, the user volume adjustment history and the like, more personalized and more efficient volume adjustment experience is provided for the user. The method can effectively solve the problem of the experience of volume adjustment of the existing intelligent equipment by voice.

More specifically, the method of this embodiment includes:

receiving a voice instruction sent to the intelligent equipment by a user;

determining a corresponding relation between a voice instruction and a volume adjustment amplitude according to an interaction distance between the intelligent device and a person, and depth understanding of a user input instruction and/or volume adjustment historical data; wherein the volume adjustment amplitude comprises a plurality of amplitudes;

and adjusting the volume of the intelligent equipment according to the volume adjustment amplitude.

In the above scheme, the interaction distance between the intelligent device and the person belongs to a playing scene, and the playing scene may further include: current time scenarios such as late night, day, working hours, entertainment time; and (4) current application scenes such as playing rock music, playing children stories and the like.

The above can be determined by investigation or by statistical analysis of a large number of user history data.

The interaction distance between the intelligent equipment and people can be obtained through a sensor of the intelligent sound box, such as a camera, infrared equipment or sound source positioning, the interaction distance information between the user and the intelligent equipment is obtained, and the most comfortable volume range of the user under the current interaction distance condition and the corresponding sound output power of the intelligent equipment are judged according to the distance.

In the above scheme, the receiving condition of the user to the audio may also be considered, including:

the age of the user mainly refers to personal and static information of the user, and can be collected in advance or identified dynamically through registration and the like.

In the above scheme, the semantic analysis of the volume adjustment command may be understood as a general and initialized semantic, including:

and acquiring voice instructions which are input by most users by habit and corresponding expected volume adjustment amplitude through big data acquisition, and establishing a mapping relation. May be determined by manual investigation or historical habit statistics of a large number of users.

If the voice command input by the user is not in the range of the command set, carrying out keyword matching and similarity recognition. For example, too little sound may be matched.

In the foregoing scheme, the method may further include:

if the difference between the collected instructions and the instructions in the instruction set reaches a preset value and the number of the collected instructions reaches a certain value, the collected instructions can be used as new instructions to be added into the instruction set. Can be used as a personalized instruction of the user and can also be used as a general instruction set. Such as english instructions.

In the above scheme, the adjusting of the volume history data refers to a personal history voice instruction habit of the user, and is used for correcting the general instruction semantics, and specifically includes:

1) the volume of the equipment and the time state information used by the user, such as the volume setting state of the equipment in different time periods;

2) setting conditions of the initial volume and the end volume when a user adjusts the volume by voice, for example, giving three instructions continuously in a set short time, and increasing the volume, wherein the volume at the initial and end points of the set short time is the initial volume and the end volume, which is regarded as a volume adjustment process;

3) user voice volume adjustment instruction set-new user instruction;

4) volume settings in different scenes, such as volume settings when the user is consuming music, encyclopedia, news, etc., without stopping.

The user's history information is used to match the appropriate volume level.

The whole process of a specific preferred embodiment of the invention can be as follows:

when the intelligent equipment is just adopted by a new user and no historical data of the user exists, when a voice instruction is received, determining a corresponding adjustment amplitude based on the distance between the equipment and a person and/or a general instruction set;

the historical data of the user is gradually collected, the corresponding relation between the personal instruction set of the user and the volume adjustment range is established, or the corresponding relation between the personal instruction set and the absolute volume is determined until stable convergence, the identification correction process can be calculated through a cloud server, the identification correction process is stored in the intelligent equipment after the identification correction process is determined, the identification correction process can be directly used by the intelligent equipment, and the rule is recalculated by triggering the cloud server until the unmatched change condition occurs.

The technical scheme of the embodiment of the invention can flexibly identify the volume and determine the target volume which is correspondingly adjusted, wherein the target volume can be different target volume absolute values or volume adjustment amplitudes. The method and the device have the advantages that after the user inputs the voice command, different volume adjustment amplitudes or absolute target volume values can be determined based on different factors, the adjustment requirements of the user can be met as much as possible in one step.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a volume adjustment device based on voice control according to a fifth embodiment of the present invention, where the volume adjustment device is implemented by software and/or hardware, and is specifically configured in a computer device, and is used to implement volume adjustment based on voice control.

As shown in fig. 5, the apparatus provided in this embodiment is applicable to the case of adjusting the volume based on voice control, and specifically, the present embodiment is applicable to the case of flexibly controlling the volume of the smart device in multiple stages based on a voice instruction by a user. The method specifically comprises the following steps: an acquisition module 510, a recognition module 520, a determination module 530, and a control module 540.

The obtaining module 510 is configured to obtain a voice instruction sent by a user to the smart device;

a recognition module 520, configured to perform semantic recognition on the voice command to determine a semantic command;

a determining module 530, configured to determine volume adjustment strategies according to the semantic instruction, where the number of volume adjustment strategies with the same adjustment trend is at least two;

and the control module 540 is configured to adjust the volume of the smart device according to the volume adjustment policy.

In the embodiment, semantic recognition is carried out on a voice instruction sent to the intelligent equipment by a user, and a corresponding semantic instruction is determined; then, determining volume adjustment strategies according to the semantic instruction, wherein the number of the volume adjustment strategies with the same adjustment trend is at least two; and finally, adjusting the volume of the intelligent equipment according to the volume adjustment strategy. The volume of the intelligent device can be flexibly controlled in multiple gears, the intelligent degree of volume adjustment based on voice instructions is enhanced, and the personalized and intelligent interactive experience requirements of users are met.

Further, in the determining module 530, the volume adjustment strategy includes at least two volume adjustment magnitudes with the same adjustment trend; and/or the volume adjustment strategy comprises at least two target volume absolute values with the same adjustment trend.

Further, the determining module 530 includes:

the matching unit is used for matching with the semantic instruction in the semantic instruction set according to the semantic instruction; the semantic instruction sets comprise at least two types, and each type of semantic instruction set corresponds to a volume adjustment strategy;

and the determining unit is used for determining the volume adjustment strategy corresponding to the semantic instruction set when the matching is successful.

Further, the matching unit is specifically configured to:

Further, the apparatus further comprises: and the correction module is used for correcting the determined volume adjustment strategy according to the historical volume adjustment strategy of the user after determining the corresponding volume adjustment strategy from the semantic instruction set according to the semantic instruction.

Further, the semantic instruction set in the matching unit is determined according to the semantic instructions and the volume adjustment strategies of a plurality of users; or the semantic instruction set is determined according to the semantic instruction and the historical volume adjustment strategy of the single user.

Further, the apparatus further comprises: an add module to:

Further, the apparatus further comprises: a state determination module for

Determining the state of the intelligent equipment and/or the state of the user;

correspondingly, the determining module 530 is specifically configured to:

and determining a volume adjustment strategy according to the semantic instruction, the state of the intelligent equipment and/or the state of the user.

Further, the state determination module determines that the intelligent device is in the state including at least one of:

determining a distance between the smart device and a user;

Further, the determining the state of the user by the state determining module includes: the age of the user.

Further, the determining module 530 is specifically configured to:

Further, the preset corresponding relationship in the determining module 530 is determined according to the corresponding relationship between the states of the plurality of intelligent devices and/or the state of the user and the expected volume range; or the preset corresponding relation is determined according to the corresponding relation between the state of the intelligent equipment owned by the single user and/or the state of the single user and the historical expected volume range.

Further, the device is integrated in a cloud server or a smart device;

if integrated in a smart device, the recognition module 520 and the determination module 530 are specifically configured to:

The volume adjusting device based on voice control provided by the embodiment of the invention can execute the volume adjusting method based on voice control provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.

EXAMPLE six

Fig. 6 is a schematic structural diagram of a computer device in the sixth embodiment of the present invention. The computer device 12 shown in FIG. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 6, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown in FIG. 6, the network adapter 20 communicates with the other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing, such as implementing a volume adjustment method based on voice control provided by an embodiment of the present invention, by running a program stored in the system memory 28. That is, the processing unit implements, when executing the program:

acquiring a voice instruction sent to intelligent equipment by a user;

EXAMPLE seven

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements a volume adjustment method based on voice control as provided in all inventive embodiments of the present application. That is, the program when executed by the processor implements:

acquiring a voice instruction sent to intelligent equipment by a user;

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A volume adjustment method based on voice control is characterized by comprising the following steps:

acquiring a voice instruction sent to intelligent equipment by a user;

2. The method of claim 1, wherein:

the volume adjustment strategy comprises at least two volume adjustment amplitudes with the same adjustment trend; and/or

The volume adjustment strategy comprises at least two target volume absolute values with the same adjustment trend.

3. The method of claim 1 or 2, wherein determining a volume adjustment policy according to the semantic instruction comprises:

according to the semantic instruction, matching with the semantic instruction in a semantic instruction set; the semantic instruction sets comprise at least two types, and each type of semantic instruction set corresponds to a volume adjustment strategy;

and if the matching is successful, determining a volume adjustment strategy corresponding to the semantic instruction set.

4. The method of claim 3, wherein matching semantic instructions from the set of semantic instructions comprises:

5. The method of claim 3, wherein after determining the corresponding volume adjustment policy from the semantic instruction set according to the semantic instruction, further comprising:

and correcting the determined volume adjustment strategy according to the historical volume adjustment strategy of the user.

6. The method of claim 3, wherein the semantic instruction set is determined according to semantic instructions and volume adjustment policies of a plurality of users; or the semantic instruction set is determined according to the semantic instruction and the historical volume adjustment strategy of a single user.

7. The method of claim 3, further comprising:

8. The method of claim 1 or 2, further comprising:

correspondingly, the determining the volume adjustment strategy according to the semantic instruction comprises:

9. The method of claim 8, wherein determining that the smart device is in the state comprises at least one of:

determining a distance between the smart device and a user;

10. The method of claim 8, wherein determining that the user is in the state comprises: the age of the user.

11. The method of claim 8, wherein determining a volume adjustment policy according to the semantic instruction and the state of the smart device and/or the state of the user comprises:

12. The method according to claim 11, wherein the preset correspondence is determined according to a correspondence between states of the plurality of smart devices and/or states of the user and an expected volume range; or the preset corresponding relation is determined according to the corresponding relation between the state of the intelligent equipment owned by the single user and/or the state of the single user and the historical expected volume range.

13. The method according to claim 1, wherein the execution subject of the method is a cloud server or a smart device;

14. A volume adjustment device based on voice control, comprising:

15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for adjusting volume based on voice control according to any one of claims 1 to 13 when executing the program.

16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of volume adjustment based on voice control according to any one of claims 1 to 13.