Disclosure of Invention
In view of this, the present application provides an audio scene recognition method, a motor driving system, and an electronic device, so as to solve the problem that the existing racing car game product has limitations in providing a user perception signal.
A first aspect of the present application provides a method for identifying an audio scene, including:
acquiring audio data to be processed;
dividing the audio data to be processed into a plurality of continuous multi-frame audio units in time sequence;
filtering each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain target audio;
acquiring frame count and energy average value of each frame audio unit in target audio; the frame number count is used for representing the characteristics of a specific scene;
and comparing the frame number count or the energy mean value with the characteristic threshold values corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit.
In one embodiment, the target audio comprises a first target audio and a second target audio; the frame number count of each frame audio unit in the first target audio is a first frame number count, and the energy mean value is a first mean value; the frame number count of each frame audio unit in the second target audio is a second frame number count; the characteristic threshold comprises a first trigger threshold and a minimum frame number count value;
comparing the frame number count and the average value with the feature thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit comprises:
setting a first trigger threshold of each frame according to the value characteristics of the first frame number count of each frame in the first target audio; determining that audio units having a first average value greater than a first trigger threshold are generated in a first audio scene; the first audio scene is a scene of the control target changing in speed in a first direction;
determining that an audio unit having the second frame number count greater than or equal to the minimum frame number count value is generated in a second audio scene; the second audio scene is a scene in which the speed of the control target changes in a second direction.
In one embodiment, the first trigger threshold comprises an incremental primary trigger threshold, a medium trigger threshold, and a high trigger threshold;
the setting of the first trigger threshold of each frame according to the value-taking feature of the first frame number count of each frame in the first target audio includes:
setting the first trigger threshold as a primary trigger threshold if GAIN _ CNT (n) < a × GAIN _ CNT _ STEP; wherein GAIN _ CNT (n) represents the first frame count of the current frame, GAIN _ CNT _ STEP represents the interval threshold, a is a positive number, and the symbol x represents the multiplication; the interval threshold is used for describing the interval between each level of threshold in the first trigger threshold;
if a is not more than GAIN _ CNT _ STEP and (n) is less than 2a is not more than GAIN _ CNT _ STEP, setting the first trigger threshold as a middle trigger threshold;
if 2a GAIN _ CNT _ STEP ≦ GAIN _ CNT (n) <3a GAIN _ CNT _ STEP, then the first trigger threshold is set to the high level trigger threshold.
In one embodiment, before comparing the frame number count or the energy average with feature thresholds corresponding to different audio scenes and determining an audio scene corresponding to each frame of audio unit, the method further includes:
determining the first frame number count of the current frame according to the first frame number count of the previous frame, the first average value of the current frame and an interval threshold; wherein, the previous frame is a frame before the current frame;
and/or the presence of a gas in the gas,
determining the second frame number count of the current frame according to the second frame number count of the previous frame, the second average value of the current frame and a second trigger threshold; wherein the second average is an energy average of a corresponding audio unit in the second target audio.
Specifically, the determining the first frame number count of the current frame according to the first frame number count of the previous frame, the first average value of the current frame, and the interval threshold includes:
if GAIN _ CNT (n-1)<a GAIN CNT STEP, AVE L (n) > b2When it is time, the GAIN _ CNT (n) is updated in the first update mode, in AVE _ L (n)<b1And GAIN _ CNT (n-1) >0, updating GAIN _ CNT (n) by a second updating formula; wherein GAIN _ CNT (n-1) represents the first frame count of the previous frame, GAIN _ CNT (n) represents the first frame count of the current frame, GAIN _ CNT _ STEP represents the interval threshold, a is a positive number, symbol is multiplied, AVE _ L (n) represents the first average value of the current frame, b1Representing a first mean evaluation parameter, b2Representing a second mean evaluation parameter, b3Represents a third mean evaluation parameter, b4Represents a fourth mean evaluation parameter; the first update is used for increasing the first frame number count; the second update is to decrement the first frame number count;
if a is GAIN _ CNT _ STEP ≦ GAIN _ CNT (n-1)<2a GAIN _ CNT _ STEP, in AVE _ L (n) > b3When it is time, the GAIN _ CNT (n) is updated in the first update mode, in AVE _ L (n)<b2Updating GAIN _ CNT (n) by a second updating formula;
if 2a is GAIN _ CNT _ STEP.ltoreq.GAIN _ CNT (n-1)<3a GAIN _ CNT _ STEP, in AVE _ L (n) > b4When it is time, the GAIN _ CNT (n) is updated in the first update mode, in AVE _ L (n)<b3If GAIN _ CNT (n) is equal to 3a × GAIN _ CNT _ STEP, GAIN _ CNT (n) is set to GAIN _ CNT (n) -c1(ii) a Wherein, c1Representing a first step value.
Specifically, the first update is: GAIN _ CNT (n) ═ GAIN _ CNT (n-1) + c2(ii) a The second update is: GAIN _ CNT (n) ═ GAIN _ CNT (n-1) -c3(ii) a Wherein, c2Representing a second step value, c3Representing a third step value.
Specifically, the determining the second frame count of the current frame according to the second frame count of the previous frame, the second average value of the current frame, and the second trigger threshold includes:
if AVE _ R (n) is greater than BP _ ATT, adopting a third updating method to update BP _ CNT (n), and setting BP _ CNT (n) as a maximum frame number counting value when BP _ CNT (n) is greater than the maximum frame number counting value; wherein AVE _ r (n) represents a second average value of the current frame, BP _ ATT represents a second trigger threshold, BP _ cnt (n) represents a second frame count of the current frame; the third update to increase the second frame number count;
if AVE _ R (n) is less than or equal to BP _ ATT, if BP _ CNT (n-1) is positive, adopting a fourth updating formula to update BP _ CNT (n); wherein BP _ CNT (n-1) represents a second frame number count of a previous frame; the fourth update is to decrease the second frame number count.
Specifically, the third update is: BP _ CNT (n) ═ BP _ CNT (n-1) + c4(ii) a The fourth update is: BP _ CNT (n) ═ BP _ CNT (n-1) -c5Wherein c is4Represents a fourth step value, c5Representing a fifth step value.
In one embodiment, the filtering, according to the band feature corresponding to the audio scene, each frame of audio unit to obtain the target audio includes:
acquiring first channel data and second channel data of each frame of audio unit;
performing low-pass filtering on the first channel data to obtain a first target audio frequency; and performing band-pass filtering on the second channel data to obtain a second target audio.
A second aspect of the present application provides a motor driving method including:
according to any one of the audio scene identification methods, identifying the audio scene corresponding to the currently played audio unit;
acquiring a corresponding vibration rule according to the audio scene;
and driving the motor to vibrate according to the vibration rule, so as to realize the vibration effect corresponding to the currently played audio unit.
In one embodiment, the vibration rules include a first vibration rule and a second vibration rule;
the obtaining of the corresponding vibration rule according to the audio scene includes: if the audio unit of the current frame is generated in the first audio scene, determining a first vibration rule according to a first average value of the current frame and a first trigger threshold; and if the current frame audio unit is generated in the second audio scene, determining a second vibration rule according to the second frame number count of the current frame.
Specifically, the determining the first vibration rule according to the first average value of the current frame and the first trigger threshold includes: if AVE _ L (n) is greater than MAX _ THR, setting the amplitude of the current frame as a second amplitude value, and setting the maximum vibration sensation zone bit as a first zone; if MOVING _ THR < AVE _ L (n) < MAX _ THR, when the maximum vibration sensation flag bit is the second flag and GAIN (n-1) < GAIN _ MAX, controlling the amplitude of the current frame according to the vibration sensation climbing rule, and when the maximum vibration sensation flag bit is the first flag and GAIN (n-1) > GAIN _ MAX, setting the amplitude of the current frame as the maximum amplitude of the motor; wherein AVE _ l (n) represents a first average value of the current frame, MAX _ THR represents a maximum trigger threshold; the maximum vibration sensing flag is used for marking vibration sensing degree, MOVING _ THR represents a first trigger threshold value, GAIN (n-1) represents the amplitude of a previous frame, and GAIN _ MAX represents the maximum amplitude of the motor; the vibration sense climbing rule is a rule that vibration amplitudes of all times are set in sequence according to the amplitude recorded by the climbing control matrix; the climbing control matrix records a plurality of amplitude values;
and/or the presence of a gas in the gas,
the determining a second vibration rule according to the second frame number count of the current frame includes: if BP _ CNT (n) is larger than or equal to m, setting the amplitude of the current frame as a first amplitude value; where BP _ cnt (n) represents a second frame number count for the current frame, and m represents a count threshold for the second audio scene.
Specifically, the first vibration rule further includes:
if AVE _ L (n) > MOVING _ THR, the maximum vibration sensation flag bit is the first flag, and GAIN (n-1) is greater than the third amplitude value, the amplitude of the current frame is set as the first amplitude value.
Specifically, the controlling the amplitude of the current frame according to the vibration sensing climbing rule includes:
and acquiring the count of the climbing frame number, searching the amplitude value sequenced as the count of the climbing frame number in a climbing control matrix, and determining the amplitude of the current frame according to the sum of the searched amplitude value and GAIN (n-1).
Specifically, after controlling the amplitude of the current frame to increase according to the vibration sense slope climbing rule, the method further comprises the following steps:
when GAIN (n) is greater than GAIN _ MAX, setting the amplitude of the current frame as the maximum amplitude of the motor, setting the maximum vibration sense flag bit as a first flag, and performing an addition operation on the number of climbing frames; wherein GAIN (n) represents the amplitude of the current frame.
Specifically, the motor driving method further includes:
and if the number of climbing frames is greater than the threshold value of the climbing times, setting the number of the climbing frames as the threshold value of the climbing times.
A third aspect of the present application provides an audio scene recognition system, including:
the first acquisition module is used for acquiring audio data to be processed;
the segmentation module is used for segmenting the audio data to be processed into a plurality of continuous multi-frame audio units in time sequence;
the filtering module is used for filtering each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain target audio;
the second acquisition module is used for acquiring the frame count and the energy mean value of each frame of audio unit in the target audio; the frame number count is used for representing the characteristics of a specific scene;
and the judging module is used for comparing the frame number count or the energy mean value with the characteristic threshold values corresponding to different audio scenes and judging the audio scene corresponding to each frame of audio unit.
A fourth aspect of the present application provides a motor drive system comprising:
the identification module is used for identifying the audio scene corresponding to the currently played audio unit according to any one of the audio scene identification systems;
the third acquisition module is used for acquiring a corresponding vibration rule according to the audio scene;
and the driving module is used for driving the motor to vibrate according to the vibration rule so as to realize the vibration effect corresponding to the currently played audio unit.
A fifth aspect of the present application provides an electronic device comprising a processor and a storage medium; the storage medium having program code stored thereon; the processor is used for calling the program codes stored in the storage medium to execute any one of the audio scene recognition methods.
In one embodiment, the electronic device further includes a motor; the processor is also configured to perform any of the motor drive methods described above.
In the audio scene identification and motor driving method and system and the electronic device, the audio data to be processed is obtained and is divided into the continuous multi-frame audio units in time sequence, so that the audio scene identification is carried out in a frame unit, and the scene corresponding to each frame of audio unit can be accurately identified; filtering each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain a target audio, so that the target audio comprises data effectively representing the state characteristics of the control target, and the accuracy of audio scene recognition according to the target audio can be improved; the method comprises the steps of obtaining the frame count and the energy mean value of each frame of audio unit in target audio, comparing the frame count or the energy mean value with the characteristic threshold values corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit so as to set other user perception signals such as corresponding vibration signals and the like aiming at a specific audio scene, so that corresponding game products can provide more comprehensive user perception signals, the participation of users in the game process can be improved, and the purpose of improving the user experience is achieved.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The following embodiments and their technical features may be combined with each other without conflict.
In a first aspect, the present application provides a method for identifying an audio scene, which is shown in fig. 1, and the method for identifying an audio scene includes:
and S100, acquiring audio data to be processed.
The audio data to be processed may be derived from an audio signal of a client of a racing game running on an intelligent terminal such as a mobile phone, or may be derived from an audio signal of a racing game provided by a game terminal such as a racing game machine. The audio signal can be an audio data stream which is output after decoding audio files in various formats.
S200, dividing the audio data to be processed into a plurality of continuous multi-frame audio units in time sequence.
Specifically, the steps may adopt a set sampling frequency and a set sampling bit number to sample the audio signal of the racing game, obtain the audio data to be processed, frame the audio data stream by using N audio sampling points as step lengths, and perform 0-complementing processing on the audio units less than N audio sampling points, to obtain a multi-frame audio unit which is continuous in time sequence, so that the audio data to be processed includes a multi-frame audio unit, and each frame of audio unit includes N audio sampling points.
Optionally, the set sampling frequency and the set sampling number may be respectively set according to the recognition accuracy of a subsequent game scene, for example, the set sampling frequency is set to 48kHz, and the set sampling number is set to 16bit equivalent, at this time, if a game client, which is a QQ flying vehicle running on a mobile phone, is targeted, an audio signal of the QQ flying vehicle may be obtained from a corresponding mobile phone system, and the original audio signal is sampled at a sampling rate of 48kHz and a sampling depth of 16bit, so as to obtain audio data to be processed. Where N may take the value of 1024, etc., raised to the power of 2.
S300, filtering each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain the target audio.
In the steps, the filtering mode and the filtering parameters can be set according to the band characteristics of the audio in the audio scene to be identified so as to reduce the noise data in the target audio, so that the obtained target audio can include the state change characteristics of the control target in the corresponding scene as effectively as possible, and the identification precision of the subsequent audio scene can be improved. For example, if the audio frequency corresponding to a certain audio scene is usually below a certain frequency value, low-pass filtering may be performed on each frame of audio unit, so that the obtained target audio can completely include the audio feature corresponding to the audio scene; if the noise characteristics of a certain audio scene are mainly below a certain frequency value, and the effective audio characteristics of the audio scene are hardly included in the frequency band, high-pass filtering can be performed on each frame of audio unit, so that the obtained target audio includes the noise data as little as possible.
S400, acquiring frame count and energy average value of each frame of audio unit in the target audio; the frame number count is used to characterize the features that a particular scene has.
The specific scene may be a scene that needs to provide other sensing signals besides the existing sensing signals in the auditory sense and/or the visual sense, such as a collision scene that needs to provide a collision sense, an acceleration scene that needs to provide an acceleration sense, and the like. The specific scene may have features including a background music feature of the specific scene and a state change feature of the operation target in the specific scene. The background music characteristics can include characteristics such as music type and/or music intensity; the state change feature may include a change feature of a state that the manipulation object has in the corresponding game scene, such as a speed change feature and/or a position change feature. In the racing game, the state change of the control target in most scenes is often represented as speed change, so that in the scenes, the state change characteristic can be a speed change characteristic; the speed change characteristics may include characteristics such as a speed change direction and a speed change magnitude, which characterize a speed change of the manipulation target in a specific scene. The target audio comprises continuous multi-frame audio units, each frame audio unit comprises N audio sampling points, each sampling point has a corresponding energy value, and the energy value can be determined according to the amplitude of the corresponding sampling point. Specifically, the absolute values of the energy values of the respective sampling points may be summed, and the summed value is divided by the number N of audio sampling points to obtain an average energy value, which is an energy average of the audio unit of the frame.
The frame number count is a parameter representing the background music characteristics and the control target state change characteristics in the target audio. Different frame number counting determination rules can be set for different game scenes; in most game scenarios, the frame count for a frame of audio units may be determined based on the respective state parameters of the steering target at that frame of audio units and/or a number of preceding frames of audio units. Specifically, in some game scenes, an initial value of the frame count may be preset as a previous frame count of the first frame audio unit, and then state parameters such as an energy average of each frame audio unit are identified, and the frame count of each frame audio unit is determined by combining features of the previous frame or the previous frames of audio units. As one example, for each frame of audio units generated for some scenes, the frame number count for the previous frame of audio units may be incremented, decremented, or maintained based on the energy average of each frame of audio units to determine the frame number count for the current frame. For example, when the energy average value of a certain frame is greater than a first energy threshold, the frame number of the frame is counted and one or more counting units are added on the basis of the frame number of the previous frame; when the energy mean value of the frame is smaller than a second energy threshold value, the frame number of the frame is counted and reduced by one or more counting units on the basis of the frame number counting of the previous frame; when the energy mean value of the frame is greater than or equal to a second energy threshold and is less than or equal to a first energy threshold, setting the frame number count of the frame as the frame number count of the previous frame; wherein the first energy threshold is greater than the second energy threshold. As another example, for audio units of frames generated by other scenes, a frame count of a previous frame or previous frames of audio units may be first identified, and the frame count of the current frame may be determined based on the energy average by increasing, decreasing, or maintaining the previous frame count after the frame count of the previous frame or previous frames of audio units satisfies a condition. Optionally, if the frame count of the current frame is determined by decreasing the count unit based on the frame count of the previous frame, a minimum value (e.g., 0 equivalent) of the frame count in the corresponding game scene may also be set, and when the frame count of the previous frame is taken as the minimum value or is smaller than the minimum value after decreasing the corresponding count unit, the frame count of the current frame is determined by maintaining the frame count of the previous frame.
And S500, comparing the frame number count or the energy average value with characteristic thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit.
The characteristic threshold may be set according to audio variation characteristics in corresponding audio scenes, and in some scenes, it may be represented as one or more fixed values, and in other scenes, it may also be adjusted in real time according to state parameters such as frame count and/or energy mean of the relevant audio units, so as to accurately determine the audio scene or game scene in which the corresponding audio unit is located.
According to the method for identifying the audio scene, the audio data to be processed is divided into the continuous multi-frame audio units in the time sequence by acquiring the audio data to be processed, and the audio data to be processed is carried out by taking a frame as a unit in the identification process of the audio scene, so that the scene corresponding to each frame of audio unit can be accurately identified; filtering each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain a target audio, so that the target audio comprises data effectively representing the state characteristics of the control target, and the accuracy of audio scene recognition according to the target audio can be improved; the method comprises the steps of obtaining the frame count and the energy mean value of each frame of audio unit in target audio, comparing the frame count or the energy mean value with the characteristic threshold values corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit so as to set other user perception signals such as corresponding vibration signals and/or shaking signals and the like aiming at a specific audio scene, so that corresponding game products can provide more comprehensive user perception signals, the participation sense of a user in a game process can be improved, and the purpose of improving the user experience is achieved.
In one embodiment, the target audio comprises a first target audio and a second target audio; the frame number count of each frame audio unit in the first target audio is a first frame number count, and the energy mean value is a first mean value; the frame number count of each frame audio unit in the second target audio is a second frame number count; the characteristic threshold comprises a first trigger threshold and a minimum frame number count value;
comparing the frame number count and the average value with the feature thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit comprises:
setting a first trigger threshold of each frame according to the value characteristics of the first frame number count of each frame in the first target audio; determining that audio units having a first average value greater than a first trigger threshold are generated in a first audio scene; the first audio scene is a scene of the control target changing in speed in a first direction;
determining that an audio unit having the second frame number count greater than or equal to the minimum frame number count value is generated in a second audio scene; the second audio scene is a scene in which the speed of the control target changes in a second direction.
In the racing game, the specific scenes that need to provide other sensing signals besides vision and hearing are the scenes that the speed of the control target changes, such as small nitrogen acceleration, large nitrogen acceleration, collision, acceleration belt passing and drifting, and the like. The speed change of the specific scenes is mainly represented in two directions, one is a direction parallel to the driving direction of the control target, such as the speed change direction in scenes of small nitrogen acceleration, large nitrogen acceleration, collision or acceleration belt passing and the like, the direction is set as a first direction, the scenes are called as first audio scenes, and the target audio corresponding to the first audio scenes is called as first target audio; the other is a direction (for example, a direction perpendicular to the driving direction of the manipulation target) which forms a certain angle with the driving direction of the manipulation target, for example, a speed change direction in a scene such as drifting or scratching, the direction is set as a second direction in the present application, the scene is referred to as a second audio scene, and a target audio corresponding to the second audio scene is referred to as a second target audio. The first target audio and the second target audio can be obtained by respectively performing filtering processing on audio data to be processed according to the band characteristics of corresponding audio scenes; and consistent with the audio data to be processed, the first target audio and the second target audio respectively comprise a plurality of frames of audio units which are continuous in time sequence, and each frame of audio unit comprises N audio sampling points. In a certain frame audio unit of the first target audio, summing the absolute values of the energy values of all the sampling points, and dividing the summed value by the number N of the audio sampling points to obtain an average energy value, wherein the average energy value is the energy average value of the frame audio unit, namely a first average value; and identifying the first average value of each frame audio unit, the first frame number count of the previous frame or the previous frames, and determining the first frame number count of each frame audio unit by increasing, decreasing or keeping the previous frame number count according to the identification result. In a certain frame of audio unit of the second target audio, summing the absolute values of the energy values of the sampling points, and dividing the summed value by the number N of the audio sampling points to obtain an average energy value, wherein the average energy value is the energy average value of the frame of audio unit, namely a second average value; and identifying the second frame number count of the previous frame or previous frames according to the second average value of each frame of audio unit, and determining the second frame number count of each frame of audio unit by increasing, decreasing or keeping the previous frame number count according to the identification result.
If a certain frame of audio unit is generated in the first audio scene, indicating that the corresponding game scene comprises a first game scene with a change in speed in the first direction, such as small nitrogen acceleration, large nitrogen acceleration, collision, acceleration zone passing and the like, at this time, a specific user sensing signal can be set according to parameters such as the first frame number count in the frame of audio unit, the first average value, the first frame number count of adjacent frame audio units and the like, and the characteristics of corresponding terminal equipment; for example, for an intelligent mobile device, a corresponding vibration signal may be set according to parameters such as a first frame count in a certain frame audio unit, a first average value, and a first frame count of an adjacent frame audio unit, so that the intelligent mobile device generates corresponding vibration when playing the frame audio unit, and for a game machine with a racing car model, a model shaking signal representing actions such as acceleration and/or collision of the racing car model may be set according to parameters such as the first frame count in the certain frame audio unit, the first average value, and the first frame count of the adjacent frame audio unit, so that the racing car model generates corresponding shaking when playing the frame audio unit; this can provide the user with more comprehensive game perception, and the user can feel the state change of the manipulation target in the first direction from multiple aspects during the game. If a frame of audio unit is generated in a second audio scene, indicating that the corresponding game scene includes a second game scene with a second direction speed change such as drift, other user perception signals can be set according to the second frame number count in the frame of audio unit and the characteristics of corresponding terminal equipment; for example, for an intelligent mobile device, a corresponding vibration signal may be set according to parameters such as a second frame count in a certain frame audio unit, so that the intelligent mobile device generates corresponding vibration when playing the frame audio unit data of a game, and for example, for a game machine with a racing car model, a model shaking signal representing actions such as a drift of the racing car model may be set according to the second frame count in a certain frame audio unit, so that the racing car model generates corresponding shaking when playing the frame audio unit; this can further provide the user with a more comprehensive game perception, so that the user can feel the state change of the manipulation target in the second direction from more aspects during the game.
In this embodiment, the first trigger threshold of each frame is set according to the first frame number count and the value characteristics thereof, so that the value of the first trigger threshold can be determined according to the background music characteristics and the speed change characteristics of the control target represented by the corresponding audio unit, and the accuracy of identifying the first audio scene according to the method can be improved. In a second audio frequency scene, the minimum frame number counting value can be set according to the speed change characteristics of the specific action of the control target in the racing game in a second direction; for speed change characteristics in the second direction, such as for drift, the minimum frame number count value may be set to 6. If the second frame count of a frame of audio is greater than or equal to the minimum frame count, it indicates that the frame of audio is generated in the second audio scene.
In particular, the first trigger threshold comprises an incremental primary trigger threshold, a medium trigger threshold, and a high trigger threshold;
the setting of the first trigger threshold of each frame according to the value-taking feature of the first frame number count of each frame in the first target audio includes:
setting the first trigger threshold as a primary trigger threshold if GAIN _ CNT (n) < a × GAIN _ CNT _ STEP; wherein GAIN _ CNT (n) represents the first frame count of the current frame, GAIN _ CNT _ STEP represents the interval threshold, a is a positive number, and the symbol x represents the multiplication; the interval threshold is used for describing the interval between each level of threshold in the first trigger threshold;
if a is not more than GAIN _ CNT _ STEP and (n) is less than 2a is not more than GAIN _ CNT _ STEP, setting the first trigger threshold as a middle trigger threshold;
if 2a GAIN _ CNT _ STEP ≦ GAIN _ CNT (n) <3a GAIN _ CNT _ STEP, then the first trigger threshold is set to the high level trigger threshold.
The primary trigger threshold, the intermediate trigger threshold and the advanced trigger threshold may be set according to a background music feature adopted in a specific game scene and each action feature of a speed change of the manipulation target in the first direction, respectively. Specifically, the primary, intermediate and advanced trigger thresholds are increasingly trending, and the advanced trigger threshold is less than the maximum trigger threshold, e.g., 3500 for the primary trigger threshold, 4000 for the intermediate trigger threshold, 4500 for the advanced trigger threshold, 7000 for the maximum trigger threshold. The interval threshold GAIN _ CNT _ STEP sets the interval of the frame count for each level of the first trigger threshold (e.g., the primary trigger threshold, the intermediate trigger threshold, and the advanced trigger threshold), which may be determined according to the update rule of the first frame count, and may be set to a positive value such as 20. The value of a may be determined according to the value characteristics of the interval threshold GAIN _ CNT _ STEP, for example, in a certain example, when the interval threshold GAIN _ CNT _ STEP takes 20, a may take 1; if the primary trigger threshold is 3500, the intermediate trigger threshold is 4000, the high trigger threshold is 4500, and the maximum trigger threshold is 7000, then:
if GAIN _ cnt (n) <20, then MOVING _ THR ═ 3500; wherein MOVING _ THR represents a first trigger threshold;
if 20 ≦ GAIN _ cnt (n) <40, then MOVING _ THR ≦ 4000;
if 40 ≦ GAIN _ cnt (n) <60, then MOVING _ THR 4500.
In an embodiment, before comparing the frame number count or the energy average with feature thresholds corresponding to different audio scenes and determining an audio scene corresponding to each frame of audio unit, the method further includes:
determining the first frame number count of the current frame according to the first frame number count of the previous frame, the first average value of the current frame and an interval threshold; wherein, the previous frame is a frame before the current frame;
and/or the presence of a gas in the gas,
determining the second frame number count of the current frame according to the second frame number count of the previous frame, the second average value of the current frame and a second trigger threshold; wherein the second average is an energy average of a corresponding audio unit in the second target audio.
The first frame number count of the current frame is determined according to the first frame number count of the previous frame, the first average value of the current frame and the interval threshold, so that the determined first frame number count can more accurately represent the background music characteristics and the speed change characteristics of the control target in the first direction in the corresponding audio unit; and determining the second frame number count of the current frame according to the second frame number count of the previous frame, the second average value of the current frame and the second trigger threshold value, so that the determined second frame number count can more accurately represent the speed change characteristic of the control target in the corresponding audio unit in the second direction.
In one example, the determining the first frame number count of the current frame based on the first frame number count of the previous frame, the first average of the current frame, and the interval threshold comprises:
if GAIN _ CNT (n-1)<a GAIN CNT STEP, AVE L (n) > b2When it is time, the GAIN _ CNT (n) is updated in the first update mode, in AVE _ L (n)<b1And GAIN _ CNT (n-1) >0, updating GAIN _ CNT (n) by a second updating formula; wherein GAIN _ CNT (n-1) represents the first frame count of the previous frame, GAIN _ CNT (n) represents the first frame count of the current frame, GAIN _ CNT _ STEP represents the interval threshold, a is a positive number, symbol is multiplied, AVE _ L (n) represents the first average value of the current frame, b1Representing a first mean evaluation parameter, b2Representing a second mean evaluation parameter, b3Represents a third mean evaluation parameter, b4Represents a fourth mean evaluation parameter; the first update is used for increasing the first frame number count; the second update is for reducing the first frame count;
If a is GAIN _ CNT _ STEP ≦ GAIN _ CNT (n-1)<2a GAIN _ CNT _ STEP, in AVE _ L (n) > b3When it is time, the GAIN _ CNT (n) is updated in the first update mode, in AVE _ L (n)<b2Updating GAIN _ CNT (n) by a second updating formula;
if 2a is GAIN _ CNT _ STEP.ltoreq.GAIN _ CNT (n-1)<3a GAIN _ CNT _ STEP, in AVE _ L (n) > b4When it is time, the GAIN _ CNT (n) is updated in the first update mode, in AVE _ L (n)<b3If GAIN _ CNT (n) is equal to 3a × GAIN _ CNT _ STEP, GAIN _ CNT (n) is set to GAIN _ CNT (n) -c1(ii) a Wherein, c1Representing a first step value.
Specifically, the first update is: GAIN _ CNT (n) ═ GAIN _ CNT (n-1) + c2(ii) a The second update is: GAIN _ CNT (n) ═ GAIN _ CNT (n-1) -c3(ii) a Wherein, c2Representing a second step value, c3Representing a third step value.
First step value c1Second step value c2And a third step value c3It may take a conventional count unit value (e.g., 1) or an integer multiple of the count unit value (e.g., 2, etc.), respectively. The first mean evaluation parameter b1Second mean evaluation parameter b2Third mean evaluation parameter b3And a fourth mean evaluation parameter b4The audio energy setting can be respectively set according to the background music type adopted by the corresponding game and the audio energy characteristics corresponding to various actions when the speed of the control target changes in the first direction. Specifically, the first mean evaluation parameter b1Second mean evaluation parameter b2Third mean evaluation parameter b3And a fourth mean evaluation parameter b4Incrementing, e.g. first mean evaluation parameter b1Taking 2800 a second mean evaluation parameter b2Taking 3200 as a mean value evaluation parameter b3Taking 4000, the fourth mean value to evaluate the parameter b4Take 5000, then, if c1=c2=c3When 1 and a is 1, the following are provided:
if GAIN _ CNT (n-1) < GAIN _ CNT _ STEP, GAIN _ CNT (n) -1 if AVE _ l (n) > 3200, GAIN _ CNT (n) ═ GAIN _ CNT (n-1) +1, and GAIN _ CNT (n) -1 if AVE _ l (n) > 2800;
(ii) if GAIN _ CNT _ STEP ≦ GAIN _ CNT (n-1) <2 × GAIN _ CNT _ STEP, GAIN _ CNT (n) ═ GAIN _ CNT (n-1) +1 when AVE _ l (n) > 4000, and GAIN _ CNT (n) ═ GAIN _ CNT (n-1) -1 when AVE _ l (n) < 3200;
(ii) when 2 × GAIN _ CNT _ STEP ≦ GAIN _ CNT (n-1) <3 × GAIN _ CNT _ STEP, GAIN _ CNT (n) ═ GAIN _ CNT (n-1) +1 when AVE _ l (n) > 5000, and GAIN _ CNT (n) ═ GAIN _ CNT (n-1) -1 when AVE _ l (n) < 4000;
if GAIN _ CNT (n) ═ 3 × GAIN _ CNT _ STEP, GAIN _ CNT (n) is set to GAIN _ CNT (n) — 1, that is, GAIN _ CNT (n) — 1.
In one example, the determining the second frame number count of the current frame based on the second frame number count of the previous frame, the second average of the current frame, and the second trigger threshold includes:
if AVE _ R (n) is greater than BP _ ATT, adopting a third updating method to update BP _ CNT (n), and setting BP _ CNT (n) as a maximum frame number counting value when BP _ CNT (n) is greater than the maximum frame number counting value; wherein AVE _ r (n) represents a second average value of the current frame, BP _ ATT represents a second trigger threshold, BP _ cnt (n) represents a second frame count of the current frame; the third update to increase the second frame number count;
if AVE _ R (n) is less than or equal to BP _ ATT, if BP _ CNT (n-1) is positive, adopting a fourth updating formula to update BP _ CNT (n); wherein BP _ CNT (n-1) represents a second frame number count of a previous frame; the fourth update is to decrease the second frame number count.
Specifically, the third update is: BP _ CNT (n) ═ BP _ CNT (n-1) + c4(ii) a The fourth update is: BP _ CNT (n) ═ BP _ CNT (n-1) -c5Wherein c is4Represents a fourth step value, c5Representing a fifth step value.
The fourth step value c4And a fifth step value c5It may take a conventional count unit value (e.g., 1) or an integer multiple of the count unit value (e.g., 2, etc.), respectively. The second trigger threshold BP _ ATT can be set according to the action characteristics of the control target in the racing game when the speed of the control target changes in the second direction; for example, for the action of drifting, a second triggerThe threshold may be set at 1400. The maximum frame number count is the maximum value of the frame number count in the second audio scene, and may be set according to the relevant characteristics of the specific action in the second game scene, for example, for the action of drift, the maximum frame number count is set to 10.
In an embodiment, the filtering, according to band features corresponding to an audio scene, each frame of audio unit to obtain a target audio includes:
acquiring first channel data and second channel data of each frame of audio unit;
performing low-pass filtering on the first channel data to obtain a first target audio frequency; and performing band-pass filtering on the second channel data to obtain a second target audio.
The first channel data and the second channel data may generally be audio data corresponding to two channels in an audio playing module of a corresponding terminal device (e.g., a smart terminal or a game console), for example, the first channel data is left channel data, and the second channel data is right channel data. In some examples, the first channel data may have a first partial characteristic of the audio data to be processed, e.g., the left channel may be a characteristic after compressing the bass region signal; the second channel data may have a second portion of the characteristics of the audio data to be processed, e.g., the right channel may be a compressed mid-high range signal; the first partial feature does not correspond exactly to the second partial feature. In another example, the first channel data and the second channel data may both be audio data to be processed, for example, when the audio data to be processed is monaural data, both the first channel data and the second channel data are obtained by copying the audio data to be processed.
Specifically, the audio frequency point of the first audio scene is mainly in a frequency range of 100Hz to 2000Hz, and low-pass filtering is performed on the first channel data, so that interference audio such as human voice in a game can be filtered, noise data in the first target audio can be reduced, and characteristics such as background music and/or control target speed of a specific scene can be effectively represented. When the control target moves in the second direction such as drifting in the game process, the generated audio is mainly concentrated in the frequency band range of 1100Hz-1300Hz, band-pass filtering is carried out on the second channel data, the audio representing the movement of the control target in the second direction in the second channel data can be extracted, and the second target audio can represent the state change characteristics such as the speed of the control target in the second direction.
Specifically, in this embodiment, the cut-off frequency of the low-pass filtering may be set according to frequency bands where the effective audio and the interference audio in the first audio scene are located, and is usually set as a parameter that can pass through the effective audio and filter the interference audio as much as possible; for example, if the disturbing audio, such as the human voice of the first audio scene, is mainly in the high frequency part of the range of 100Hz-2000Hz, the cut-off frequency of the low-pass filtering can be set to 225Hz to filter out the human voice of the first audio scene as much as possible. The pass frequency band of the band-pass filtering may be set according to a frequency band where an audio representing the action feature in the second direction of the manipulation target in a specific game product is located, and may often be set to a frequency band where an audio representing the action feature in the second direction of the manipulation target is located, for example, set to a frequency band of 1100Hz to 1300 Hz.
A second aspect of the present application provides a motor driving method, shown with reference to fig. 2, including:
s700, according to the method for identifying an audio scene provided in any of the above embodiments, identifying an audio scene corresponding to a currently played audio unit.
And S800, acquiring a corresponding vibration rule according to the audio scene.
The vibration rules record the vibration characteristics of the drive motor vibration. The vibration characteristics may include characteristics such as the amplitude (amplitude) of the vibration and/or the tendency of the vibration to change. When the vibration rule corresponding to each frame of audio unit is set, the maximum vibration sense FLAG bit TRIG _ FLAG may be set to mark the vibration sense degree of the corresponding vibration signal, where the maximum vibration sense FLAG bit TRIG _ FLAG may have two FLAG signs, that is, a first FLAG indicating that the corresponding vibration sense degree is high, and a second FLAG indicating that the corresponding vibration sense degree is low, and an initial value of the maximum vibration sense FLAG bit TRIG _ FLAG may be set to the second FLAG indicating that the vibration sense degree is low. In one example, the first flag may be noted as 1 and the second flag may be noted as 0.
And S900, driving the motor to vibrate according to the vibration rule, and realizing the vibration effect corresponding to the currently played audio unit.
According to the motor driving method, the corresponding vibration rule is obtained by identifying the audio scene corresponding to the currently played audio unit, the motor is driven to vibrate according to the vibration rule, the vibration effect corresponding to the currently played audio unit is achieved, the perception signals of racing car game products are effectively enriched, and therefore a user can more comprehensively perceive various state changes of the control target in the scenes in the game process, and the motor driving method has higher participation sense.
In one embodiment, the vibration rules include a first vibration rule and a second vibration rule; the obtaining of the corresponding vibration rule according to the audio scene includes: if the audio unit of the current frame is generated in the first audio scene, determining a first vibration rule according to a first average value of the current frame and a first trigger threshold; and if the current frame audio unit is generated in the second audio scene, determining a second vibration rule according to the second frame number count of the current frame.
The embodiment can respectively determine the first vibration rule of the first audio scene and the second vibration rule of the second audio scene, and when each frame of audio unit is played, the motor is driven to vibrate according to the corresponding vibration rule, so that the corresponding racing car game product can provide vibration signals for the specific game scenes of the first audio scene and/or the second audio scene, and the vibration effect of the corresponding game product is further promoted.
Specifically, the determining the first vibration rule according to the first average value of the current frame and the first trigger threshold includes: if AVE _ L (n) is greater than MAX _ THR, setting the amplitude of the current frame as a second amplitude value, and setting the maximum vibration sensation zone bit as a first zone; if MOVING _ THR < AVE _ L (n) < MAX _ THR, when the maximum vibration sensation flag bit is the second flag and GAIN (n-1) < GAIN _ MAX, controlling the amplitude of the current frame according to the vibration sensation climbing rule, and when the maximum vibration sensation flag bit is the first flag and GAIN (n-1) > GAIN _ MAX, setting the amplitude of the current frame as the maximum amplitude of the motor; wherein AVE _ l (n) represents a first average value of the current frame, MAX _ THR represents a maximum trigger threshold; the maximum vibration sensing flag is used for marking vibration sensing degree, MOVING _ THR represents a first trigger threshold value, GAIN (n-1) represents the amplitude of a previous frame, and GAIN _ MAX represents the maximum amplitude of the motor; the vibration sense climbing rule is a rule that vibration amplitudes of all times are set in sequence according to the amplitude recorded by the climbing control matrix; the climbing control matrix records a plurality of amplitude values, and specifically can record a plurality of non-decreasing amplitude values so as to keep or increase the vibration.
And/or the presence of a gas in the gas,
the determining a second vibration rule according to the second frame number count of the current frame includes: if BP _ CNT (n) is larger than or equal to m, setting the amplitude of the current frame as a first amplitude value; wherein BP _ cnt (n) represents a second frame count of the current frame, and m represents a count threshold of the second audio scene, which may be set according to a corresponding action characteristic of the manipulation target in the second game scene, for example, the action for the drift may be set to 6.
The number of the amplitude values recorded by the climbing control matrix can be set according to the corresponding vibration effect. In one example, the hill climbing control matrix records 8 amplitude values, which may be recorded as GAIN _ STEP [8 ] at this time]The specific value characteristics can be as follows in sequence:
specifically, the first vibration rule further includes:
if AVE _ L (n) > MOVING _ THR, the maximum vibration sensation flag bit is the first flag, and GAIN (n-1) is greater than the third amplitude value, the amplitude of the current frame is set as the first amplitude value.
The first amplitude value, the second amplitude value and the third amplitude value may be set according to vibration sense required to be configured according to corresponding action characteristics, for example, the first amplitude value may be set as the first amplitude value
Setting the second amplitude value as
Setting the third amplitude value as
To improve the vibration effect.
Further, the controlling the amplitude of the current frame according to the vibration gradient rule includes:
and acquiring the count of the climbing frame number, searching the amplitude value sequenced as the count of the climbing frame number in a climbing control matrix, and determining the amplitude of the current frame according to the sum of the searched amplitude value and GAIN (n-1).
In this embodiment, the number count of the climbing frames is set, so that the amplitude values sorted into the number count of the climbing frames can be searched in the climbing control matrix, and the amplitude of the current frame is determined according to the amplitude values obtained by searching, thereby ensuring the orderliness of the vibration sensing climbing rule.
Optionally, after controlling the amplitude of the current frame to increase according to the vibration sense slope climbing rule, the method further includes:
when GAIN (n) is greater than GAIN _ MAX, setting the amplitude of the current frame as the maximum amplitude of the motor, setting the maximum vibration sense flag bit as a first flag, and performing an addition operation on the number of climbing frames; wherein GAIN (n) represents the amplitude of the current frame.
Optionally, the motor driving method further includes:
and if the number of climbing frames is greater than the threshold value of the climbing times, setting the number of the climbing frames as the threshold value of the climbing times.
The above-mentioned climbing number of times threshold value can be set up according to the amplitude value number setting of climbing control matrix, sets up to the value that slightly is less than the amplitude value number usually, for example when climbing control matrix includes 8 amplitude values, the climbing number of times threshold value can be set to 5 to make the value of each time of vibration climbing frame number count, all can seek corresponding amplitude value in corresponding climbing control matrix and carry out vibration control.
In a third aspect, the present application provides an audio scene recognition system, as shown in fig. 3, the audio scene recognition system includes:
a first obtaining module 100, configured to obtain audio data to be processed;
a dividing module 200, configured to divide the audio data to be processed into multiple frames of audio units that are consecutive in time sequence;
the filtering module 300 is configured to perform filtering processing on each frame of audio unit according to the band features corresponding to the audio scene to obtain a target audio;
a second obtaining module 400, configured to obtain a frame count and an energy average of each frame of audio unit in the target audio; the frame number count is used for representing the characteristics of a specific scene;
the determining module 500 is configured to compare the frame count or the energy average with feature thresholds corresponding to different audio scenes, and determine an audio scene corresponding to each frame of audio unit.
For specific limitations of the audio scene recognition system, reference may be made to the above limitations of the audio scene recognition method, which are not described herein again. The various modules in the above-described audio scene recognition system may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The present application provides, in a fourth aspect, a motor drive system, as shown in fig. 4, comprising:
an identifying module 700, configured to identify an audio scene corresponding to a currently played audio unit according to the audio scene identifying system provided in any of the embodiments above;
a third obtaining module 800, configured to obtain a corresponding vibration rule according to the audio scene;
and the driving module 900 is configured to drive the motor to vibrate according to the vibration rule, so as to achieve a vibration effect corresponding to the currently played audio unit.
For specific limitations of the motor driving system, reference may be made to the above limitations of the motor driving method, which are not described herein again. The various modules in the motor drive system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The present application provides, in a fifth aspect, an electronic device comprising a processor and a storage medium; the storage medium having program code stored thereon; the processor is configured to call the program code stored in the storage medium to execute the method for identifying an audio scene according to any of the embodiments.
In one embodiment, the electronic device further comprises a motor; the processor is further used for executing the motor driving method provided by any one of the above embodiments.
Specifically, referring to fig. 5, the electronic device further includes a motor driving chip, and when the electronic device plays each frame of audio unit, the processor may control the motor driving chip to drive the motor to vibrate according to a vibration rule corresponding to each frame of audio unit.
Furthermore, the electronic device may further include an audio power amplifier, a speaker, and other components to play each frame of audio unit. At this time, as shown in fig. 6, in the working process of the electronic device, firstly, audio data of the car racing game is obtained, after it is determined that audio data to be processed is generated in the first audio scene and/or the second audio scene, a vibration rule corresponding to each frame of audio unit is obtained, each frame of audio unit is played through an audio power amplifier and a speaker, and simultaneously, a motor driving chip is controlled to drive a motor to vibrate according to the vibration rule corresponding to each frame of audio unit.
The electronic equipment can also be called as terminal equipment, can provide vibration signals aiming at the first audio scene and/or the second audio scene of the racing car game, and effectively enriches the user perception signals of the game products, so that the user can more comprehensively perceive various state changes of the control target in the scenes in the game process, and the user experience is effectively improved.
Although the application has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. This application is intended to embrace all such modifications and variations and is limited only by the scope of the appended claims. In particular regard to the various functions performed by the above described components, the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the specification.
That is, the above description is only an embodiment of the present application, and not intended to limit the scope of the present application, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, such as mutual combination of technical features between various embodiments, or direct or indirect application to other related technical fields, are included in the scope of the present application.
In addition, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
The previous description is provided to enable any person skilled in the art to make and use the present application. In the foregoing description, various details have been set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known processes have not been described in detail so as not to obscure the description of the present application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.