[go: up one dir, main page]

CN114727212B - Audio processing method and electronic equipment - Google Patents

Audio processing method and electronic equipment Download PDF

Info

Publication number
CN114727212B
CN114727212B CN202210231526.5A CN202210231526A CN114727212B CN 114727212 B CN114727212 B CN 114727212B CN 202210231526 A CN202210231526 A CN 202210231526A CN 114727212 B CN114727212 B CN 114727212B
Authority
CN
China
Prior art keywords
signal
sound
earphone
audio
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210231526.5A
Other languages
Chinese (zh)
Other versions
CN114727212A (en
Inventor
杨昭
韩欣宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Honor Device Co Ltd
Original Assignee
Beijing Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Honor Device Co Ltd filed Critical Beijing Honor Device Co Ltd
Priority to CN202210231526.5A priority Critical patent/CN114727212B/en
Publication of CN114727212A publication Critical patent/CN114727212A/en
Application granted granted Critical
Publication of CN114727212B publication Critical patent/CN114727212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)
  • Headphones And Earphones (AREA)

Abstract

An audio processing method and an electronic device are provided. In a transparent transmission mode, the method can acquire environmental sounds through the left earphone and the right earphone, obtain azimuth information of the environmental sounds based on two audio signals acquired by the two earphones, acquire spatial clues corresponding to the azimuth information based on the azimuth information after the left earphone and the right earphone transmit the acquired environmental sounds respectively, and superpose the spatial clues on the audio signals acquired by the left earphone and the right earphone transmitting the environmental sounds. Therefore, the spatial orientation of the audio after transparent transmission can be kept, and a user can distinguish the specific orientation of the sound in the environment under the condition of wearing the earphone.

Description

Audio processing method and electronic equipment
Technical Field
The present application relates to the field of terminal technologies, and in particular, to an audio processing method and an electronic device.
Background
Nowadays, wireless bluetooth headsets have become common electronic devices in our daily lives. When the Bluetooth headset is worn, the Bluetooth headset is less sensitive to external sound due to the isolation effect of the headset. However, sometimes we want to hear external sound clearly even when wearing the bluetooth headset. For example, people need to pay attention to car whistling on roads, and pay attention to platform broadcasting on buses, and the isolation effect of the Bluetooth headset on external sound undoubtedly brings inconvenience to people. The transparent transmission function is then in effect. In order to allow a wearer such as an earphone to hear ambient sound in real life simultaneously when wearing the earphone, some earphones have been provided with a transparent transmission function. After the transparent transmission function is turned on, the user can hear external sound even if wearing a bluetooth headset.
However, after the existing bluetooth headset transparently transmits the audio signals in the environment, the sound played for the wearer can cause a certain confusion sense to the perception of the wearer to the space environment, thus threatening the safety of the user and influencing the user experience sense.
Therefore, it is important for those skilled in the art to keep spatial information of sound when ambient sound is transmitted through.
Disclosure of Invention
The application aims to provide an audio processing method, a Graphical User Interface (GUI) and an electronic device. The electronic equipment comprises a left earphone and a right earphone, and can collect the environmental sound through the left earphone and the right earphone, obtain the azimuth information of the environmental sound based on two audio signals collected by the two earphones, and superimpose the azimuth information on the audio signal obtained after the environmental sound is transmitted. Therefore, the spatial orientation of the environment sound after transparent transmission can be kept, the specific orientation of the sound in the environment can be identified under the condition that a user wears the earphone, and better user experience is provided.
The above and other objects are achieved by the features of the independent claims. Further implementations are presented in the dependent claims, the description and the drawings.
In a first aspect, a method for processing audio is provided, where the method includes: acquiring a first signal and a second signal, wherein the first signal is an audio signal acquired by acquiring sound of an external environment through a first earphone, and the second signal is an audio signal acquired by acquiring the sound of the external environment through a second earphone; respectively carrying out transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal; determining orientation information of a sound of the external environment from the first signal and the second signal; and respectively adjusting the first audio signal and the second audio signal according to the azimuth information to obtain a first target signal and a second target signal.
In the method, after the audio signal is transmitted through, the orientation sense of the first audio signal and the second audio signal is eliminated. Thus, in the method, the first object signal and the second object signal obtained retain the spatial orientation sense of the original audio (i.e. the first signal and the second signal) by determining the spatial orientation information of the audio signal before the unvarnished transmission and then restoring the spatial orientation sense for the audio that has lost the spatial orientation sense after the unvarnished transmission of the audio.
By implementing the method provided by the first aspect, the user can distinguish the specific direction of the sound in the environment even when wearing the headset after the headset transmits the audio, and better user experience is provided.
With reference to the first aspect, in a possible implementation manner, the first target signal is played through the first earphone, the second target signal is played through the second earphone, a playing time of the first target signal is different from a playing time of the second target signal, and loudness of the first target signal is different from loudness of the second target signal.
Humans can perceive the three-dimensional nature of sound and localize sound sources. Because the human ears can feel the time and volume difference of the sound. Therefore, in the present embodiment, by reducing the difference in time and loudness that the ambient sound gives to the human ear in the open-ear state of the human, the first target signal and the second target signal can be made to give the user the original sense of direction of the ambient sound.
With reference to the first aspect, in a possible implementation manner, a distance between the first headphone and an environment sound source is smaller than a distance between the second headphone and the environment sound source, the environment sound source is a sound source of sound of the external environment, a playing time of the first target signal is before a playing time of the second target signal, and loudness of the first target signal is greater than loudness of the second target signal.
Because the distance from the sound source to the left ear is different from the distance from the sound source to the right ear, the time difference of the sound signal transmitted to the two ears is small, and the time difference can help people to know the horizontal position of the sound source. In addition, in practice, the sound wave is often blocked and cannot be transmitted and diffused, and thus, the sound is generated in the area of the shadow (acoustic shadow), for example, when the sound wave meets buildings and the head of a person, which affects the left ear and the right ear to hear different sound sizes and to receive different frequency distributions. Signals originating from different locations arrive at different times in each ear and at different loudness. Therefore, in the trial mode, when the first earphone is closer to the sound source, the time of the ambient sound collected by the first earphone is earlier than the time of the ambient sound collected by the second earphone, and the loudness of the ambient sound collected by the first earphone is necessarily greater than the loudness of the ambient sound collected by the first earphone.
With reference to the first aspect, in a possible implementation manner, the first earphone is a left earphone, the second earphone is a right earphone, the azimuth information includes binaural azimuth information and monaural azimuth information, the monaural azimuth information includes left ear information and right ear information, and the adjusting the first audio signal and the second audio signal according to the azimuth information to obtain a first target signal and a second target signal includes: adjusting loudness and relative time delay of the first audio signal and the second audio signal according to the binaural azimuth information to obtain a third audio signal and a fourth audio signal; filtering the third audio signal through the left ear information to obtain the first target signal; and filtering the fourth audio signal through the right ear information to obtain the second target signal.
It is to be understood that a person may locate the position of the sound in both the vertical direction and the horizontal direction, which is beneficial for the monaural and binaural information. Due to the corresponding amplitude and time differences when sound is delivered to both ears of a person, we generally refer to this difference in the perception of the ears by the same sound source as "binaural information", which in some embodiments of the present application may also be referred to as "binaural cues", which is where the human being is key to being able to judge the orientation of the sound in the horizontal direction. In addition, the human ear can recognize the specific position of sound in the vertical direction because the shape of auricle changes the intensity of different frequency regions of sound wave, and the sound spectrum is differentiated when the sound emitted from above and below is reflected by auricle to ear canal because of the different shape of auricle. In fact, the process of reflection and diffraction of the external sound wave on the auricle of the human ear can be regarded as filtering the external sound wave by a filtering system, and the external sound wave is processed by the filter to obtain the sound signal in front of the eardrums of the two ears. This filter may be referred to as "monaural information," which may also be referred to as "monaural cues" in some embodiments of the present application.
Therefore, by implementing the method provided by the embodiment, after the ambient sound is transmitted, the original horizontal direction sense and vertical direction sense of the ambient sound after transmission can be restored, so that the user can distinguish the specific directions of the sound in the environment in the horizontal direction and the vertical direction, and better user experience sense is provided.
With reference to the first aspect, in a possible implementation manner, the determining, according to the first signal and the second signal, azimuth information of the sound of the external environment includes: respectively performing spectrum analysis on the first signal and the second signal to obtain N sub-bands of the first signal and the second signal; determining binaural cross-correlation coefficients of subbands in the same frequency band of the N subbands of the first signal and the second signal; determining orientation information of a sound of the external environment from the first signal and the second signal in case the number of target subband pairs is larger than a first threshold, the target subband pairs being two subbands of the N subbands of the first signal and the N subbands of the second signal for which binaural cross-correlation coefficients are smaller than a second threshold.
It is understood that when the difference in distance (or relative position) from the left ear and the right ear of a person is small, the difference in time and volume between the two ears of the person caused by the sound generated by the sound source is often small, and when the difference is small to some extent, especially when the sound source is located right in front of and right behind the person, the person cannot feel the difference, and the sound at this time has no "sense of orientation" for the person. In addition, in an actual environment, the generation of the environmental sound is often not performed by a single sound source, but is commonly performed by a plurality of sound sources located in different directions. However, for the earphone, when the audio of the environmental sound is collected, it cannot be distinguished that there are multiple sound sources in the environment, and it cannot be known that the collected audio signal is actually generated by multiple sound sources in the environment together. However, in acoustics and biology, when the frequencies of two tones are located within one sub-band, a person hears the two tones as one. More generally, if the frequency distribution of a complex signal is located within a sub-band, the human ear perceives the signal as equivalent to a simple signal having a frequency at the center frequency of the sub-band, which is the core connotation of the sub-band. In brief, a subband refers to a range of frequencies within which a signal having a frequency spectrum may be replaced with a component of a single frequency. Therefore, when the audio signal generated by the multiple sound sources in the environment is collected by the earphone, the audio signal appears to the earphone as an audio signal that can be equivalent to the audio signal generated by the single sound source of the multiple sound sources. The IACC generated by the mixed audio signals emitted by the multiple sound sources at the ears can be obtained only by comprehensively analyzing the audio signals collected by the left and right earphones.
In this embodiment, whether the listener can feel the directionality of the sound generated by the sound source in the environment is determined by analyzing the first signal and the second signal, and only when the listener can feel the directionality of the sound generated by the sound source, spatial cues are superimposed on the sound after the sound is transmitted through; when the listener can not feel the directionality of the sound generated by the sound source, the listener does not need to superimpose a spatial cue for the sound, and the energy consumption of the equipment can be further saved.
With reference to the first aspect, in a possible implementation manner, the determining, according to the first signal and the second signal, azimuth information of the sound of the external environment includes: determining a binaural time difference of arrival between the first signal and the second signal from the first signal and the second signal; determining orientation information of the sound of the external environment based on the time difference of arrival at both ears.
With reference to the first aspect, in a possible implementation manner, the determining, according to the first signal and the second signal, azimuth information of the sound of the external environment includes: determining a binaural strength difference of arrival between the first signal and the second signal from the first signal and the second signal; determining orientation information of the sound of the external environment based on the difference in intensity of arriving at both ears.
With reference to the first aspect, in a possible implementation manner, the performing pass-through processing on the first signal and the second signal respectively to obtain a first audio signal and a second audio signal includes: the first earphone respectively conducts transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal; the determining the orientation information of the sound of the external environment according to the first signal and the second signal comprises: the first earphone determines the direction information of the sound of the external environment according to the first signal and the second signal; the adjusting the first audio signal and the second audio signal according to the orientation information to obtain the first target signal and the second target signal includes: and the first earphone adjusts the first audio signal and the second audio signal respectively according to the azimuth information to obtain the first target signal and the second target signal.
It is to be understood that the performance in the first and second headphones may be differentiated and that their performance will determine whether the steps of transparent transmission of audio, spatial cue superposition, etc. are performed by the first one of the headphones or by the second one of the headphones.
In an embodiment, only the first headphone of the first and second headphones in the headphones deploys the transparent transmission model provided by the present application for retaining the spatial orientation information, and the second headphone is not deployed, so the steps of transparent transmission of audio, spatial cue superposition, and the like are performed by the first headphone in the headphones. Therefore, only one processor and a memory which are required to be configured with high performance are needed for the two earphones, and cost is saved.
With reference to the first aspect, in a possible implementation manner, the performing pass-through processing on the first signal and the second signal respectively to obtain a first audio signal and a second audio signal includes: the first earphone conducts transparent transmission processing on the first signal to obtain a first audio signal, and the second earphone conducts transparent transmission processing on the second signal to obtain a second audio signal; the determining the orientation information of the sound of the external environment according to the first signal and the second signal comprises: the first earphone and the second earphone determine orientation information of the sound of the external environment according to the first signal and the second signal; the adjusting the first audio signal and the second audio signal according to the orientation information to obtain the first target signal and the second target signal includes: the first earphone adjusts the first audio signal according to the azimuth information to obtain the first target signal; and the second earphone adjusts the second audio signal according to the azimuth information to obtain the second target signal.
In an embodiment, the transparent transmission model provided by the present application and retaining spatial orientation information is disposed in both a first earphone and a second earphone of an earphone, so that both the first earphone and the second earphone can independently complete steps of transparent transmission of audio, spatial cue superposition, and the like. Therefore, communication overhead between the two earphones can be reduced, audio processing is more timely, a user can hear environment sound from the earphones more quickly, and better experience is achieved.
In a second aspect, an embodiment of the present application provides an electronic device, where the electronic device includes: one or more processors, memory; the memory coupled with the one or more processors is configured to store computer program code comprising computer instructions that are invoked by the one or more processors to cause the electronic device to perform the first aspect or the method of any possible implementation of the first aspect.
In a third aspect, a chip system is provided, which is applied to an electronic device, and includes one or more processors, where the processors are configured to invoke computer instructions to cause the electronic device to execute any one of the implementation manners described in the first aspect, or any one of the implementation manners described in the second aspect.
In a fourth aspect, a computer program product comprising instructions that, when run on an electronic device, cause the electronic device to perform any one of the possible implementations of the first aspect, or any one of the possible implementations of the second aspect.
In a fifth aspect, a computer-readable storage medium is provided, which includes instructions that, when executed on an electronic device, cause the electronic device to perform any one of the possible implementations of the first aspect or any one of the possible implementations of the second aspect.
The advantageous effects of the technical solutions provided in the second to fifth aspects of the present application can refer to the advantageous effects of the technical solutions provided in the first aspect, and are not described herein again.
Drawings
Fig. 1A is a schematic diagram of a time length difference between human ears and an audio signal received according to an embodiment of the present application;
fig. 1B is a schematic diagram of sound pressure perception of human ears on an audio signal according to an embodiment of the present application;
fig. 2 is a schematic diagram of an audio playing system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a transparent transmission model provided in an embodiment of the present application;
fig. 4 is a schematic diagram of a reference point distribution location provided in an embodiment of the present application;
fig. 5 is a schematic view of a usage scenario of an earphone according to an embodiment of the present application;
fig. 6 is a schematic diagram of an audio change process provided by an embodiment of the present application;
fig. 7 is a schematic diagram of a transparent transmission model provided in an embodiment of the present application;
fig. 8 is a schematic diagram illustrating integration of a spatial cue library according to an embodiment of the present application;
FIG. 9 is a functional diagram of a spatial cue library according to an embodiment of the present application;
FIG. 10 is a schematic illustration of some of the user interfaces provided by embodiments of the present application;
fig. 11 is a flowchart of an audio processing method according to an embodiment of the present application;
fig. 12 is a schematic view of a sound wave transmitted in a space according to an embodiment of the present application;
FIG. 13 is a schematic diagram of a variation process of a first signal and a second signal provided by an embodiment of the present application;
fig. 14 is a schematic diagram of a workflow of a first headset and a second headset according to an embodiment of the present application;
fig. 15 is a schematic diagram of a workflow of a first earphone and a second earphone according to an embodiment of the present application;
fig. 16 is a schematic diagram of a workflow of a first earphone, a second earphone and a terminal device according to an embodiment of the present application;
fig. 17 is a schematic diagram of an attitude division according to an embodiment of the present application;
fig. 18 is a flowchart of an audio processing method according to an embodiment of the present application;
FIG. 19 is a graph of frequency spectra provided by an embodiment of the present application;
fig. 20 is a schematic structural diagram of a terminal device 100 according to an embodiment of the present application;
fig. 21 is a schematic structural diagram of an earphone 200 according to an embodiment of the present application.
Detailed Description
In the description of the embodiments of the present application, when referring to the ordinal numbers "first", "second", etc., it should be understood that they are used for distinguishing only, unless they are actually intended to express an order according to the context. The words "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
Unless otherwise indicated, "/" herein generally indicates that the former and latter associated objects are in an "or" relationship, e.g., a/B may represent a or B. The term "and/or" is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the present application, "a plurality" means two or more. The terminology used in the following embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application.
As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the listed items.
For ease of understanding, the following description will first refer to related terms related to embodiments of the present application.
(1) Transparent transmission function
In life, the Bluetooth headset is not sensitive to external sounds when being worn. However, even when wearing a bluetooth headset, it may not be desirable to isolate external sounds. For example, when shopping in a supermarket, if a user carries a wireless headset, the user needs to take off one headset to hear when the user reads the headset by a sales promoter in the supermarket or when the user speaks a cashier. For another example, people need to pay attention to car whistling on roads, and pay attention to platform broadcasting on buses, and at the moment, the isolation effect of the Bluetooth headset on external sound undoubtedly brings inconvenience to people.
The transparent transmission function is then in effect. With the rapid development of True Wireless Stereo (TWS) headphones and Augmented Reality (AR) and Virtual Reality (VR) related audio technologies, a wear-through algorithm is generated to allow a wearer of an electronic device such as a headphone to Hear ambient sounds in real life at the same time. The transparent transmission function is realized based on the algorithm. After the transparent transmission mode is turned on, the user can hear external sound even if wearing a bluetooth headset. Thus, when the external sound is required to be focused, the earphone can be naturally handled, and the earphone does not need to be taken off and worn back and forth.
It should be understood that the "transparent function" is only some names used in the embodiments of the present application, and the representative meanings thereof are already described in the embodiments of the present application, and the names thereof do not set any limit to the embodiments.
(2) Time difference to reach both ears (ITD), and intensity difference to reach both ears (ILD)
Just as creating visual stereo perception by binocular focusing, locating the position of an object, humans can use the time, volume differences, and pinna effects of sound received by both ears to perceive the three-dimensional nature of the sound and locate the source of the sound. Time and volume are coordinate values of the positioning sound.
Fig. 1A is a schematic diagram of a time length difference between human ears and an audio signal received according to an embodiment of the present application. As shown in fig. 1A, a tester 101 and a sound source 102 are in the same environment. At this time, the sound source 102 is located to the left of the tester 101 looking down from above the top of the head. In fig. 2, the lines and curves with arrows roughly reflect the propagation paths of the audio signal generated by the sound source 102 to the left and right ears of the test person 101. As can be seen in FIG. 1A, the audio signal generated by the sound source 102 is shorter in its path of travel to the left ear of the test subject 101 than to the right ear of the test subject 101, since it is closer to the left ear of the test subject 101. It can be understood that, since the propagation speed of sound in air is constant, the sound generated by the sound source 102 will reach the left ear and then the right ear of the tested person 101 through the air, and the time difference of the sound reaching the two ears is called the time difference of arrival ITD, which is one of the clues we use to determine the left and right directions of the sound.
Since the distance from the sound source to the left ear is different from the distance from the sound source to the right ear (which may be the same), there is a slight difference in the time of the sound signal propagating to the two ears, and such a time difference can help us to know the horizontal (horizontal) position of the sound source, and especially in the case of a low-frequency sound source, the brain localization can be more accurate. This time difference depends on the horizontal position of the sound source relative to the listener, and the greater the difference between the left and right ears of a sound source from the listener, the greater the time difference. If the sound is coming directly from the left, it arrives at the left ear approximately 0.6 to 0.8 milliseconds earlier than the right ear, corresponding to 0.0006 to 0.0008 seconds. Although short enough to be beyond imagination, it is sufficient for the human brain to discern the horizontal direction of the sound.
In addition, in practice, the sound wave is often blocked and cannot be transmitted and diffused, and thus, the sound is generated in the area of the shadow (acoustic shadow), for example, when the sound wave meets buildings and the head of a person, which affects the left ear and the right ear to hear different sound sizes and to receive different frequency distributions. Signals originating from different locations arrive at different times in each ear and differ in amplitude due to attenuation by head obstructions, commonly referred to as "head shadows". Fig. 1B is a schematic diagram of sound pressure perception of human ears on an audio signal according to an embodiment of the present application. As shown in FIG. 1B, the tester 101 and the sound source 103 are in the same environment. At this time, the sound source 103 is located to the left of the tester 101 as viewed from above the top of the head looking down. Because the left and right ears are separated by one head, the left ear, which is closer to the sound source 103, receives a greater sound intensity (volume) than the right ear. This phenomenon of the head blocking sound is called a head shadow effect (head shadow effect). The difference in sound pressure (or sound level) caused by the sound produced by the same sound source to the ears of a person is called the arriving binaural intensity difference ILD.
Studies have shown that higher frequency sounds are more susceptible to the head shadow effect because the higher the frequency of the sound, the more the number of vibrations per second in air representing the energy, the shorter the wavelength, and hence the more significant the head shadow effect. Conversely, because the waveform of low frequency sounds is long, even exceeding the width of the head, the binaural intensity difference will typically be small. Research shows that the azimuth of sound above 2000Hz is usually judged by means of binaural intensity difference, and the difference of high-frequency sound received by two ears can reach 8-10 dB at most.
For the frequency range of 1.5-4.0 kHz, level difference and time difference are common factors for sound source localization, while binaural level difference is the main factor for localization when f >5.0kHz, complementary to time difference. In general, the binaural time difference and the level difference cover the entire sound frequency range. Therefore, a human being can recognize the direction of the sound by the time difference and the sound level difference.
(3) Binaural Cross-correlation coefficient (IACC)
IACC is a measure of the similarity of sound pressures reaching the ears at a certain moment. Studies have shown that in the free-field case, a single sound source produces a high IACC (close to 1) at both ears. Whereas in the case of diffuse fields, the generation of IACC at both ears is low. The lower the IACC, the more subjectively "spatial" and "enveloping" the listener.
(4) Head and Torso Simulator (HATS), artificial Head testing System (HMS), artificial Head data (Knowles Electronics Manikin for Acoustic Research, KEMAR)
HATS is a mannequin supplied by buel & Kj æ r with built-in mouth simulators and calibrated ear simulators that provides a realistic representation of the acoustic properties of an average adult head and torso. They are well suited for in situ electro-acoustic testing of telephones, headsets, audio conferencing equipment, microphones, ear pieces, hearing aids, and hearing protectors alike.
HMS is an artificial HEAD measurement system provided by HEAD Acoustics, inc., with an ear simulator and full band artificial mouth that meet IEC60318-4 standard. It is well suited for measuring ear-on-ear sensors in cell phones, earphones, hearing protectors and hearing aids. HMS also allows for the measurement of far-ear sensors, such as hands-free devices, by realistically replicating all acoustically relevant structures of the human anatomy. HMS II.3 is equipped with a right ear impedance simulator and an artificial mouth, both meeting the requirements of the ITU-T P.57 and P.58 recommendations. The mouth reproduces the complete spectrum of human voice, allowing ultra-wideband and full-band measurements in the transmit direction.
KEMAR head and torso simulators were introduced by Knowles in 1972 and are rapidly becoming the industry standard for hearing aid manufacturers and research audiologists. KEMAR introduced by GRAS corporation has the same size and acoustic properties as the original KEMAR in 1972, and is 100% backward compatible. KEMAR closely mimics the acoustic properties of the human ear when equipped with a pinna simulator, ear canal extension device, and IEC60318-4 ear simulator. KEMAR builds on a large statistical study of the general human body-meaning that KEMAR has the same acoustic properties as the general human, including different facial features. Thus, it provides acoustic diffraction in both the near and far fields similar to that encountered around the middle of a human head and torso. It is more realistic than any other anthropometric model because of its anthropometric shape.
The above mentioned artificial head standard is formed by statistical optimization of a large number of real person data, so that the data measured by the three artificial heads are averaged to cover the actual head condition of most people.
(5) Fast Fourier Transform (FFT) algorithm, inverse Fast Fourier Transform (IFFT) algorithm
FFT is a fast algorithm for discrete fourier transform, which transforms a signal from the time domain to the frequency domain. IFFT is an algorithm of inverse fast fourier transform corresponding to FFT, and can transform a signal frequency domain from a time domain.
With the vigorous development of TWS earphones and AR/VR related audio technologies, most of these electronic devices have implemented a transparent transmission function, so that a wearer can hear an audio played in a device connected to the earphones and can also hear ambient sounds generated by other sound sources in the environment. As shown in fig. 2, the headset 201 is a bluetooth headset supporting the transparent transmission function and the wireless communication function, and is in communication connection with the terminal 202 through bluetooth. There are also multiple sound sources in the environment of the user, including sources 203-205, which all may emit sounds that may be picked up by a microphone on the headset 201 in the form of audio signals, which are transmitted through and played by a speaker on the headset. Assuming that a plurality of sound sources including the sound sources 203-205 are continuously emitting sound and the user is playing music through the terminal 202 at this time, when the user wears the headset 201 on his/her ear, the user can hear both the music sound transmitted from the terminal 202 through bluetooth and the ambient sound converted from the ambient audio signal collected by the headset.
Fig. 3 is a schematic diagram of a transparent transmission model provided in an embodiment of the present application, and all current earphones similar to a TWS implement a transparent transmission function based on the transparent transmission system. As shown in fig. 3, the transparent transmission model 303 may be determined by an open-ear transfer function model 301 and a passive transfer function model 302. In addition, because the user has different wearing habits to the earphone, the wearing tightness degree of the earphone can cause the difference of the user's hearing sense to the sound played by the earphone. Thus, in some embodiments, the passthrough model described above may also be determined in conjunction with a leakage model 304 that relates the degree of tightness of fit of the user.
In order to further describe the multiple models involved in the transparent transmission model 303, an exemplary embodiment of the present application further provides a schematic diagram of the distribution positions of the reference points. As shown in fig. 4, the dark part of fig. 4 is a schematic cross-sectional view of a human ear, and the tympanic membrane 403 in the human ear can divide the ear canal into an external auditory canal 404 and an internal (middle) ear canal. In the region of the external ear canal 404 near the pinna of the ear, we can determine an ear reference point 4041; in addition, in the vicinity of the external ear canal 404 near the tympanic membrane 403, a tympanic membrane reference point 4031 may be determined.
In the art, a microphone may be placed at ear reference point 4041 to capture the audio signal (hereinafter denoted as u) open (t)), audio signal u open And (t) the environment sound initially collected by the real human ear under the condition that the earphone is not worn can be reflected. It will be appreciated that the sound may also change during its transmission to the tympanic reference point 4031 after being collected from the ear reference point 4041. Thus, we can place a microphone on the tympanic reference point 4031 to capture the audio signal (hereinafter denoted as y) open (t)), which may reflect the state of the original captured ambient sound of the real human ear transmitted to the tympanic membrane reference point 4031 in the external ear canal 404 without the earpiece. Based on the two audio signals, we can establish a function model which can describe the environment sound initially collected by the real human ear from the ear reference under the condition of not wearing the earphoneThe change of the audio frequency during the process of transferring the point 4041 to the tympanic membrane reference point 4031 is the above-mentioned open-ear transfer function model 301. Specifically, the open-ear transfer function model 301 can be expressed as:
Figure 874336DEST_PATH_IMAGE001
wherein, Y open (S) represents y open (t) frequency domain signal, U, obtained after FFT open (S) represents u open (t) obtaining a frequency domain signal after FFT.
When a person wears the earphone, the feedforward microphone close to the ear reference point 4041 on the earphone can collect environmental sound in the environment, and the earphone can play the obtained audio signal after a series of processing such as noise reduction, transparent transmission and the like is carried out on the environmental sound; after playback, audio signals may also be acquired at the tympanic reference point 4031. The ambient sound in the environment collected by the feedforward microphone is denoted as u here passive (t) representing the audio signal played by the headphone collected at the tympanic reference point 4031 as y passive (t), based on the two audio signals, we can build a function model, which can describe the audio frequency variation during the process of transferring the environmental sound initially collected by the headset from the ear reference point 4041 to the tympanic membrane reference point 4031 when the headset is worn, and this function model is the passive transfer function model 302. Specifically, the open-ear transfer function model 301 can be expressed as:
Figure 634481DEST_PATH_IMAGE002
wherein, Y passive (s) represents y passive (t) frequency domain signal, U, obtained after FFT passive (s) represents U passive (t) obtaining a frequency domain signal after FFT.
It will be appreciated that in some embodiments, a feedback microphone is also present on some of the earphones, the feedback microphone being positioned adjacent to the tympanic membrane when the earphones are worn by a personReference point 4031. Thus, in some embodiments, Y passive (s) may be replaced by a signal picked up at the feedback microphone. Similarly, due to the audio signal U in the foregoing description open (s) the signal acquired by the reference microphone at ear reference point 4041, and thus, in some embodiments, U passive (s) may also be U open (s) approximate substitution.
The difference between the open-ear transfer function model 301 and the passive transfer function model 302 represents the auditory influence of the human ear caused by wearing the earphone, so that the perspective transfer model 303 is mainly based on the open-ear transfer function model 301 and the passive transfer function model 302 to design a compensation model G optimize (s) (not shown in fig. 3) such that:
Figure 839198DEST_PATH_IMAGE003
wherein G is optimize (s) represents the amount of compensation for the auditory effect caused by the headset, where L(s) is the leakage model 304, to compensate for the effect variations due to different wearing states of the headset. G passive (s) characterizing the audio signal before compensation, G open (s) characterizing the audio signal after compensation. "series" means a cascading operation.
It should be understood that with the conventional unvarnished transmission algorithm, in the process of modeling the two aforementioned transfer function models, the audio signals acquired at the respective reference points are all acquired from a single sound source or several sound sources. That is, in the process of acquiring modeling data, sound sources are usually placed only in a single direction or in several directions, and audio signals generated by the sound sources are acquired. Therefore, for the transparent transmission model 303 and the conventional earphone with the transparent transmission function, after the collected environmental audio is transmitted, the spatial information of the environmental audio after the transparent transmission is eliminated or focused to a certain specific position, which may cause a certain sense of confusion to the perception of the time environment by the wearer of the earphone, and may also pose a threat to the safety of the user while affecting the user experience.
For example, in the scenario shown in fig. 5, a pedestrian 501 is walking on a narrow road with headphones with a passthrough function (including a right headphone 502 and a left headphone 503, when the headphones are playing mobile music). At this time, a vehicle 504 is driving toward the pedestrian 501 at the right rear of the pedestrian 501. To give the pedestrian 501 a warning message, the owner of the car 504 pries to press the horn. At this time, the horn of the automobile generates a whistle 505 (i.e., a curved portion in fig. 5) as a noise source in the scene. It can be understood that, if the pedestrian 501 does not wear the earphones at this time, after the whistling sounds 505 are received by both ears of the pedestrian 501, due to the difference of sound pressures generated by the whistling sounds at the left and right ears of the pedestrian and the time difference of arrival at the left and right ears, the pedestrian 501 can timely recognize that the whistling sounds come from the rear right of the pedestrian, and the pedestrian 501 can timely avoid (for example, drive left to avoid).
However, as can be seen from the above description, the spatial information of the ambient audio after transmission is eliminated or focused to a specific direction. Therefore, fig. 6 may be referred to specifically for the process of changing the audio signals collected by the left earphone 503 and the right earphone 502 in fig. 5. As shown in fig. 6, the horizontal axis of the axis represents time, and the point corresponding to the right on the horizontal axis corresponds to a later time. Each curve cluster in the coordinate system represents an audio signal, the value of the abscissa corresponding to the lower part of the curve cluster represents the time when the curve cluster is collected or generated, and the number of the curves in each curve cluster reflects the corresponding amplitude (or sound pressure) of the audio signal. It should be understood that in the embodiments of the present application and the following embodiments, the shape of the audio signal formed by the curve clusters is only used for distinguishing the audio signal, and does not represent the actual waveform of the audio signal when the audio signal propagates in the space. The left earpiece shown in fig. 6 is the left earpiece 503 in fig. 5, and the right earpiece is the right earpiece 502 in fig. 5. As can be seen from fig. 6, the left earphone and the right earphone perform operations such as collecting, unvarnished transmission, and playing on the whistle 505 in fig. 5 in sequence, and in this process, the amplitudes and relative time delays of the audios collected by the two earphones change:
(1) acquisition phase
As can be seen from the above description, since the right earphone is closer to the car than the left earphone, the right earphone will acquire the whistling sound 505 earlier than the left earphone in the acquisition stage, and the amplitude (sound pressure) of the audio acquired by the right earphone is greater than the amplitude of the audio acquired by the left earphone. As can be seen from fig. 6, the right earphone collects the whistle 505 at the time t1 to obtain an audio signal 601R, and the left earphone collects the whistle 505 at the time t2 to obtain an audio signal 601L; here, the time t2 is after the time t1, and the amplitude (sound pressure) of the audio frequency of the audio signal 601R is larger than the amplitude (sound pressure) of the audio signal 601L.
(2) Transparent transmission stage
For a traditional earphone with a transparent transmission function, after the collected environmental audio is transmitted, the spatial information of the environment audio after transmission is eliminated or focused to a certain specific direction. Therefore, in the transparent transmission stage, the earphone performs transparent transmission on the audio signal 601R to obtain the audio signal 602R, and the earphone performs transparent transmission on the audio signal 601L to obtain the audio signal 602L, and the difference between the two audio signals (602L and 602R) does not exist in terms of amplitude (sound pressure) and time. That is, the two audio signals will be played at the same time and the same amplitude (sound pressure) when played through the left and right earphones, and the pedestrian 501 cannot recognize the direction of the whistling sound.
(3) Playing stage
In the playing stage, the audio signal 603L played by the left headphone may be the audio signal 602L obtained in the pass-through stage, and the audio signal 603R played by the right headphone may be the audio signal 602R obtained in the pass-through stage. Of course, the audio signal 603L played by the left headphone may also be a new audio signal obtained by performing other processing (for example, noise reduction and time-frequency conversion) on the audio signal 602L obtained in the unvarnished transmission stage by the headphones, and the audio signal 603R played by the right headphone may also be a new audio signal obtained by performing other processing on the audio signal 602R obtained in the unvarnished transmission stage. However, the left and right earphones transparently pass the sound in the environment (e.g., whistling 505), and the sound that is ultimately played to the user is indistinguishable in both time and amplitude (sound pressure).
It will be appreciated that when a user is listening to music using headphones, the ambient sounds may include a number of alarm sounds that are so short and important as the whistle 505, and this can be a threat to the user's safety if the user cannot confirm the direction of the sound source for the first time. Secondly, the moving track of a certain target sound source in the environment is seen in the sight range of the user, the sound emitted by the sound source heard by the user has no direction sense, the experience can also be uncomfortable, and the distance sense of the target can be influenced by the user in serious cases.
In view of the foregoing problems, embodiments of the present application provide an audio processing method and an electronic device, based on which, in a transparent transmission mode, an earphone can retain orientation information of a sound having spatial information in an ambient sound. Therefore, even when a user wears the electronic terminal with the transparent transmission scheme, the user can feel the original spatial orientation information of the transparent transmission sound, and the safety and the user experience sense when the earphone is used are ensured.
It is understood that the ability of a person to localize sound from three dimensions is due to the unique analysis system of the human ear for sound signals. Because there may be processes of reflection, superposition, etc. in the process of sound signal arriving at human ear (in front of eardrum) from sound source, the transmission process of signal transmitted from any point in space to human ear can be regarded as a filtering system, and the sound signal in front of eardrum of both ears is obtained by processing sound source by filter. This transmission system is a black box and we do not have to care about how the sound is delivered to both ears, but only about the difference between the source and the binaural signal. Thus, although the conventional pass-through model eliminates the orientation information of the audio signal after passing through the audio signal, if we can determine the spatial orientation information of the audio signal before the audio signals obtained from the left and right ears are passed through, and can obtain the set of filters describing the spatial orientation information, the sound signal from this orientation in space can be restored. Similarly, if there is a filter bank from all spatial directions to two ears, we can restore the spatial direction sense for the audio that has lost the spatial direction sense after transmitting the audio regardless of the direction of the sound, which is also the main work of the embodiment of the present application.
With reference to the foregoing description, a schematic diagram of a transparent transmission model provided in an embodiment of the present application is described next. The transparent transmission method capable of retaining the sound space direction information provided by the embodiment of the application can be realized based on the transparent transmission model. It should be noted that the headset to which the transparent transmission system is applied may be any type of TWS headset, neck-worn headset, or line-controlled headset. As shown in fig. 7, the transparent transmission model 706 for retaining the spatial orientation information may be determined by an open-ear transfer function model 701, a passive transfer function model 702, a transparent transmission model 703 and a spatial cue library 705. In addition, because the user has different wearing habits on the earphone, the wearing tightness of the earphone causes the difference of the listening feeling of the user on the sound played by the earphone. Thus, in some embodiments, a passthrough model 706 that retains spatial orientation information may also be determined in conjunction with a leakage model 704 that relates to the degree of tightness of fit of the user.
Specific meanings and functions of the open-ear transfer function model 701, the passive transfer function model 702, the transparent transfer model 703 and the leakage model 704 can refer to the foregoing description about fig. 3 and fig. 4, and are not described herein again. The spatial cue library 705 and the pass-through model 706 for preserving spatial orientation information are described in detail below with reference to fig. 8 and 9.
Fig. 8 is a schematic diagram illustrating integration of a spatial cue library according to an embodiment of the present application. It can be understood that, in order to truly restore the realistic feeling of the human ear to the sound source of each direction in the three-dimensional space, when the spatial direction clues are superimposed, the spatial direction clues need to be obtained according to the real artificial head data. As shown in FIG. 8, the spatial cue library can be obtained by testing HATS from Bruel & Kj æ r, HMS from HEAD Acoustics, and KEMAR artificial HEAD data from GRAS. The standard of the artificial head used by the artificial head data is formed by statistic optimization of a large number of real person data. Taking HATS of Bruel & Kj æ r company as an example, HATS is an objective measuring instrument which is provided with a built-in simulated ear and mouth simulator and can truly replicate the acoustic characteristics of a common adult, and the HATS provides a new version which has realized the utilization of the nuclear magnetic resonance scanner technology, collects a large amount of information in the aspect of the geometric structure of human ears and collects the complete information of the geometric structure of the auditory canal, and comprises an ear canal bone part connected with the eardrum, wherein the artificial auditory canal of the simulated ear has a correct anatomical structure, the artificial auditory canal is provided with an eardrum simulator with a certain angle, and the simulator is accurately arranged at the eardrum so as to be closer to the true structure of the human ears. In the embodiment of the application, the data measured by the three artificial heads are averaged, and the average value can cover the actual situation of most people.
Considering that the above-mentioned three head criteria may also take into account the difference in head features between different races, in some embodiments, the averaged data may also be individually compensated for the statistical biometric differences of people in different regions before integrating the spatial cue library 705.
Based on the above test data of the three human heads, the monaural cue model and the binaural cue model shown in fig. 8 can be obtained. The specific functions and meanings of the two models can refer to the following description, which is not repeated herein. The spatial cue library 705 in the above description can be obtained by integrating the monaural cue model and the binaural cue model, i.e. by building a database of the measured orientations and the relationship of the cues.
As for the spatial cue library 705, as long as accurate relative orientation information (i.e. the spatial orientation information in the foregoing description or the environmental sound source orientation parameter shown in fig. 7, this relative orientation information may be a certain angle value) is obtained and input as the spatial cue library 705, the spatial cue library 705 may query real-time binaural cues and monaural cues corresponding to the relative orientation information.
In the present application, the transparent transmission model 706 that retains the spatial orientation information can distinguish from the conventional transparent transmission model (for example, the transparent transmission model 703 and the transparent transmission model 303 in the foregoing description), and creates the orientation sense of the original audio (the audio before transparent transmission) for the user when the audio after transparent transmission is played to the user, because the transparent transmission model provided in the embodiment of the present application can calculate the environmental sound source orientation parameter of the original audio, and can determine the spatial cue corresponding to the orientation parameter according to the environmental sound source orientation parameter and the spatial cue library 705, and can superimpose the spatial cue on the audio that loses the orientation sense due to transparent transmission, thereby restoring the original orientation sense that the audio has.
On the one hand, the above spatial cues can be used to obtain the corresponding amplitude and time difference when the sound emitted from the sound source at the azimuth angle is transmitted to two ears of a human, and generally, we refer to the difference in the perception of the same sound source by two ears as "binaural cues", which is the key point for the human being to be able to judge the azimuth of the sound in the horizontal direction. In addition, the human ear can recognize the specific position of sound in the vertical direction because the shape of auricle changes the intensity of different frequency regions of sound wave, and the sound spectrum is differentiated when the sound emitted from above and below is reflected by auricle to ear canal because of the different shape of auricle. For example, when sound comes from above, the sound waves will go through a series of reflections and diffractions by the auricular cartilage before reaching the ear canal. When the position of the sound source is raised or lowered, the path of the sound waves through the pinna changes and the combination of sounds reaching the ear canal varies. In fact, the process of reflection and diffraction of the external sound wave on the auricle of the human ear can be regarded as filtering the external sound wave by a filtering system, and the external sound wave is processed by the filter to obtain the sound signal in front of the eardrums of the two ears. This filter may be referred to as an "monaural cue".
In the binaural cue model and the monaural cue model shown in fig. 8, in order to obtain corresponding binaural cues and monaural cues after determining the spatial orientation information, the original orientation of the audio can be restored by superimposing these two cues on the audio whose orientation is lost due to transillumination. Accordingly, the present application provides a schematic diagram of a process for overlaying spatial cues to audio. As shown in fig. 9, the spatial cue library 901 may be the spatial cue library 705 in the foregoing description, and the input parameters are azimuth information, in some embodiments of the present application, the azimuth information may also be referred to as spatial azimuth information and environmental sound source azimuth parameters, and the azimuth information may be obtained by analyzing two audio signals collected by the same environmental sound source according to left and right earphones of a user, which may specifically refer to the following embodiments and is not described herein again.
It should be understood that the left ear signal 904L without spatial orientation sense shown in fig. 9 is obtained by the left earphone of the user collecting and transmitting the ambient sound, and the right ear signal 904R without spatial orientation sense is obtained by the right earphone of the user collecting and transmitting the ambient sound, when these two audio signals are played by the left earphone and the right earphone respectively, because the spatial orientation sense thereof has been erased, but after the orientation information is input into the spatial cue library 901, the spatial cue library 901 can output the binaural cue 902 and the monaural cue (i.e. the left ear cue 903L and the right ear cue 903R shown in fig. 9) corresponding to the orientation information, where:
binaural cue 902, which may be calculated based on the above orientation information by binaural cue models (not shown in fig. 9) in spatial cue library 901, may be shown in fig. 7. It can be known from the foregoing description that the binaural cue is mainly used to adjust the amplitude and delay of the audio obtained after the left and right earphones transmit the ambient sound, so that the audio played by the left and right earphones is different in loudness and play time, thereby creating a sense of orientation of sound for the user, and the user can identify the specific orientation of the sound source of the ambient sound in the horizontal direction in time.
The left ear cue 903L and the right ear cue 903R in the monaural cues may be calculated by monaural cue models (not shown in fig. 9) in the spatial cue library 901 based on the above orientation information, where the monaural cue models may be the monaural cue models in fig. 7. The left ear cord 903L in the monaural cord can be used to obtain a filter, the filter can describe the information of the space through which the environmental sound corresponding to the azimuth information is transmitted to the left ear of the user, and the filter is used to filter the audio frequency transmitted by the left earphone, so that the actual hearing brought by the environmental sound to the left ear of the user can be restored; similarly, the right ear cue 903R in the monaural cue can also be used to obtain a filter, and the filter is used to filter the audio transmitted by the right earphone, so as to restore the actual listening sensation brought by the above environmental sounds to the right ear of the person. Therefore, the audio played by the left and right earphones can create the sense of direction of the sound in the vertical direction for the user, and the user can distinguish the specific direction of the sound source of the environmental sound in the vertical direction.
It is to be understood that the left ear signal with spatial orientation sense 905L and the right ear signal with spatial orientation sense 905R in fig. 9 may be audio signals finally used for playing. In some embodiments, the user's headphones may also perform other processing (e.g., noise reduction, time-frequency translation) on the left ear signal 905L with spatial directional sensation and the right ear signal 905R with spatial directional sensation, and use the resulting new audio signal as the audio signal that is finally used for playback. In addition, in the process of overlapping spatial cues to the audio after transparent transmission by the left and right earphones of the user, as shown in fig. 9, binaural cues may be superimposed first, and then monaural cues may be superimposed; the monaural cue may be superimposed first, and then the binaural cue may be superimposed, which is not limited in this application.
It should be noted that the earphones suitable for the transparent transmission model for retaining the spatial orientation information provided in the embodiment of the present application may be any type of earphones, such as TWS earphones, neck-worn earphones, or line-controlled earphones. Therefore, the present application is not limited to this, and it is to be understood that when a user uses an earphone, the earphone needs to be connected to a certain terminal, and the earphone can synchronously play audio played in the terminal while ignoring transmission delay. The main function of the transparent transmission function is that when the user wears the earphone, the user can hear the audio frequency transmitted by the terminal and received by the earphone, and can also clearly hear the environmental sound in the environment, such as the speaking sound or the whistling sound of others.
The audio processing method provided by the embodiment of the present application is described below with reference to the user interface and the foregoing description.
First, a user interface involved in turning on the headset pass-through mode is introduced. Fig. 10 (a) exemplarily shows an exemplary user interface 10A for an application menu on the terminal device 100. After the headset 200 establishes a connection with the terminal device 100, an icon 10A1 indicating that the headset 200 has been successfully connected with the terminal device 100 may be displayed in the status bar of the user interface 10A, and the icon 10A1 may further reflect the current remaining power of the headset 200. The application icons 10A2 may include, for example, weChat icons, twitter icons, face book icons, sina Weibo icons, QQ (Tencent QQ) icons, youTube icons, gallery icons, camera icons, earphone icons 10A3 and the like, and may further include icons of other applications, which is not limited in the embodiments of the present application. The icon of any one of the applications may be used to cause the terminal device 100 to launch the application corresponding to the icon in response to an operation by the user, for example, a touch operation.
In one possible embodiment, the user may start the headset APP by clicking the headset icon 10A3, and the terminal device 100 may display a headset control interface, such as the user interface 10B shown in fig. 10 (B), in response to the user operation.
In one possible embodiment, after the headset 200 establishes a connection with the terminal device 100, the terminal device 100 may automatically jump to the headset control interface from the current page user interface. For example, the interface displayed by the terminal device 100 at this time is the user interface 10A, and when the terminal device 100 recognizes that the headset 200 is connected to the terminal device, the user interface 10A may jump to the user interface 10B.
In another possible implementation, after the headset 200 is connected to the terminal device 100, when the terminal device 100 triggers the headset to play audio, the headset APP may be triggered to be started, that is, the user interface 10B including the selection control is displayed. For example, when the terminal device 100 triggers an earphone to play audio, a song may be played after the terminal device establishes a connection with the earphone, and the user interface 10B including the selection control may be displayed. For another example, after the terminal device establishes a connection with an earphone, a video is played, and the user interface 10B including the selection control may be displayed.
As shown in fig. 10 (B), fig. 10 (B) illustrates one user interface 10B for a "headset" application on an electronic device such as a smartphone. The user interface 10B may include an information bar 1001, a status bar 10B1, a mode selection bar 10B2, a shortcut operation bar 10B3, an ear-bud detection control 10B4, a wear detection control 10B5, a search earphone control 10B6, and a high-definition recording control 10B7. Wherein: wherein:
an information field 1001 for displaying the name of the headset 200, such as "FlyPods3" in the figure.
And a status bar 10B1 for the user to check the connection status of the headset 200 and the terminal device 100, the power of the left and right headsets of the headset 200, and the power of the charging chamber of the headset 200.
The mode selection bar 10B2 includes a noise reduction mode control, an off mode control, and a pass-through mode control 1002, which are all operable to respond to an operation by the user, so that the terminal device 100 controls signaling to the headset 200. The control signaling is used to control the processing mode employed by the headset 200.
The shortcut bar 10B3 includes shortcut control elements, and the shortcut control elements may display a corresponding shortcut interface corresponding to user operations, and are used to quickly switch songs, wake up a voice assistant, and the like when the earphone 200 plays audio.
An ear bud detection control 10B4 for detecting how well the headset 200 fits the user's ears.
And the wearing detection control 10B5 is used for controlling the playing state of the audio in the mobile phone according to whether the user wears the headset 200.
And the headset control 10B6 is searched for establishing a communication connection with the headset 200 through the cloud server in a state that the user loses the headset 200, so that the user can locate the headset.
And the high-definition recording control 10B7 is used for recording high-definition call audio when the user calls.
As shown in fig. 10 (B), in response to a user operation, that is, a touch operation acting on the transparent transmission mode control 1002, the terminal device 100 sends a control signaling to the headset 200 to control the headset 200 to start the transparent transmission mode, by the transparent transmission mode control 1002. Specifically, when the headset 200 operates in the pass-through mode, the headset 200 may preserve spatial orientation information of audio after pass-through by the following procedure, as shown in fig. 11.
Fig. 11 is a flowchart of an audio processing method according to an embodiment of the present application. The method is suitable for earphones equipped with the transparent transmission model for reserving space information provided by the embodiment of the application, such as the earphone 200. The method comprises the steps of collecting environmental sounds through a left earphone and a right earphone, obtaining azimuth information of the environmental sounds based on two audio signals collected by the two earphones, obtaining a spatial cue corresponding to the azimuth information based on the azimuth information after the left earphone and the right earphone conduct transparent transmission on the collected environmental sounds respectively, and overlapping the spatial cue to the audio signals obtained after the left earphone and the right earphone conduct transparent transmission on the environmental sounds. Thus, when the audio signal is played, the user can distinguish the specific direction of the sound source of the environment sound in the three-dimensional space. As shown in fig. 11, a method provided in an embodiment of the present application may include:
s1101, collecting a first signal and a second signal.
A first earphone of the earphones collects a first signal, and a second earphone of the earphones collects a second signal. In some embodiments of the present application, the first earphone may also be referred to as a left earphone and the second earphone may also be referred to as a right earphone. It should be noted that the first signal and the second signal are obtained by collecting the same environmental sound source from the left and right earphones, where the environmental sound source may be a single sound source or a mixed sound source of multiple sound sources in the environment; for example, there may be a whistling sound of an automobile in front of the user, and there may be a broadcast sound of a platform behind the user, and the environment sound source may be composed of the whistling sound and the broadcast sound.
It should be understood that the earphone in the embodiments of the present application may be a headphone, an ear-hook earphone, a neck-hook earphone, or an earplug earphone. Earbud headphones also include in-ear headphones (otherwise known as in-the-canal headphones) or semi-in-ear headphones. In addition, the earphone has a transparent transmission function, and a transparent transmission model for reserving the spatial orientation information provided by the embodiment of the application has been deployed in the earphone.
And S1102, determining the direction information of the sound of the external environment according to the first signal and the second signal.
As can be seen from the foregoing description, when the same sound is transmitted in three-dimensional space, the time and amplitude of the sound transmitted to the two ears of a person are different. Similarly, in the process that a first earphone in the earphones acquires a first signal and a second earphone acquires a second signal, the time for the first earphone to acquire the first signal is different from the time for the second earphone to acquire the second signal, and the corresponding amplitude of the first signal is different from the corresponding amplitude of the second signal.
In the embodiment of the present application, the azimuth information is an input of a spatial cue library in a transparent transmission model that retains spatial azimuth information. This orientation information, although not directly measurable by the headset 200, can be inferred by the ITD and ILD between the first signal and the second signal.
Next, the relationship between the ITDs and ILDs between the first signal and the second signal and the azimuth information will be described with reference to fig. 12. Fig. 12 is a schematic view of a scene where sound waves are transmitted in space according to an embodiment of the present application. As shown in fig. 12, the human brain of a human can be viewed as a circle with a radius a on a two-dimensional plane with a center at a point O when viewed from above the top of the head of the human. Assuming that there is a sound source (not shown in fig. 12) in front of the left of the person in fig. 12, lines L1, L2 and L3 are the audio signals generated by the sound source, transmitted in space from the sound source into the left and right ears of the person.
A plane rectangular coordinate system is established by taking the point O as an origin, wherein the horizontal axis is an X axis, and the vertical axis is a Y axis. Ignoring the shape and size of the left and right ears, considering the left ear as point L and the right ear as point R, both point L and point R fall on the X-axis. Line L4 is a straight line passing through point L and perpendicular to lines L1, L2, and L3, line L5 is a straight line passing through point O and perpendicular to lines L1, L2, and L3, and line L6 is a straight line passing through point R and perpendicular to lines L1, L2, and L3. It can be seen from fig. 12 that, in the process of transmitting the audio signal generated by the sound source to the left ear and the right ear of the human, the audio signal is closer to the left ear, so the distance that the audio signal is transmitted from the sound source to the right ear is longer than the distance that the audio signal is transmitted from the sound source to the left ear, and the difference between the distance that the audio signal is transmitted to the right ear and the distance that the audio signal is transmitted to the left ear is the sum of d1 and d2 shown in fig. 12.
Let the angle between the axes L4, L5, L6 and X be θ shown in fig. 12, where θ is the orientation information in the foregoing description. The following can be deduced by geometrical knowledge:
d1 =a*sinθ;
d2 =a*θ;
wherein the symbol "+" represents a multiplication operation.
Then the ITD of the audio signal as it is transmitted to the left and right ears of the person can be expressed as:
Figure 474316DEST_PATH_IMAGE004
where a is generally constant at 0.0875m, c is the speed of sound, and when θ is 0, the ITD is also 0.
It will be appreciated from the above equation that when the delay information for the left and right ear signals is known, the value of θ can be derived.
If ITD (θ) = τ is calculated, the left and right channel signals of the audio signal are:
Figure 143195DEST_PATH_IMAGE005
Figure 74242DEST_PATH_IMAGE006
where f is m To modulate frequency, f c For the signal frequency, m is the modulation index, and pL (t) and pR (t) are the sound pressures caused by the audio signal in the left and right ears of the human, respectively.
In acoustics, there is a certain functional relationship between the binaural amplitude difference ILD and the angle θ in fig. 12, which can be expressed as:
Figure 766254DEST_PATH_IMAGE007
therefore, when the ILD (θ) is estimated, we calculate the amplitude difference of the left and right channels, and may reversely estimate the value of θ.
It can be understood that, in the embodiment of the present application, as well as the human ears acquiring the audio signal, the time when the first earphone acquires the first signal and the time when the second earphone acquires the second signal are different, and the corresponding amplitudes of the first signal and the second signal are also different. Therefore, in the embodiment of the present application, although the headset cannot directly measure the orientation information, the headset may obtain the value of ILD or ITD between the first signal and the second signal by analyzing (for example, performing spectrum analysis) the first signal and the second signal, and inversely deduct the value of the orientation information (i.e., the value of θ) by combining the formula 1 or the formula 2. Note that, in an alternative embodiment, the earphone may calculate the orientation information by only using the ITD between the first signal and the second signal and combining equation 1; in another alternative embodiment, the earphone may calculate the orientation information only by using the ILD between the first signal and the second signal and combining with equation 2; in an actual scenario, the ILD or the ITD may be selected to calculate the orientation information according to the device performance and calculation power of the headset, which is not limited in the present application.
And S1103, performing transparent transmission processing on the first signal and the second signal respectively to obtain a first audio signal and a second audio signal.
As can be seen from the foregoing description, after the headphone performs the unvarnished transmission processing on the first signal and the second signal, loudness of the first audio signal and loudness of the second audio signal are the same, and after the unvarnished transmission, an originally existing time delay between the first signal and the second signal will not be reflected between the first audio signal and the second audio signal, even when a speaker is used to play sounds corresponding to the two audio signals, playing times are the same.
And S1104, determining a spatial orientation clue corresponding to the orientation information.
It can be understood that the above-mentioned earphone is an earphone equipped with the transparent transmission model for reserving spatial information provided in the embodiment of the present application, and as can be seen from the foregoing description, after the above-mentioned orientation information is obtained, the orientation information is used as an input of the spatial cue library in the transparent transmission model for reserving spatial information, and the above-mentioned spatial orientation cue corresponding to the orientation information can be obtained.
S1105, adjusting the first audio signal and the second audio signal according to the spatial direction clue to obtain a first target signal and a second target signal.
In conjunction with the foregoing description, the spatial orientation cues include binaural cues and monaural cues. The binaural cue is mainly used for adjusting the amplitude and the time delay of the first audio signal and the second audio signal, so that the finally played audio of the left earphone and the right earphone is different in loudness and playing time, the direction feeling of sound is created for the user, and the user can timely distinguish the specific direction of the sound source of the environmental sound in the horizontal direction. The monaural clue can be used for obtaining respective filters of left and right ears, and the actual hearing brought by the left and right ears of the human by the environmental sound can be restored by respectively filtering the first audio signal and the second audio signal by using the respective filters of the left and right ears; finally, the audio played by the left and right earphones can create the sense of direction of the sound in the vertical direction for the user, and the user can distinguish the specific direction of the sound source of the environmental sound in the vertical direction.
In the process of adjusting the first audio signal and the second audio signal to obtain the first target signal and the second target signal, the headphone may superimpose binaural cues and then monaural cues for the first audio signal and the second audio signal; the first audio signal and the second audio signal may be superimposed with monaural cues, and then superimposed with binaural cues, which is not limited in this application.
Furthermore, it is to be understood that the first target signal and the second target signal may be audio signals that the headphone is finally used for playing. In some embodiments, the headphone may further perform other processing (e.g., noise reduction, time-frequency conversion) on the first target signal and the second target signal, and use the obtained new audio signal as the audio signal finally used for playing.
Fig. 13 exemplarily shows the variation of the first signal and the second signal during the process of the method flow. As shown in fig. 13, the abscissa axis of the axis represents time, and points corresponding to points to the right on the abscissa axis are at later times. Each curve cluster in the coordinate system represents an audio signal, the value of the abscissa corresponding to the lower part of the curve cluster represents the time when the curve cluster is collected or generated, and the number of the curves in each curve cluster reflects the corresponding amplitude (or sound pressure) of the audio signal. The first earpiece shown in fig. 13 is the first earpiece of fig. 11 and the second earpiece is the second earpiece of fig. 11. Here, the social-coming environment sound source is located on the right side from the user, and the first earphone is the right earphone (refer to the scenario shown in fig. 5 in particular). As can be seen from fig. 13, the first headphone and the second headphone sequentially perform operations such as collecting, acquiring, transmitting, superimposing and playing the direction cue (not shown in fig. 13), and in this process, the amplitudes and the relative time delays of the audio collected by the two headphones change:
(1) acquisition phase
As can be seen from the foregoing description, since the first earphone is closer to the ambient sound source than the second earphone, the first earphone will acquire the ambient sound earlier than the second earphone in the acquisition stage, and the amplitude (sound pressure) of the audio acquired by the first earphone is greater than the amplitude of the audio acquired by the second earphone. As can be seen from fig. 13, the first headphone collects the environmental sound at time t4 to obtain an audio signal 1301R, and the second headphone collects the environmental sound at time t5 to obtain an audio signal 1301L; where time t5 is after time t4, and the amplitude (sound pressure) of the audio of audio signal 1301R is greater than the amplitude (sound pressure) of audio signal 1301L.
(2) Stage of obtaining azimuth information
The first earphone or the second earphone analyzes the ITD or ILD of two audio signals collected by the same environment sound source to obtain the azimuth information of the environment sound, namely theta.
(3) Transparent transmission stage
After the collected environmental audio is transmitted in a transparent manner, the spatial information of the environment audio after transmission in a transparent manner can be eliminated or focused to a specific direction. Therefore, in the transparent transmission stage, the earphone performs transparent transmission on the audio signal 1301R to obtain the audio signal 1302R, and the earphone performs transparent transmission on the audio signal 1301L to obtain the audio signal 1302L, and the difference between the two audio signals (1302L and 1302R) does not exist in terms of amplitude (sound pressure) and time. That is, if the two audio signals are played through the first earphone and the second earphone, the sounds corresponding to the two audio signals will be played at the same time and with the same amplitude (sound pressure), and the pedestrian cannot recognize the direction of the environmental sound.
(4) Orientation clue superposition stage
In the phase of superimposing the orientation cue, the user earphone processes the audio signal 1302R and the audio signal 1302L respectively through the orientation cue, so as to obtain an audio signal 1303R and an audio signal 1303L. Specifically, the user headphones may adjust the amplitudes and delays of the audio signal 1302R and the audio signal 1302L through binaural cues, so that the last played audio of the left and right headphones differs in loudness and playing time (the difference in playing time may refer to the time difference between t7 and t8 shown in fig. 13), thereby creating a sense of direction of the sound in the horizontal direction for the user; meanwhile, the user earphone can filter the 1302R and the audio signal 1302L by using monaural cues, so as to restore the actual listening feeling brought by the left ear and the right ear of the people by the environmental sound, and create the direction feeling of the sound in the vertical direction for the user.
(5) Play stage (not shown in FIG. 13)
In the playing stage, the audio signal 1304L (not shown in fig. 13) played by the second headphone may be the audio signal 1303L, and the audio signal 1304R (not shown in fig. 13) played by the first headphone may be the audio signal 1303R. Of course, the audio signal 1304L played by the second headphone may also be a new audio signal obtained by performing other processing (for example, noise reduction and time-frequency conversion) on the audio signal 1303L by the headphones, and the audio signal 1304R played by the right headphone may also be a new audio signal obtained by performing other processing on the audio signal 1303R. However, after the left and right earphones transparently transmit the sound in the environment, the sound that is finally played to the user is different in time and amplitude (sound pressure), and the user can recognize the direction of the environment sound through the difference.
In the audio processing method provided in the embodiment of the present application, the performance of the first earphone and the performance of the second earphone may be different, and the performance of the audio processing method determines that the steps of audio transparent transmission, spatial cue superposition, and the like are performed by the first earphone or the second earphone, or both the first earphone and the second earphone. According to the various possibilities of the performance of the processors in the first and second earphones, the steps performed by the first and second earphones can be divided into the following two cases:
(1) in an alternative embodiment, only one of the first and second of the headsets deploys the transparent transmission model provided herein that preserves spatial orientation information.
Assuming that the first earpiece deploys the transparent transmission model provided herein that preserves spatial orientation information, the second earpiece is not deployed. Then in this embodiment the first earpiece may be referred to as the master earpiece and the second earpiece may be referred to as the slave earpiece. The specific operation of the first earpiece and the second earpiece may be as described with reference to fig. 14. As shown in fig. 14, the audio processing method shown in fig. 11 may be implemented by the first headphone and the second headphone as follows:
s101: the first headset collects a first signal.
S102: the second headset collects a second signal.
S103: the second headset transmits the second signal to the first headset.
And after the first earphone and the second earphone collect the environmental sound, respectively obtaining the first signal and the second signal. The first signal and the second signal are the first signal and the second signal shown in fig. 11. Since only the first headphone is deployed with the transparent transmission model provided by the present application and retaining the spatial orientation information in the present embodiment, the first headphone will take the majority of the audio processing. The first earphone will receive the second signal collected by the second earphone.
S104: the first earphone determines the direction information of the external environment sound according to the first signal and the second signal.
S105: the first earphone conducts transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal.
S106: and the first earphone determines a spatial direction clue corresponding to the direction information.
S107: the first earphone adjusts the first audio signal and the second audio signal according to the spatial direction clue to obtain a first target signal and a second target signal.
After receiving the second signal, the first headset may analyze the first signal and the second signal to obtain the direction information. Then, the first earphone may perform transparent transmission and spatial cue superposition on the first signal and the second signal through a transparent transmission model which is deployed by the first earphone and retains spatial azimuth information, so as to obtain the first target signal and the second target signal. The first target signal and the second target signal are the first target signal and the second target signal shown in fig. 11.
S108: and the first earphone sends the second target signal to the second earphone.
(2) In an alternative embodiment, the memory and processor in each of the first and second ones of the headsets deploy the transparent transmission model provided herein that preserves spatial orientation information.
In this embodiment, the first earphone and the second earphone can work in a specific manner with reference to fig. 15. As shown in fig. 15, the audio processing method shown in fig. 11 may be implemented by the first headphone and the second headphone as follows:
s201: the first headset collects a first signal.
S202: the second headset collects a second signal.
S203: the second headset transmits the second signal to the first headset.
S204: the first headset transmits the first signal to the second headset.
And after the first earphone and the second earphone collect the environmental sound, respectively obtaining the first signal and the second signal. The first signal and the second signal are the first signal and the second signal shown in fig. 11. In this embodiment, the first earphone and the second earphone are both provided with the transparent transmission model for retaining the spatial orientation information provided by the present application, so that the first earphone and the second earphone can independently complete the audio processing after acquiring the two signals. Therefore, in this embodiment, the first earphone will receive the second signal collected by the second earphone, and the second earphone will also receive the first audio signal collected by the first earphone.
S205: the first earphone determines first orientation information of the external environment sound according to the first signal and the second signal.
S206: the first earphone conducts transparent transmission processing on the first signal to obtain a first audio signal.
S207: the first earphone determines a first spatial orientation cue corresponding to the first orientation information.
S208: the second earphone adjusts the first audio signal according to the spatial direction clue to obtain a first target signal.
S209: the second earpiece determines second directional information of the external ambient sound from the first signal and the second signal.
S210: and the second earphone carries out transparent transmission processing on the second signal to obtain a second audio signal.
S211: and the second earphone determines a second spatial orientation clue corresponding to the second orientation information.
S212: and adjusting the second audio signal according to the spatial orientation clue to obtain a second target signal.
After both of the two earphones obtain the first signal and the second signal, the first earphone may analyze the first signal and the second signal separately to obtain the first orientation information, and determine a first spatial orientation cue corresponding to the first orientation information. The second headset may analyze the second signal and the second signal separately to obtain the second orientation information, and determine a second spatial orientation cue corresponding to the second orientation information. It should be understood that the first spatial orientation cue and the second spatial orientation cue herein may both be the spatial orientation cue shown in fig. 11, and "first" and "second" merely indicate that the two spatial orientation cues are obtained by different headphones. Similarly, the first orientation information and the second orientation information may be spatial orientation cues shown in fig. 11.
Then, the first earphone carries out transparent transmission and spatial cue superposition on the first signal through a transparent transmission model which is deployed by the first earphone and retains spatial azimuth information, and the first target signal is obtained; and the second earphone carries out transparent transmission and spatial cue superposition on the second signal through a transparent transmission model which is deployed by the second earphone and retains spatial azimuth information, so as to obtain the second target signal.
It will be appreciated that different manufacturers of headsets may have different capabilities and that users may use different models of headsets produced by different manufacturers. Not all headphones have sufficient performance to deploy the pass-through model provided by the present application that preserves spatial orientation information and to complete the audio processing methods provided by the present application. Therefore, in a possible implementation, the transparent transmission model for reserving the spatial orientation information provided by the present application may also be deployed in the terminal device of the user. After the earphones of the user are connected with the terminal equipment, the terminal equipment can acquire the audio signals of the environmental sounds collected by the left earphone and the right earphone, process the two audio signals through a transparent transmission model retaining the space direction information, superimpose space direction clues on the audio after the transmission, and respectively send the audio after the superimposition of the space direction clues to the corresponding earphones for playing. Thus, the processing load of the earphone is reduced while the spatial direction sense of the environment sound after transparent transmission is kept, and the user can hear the environment sound with spatial direction dare by using earphones of different styles.
In this embodiment, specific operation of the two earphones (also referred to herein as the first earphone and the second earphone, respectively) and the terminal device can be referred to fig. 16. As shown in fig. 16, the first earphone, the second earphone, and the terminal device may process audio as follows:
s301: the first headset collects a first signal.
S302: the second headset collects a second signal.
S303: the first earphone sends the first signal to the terminal equipment.
S304: the second earphone sends the second signal to the terminal equipment.
And after the first earphone and the second earphone collect the environmental sound, respectively obtaining the first signal and the second signal.
In the present embodiment, the terminal device may be a mobile phone (mobile phone), a vehicle-mounted device (e.g., an On Board Unit (OBU)), a tablet computer (pad), a computer with a data transceiving function (e.g., a laptop computer, a palmtop computer, etc.), a Mobile Internet Device (MID), a terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a terminal in transportation safety (transportation safety), a terminal in smart city (smart city), a terminal in smart home (smart home), a terminal device in a 5G network, or a terminal device in a Public Land Mobile Network (PLMN) that is evolved in the future, and the like. It is understood that the present application is not limited to the specific form of the terminal device 100.
In this embodiment, the transparent transmission model for retaining the spatial orientation information provided in the embodiment of the present application is deployed in the terminal device, and the terminal device needs to acquire signals respectively acquired by two earphones, that is, the first signal and the second signal. In this embodiment, the first earphone transmits the first signal to the terminal device, and the second earphone similarly transmits the second signal to the terminal device.
S305: the terminal equipment determines the direction information of the external environment sound according to the first signal and the second signal.
S306: and the terminal equipment carries out transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal.
S307: and the terminal equipment determines a spatial orientation clue corresponding to the orientation information.
S308: and the terminal equipment adjusts the first audio signal and the second audio signal according to the spatial direction clue to obtain a first target signal and a second target signal.
After the terminal device obtains the first signal and the second signal, the terminal device may analyze the first signal and the second signal to obtain the orientation information, and determine a spatial orientation cue corresponding to the orientation information. And then, the terminal equipment can reserve a transparent transmission model of the spatial orientation information to process the two audio signals, and superimpose a spatial orientation clue on the audio after transparent transmission to obtain the first target signal and the second target signal.
S309: and the terminal equipment sends the first target signal to a first earphone.
S310: and the terminal equipment sends the second target signal to a second earphone.
After the terminal device obtains the first target signal and the second target signal, the terminal device may send the first target signal to the first earphone for playing, and send the second target signal to the second earphone for playing.
And (4) optional. The first target signal and the second target signal may be audio signals finally used for playing; in addition, when the earphone receives the first target signal and the second target signal, the earphone may also perform other processing (e.g., noise reduction and time-frequency conversion) on the first target signal and the second target signal, and use the obtained new audio signal as the audio signal to be played finally. However, the sound that is ultimately played to the user is different in time and amplitude (sound pressure), and the user can recognize the direction of the environmental sound by this difference.
Fig. 17 is a schematic diagram of an attitude division situation according to an embodiment of the present application. As shown in fig. 17, the position of the person is simplified to a point P as viewed from above the head of the person downward, and assuming that the azimuth angle in the front direction of the person is 0 °, the entire space can be regarded as a circular area including 0 ° to 360 ° horizontally centered on the point P. And since P0 is located directly in front of the person, it can be understood that at P0, the azimuth angle 0 ° and the azimuth angle 360 ° coincide. It can be seen that the azimuth angle corresponding to the position of the point P0' is 180 °, the azimuth angle corresponding to the position of the point P1 is 15 °, the azimuth angle corresponding to the position of the point P2 is 165 °, the azimuth angle corresponding to the position of the point P3 is 195 °, and the azimuth angle corresponding to the position of the point P4 is 345 °.
As is apparent from the above description, human beings can perceive three-dimensional characteristics of sound and localize sound sources because of differences in time and volume between the human ears. This difference is mainly due to the different distances of the sound source from the left and right ears of the person. When the difference in distance (or relative position) from the left ear and the right ear of a person is small, the difference in time and volume caused by the sound generated by the sound source to the two ears of the person is often small, and when the difference is small to a certain extent, especially when the sound source is positioned right in front of and right behind the person, the person cannot feel the difference, and the sound at this time has no sense of direction to the person. For example, in a dark region formed by the point P4, the point P, and the point P1 in fig. 17, if there is one sound source, since the distances of the sound source from the left and right ears of the person are substantially the same, there is no obvious "left-hand" or "right-hand" in both ears, and thus the person cannot recognize the directivity of the sound emitted from the sound source by both ears, and cannot locate the specific direction of the sound source. Similarly, when there is a sound source in the dark region formed by the point P2, the point P, and the point P3 in fig. 17, the person cannot distinguish the sound emitted from the sound source and the direction of the sound source. And for two audio signals received by human ears from the same sound source, we can separately determine whether the listener can feel the directivity of the sound generated for that sound source by analyzing the IACCs generated at both ears.
In addition, in an actual environment, the generation of the environmental sound is often not performed by a single sound source, but is commonly performed by a plurality of sound sources located in different directions. However, for the earphone, when the audio of the environmental sound is collected, it cannot be distinguished that there are multiple sound sources in the environment, and it cannot be known that the collected audio signal is actually generated by multiple sound sources in the environment together. However, in acoustics and biology, when the frequencies of two tones are located within one sub-band, a person hears the two tones as one. More generally, if the frequency distribution of a complex signal is located within a sub-band, the human ear perceives the signal as equivalent to a simple signal having a frequency at the center frequency of the sub-band, which is the core connotation of the sub-band. In brief, a subband refers to a range of frequencies, and a signal whose spectrum lies within this range may be replaced by a single frequency component.
Therefore, when the audio signal generated by the plurality of sound sources in the environment is collected by the earphone, the audio signal looks like the earphone to be an audio signal generated by a single sound source which can be equivalent to the plurality of sound sources. The IACC generated by the mixed audio signals emitted by the multiple sound sources at the ears can be obtained only by comprehensively analyzing the audio signals collected by the left and right earphones.
Based on the above description and the foregoing audio processing method, an embodiment of the present application further provides a flowchart of an audio processing method. The method can determine whether a listener can perceive the directivity of a sound produced by a sound source in the environment. Only when the listener can feel the directionality of the sound generated by the sound source, spatial cues are superimposed on the sound after the sound is transmitted through; when the listener can not feel the directionality of the sound generated by the sound source, the listener does not need to superpose a spatial cue on the sound, and the energy consumption of the equipment is further saved. As shown in fig. 18, the method may include:
and S1801, collecting a first signal and a second signal.
For details of step S1081, reference may be made to the foregoing description related to step S1101 in fig. 11, and details are not repeated here.
S1802, performing spectrum analysis on the first signal and the second signal, respectively, to obtain N subbands of the first signal and the second signal.
It will be appreciated that the first signal picked up by the first earpiece and the second signal picked up by the second earpiece are in fact time domain signals, but "sub-bands" are concepts related to the frequency domain. Therefore, after the first signal and the second signal are collected by the earphone, and are analyzed, the first signal and the second signal can be converted into frequency domain signals through FFT and the like, and then sub-band division and related analysis work are performed.
Fig. 19 exemplarily shows a spectrum diagram of the first signal and the second signal. As shown in fig. 19, the upper spectrogram is a spectrogram of the first signal (hereinafter referred to as a first spectrum), and the lower spectrogram is a spectrogram of the second signal (hereinafter referred to as a second spectrum). The two spectrograms correspond to the same ambient sound, which may be generated from a single source or from multiple sources. The first frequency spectrum and the second frequency spectrum are divided into four sub-band frequency bands according to the frequency, namely a frequency band f1, a frequency band f2, a frequency band f3 and a frequency band f4. In the first frequency spectrum, the sub-bands corresponding to the frequency band f1, the frequency band f2, the frequency band f3 and the frequency band f4 are respectively a sub-band ab, a sub-band bc, a sub-band cd and a sub-band de; in the second spectrum, the subbands corresponding to the frequency band f1, the frequency band f2, the frequency band f3, and the frequency band f4 are respectively a subband a 'b', a subband b 'c', a subband c'd', and a subband d 'e'. It will be appreciated that the first and second spectra shown in fig. 19 are for the convenience of the reader only and do not represent shapes and patterns in the actual scene of the spectrograms of the first and second signals described above.
S1803, determining binaural cross-correlation coefficients of subbands in the same frequency band in the N subbands of the first signal and the N subbands of the second signal.
Taking the first spectrum and the second spectrum in fig. 19 as an example, the value of N is 4. At this time, the subband ab and the subband a 'b' are subbands located in the same frequency band, the subband bc and the subband b 'c' are subbands located in the same frequency band, the subband cd and the subband c'd' are subbands located in the same frequency band, and the subband de and the subband d 'e' are subbands located in the same frequency band. Therefore, the four groups of sub-bands in the same frequency band can be analyzed respectively to obtain respective binaural cross-correlation coefficients IACC of the four pairs of sub-bands.
And S1804, judging whether the number of the target sub-band pairs is larger than a first threshold value.
The target subband pair represents a subband pair having a binaural cross-correlation coefficient smaller than a second threshold value among the N subbands of the first signal and the N subbands of the second signal. The present application does not limit the specific values of the first threshold and the second threshold. It should be understood that the subband pair here represents subbands in which the N subbands of the first signal and the N subbands of the second signal are in the same frequency band, such as a subband ab and a subband a 'b' in fig. 19.
When the IACC of the subband pair in the same frequency band is smaller than the second threshold, it may be considered that the sounds corresponding to the two subbands may bring a difference that can be perceived by the user to the two ears of the user. When the IACCs of more than a certain number of subband pairs in the N subband pairs in the first frequency spectrum and the second frequency spectrum are smaller than the second threshold, the difference between the sounds corresponding to the first signal and the second signal, which are generally brought to the ears of the user, can be perceived by the user.
Also, taking the first spectrum and the second spectrum in fig. 19 as an example, assuming that the specific value of the first threshold is 2, the specific value of the second threshold is 0.5, and the IACC of the sub-band ab and the sub-band a 'b', the sub-band bc and the sub-band b 'c', the sub-band cd and the sub-band c'd', and the sub-band de and the sub-band d 'e' is 0.3, 0.6, 0.2, and 0.2, respectively. As a result of analysis, the ICCA of the sub-band bc and the sub-band b ' c ' is greater than the second threshold, but the ICCA of the sub-band ab and the sub-band a ' b ', the sub-band cd and the sub-band c'd ', and the ICCA of the sub-band de and the sub-band d ' e ' are less than the second threshold ', and the number of target sub-band pairs is 3 (greater than 2).
If the number of the target subband pairs is greater than the first threshold, then step S1805-step S1808 are executed, and the process ends; if the number of the target subband pairs is not greater than the first threshold, step S1809 is executed next, and the process ends.
S1805, determining the direction information of the sound of the external environment according to the first signal and the second signal.
And S1806, respectively performing transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal.
S1807, determine a spatial orientation cue corresponding to the orientation information.
S1808, respectively adjusting the first audio signal and the second audio signal according to the spatial direction cue to obtain a first target signal and a second target signal.
S1809, performing transparent transmission processing on the first signal and the second signal, respectively, to obtain a first audio signal and a second audio signal.
For details of steps S1085-S1808, reference may be made to the foregoing description of steps S1102-S1105 in fig. 11; the details of step S1809 can refer to the foregoing description about steps S1102-S1105 in fig. 11. And will not be described in detail herein. It is to be understood that, in the present embodiment, step S1809 and step S1086 are actually the same operation, and the "first audio signal" in step S1809 may be the same audio signal as the "first audio signal" in step S1806, the "second audio signal" in step S1809, and the "second audio signal" in step S1806.
Furthermore, as can be seen from the above description, the performance of the first and second earphones will determine who will handle most of the audio processing. Similarly, in this embodiment of the application, the performer in steps S1802-S1809 may be the first earphone or the second earphone, specifically based on the performance of the first earphone and the second earphone. For example, assuming that in this embodiment, only the first headset deploys the transparent transmission model provided in this embodiment for reserving the spatial orientation information, steps S1802 to S1809 are all completed by the first headset.
The following description of embodiments of the present application provides a system. The system includes a terminal device 100 and an earphone 200. The terminal device 100 is connected to the headset 200, and the connection may be a wireless connection or a wired connection (specifically, refer to the system architecture diagram shown in fig. 2). For the wireless connection, for example, the terminal device may be connected to the headset through bluetooth technology, wireless fidelity (Wi-Fi) technology, infrared IR technology, or ultra-wideband technology.
In the embodiment of the present application, the terminal device 100 may be a mobile phone (mobile phone), a vehicle-mounted device (e.g., an On Board Unit (OBU)), a tablet computer (pad), a computer with a data transceiving function (e.g., a laptop computer, a palmtop computer, etc.), a Mobile Internet Device (MID), a terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a terminal in transportation safety (transportation safety), a terminal in smart city (smart city), a terminal in smart home (smart home), a terminal device in a 5G network, or a terminal device in a Public Land Mobile Network (PLMN) that is evolved in the future, and the like.
It is understood that the present application is not limited to the specific form of the terminal device 100.
Specifically, the terminal device 100 may be the terminal device in the foregoing description.
The headset 200 includes two sound-producing units that hang on the sides of the ears. A left ear fitted may be referred to as a left earpiece and a right ear fitted may be referred to as a right earpiece. In some embodiments of the present application, the left and right earphones may also be referred to as first and second earphones; specifically, when the left earphone is the first earphone, the right earphone is the second earphone, and similarly, when the left earphone is the second earphone, the right earphone is the first earphone. From a wearing perspective, the earphone 200 in the embodiment of the present application may be a headphone, an earphone, a neck earphone, or an earphone. Earbud headphones also include in-ear headphones (otherwise known as in-the-canal headphones) or semi-in-ear headphones. The earphone 200 has a transparent (HT) function.
Specifically, the terminal device 100 may be the earphone in the foregoing description.
As an example, an in-ear headphone is taken as an example. The left earphone and the right earphone adopt similar structures. Either the left or right headset may employ a headset structure as described below. The earphone structure (left earphone or right earphone) comprises a rubber sleeve which can be inserted into an ear canal, an ear bag close to the ear and an earphone rod hung on the ear bag. The rubber sleeve guides the sound to the auditory canal, the ear bag comprises devices such as a battery, a loudspeaker, a sensor and the like, and a microphone, a physical key and the like can be arranged on the earphone rod. The earphone rod can be in the shape of a cylinder, a cuboid, an ellipsoid and the like. A microphone arranged inside the ear may be referred to as a feedback microphone and a microphone arranged outside the headset as a feed-forward microphone. The feedforward microphone is used for collecting sounds of the external environment. The feedback microphone may be used to collect sound from the environment inside the ear canal of the user wearing the headset while the user is wearing the headset. The two microphones may be either analog or digital microphones. After the user wears the earphone, the relationship between the placement positions of the two microphones and the loudspeaker is as follows: the feedback microphone is arranged inside the ear and close to the earphone rubber sleeve. The speaker is located between the feedback microphone and the feedforward microphone. The feedforward microphone may be located in the upper portion of the earphone stem, near the external structure of the ear. The conduit of the feedback microphone may face the speaker or may face the inside of the ear canal. An earphone opening is formed near the feedforward microphone and used for transmitting external environment sound into the reference microphone.
In this embodiment, the terminal device 100 is configured to send a downlink audio signal and/or a control signaling to the headset 200. For example, the control signaling is used to control the processing mode adopted by the headset 200, and the processing mode may include a normal mode and a transparent transmission mode, and may further include other modes such as a noise reduction mode and an auditory enhancement mode. For example, in the case that the headset 200 and the terminal device 100 are connected, the user may select a processing mode of the headset through a touch screen of the terminal device. When the earphone 200 works in the transparent transmission mode, the perception of the user on the sound of the current external environment can be strengthened; specifically, the sound in the external environment includes a stop sound in a railway station, a siren sound, a stop sound in a railway station, a sound of a call of a restaurant, and the like. In the pass-through mode, the headset 200 is able to capture sounds of the external environment and pass-through to the user. It should be noted that the environmental sounds transmitted to the user through the left and right earphones by the earphone 200 are two different sounds, and the two sounds are different in playing time and amplitude (sound pressure); specifically, the time delay and amplitude of the two sounds in the playing time may be set to be the time delay and amplitude which are perceived by the left and right ears when the user is in an open-ear state (i.e., a state where the user does not wear the earphone), or may be adjusted in a certain proportion according to the time delay and amplitude which are perceived by the left and right ears when the user is in the open-ear state, as long as the user can accurately position the direction of the ambient sound according to the two sounds, which is not limited in the present application.
It should be noted that the transparent transmission function of the earphone 200 may be turned on by default when the earphone 200 is connected to the terminal 100; or the terminal device 100 provides a user interface for the user to select whether to turn on the transparent transmission function of the earphone 200 according to the requirement. The terminal device 100 sends a control signaling to the headset 200 under the operation of the user, and the control signaling is used to instruct the headset 200 to start the transparent transmission processing function.
Fig. 20 exemplarily shows the structure of the terminal device 100.
As shown in fig. 20, the terminal device 100 may further include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like.
The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the terminal device 100. In other embodiments of the present application, terminal device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processor (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
In some embodiments, the processor 110, such as the controller or the GPU, may send a downlink audio signal and/or a control signaling to the headset 200 when the terminal device 100 is communicatively connected to the headset 200, where the control signaling may be used to control a processing mode adopted by the headset 200, where the processing mode may include a normal mode and a pass-through mode, and may further include other modes, such as a noise reduction mode, an auditory enhancement mode, and the like.
In other embodiments, the processor 110 such as the controller or the GPU may be further configured to receive, when the terminal device 100 is in communication connection with the headset 200 and the working mode of the headset is the transparent transmission mode, audio signals sent by a left headset and a right headset in the headset 200, where the two audio signals are obtained by the left headset and the right headset by collecting one or multiple same sound sources in the environment. After the terminal device 100 receives the two audio signals, the processor 110, such as the controller or the GPU, may determine, based on the two audio signals, the direction information of the environmental sound corresponding to the two audio signals, and send the direction information to the headset 200; or superimpose the direction information on the two audio signals and send the resulting new audio signal to the headphones 200.
The controller may be a neural center and a command center of the terminal device 100, among others. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be called directly from memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.
The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, a charger, a flash, a camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the terminal device 100.
The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of receiving a call through a bluetooth headset.
The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, audio module 170 and wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to implement the function of playing music through a bluetooth headset.
MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture function of terminal device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the terminal device 100.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal device 100, and may also be used to transmit data between the terminal device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other terminal devices, such as AR devices and the like.
It should be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is only an exemplary illustration, and does not constitute a limitation on the structure of the terminal device 100. In other embodiments of the present application, the terminal device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the terminal device 100. The charging management module 140 may also supply power to the terminal device 100 through the power management module 141 while charging the battery 142.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.
The wireless communication function of the terminal device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the terminal device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave.
The wireless communication module 160 may provide solutions for wireless communication applied to the terminal device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
In some embodiments, the antenna 1 of the terminal device 100 is coupled to the mobile communication module 150 and the antenna 2 is coupled to the wireless communication module 160 so that the terminal device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. GNSS may include Global Positioning System (GPS), global navigation satellite system (GLONASS), beidou satellite navigation system (BDS), quasi-zenith satellite system (QZSS), and/or Satellite Based Augmentation System (SBAS).
The terminal device 100 implements a display function by the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the terminal device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The terminal device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the terminal device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the terminal device 100 selects a frequency point, the digital signal processor is used to perform fourier transform or the like on the frequency point energy.
Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the terminal device 100, for example: image recognition, face recognition, speech recognition, text understanding, and the like. The NPU can also realize the decision model provided by the embodiment of the application.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the terminal device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in the external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the terminal device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, a phonebook, etc.) created during use of the terminal device 100, and the like. In addition, the internal memory 121 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like.
The terminal device 100 may implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc. The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
In some embodiments, the audio module 170 may further adjust the playing time and amplitude of the two audio signals under the control of the processor 110 such as the controller or the GPU after the terminal device 100 receives the audio signals sent by the left earphone and the right earphone in the earphone 200 and determines the direction information of the environmental sound corresponding to the two audio signals through the processor 110 such as the controller or the GPU, so as to obtain two new audio signals. When the two new audio signals are played to the user, the auditory sensation given to the user is the same as or very similar to the auditory sensation given to the user by the ambient sound, and the user can determine the general direction of the ambient sound in the environment based on the two audio signals.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The terminal device 100 can listen to music, listen to sound in video, or listen to a handsfree call through the speaker 170A. In the embodiment of the present application, the number of the speakers 170A may be one, two, or more than two. In the audio processing method provided by the embodiment of the present application, when the number of the speakers 170A of the terminal device 100 exceeds two, it is possible to support the playback of two-channel audio.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the terminal device 100 answers a call or voice information, it is possible to answer a voice by bringing the receiver 170B close to the human ear.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking near the microphone 170C through the mouth. The terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C, which may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be the USB interface 130, or may be an Open Mobile Terminal Platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The terminal device 100 determines the intensity of the pressure from the change in the capacitance. When a touch operation is applied to the display screen 194, the terminal device 100 detects the intensity of the touch operation from the pressure sensor 180A. The terminal device 100 may also calculate the touched position from the detection signal of the pressure sensor 180A.
The gyro sensor 180B may be used to determine the motion attitude of the terminal device 100. In some embodiments, the angular velocity of terminal device 100 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the terminal device 100, calculates the distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.
The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal device 100 calculates an altitude from the barometric pressure measured by the barometric pressure sensor 180C, and assists in positioning and navigation.
The magnetic sensor 180D includes a hall sensor. The terminal device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the terminal device 100 is a folder, the terminal device 100 may detect the opening and closing of the folder according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.
The acceleration sensor 180E can detect the magnitude of acceleration of the terminal device 100 in various directions (generally, three axes). The magnitude and direction of gravity can be detected when the terminal device 100 is stationary. The method can also be used for recognizing the posture of the terminal equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The terminal device 100 may measure the distance by infrared or laser. In some embodiments, shooting a scene, the terminal device 100 may range using the distance sensor 180F to achieve fast focus.
The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal device 100 emits infrared light to the outside through the light emitting diode. The terminal device 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal device 100. When insufficient reflected light is detected, the terminal device 100 can determine that there is no object near the terminal device 100. The terminal device 100 can utilize the proximity light sensor 180G to detect that the user holds the terminal device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.
The ambient light sensor 180L is used to sense the ambient light level. The terminal device 100 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal device 100 is in a pocket, in order to prevent accidental touches.
The fingerprint sensor 180H is used to collect a fingerprint. The terminal device 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.
The temperature sensor 180J is used to detect temperature. In some embodiments, the terminal device 100 executes a temperature processing policy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the terminal device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the terminal device 100 heats the battery 142 when the temperature is below another threshold to avoid the terminal device 100 being abnormally shut down due to low temperature. In other embodiments, when the temperature is lower than a further threshold, the terminal device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation acting thereon or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the terminal device 100, different from the position of the display screen 194.
The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human voice vibrating a bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be disposed in headset 200, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone block vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signals acquired by the bone conduction sensor 180M, and the heart rate detection function is achieved.
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal device 100 may receive a key input, and generate a key signal input related to user setting and function control of the terminal device 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the terminal device 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The terminal device 100 may support one or more SIM card interfaces. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal device 100 employs eSIM, namely: an embedded SIM card. The eSIM card may be embedded in the terminal device 100 and cannot be separated from the terminal device 100.
The terminal device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. Not limited to integration in the processor 110, the ISP may also be provided in the camera 193.
In the embodiment of the present application, the number of the cameras 193 may be M, and M ≧ 2,M is a positive integer. The number of the cameras that are turned on by the terminal device 100 in the double-view video recording may be N, where N is equal to or less than M, and N is a positive integer.
The camera 193 includes a lens and a photosensitive element (which may also be referred to as an image sensor) for capturing still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal, such as an image signal in a standard RGB, YUV, or other format.
The hardware configuration and physical location of the cameras 193 may be different, and thus, the size, range, content, or sharpness of the images captured by different cameras may be different.
The image size of the camera 193 may be different or the same. The image size of the camera refers to the length and width of an image acquired by the camera. The length and width of the image can be measured in pixels. The image size of a camera may also be called image size, pixel size, or image resolution. The scale of a typical camera may include: 4:3, 16, 9 or 3:2, and so on. The map scale refers to the approximate ratio of the number of pixels in length and width of the image captured by the camera.
The cameras 193 may correspond to the same focal segment or to different focal segments. The focal segment may include, but is not limited to: a first focal length of less than a preset value of 1 (e.g., 20 mm); a second focal length greater than or equal to a first preset value and less than or equal to a second preset value (for example, 50 mm); the focal length is larger than the third focal length of the second preset value. A camera corresponding to a first focal length may be referred to as a super wide camera, a camera corresponding to a second focal length may be referred to as a wide camera, and a camera corresponding to a third focal length may be referred to as a tele camera. The smaller the field of view (FOV) of the camera, corresponding to the focal point Duan Yueda. The field angle refers to the range of angles that the optical system can image.
The cameras 193 may be provided on both sides of the terminal device. The camera in the same plane as the display 194 of the terminal device may be referred to as a front camera, and the camera in the plane of the rear cover of the terminal device may be referred to as a rear camera. The front camera may be used to capture the photographer's own image facing the display screen 194, and the rear camera may be used to capture the image of the photographic subject (e.g., person, landscape, etc.) that the photographer is facing.
In some embodiments, a camera 193 may be used to acquire depth data. For example, the camera 193 may have a (time of flight) 3D sensing module or a structured light (structured light) 3D sensing module for acquiring depth information. The camera used for collecting the depth data can be a front-facing camera or a rear-facing camera.
Video codecs are used to compress or decompress digital images. The terminal device 100 may support one or more image codecs. In this way, the terminal device 100 can open or save pictures or videos in a variety of encoding formats.
The terminal device 100 may implement a display function through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-OLED, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the terminal device 100 may include one or more display screens 194.
In some embodiments, after the terminal device 100 and the headset 200 establish the communication connection, the display screen 194 may display a corresponding control interface, which may display the connection state of the headset 200, the power of the headset 200, and the power of the charging chamber of the headset 200. In addition, the control interface may also display a plurality of mode selection controls that may be used by the user to select a corresponding mode of operation for the headset 200, such as a pass-through mode.
Fig. 21 exemplarily shows the structure of the above-described earphone 200.
As shown in fig. 21, the headset 200 includes: a processor 2601, a memory 2602, a wireless communication module 2603, and an input module 2604.
A microphone 2605, a speaker 2606, a sensor module 2607, and the like. It will be appreciated that the earphone configuration shown in fig. 21 is not limiting and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
It is to be appreciated that if the type of headset is a TWS headset, the left and right headsets may be provided with one or more of the processor 2601, memory 2602, wireless communication module 2603, input module 2604, microphone 2605, speaker 2606, and sensor module 2607, respectively, described above.
If the type of headset is a line control headset or a neck set, one or more of the sensor module 2607, the speaker 2606, the microphone 2605, and the input module 2604 may be provided on the left and right of the headset, respectively. The control module may have thereon a processor 2601, a memory 2602, a wireless communication module 2603, an input module 2604, a microphone 2605, and the like.
Among other things, the processor 2601 may include at least one of the following types: a Central Processing Unit (CPU), the Processor 2601 may also be other general purpose processors, digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), field-Programmable Gate arrays (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
A memory may also be provided in the processor 2601 for storing instructions and data. In some embodiments, memory in the processor 2601 is cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 2601. If the processor 2601 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 2601, thereby increasing the efficiency of the system.
The memory 2602 may be used to store software programs and modules, and the processor 2601 executes various functional applications of the headset and data processing by running the software programs and modules stored in the memory 2602. The memory 2602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the headset, and the like. Further, the memory 2602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The communication module 2603 is configured to receive and transmit signals under the control of the processor 2601, for example, to receive a voice signal of a call partner transmitted by a terminal device, transmit the voice signal to the terminal device, and the like. The communication module 2603 may include a Radio Frequency (RF) circuit. Typically, the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a hybrid-hybrid > LNA (low noise amplifier), a duplexer, and the like. In addition, the near field RF circuitry may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including, but not limited to, short-range communication technologies such as wireless fidelity (WiFi) communication, bluetooth communication, near field radio frequency (rf) communication, and the like.
The input module 2604 may be used to receive input key information, switch information, and generate key signal inputs and voice signal inputs related to user settings and function control of the headset. In particular, the input module 2604 may include touch keys and/or physical keys. The touch keys can collect touch operations of a user on or near the touch keys (such as operations of the user on or near the touch keys by using any suitable object or accessory such as a finger, a stylus and the like), and drive the corresponding connecting devices according to a preset program. In one implementation, the touch key may include two parts, a touch detection device and a touch controller. In addition, the touch keys can be realized by various types such as resistance type, capacitance type, infrared ray, surface acoustic wave and the like. The physical keys may include one or more of volume control keys, on-off keys, and the like.
A microphone 2605, also referred to as a "microphone", converts sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 2605 by speaking the user's mouth near the microphone 2605. A plurality of microphones 2605 may be provided on the headset.
A speaker 2606, also called a "horn", is used to convert electrical audio signals into sound signals.
The sensor module 2607 may include a pressure sensor 2607a, a temperature sensor 2607b, a distance sensor 2607c, a light sensor 2607d, an acceleration sensor 2607e, an ear entrance detection sensor 2607f, a front light distance sensor 2607g, a back light distance sensor 2607h, and the like. The pressure sensor 2607a is used for sensing a pressure signal and converting the pressure signal into an electrical signal. The pressure sensor 2607a can be of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 2607a, the capacitance between the electrodes changes. The earphone determines the intensity of the pressure according to the change of the capacitance. In some embodiments, the switch operation may be detected by the pressure sensor 2607a, and whether the headphone is in a non-wearing state may also be detected.
The acceleration sensor 2607e can detect the magnitude of acceleration of the headphone in various directions (typically three axes). The magnitude and direction of gravity can be detected when the headset is stationary. The method can also be used for identifying the state of the earphone and is applied to wearing detection.
A distance sensor 2607c for measuring distance. The headset may measure distance by infrared or laser. In some embodiments, the headset may utilize range finding with the distance sensor 2607c to enable wear detection.
A light sensor 2607d for measuring the intensity of light. In some embodiments, the headset may measure light intensity with the light sensor 2607d to enable wear detection.
The temperature sensor 2607b is used to detect temperature. In some embodiments, the headset utilizes the temperature detected by temperature sensor 2607b to enable wear detection.
A front light distance sensor 2607g and a back light distance sensor 2607h for detecting the light distance in the front and back directions on the earphone. In some embodiments, the headset may utilize front and back light distance sensors 2607g and 2607h for ranging to enable wear detection.
In addition, although not shown, the headset may further include a power module and the like, which will not be described herein.
The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer storage media and communication media, and may include any medium that can communicate a computer program from one place to another. A storage media may be any available media that can be accessed by a computer.
As one implementation, a computer-readable medium may include a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an Electrically erasable programmable read-only memory (EEPROM) or other optical disk storage, magnetic disk storage or other magnetic storage device, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
An embodiment of the present application further provides an electronic device, including: one or more processors and memory;
wherein a memory is coupled to the one or more processors, the memory for storing computer program code comprising computer instructions which are invoked by the one or more processors to cause the electronic device to perform the method shown in the preceding embodiments.
As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection of …", depending on the context. Similarly, the phrase "in determining …" or "if a (stated condition or event) is detected" may be interpreted to mean "if … is determined" or "in response to …" or "upon detection of (stated condition or event)" or "in response to detection of (stated condition or event)" depending on the context.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims (11)

1. A method for processing audio, the method comprising:
acquiring a first signal and a second signal, wherein the first signal is an audio signal acquired by acquiring sound of an external environment through a first earphone, and the second signal is an audio signal acquired by acquiring the sound of the external environment through a second earphone;
respectively carrying out transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal;
respectively performing spectrum analysis on the first signal and the second signal to obtain N sub-bands of the first signal and the second signal;
determining binaural cross-correlation coefficients of subbands in the same frequency band of the N subbands of the first signal and the second signal;
determining azimuth information of the sound of the external environment from the first signal and the second signal in the case that the number of target subband pairs is greater than a first threshold, the target subband pairs being two subbands of the N subbands of the first signal and the N subbands of the second signal for which binaural cross-correlation coefficients are less than a second threshold;
and respectively adjusting the first audio signal and the second audio signal according to the azimuth information to obtain a first target signal and a second target signal.
2. The method of claim 1, wherein the first target signal is played through the first earpiece and the second target signal is played through the second earpiece, wherein the first target signal has a different instant of play than the second target signal, and wherein the first target signal has a different loudness than the second target signal.
3. The method of claim 2, wherein the first earpiece is closer to an ambient sound source than the second earpiece, the ambient sound source is an audio source of sound of the external environment, the first target signal has a loudness greater than a loudness of the second target signal before a playback time of the second target signal.
4. The method of any of claims 1 to 3, wherein the first headphone is a left headphone, the second headphone is a right headphone, the orientation information comprises binaural orientation information and monaural orientation information, the monaural orientation information comprises left ear information and right ear information, and the adjusting the first audio signal and the second audio signal according to the orientation information to obtain a first target signal and a second target signal comprises:
adjusting loudness and relative time delay of the first audio signal and the second audio signal according to the binaural azimuth information to obtain a third audio signal and a fourth audio signal;
filtering the third audio signal through the left ear information to obtain the first target signal;
and filtering the fourth audio signal through the right ear information to obtain the second target signal.
5. The method of any of claims 1 to 3, wherein determining the orientation information of the sound of the external environment from the first signal and the second signal comprises:
determining a binaural time difference of arrival between the first signal and the second signal from the first signal and the second signal;
and determining the azimuth information of the sound of the external environment according to the time difference of the two ears of arrival.
6. The method of any of claims 1 to 3, wherein determining the orientation information of the sound of the external environment from the first signal and the second signal comprises:
determining a binaural difference in arrival between the first signal and the second signal from the first signal and the second signal;
and determining the orientation information of the sound of the external environment according to the strength difference of the two reaching ears.
7. The method according to any one of claims 1 to 3,
the separately processing the transparent transmission of the first signal and the second signal to obtain a first audio signal and a second audio signal includes:
the first earphone respectively conducts transparent transmission processing on the first signal and the second signal to obtain a first audio signal and a second audio signal;
the determining the orientation information of the sound of the external environment according to the first signal and the second signal comprises:
the first earphone determines the direction information of the sound of the external environment according to the first signal and the second signal;
the adjusting the first audio signal and the second audio signal according to the orientation information to obtain the first target signal and the second target signal includes:
and the first earphone adjusts the first audio signal and the second audio signal respectively according to the azimuth information to obtain a first target signal and a second target signal.
8. The method according to any one of claims 1 to 3,
the separately processing the transparent transmission of the first signal and the second signal to obtain a first audio signal and a second audio signal includes:
the first earphone conducts transparent transmission processing on the first signal to obtain a first audio signal,
the second earphone conducts transparent transmission processing on the second signal to obtain a second audio signal;
the determining the orientation information of the sound of the external environment according to the first signal and the second signal comprises:
the first earphone and the second earphone determine orientation information of the sound of the external environment according to the first signal and the second signal;
the adjusting the first audio signal and the second audio signal according to the orientation information to obtain the first target signal and the second target signal respectively includes:
the first earphone adjusts the first audio signal according to the azimuth information to obtain the first target signal;
and the second earphone adjusts the second audio signal according to the azimuth information to obtain the second target signal.
9. An electronic device, characterized in that the electronic device comprises: one or more processors, memory;
the memory coupled with the one or more processors, the memory to store computer program code, the computer program code comprising computer instructions that the one or more processors invoke to cause the electronic device to perform the method of any of claims 1-8.
10. A chip system, wherein the chip system is applied to an electronic device, and the chip system comprises one or more processors for calling computer instructions to cause the electronic device to execute the method according to any one of claims 1-8.
11. A computer-readable storage medium comprising instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-8.
CN202210231526.5A 2022-03-10 2022-03-10 Audio processing method and electronic equipment Active CN114727212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210231526.5A CN114727212B (en) 2022-03-10 2022-03-10 Audio processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210231526.5A CN114727212B (en) 2022-03-10 2022-03-10 Audio processing method and electronic equipment

Publications (2)

Publication Number Publication Date
CN114727212A CN114727212A (en) 2022-07-08
CN114727212B true CN114727212B (en) 2022-10-25

Family

ID=82238294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210231526.5A Active CN114727212B (en) 2022-03-10 2022-03-10 Audio processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN114727212B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116744215B (en) * 2022-09-02 2024-04-19 荣耀终端有限公司 Audio processing method and device
CN115273431B (en) * 2022-09-26 2023-03-07 荣耀终端有限公司 Device recovery method, device, storage medium and electronic device
CN115835079B (en) * 2022-11-21 2023-08-08 荣耀终端有限公司 Transparent transmission mode switching method and switching device
CN116996801B (en) * 2023-09-25 2023-12-12 福州天地众和信息技术有限公司 Intelligent conference debugging speaking system with wired and wireless access AI

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109906616A (en) * 2016-09-29 2019-06-18 杜比实验室特许公司 For determining the method, system and equipment of one or more audio representations of one or more audio-sources
CN110972018A (en) * 2019-12-13 2020-04-07 恒玄科技(上海)股份有限公司 Method and system for carrying out transparent transmission on earphone and earphone
CN111010646A (en) * 2020-03-11 2020-04-14 恒玄科技(北京)有限公司 Method and system for transparent transmission of earphone and earphone
CN111726727A (en) * 2019-03-20 2020-09-29 创新科技有限公司 System and method for processing audio between multiple audio spaces
CN112866864A (en) * 2021-02-26 2021-05-28 北京安声浩朗科技有限公司 Environment sound hearing method and device, computer equipment and earphone

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014022359A2 (en) * 2012-07-30 2014-02-06 Personics Holdings, Inc. Automatic sound pass-through method and system for earphones
EP3847827A1 (en) * 2019-02-15 2021-07-14 Huawei Technologies Co., Ltd. Method and apparatus for processing an audio signal based on equalization filter

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109906616A (en) * 2016-09-29 2019-06-18 杜比实验室特许公司 For determining the method, system and equipment of one or more audio representations of one or more audio-sources
CN111726727A (en) * 2019-03-20 2020-09-29 创新科技有限公司 System and method for processing audio between multiple audio spaces
CN110972018A (en) * 2019-12-13 2020-04-07 恒玄科技(上海)股份有限公司 Method and system for carrying out transparent transmission on earphone and earphone
CN111010646A (en) * 2020-03-11 2020-04-14 恒玄科技(北京)有限公司 Method and system for transparent transmission of earphone and earphone
CN112866864A (en) * 2021-02-26 2021-05-28 北京安声浩朗科技有限公司 Environment sound hearing method and device, computer equipment and earphone

Also Published As

Publication number Publication date
CN114727212A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN114727212B (en) Audio processing method and electronic equipment
CN113873378B (en) Earphone noise processing method and device and earphone
EP3624463B1 (en) Audio signal processing method and device, terminal and storage medium
US9271077B2 (en) Method and system for directional enhancement of sound using small microphone arrays
CN111294438B (en) Method and terminal for realizing stereo output
US20180041849A1 (en) Binaural hearing system configured to localize a sound source
CN113496708B (en) Pickup method and device and electronic equipment
CN113873379A (en) Mode control method and device and terminal equipment
CN113393856B (en) Pickup method and device and electronic equipment
CN114157945B (en) Data processing method and related device
EP4113961B1 (en) Voice call method and apparatus, system, and computer readable storage medium
US20240276157A1 (en) A hearing aid system comprising a database of acoustic transfer functions
US20230091607A1 (en) Psychoacoustics-based audio encoding method and apparatus
CN117528370A (en) Signal processing method and device, equipment control method and device
CN116347320A (en) Audio playing method and electronic equipment
CN114120950B (en) Human voice shielding method and electronic equipment
US12063477B2 (en) Hearing system comprising a database of acoustic transfer functions
CN114220454B (en) Audio noise reduction method, medium and electronic equipment
CN113542984A (en) Stereo realization system, method, electronic device and storage medium
CN114786117B (en) A kind of audio playing method and related equipment
CN114449393A (en) Sound enhancement method, earphone control method, device and earphone
CN116233696B (en) Airflow noise suppression method, audio module, sound generating device and storage medium
EP4529230A1 (en) Audio playback method and system, and related apparatus
CN116781817A (en) Binaural sound pickup method and device
CN117812523A (en) Recording signal generation method, device and system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant