CN119155621A

CN119155621A - Audio playing method and device

Info

Publication number: CN119155621A
Application number: CN202310714486.4A
Authority: CN
Inventors: 陈绍天; 陈华明; 胡贝贝
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2024-12-17

Abstract

The present application provides an audio external speaker method and device, which helps to restore the overall timbre of the audio and improve the user's audio-visual experience. The method includes: obtaining the user's auricle information, the auricle information includes at least one of the auricle length, auricle width, auricle thickness or auricle area; based on the auricle information and the first sound effect localization algorithm, determining the second sound effect localization algorithm, the first sound effect localization algorithm is used to simulate the sound signal of the preset spatial orientation, the first sound effect localization algorithm is obtained based on the amplitude spectrum data, and the amplitude spectrum data is obtained after a plurality of groups of head-related impulse response HRIR data are subjected to Fourier transformation, averaging, envelope extraction and inverse Fourier transformation; through the second sound effect localization algorithm, the audio data to be played is processed to obtain the processed audio data, and the processed audio data is played externally.

Description

Audio playing method and device

Technical Field

The present application relates to the field of terminal technologies, and in particular, to an audio playback method and apparatus.

Background

When the terminal equipment plays the audio for the user through playing, the user generally feels that the sound source of the audio is positioned at the terminal equipment, so that the user feels that the sound field of the audio lacks stereoscopic impression and height impression, and the audio-video experience of the user is influenced. Illustratively, when the notebook computer plays music for a user through play, the sound field of the music perceived by the user is located on the keyboard of the notebook computer. In order to improve the audio-visual experience of the user, the terminal device generally processes the audio data through a head-response transfer function (head-response transfer function, HRTF), and then plays the processed audio data for the user, so that the position of the sound field of the audio perceived by the user becomes high.

However, such audio playback methods may change the tone color of the audio, making the user experience less pleasant.

Disclosure of Invention

The application provides an audio playing method and device, which can restore the overall tone of audio while improving the height sense and the stereoscopic sense of the audio, thereby improving the audio-video experience of users.

The first aspect provides an audio playing method, which comprises the steps of obtaining auricle information of a user, wherein the auricle information comprises at least one of auricle length, auricle width, auricle thickness or auricle area, determining a second sound effect positioning algorithm based on the auricle information and a first sound effect positioning algorithm, wherein the first sound effect positioning algorithm is used for simulating sound signals with preset space orientations, the first sound effect positioning algorithm is obtained based on amplitude spectrum data, the amplitude spectrum data are obtained by carrying out Fourier transform, averaging, envelope extraction and Fourier inverse transformation on a plurality of groups of head related impulse response HRIR data, processing audio data to be played through the second sound effect positioning algorithm, obtaining processed audio data, and playing the processed audio data through an playing mode.

The audio playing method comprises the steps of carrying out Fourier transform, averaging, inverse Fourier transform and envelope extraction on a plurality of groups of HRIRs obtained through measurement to obtain a first sound effect positioning algorithm, calibrating parameters in the first sound effect positioning algorithm by terminal equipment based on the obtained auricle information of a user to obtain a second sound effect positioning algorithm, and processing audio data to be played by adopting the second sound effect positioning algorithm. In addition, the accuracy of the second sound effect positioning algorithm can be improved by calibrating the parameters of the first sound effect positioning algorithm based on auricle information of the user, and the terminal equipment can process the audio data to be played based on the auricle information of the user, so that the transmission mode of sound waves is adjusted based on the personalized biological characteristics of the user, and the terminal equipment can dynamically adapt to personalized high awareness of different users, so that the overall tone of the restored audio is facilitated, and the video and audio experience of the user is improved.

It should be understood that the user may be any user using the terminal device, such as a owner of the terminal device. The terminal device may obtain the auricle information of the user in any manner, for example, by means of a camera. Auricle information may include two sets of data, one set being left auricle information of a user and the other set being right auricle information of the user. The auricle information may also include a set of data, which may be any of the left auricle information or the right auricle information of the user.

In certain implementations of the first aspect, the first sound effect localization algorithm is derived based on the amplitude spectrum data and linear compensation phase data of a speaker of the terminal device.

It will be appreciated that the linearly compensated phase data may be used to ameliorate group delay inconsistencies caused by the acoustic properties of the speakers of the terminal device. The initial phase data of the loudspeaker of the terminal equipment is subjected to linear phase compensation to obtain linear phase compensated data, wherein the linear compensation phase data can be data adopted when the initial phase data is subjected to linear phase compensation, namely, the initial phase data and the linear compensation phase data can be synthesized into the linear phase compensated data. Because the phase of the loudspeaker of the terminal equipment can influence the perception of the user on the height of the audio sound field, the accuracy of the first sound effect positioning algorithm can be improved by calibrating the sound effect positioning algorithm by using the linear compensation phase data, and thus, the sound field position of the audio perceived by the user is closer to the preset space azimuth, and the audio and video experience of the user can be improved.

In some implementations of the first aspect, the determining a second sound effect positioning algorithm based on the auricle information and the first sound effect positioning algorithm includes calibrating the first sound effect positioning algorithm based on the auricle information to obtain a calibrated first sound effect positioning algorithm, and performing binaural time difference ITD calibration and/or binaural intensity difference ILD calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm.

It should be appreciated that the localization of the sound source by the user is not only related to the auricle information of the user, but also related to the ITD and ILD, so that the ITD and/or ILD calibration of the calibrated first sound effect localization algorithm can further improve the accuracy of the second sound effect localization algorithm and restore the overall tone of the audio.

In some implementations of the first aspect, performing ILD calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm includes performing ILD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth between the user's head and a terminal device to obtain a third sound effect positioning algorithm, and determining the third sound effect positioning algorithm as the second sound effect positioning algorithm.

It should be appreciated that ILD calibration of the calibrated first sound localization algorithm may be understood as adjusting parameters in the calibrated first sound localization algorithm. And (3) taking the center of the head of the user as an origin, establishing a two-dimensional coordinate system in a plane where a connecting line between the center of the head of the user and the sound source is positioned, wherein a y-axis is a straight line in the vertical direction, and then the horizontal azimuth angle can be the angle between the connecting line of the center of the head of the user and the sound source and the y-axis. Alternatively, the horizontal azimuth is a value greater than 0 when the position of the sound source is at a first quadrant in the two-dimensional coordinate system, and is a value less than 0 when the position of the sound source is at a fourth quadrant in the two-dimensional coordinate system.

In some implementations of the first aspect, performing ITD calibration and ILD calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm includes performing ILD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth between the user's head and a terminal device to obtain a third sound effect positioning algorithm, and performing ITD calibration on the third sound effect positioning algorithm based on the horizontal azimuth between the user's head and the terminal device, a preset human head radius and a sound velocity to obtain the second sound effect positioning algorithm.

It is understood that through carrying out ILD calibration and ITD calibration on the calibrated first sound effect positioning algorithm in sequence, the obtained second sound effect positioning algorithm can process audio data based on a horizontal azimuth angle and a preset human head radius, so that the second sound effect positioning algorithm can be dynamically adapted to personalized height perception of different users, accuracy of the sound effect positioning algorithm is improved, terminal equipment can process the audio data based on personalized characteristic information of the users, and therefore overall tone color of audio is restored, and video experience of the users is improved.

In some implementations of the first aspect, the second sound effect positioning algorithm includes a first ILD calibration sound effect positioning algorithm and a second ILD calibration sound effect positioning algorithm, the performing ILD calibration on the calibrated first sound effect positioning algorithm includes determining a target left adjustment gain parameter from a plurality of left adjustment gain parameters based on the horizontal azimuth and a preset correspondence, determining a target right adjustment gain parameter from a plurality of right adjustment gain parameters, the preset correspondence including a correspondence among a plurality of angles, the plurality of left adjustment gain parameters, and the plurality of right adjustment gain parameters, using the target left adjustment gain parameter as a coefficient of the calibrated first sound effect positioning algorithm to obtain the first ILD calibration sound effect positioning algorithm, and using the target right adjustment gain parameter as a coefficient of the calibrated first sound effect positioning algorithm to obtain the second ILD calibration sound effect positioning algorithm.

It should be understood that the plurality of angles may include horizontal azimuth angles that may exist, for example, horizontal azimuth angles between-90 ° and 90 °, and then the plurality of angles may include angles between-90 ° and 90 °. The preset corresponding relation comprises a corresponding relation between a plurality of angles and a plurality of left adjustment gain parameters and a corresponding relation between a plurality of angles and a plurality of right adjustment gain parameters, namely, each angle in the plurality of angles corresponds to one left adjustment gain parameter and one right adjustment gain parameter. The azimuth horizontal angle is one of a plurality of angles.

In some implementations of the first aspect, the processing the audio data to be played through the second sound effect positioning algorithm to obtain processed audio data, and playing the processed audio data through an external playing mode includes processing the audio data to be played through the first ILD calibration sound effect positioning algorithm to obtain first processed audio data, processing the audio data to be played through the second ILD calibration sound effect positioning algorithm to obtain second processed audio data, and playing the first processed audio data and the second processed audio data through an external playing mode.

It should be appreciated that the first ILD calibration sound effect positioning algorithm may also be referred to as a left channel height filter and the second ILD calibration sound effect positioning algorithm may also be referred to as a right channel height filter. The first ILD calibration sound effect localization algorithm and the second ILD calibration sound effect localization algorithm may be different such that the first processed audio data and the second processed audio data may be different, thereby making the user binaural received audio different. The first ILD calibration sound effect positioning algorithm and the second ILD calibration sound effect positioning algorithm are obtained by carrying out ILD calibration on the calibrated first sound effect positioning algorithm, the accuracy of the sound effect positioning algorithm can be improved, and the terminal equipment can process audio played on the basis of the horizontal azimuth angle between the head of the user and the terminal equipment, so that the terminal equipment can dynamically adapt to personalized height perception of different users, process audio data on the basis of personalized characteristic information of the users, and therefore the method is beneficial to restoring the overall tone of the audio and improving the audio and video experience of the users.

In some implementations of the first aspect, performing ITD calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm includes performing ITD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth angle between the user's head and a terminal device, a preset human head radius, and a sound velocity to obtain the second sound effect positioning algorithm.

It should be appreciated that the preset human head radius is a preset value greater than 0, for example, may be an average of human head radii of a plurality of users. And (3) taking the center of the head of the user as an origin, establishing a two-dimensional coordinate system in a plane where a connecting line between the center of the head of the user and the sound source is positioned, wherein a y-axis is a straight line in the vertical direction, and then the horizontal azimuth angle can be the angle between the connecting line of the center of the head of the user and the sound source and the y-axis.

In certain implementations of the first aspect, the second sound effect positioning algorithm includes a first ITD calibration sound effect positioning algorithm and a second ITD calibration sound effect positioning algorithm, the first ITD calibration sound effect positioning algorithm is the calibrated first sound effect positioning algorithm, the performing ITD calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm includes determining a first parameter based on the horizontal azimuth angle, the preset head radius and the sound velocity, and calibrating the calibrated first sound effect positioning algorithm based on the first parameter to obtain the second ITD calibration sound effect positioning algorithm.

It is to be understood that the first parameter may be an ITD value calculated based on the horizontal azimuth, the preset head radius and the speed of sound, the ITD value being indicative of the time difference of transmission of sound waves by the terminal device speakers to both ears. By performing ITD calibration on the calibrated first sound effect positioning algorithm, the terminal equipment can process audio data based on the ITD value, so that the overall tone color of the audio can be restored, and the audio and video experience of a user can be improved.

In certain implementations of the first aspect, the calibrating the first sound effect positioning algorithm based on the first parameter to obtain the second ITD calibrated sound effect positioning algorithm includes using the first parameter as a time processing parameter in the first sound effect positioning algorithm after calibration to obtain the second ITD calibrated sound effect positioning algorithm.

It should be understood that the time processing parameter refers to a parameter for adjusting the time in the calibrated first sound effect localization algorithm.

In certain implementations of the first aspect, the method further includes obtaining a head radius of the user, determining a second parameter based on the horizontal azimuth, the head radius of the user, and the speed of sound, calibrating the calibrated first sound effect positioning algorithm based on the first parameter to obtain the second ITD calibrated sound effect positioning algorithm, including taking a difference between the first parameter and the second parameter as a time processing parameter in the calibrated first sound effect positioning algorithm to obtain the second ITD calibrated sound effect positioning algorithm.

It should be appreciated that the determination of the second parameter may be the same as the determination of the first parameter. The first sound effect positioning algorithm after calibration is further calibrated based on the difference value between the first parameter and the second parameter, and the obtained second sound effect positioning algorithm enables the terminal equipment to process audio data based on the head radius of the user, so that the terminal equipment can dynamically adapt to personalized high perception of different users, the accuracy of the sound effect positioning algorithm is improved, the overall tone of the audio is restored, and the audio and video experience of the user is improved.

In certain implementations of the first aspect, the acquiring the head radius of the user includes acquiring the head radius with an external spatial audio switch of a terminal device in an on state.

It should be understood that the external spatial audio switch may be a switch for controlling whether the terminal device implements the audio external method, and the external spatial audio switch may be provided in a system setup page of the terminal device. By setting the external space audio switch, a user can control whether the terminal equipment calibrates the sound effect positioning algorithm, so that the flexibility of an audio external playing method can be improved.

In some implementations of the first aspect, the determining a second sound effect positioning algorithm based on the auricle information and the first sound effect positioning algorithm includes inputting the auricle information to the auricle information encoding algorithm to obtain encoded auricle information, inputting parameters of the first sound effect positioning algorithm to the sound effect positioning algorithm encoding algorithm to obtain encoded parameters, and determining the second sound effect positioning algorithm based on the encoded auricle information and the encoded parameters.

It should be appreciated that the auricle information encoding algorithm may also be referred to as an auricle information encoder, which is capable of encoding auricle information, and the sound localization algorithm encoding algorithm may also be referred to as a sound localization algorithm encoder, which is capable of encoding parameters of the first sound localization algorithm.

In certain implementations of the first aspect, the acquiring auricle information of the user includes acquiring the auricle information with an external spatial audio switch of the terminal device in an on state.

In certain implementations of the first aspect, the method further includes displaying a first interface including a first selection button for turning the outer-space audio switch on or off, and setting a state of the outer-space audio switch to an on state or an off state based on a selection operation of the first selection button by the user.

It should be appreciated that the selection operation of the first selection button by the user may be any preset input operation, such as clicking, sliding, or the like. The external space audio switch is in an on state, and the terminal equipment can calibrate the sound effect positioning algorithm based on auricle information of the user, so that accuracy of the sound effect positioning algorithm is improved, overall tone of audio is restored, and audio and video experience of the user is improved. The outer space audio switch may also be any other name, for example, it may be whether or not to turn on the outer space audio personalized high perception.

The method further comprises the steps of displaying a second interface when the external space audio switch is in an on state, wherein the second interface comprises a second selection button for enabling or disabling a personalized sensing switch, setting the state of the personalized sensing switch to be in the on state or the off state based on the selection operation of the second selection button by a user, displaying a third interface when the personalized sensing switch is in the on state, wherein the third interface comprises a third selection button for selecting a camera acquisition function, a fourth selection button for selecting a manual input function of the user, a fifth selection button for selecting and acquiring auricle information and a sixth selection button for selecting and acquiring head information, the head information comprises the head radius of the user, displaying a fourth interface or a fifth interface based on the first selection operation and the second selection operation of the user by the user, the fourth interface comprises an image of the user acquired by the terminal device through a camera, the fifth interface comprises an input area for filling in the auricle information or the head information, and the fifth interface comprises the fifth selection button for selecting or the fifth selection operation of the fourth selection button to the fourth selection operation.

It should be understood that the selection operation of the selection button by the user may be any preset input operation, such as clicking, sliding, and the like. Under the condition that the external space audio switch is in the off state, the terminal equipment does not process the audio data to be played, so that the audio to be played is directly externally played, the audio heard by the user lacks of a sense of height and a sense of stereo, under the condition that the external space audio switch is in the on state, the terminal equipment can calibrate the sound effect positioning algorithm, thereby improving the accuracy of the sound effect positioning algorithm, being beneficial to restoring the overall tone of the audio, and improving the audio and video experience of the user.

In a second aspect, an audio playback apparatus is provided for performing the method in any of the possible implementations of the first aspect. In particular, the apparatus comprises means for performing the method in any one of the possible implementations of the first aspect described above.

In a third aspect, there is provided a further audio playback apparatus comprising a processor coupled to a memory, operable to execute instructions in the memory to implement a method as in any one of the possible implementations of the first aspect. Optionally, the apparatus further comprises a memory. Optionally, the apparatus further comprises a communication interface, the processor being coupled to the communication interface.

In one implementation, the apparatus is a terminal device. When the apparatus is a terminal device, the communication interface may be a transceiver, or an input/output interface.

In another implementation, the apparatus is a chip configured in a terminal device. When the apparatus is a chip configured in a terminal device, the communication interface may be an input/output interface.

In a fourth aspect, a processor is provided that includes an input circuit, an output circuit, and a processing circuit. The processing circuit is configured to receive a signal via the input circuit and transmit a signal via the output circuit, such that the processor performs the method of any one of the possible implementations of the first aspect.

In a specific implementation flow, the processor may be a chip, the input circuit may be an input pin, the output circuit may be an output pin, and the processing circuit may be a transistor, a gate circuit, a trigger, various logic circuits, and the like. The input signal received by the input circuit may be received and input by, for example and without limitation, a receiver, the output signal may be output by, for example and without limitation, a transmitter and transmitted by a transmitter, and the input circuit and the output circuit may be the same circuit, which functions as the input circuit and the output circuit, respectively, at different times. The embodiment of the application does not limit the specific implementation modes of the processor and various circuits.

In a fifth aspect, a processing device is provided that includes a processor and a memory. The processor is configured to read instructions stored in the memory to perform the method according to any one of the possible implementations of the first aspect.

Optionally, the processor is one or more, and the memory is one or more.

Alternatively, the memory may be integrated with the processor or the memory may be separate from the processor.

In a specific implementation process, the memory may be a non-transient (non-transitory) memory, for example, a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips.

It should be appreciated that the related data interaction flow may be, for example, a flow of sending indication information from a processor, and the receiving capability information may be a flow of receiving input capability information by the processor. Specifically, the data output by the processing may be output to the transmitter, and the input data received by the processor may be from the receiver. Wherein the transmitter and receiver may be collectively referred to as a transceiver.

The processing means in the fifth aspect may be a chip, and the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like, and when implemented by software, the processor may be a general-purpose processor, and the memory may be integrated in the processor, may be located outside the processor, or may exist independently, by reading software codes stored in the memory.

In a sixth aspect, there is provided a computer program product comprising a computer program (which may also be referred to as code, or instructions) which, when executed, causes a computer to perform the method of any one of the possible implementations of the first aspect.

In a seventh aspect, a computer readable storage medium is provided, which stores a computer program (which may also be referred to as code, or instructions) which, when run on a computer, causes the computer to perform the method of any one of the possible implementations of the first aspect.

Drawings

Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

Fig. 2 is a software architecture block diagram of a terminal device according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an application scenario of an audio playback method;

fig. 4 is a schematic flow chart of an audio playback method according to an embodiment of the present application;

fig. 5 is a schematic diagram of envelope extraction according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a process for obtaining a first sound effect positioning algorithm according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a horizontal azimuth angle according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating a determining process of a second sound effect positioning algorithm according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating a determining process of another second sound effect positioning algorithm according to an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating a determining process of a second sound effect positioning algorithm according to an embodiment of the present application;

FIG. 11 is an interface diagram of a notebook computer displaying a first interface according to an embodiment of the present application;

FIG. 12 is an interface diagram of a notebook computer displaying a second interface according to an embodiment of the present application;

fig. 13 is an interface schematic diagram of a display interface 1 of a notebook computer according to an embodiment of the present application;

fig. 14 is an interface schematic diagram of a display interface 2 of a notebook computer according to an embodiment of the present application;

Fig. 15 is an interface schematic diagram of a display interface 3 of a notebook computer according to an embodiment of the present application;

fig. 16 is an interface schematic diagram of a display interface 4 of a notebook computer according to an embodiment of the present application;

fig. 17 is an interface schematic diagram of a display interface 5 of a notebook computer according to an embodiment of the present application;

Fig. 18 is an interface schematic diagram of a display interface 6 of a notebook computer according to an embodiment of the present application;

FIG. 19 is a schematic diagram of a top view of a notebook computer for a user according to an embodiment of the present application;

FIG. 20 is a schematic block diagram of an audio playback apparatus according to an embodiment of the present application;

fig. 21 is a schematic block diagram of another audio playback apparatus according to an embodiment of the present application.

Detailed Description

The technical scheme of the application will be described below with reference to the accompanying drawings.

In embodiments of the present application, the words "first," "second," and the like are used to distinguish between identical or similar items that have substantially the same function and effect. For example, the first value and the second value are merely for distinguishing between different values, and are not limited in order. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes an association of associated objects, meaning that there may be three relationships, e.g., A and/or B, and that there may be A alone, while A and B are present, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a, b, or c) of a, b, c, a-b, a-c, b-c, or a-b-c may be represented, wherein a, b, c may be single or plural.

In order to better understand the terminal device in the embodiment of the present application, the hardware structure of the terminal device in the embodiment of the present application is described in detail below with reference to fig. 1.

Fig. 1 is a schematic structural diagram of a terminal device 100 according to an embodiment of the present application. As shown in fig. 1, the terminal device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display 194, a subscriber identity module (subscriber identification module, SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal device 100. In other embodiments of the application, terminal device 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SERIAL DATA LINE, SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example, the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the terminal device 100.

The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example, the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement bluetooth functions. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (CAMERA SERIAL INTERFACE, CSI), display serial interfaces (DISPLAY SERIAL INTERFACE, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing function of terminal device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display function of the terminal device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal device 100, or may be used to transfer data between the terminal device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other terminal devices, such as AR devices, etc.

It should be understood that the interfacing relationship between the modules illustrated in the embodiment of the present application is only illustrative, and does not constitute a structural limitation of the terminal device 100. In other embodiments of the present application, the terminal device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the terminal device 100. The charging management module 140 may also supply power to the terminal device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the terminal device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the terminal device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example, the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the terminal device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied on the terminal device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of terminal device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that terminal device 100 may communicate with a network and other devices via wireless communication techniques. The wireless communication techniques can include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation SATELLITE SYSTEM, GLONASS), a beidou satellite navigation system (beidou navigation SATELLITE SYSTEM, BDS), a quasi zenith satellite system (quasi-zenith SATELLITE SYSTEM, QZSS) and/or a satellite based augmentation system (SATELLITE BASED AUGMENTATION SYSTEMS, SBAS).

The terminal device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an organic light-emitting diode (OLED), an active-matrix organic LIGHT EMITTING diode (AMOLED), a flexible light-emitting diode (FLED), miniled, microLed, micro-oLed, a quantum dot LIGHT EMITTING diode (QLED), or the like. In some embodiments, the terminal device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The terminal device 100 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the terminal device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in various encoding formats, such as moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent recognition of the terminal device 100, for example, image recognition, face recognition, voice recognition, text understanding, etc., can be realized through the NPU.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to realize expansion of the memory capability of the terminal device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (such as audio data, phonebook, etc.) created during use of the terminal device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the terminal device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The terminal device 100 can listen to music or to handsfree talk through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the terminal device 100 receives a call or voice message, it is possible to receive voice by approaching the receiver 170B to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal device 100 may be further provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the source of sound, implement directional recording functions, etc.

The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be a USB interface 130 or a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The terminal device 100 determines the intensity of the pressure according to the change of the capacitance. When a touch operation is applied to the display 194, the terminal device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The terminal device 100 may also calculate the position of the touch from the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity smaller than a first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the terminal device 100. In some embodiments, the angular velocity of the terminal device 100 about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects the angle of shake of the terminal apparatus 100, calculates the distance to be compensated for by the lens module according to the angle, and allows the lens to counteract the shake of the terminal apparatus 100 by the reverse movement, thereby realizing anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal device 100 calculates altitude from barometric pressure values measured by the barometric pressure sensor 180C, aiding in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The terminal device 100 can detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the terminal device 100 is a folder, the terminal device 100 may detect opening and closing of the folder according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 180E can detect the magnitude of acceleration of the terminal device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the terminal device 100 is stationary. The method can also be used for identifying the gesture of the terminal equipment, and is applied to the applications such as horizontal and vertical screen switching, pedometers and the like.

A distance sensor 180F for measuring a distance. The terminal device 100 may measure the distance by infrared or laser. In some embodiments, the terminal device 100 may range using the distance sensor 180F to achieve fast focusing.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal device 100 emits infrared light outward through the light emitting diode. The terminal device 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the terminal device 100. When insufficient reflected light is detected, the terminal device 100 may determine that there is no object in the vicinity of the terminal device 100. The terminal device 100 can detect that the user holds the terminal device 100 close to the ear to talk by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense ambient light level. The terminal device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal device 100 is in a pocket to prevent false touches.

The fingerprint sensor 180H is used to collect a fingerprint. The terminal device 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 180J is for detecting temperature. In some embodiments, the terminal device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the terminal device 100 performs a reduction in the performance of a processor located near the temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the terminal device 100 heats the battery 142 to avoid the low temperature causing the terminal device 100 to shut down abnormally. In other embodiments, when the temperature is below a further threshold, the terminal device 100 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.

The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the terminal device 100 at a different location than the display 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may analyze the voice signal based on the vibration signal of the sound portion vibration bone block obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 180M, so as to implement a heart rate detection function.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The terminal device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the terminal device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be contacted and separated from the terminal apparatus 100 by being inserted into the SIM card interface 195 or by being withdrawn from the SIM card interface 195. The terminal device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal device 100 interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, the terminal device 100 employs an eSIM, i.e., an embedded SIM card. The eSIM card can be embedded in the terminal device 100 and cannot be separated from the terminal device 100. The software system of the terminal device 100 may employ a layered architecture, an event driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the terminal device 100 is illustrated.

Fig. 2 is a software architecture block diagram of a Windows system of the terminal device 100 according to an embodiment of the present application.

As shown in fig. 2, the layered architecture of the Windows system is mainly divided into a user mode and a kernel mode, wherein the user mode may include user application program processes, system processes, service processes and the like, and the kernel mode may include a kernel, a device driving layer, a hardware abstraction layer and the like.

The user application process may execute a series of applications such as music, gallery, bluetooth, WLAN, games, memos, video, etc.

The system process and the service process can comprise an input manager, a resource manager, a view system, a drawing trigger, a dynamic effect trigger, a time manager, a task manager and the like, and can provide corresponding services for the execution of the user application program process.

The input manager is used for acquiring and transmitting various input information of the user.

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The view system may be responsible for interface rendering and event handling for the application.

And the task manager is used for sequencing the drawing tasks of the plurality of controls and the drawing tasks of each control, and sending the sequencing result to the time manager.

The kernel layer and the device driver layer may include an audio driver, an image processor (graphics processing unit, GPU) driver, a display driver, a central processor (central processing unit, CPU) driver, etc., and the hardware abstraction layer may include a GPU module, a hardware configuration module, a CPU module, etc., which is not limited in this embodiment of the application.

For a better understanding of embodiments of the present application, several terms related to embodiments of the present application are described below.

1. Head-response transfer function (HRTF), HRTF is an audio localization algorithm that includes and is a functional representation of a number of factors that can affect the localization of a user's sound source. The HRTF may also be referred to as a filter, by which the sound source position perceived by the user can be adjusted.

2. The head related impulse response (HEAD RELATED impulse responses, HRIR) is a time domain representation of the HRTF. HRIR may be measured with a measuring device under certain measuring circumstances. The measurement environment may be an anechoic laboratory capable of acquiring a free sound field signal, and a speaker, a fixed-position measured object, a sensor, a data acquisition system and the like, which can be flexibly changed in position, are arranged in the anechoic laboratory, wherein the measured object may be a human being, a special dummy or the like.

When the terminal device is a user playing audio, the user generally perceives that the heard sound falls on the display screen or keyboard of the terminal device, so that the sound field of the playing audio lacks a sense of height and a sense of stereo. At present, in order to improve the sound effect experience of a user, an HRTF can be set in a terminal device, so that the terminal device externally puts audio subjected to HRTF processing, the sound heard by the user is improved in height, and the user has a more stereoscopic impression.

In the following, referring to fig. 3, a terminal device is taken as a notebook computer as an example, and an application scenario of the audio playback method will be described in detail.

Fig. 3 is a schematic diagram of an application scenario 300 of an audio playback method. As shown in fig. 3, when the notebook computer plays out audio, such as playing out music, the sound field of the music heard by the user is located on the keyboard of the notebook computer, such as audio sound field 301 in fig. 3. When the sound effect positioning algorithm is arranged in the notebook computer, the notebook computer can process the audio data of the music through the sound effect positioning algorithm to obtain the externally-played audio data, so that when the notebook computer externally-played the externally-played audio data through the loudspeaker, the audio sound field 302 of the music heard by the user is higher than the audio sound field 301, and the music heard by the user has a higher sense of height and a stereoscopic sense. The sound effect positioning algorithm can be used for changing a transmission mode of audio data transmitted from the notebook computer to the ears of the user, so that the position of a sound field of audio heard by the user is changed, and for example, the sound effect positioning algorithm can be an HRTF.

It should be understood that the application scenario 300 shown in fig. 3 is only an example, where the terminal device may also be a device such as a mobile phone, a tablet computer, or a smart bracelet, a system installed in the terminal device may be a Windows system or an android system, and when the audio processed by the sound effect positioning algorithm is externally placed on the terminal device, the position of the sound field of the audio heard by the user may also be changed, for example, the position of the sound field becomes low, and the specific form of the terminal device, the type of the system installed in the terminal device, and the changing manner of the sound field position felt by the user are not limited in the present application.

At present, the HRTF set in the terminal equipment can be determined by acquiring N groups of head related impulse responses (HEAD RELATED impulse responses, HRIR) based on a preset elevation angle and/or a horizontal angle in a test environment to obtain an HRIR data set, performing Fourier transform on the HRIR data set to obtain a plurality of frequency domain signals of the HRIR data set, averaging the plurality of frequency domain signals to obtain a frequency domain average signal, performing inverse Fourier transform on the frequency domain average signal to obtain a time domain pulse signal, wherein the absolute value of the time domain pulse signal can be a target HRTF, and the target HRTF can also be called a height filter. The audio signal is processed through the height filter to obtain an external audio signal, when the loudspeaker of the terminal equipment plays the external audio signal, the sound field position of the sound heard by the user can be increased, so that the sound heard by the user has a height sense and a three-dimensional sense, and the user video experience is improved.

Therefore, in the environment where the user actually puts out the audio through the terminal equipment, the environment where the user is located is different from the test environment, and the phenomenon of reflection and the like of sound waves can be generated, so that the propagation of the sound waves is blocked, in addition, auricle information, head azimuth information and the like of the user and the tested object can be different, and therefore, the HRTF is difficult to dynamically adapt to the personalized high perception of different users, the tone color of the audio processed by the HRTF heard by the user can be changed, and the audio and video experience of the user is affected.

In order to solve the problems, the application provides an audio playing method, which comprises the steps of carrying out Fourier transform, averaging, envelope extraction and Fourier inverse transform on a plurality of groups of HRIRs obtained through measurement to obtain a first sound effect positioning algorithm, and then calibrating parameters of the first sound effect positioning algorithm by terminal equipment based on the obtained auricle information of a user, the head radius of the user, the horizontal azimuth angle between the head of the user and the terminal equipment and the like to obtain a second sound effect positioning algorithm, and processing audio data to be played by adopting the second sound effect positioning algorithm. In addition, parameters of a first sound effect positioning algorithm are calibrated through auricle information of a user, head radius of the user, horizontal azimuth angle between the head of the user and terminal equipment and the like, so that the terminal equipment can process audio data to be played based on personalized characteristic information of the user such as auricle information of the user, head radius of the user, horizontal azimuth angle between the head of the user and the terminal equipment and the like, and accordingly, the transmission mode of sound waves is adjusted based on the personalized characteristic information of the user, so that the terminal equipment can dynamically adapt to personalized high perception of different users, and the overall tone color of the restored audio is facilitated, and video and audio experience of the user is improved.

The audio playback method of the present application will be described in detail with reference to fig. 4 to 19. The illustrated embodiments of the present application may be implemented by a terminal device, such as a notebook computer, a mobile phone, a tablet computer, a smart band, etc. The specific form and number of the devices shown therein are only examples and should not be construed as limiting the practice of the method provided by the present application in any way. The audio playback method according to the embodiment of the present application will be described in detail below using a terminal device as an execution body.

It should be understood that the terminal device may be the terminal device itself, or a chip, a chip system or a processor supporting the terminal device to implement the audio play-out method, or may be a logic module or software capable of implementing all or part of the functions of the terminal device, where the hardware structure of the terminal device may be as shown in fig. 1, and the software structure may be as shown in fig. 2, and the application is not limited thereto specifically.

Fig. 4 is a flowchart of an audio playback method 400 according to an embodiment of the present application. The method 400 includes the steps of:

s401, acquiring auricle information of a user, wherein the auricle information comprises at least one of auricle length, auricle width, auricle thickness or auricle area.

S402, determining a second sound effect positioning algorithm based on auricle information and a first sound effect positioning algorithm, wherein the first sound effect positioning algorithm is used for simulating sound signals with preset space orientations, the first sound effect positioning algorithm is obtained based on amplitude spectrum data, and the amplitude spectrum data are obtained by performing Fourier transformation, averaging, envelope extraction and inverse Fourier transformation on multiple groups of head related impulse response HRIR data.

It should be appreciated that the preset spatial orientation may be a preset user perceived audio field position, such as 302 in scene 300. The first sound localization algorithm may be a function derived based on a preset spatial orientation, which may also be referred to as a first height filter. The preset spatial orientation may be different from an initial spatial orientation, which is an audio sound field position when the terminal device is playing audio that has not been processed by the sound localization algorithm, such as 301 in the scene 300.

The second sound localization algorithm may also be referred to as a second height filter. The second sound effect positioning algorithm may be a transfer function obtained by calibrating the first sound effect positioning algorithm by the terminal device based on auricle information of the user. The first sound effect positioning algorithm is calibrated through auricle information of the user, so that the second sound effect positioning algorithm can process audio data based on auricle information of the user, and therefore, the sound field position of the audio processed by the second sound effect positioning algorithm felt by the user is closer to a preset space orientation.

Envelope extraction may also be referred to as extracting an envelope, and may be performed by a method such as hilbert-huang transform (HHT), local peak detection, spline fitting, or the like. By the envelope extraction, characteristic signals, such as a high-frequency signal, a low-frequency signal, and the like, in the frequency-domain average signal obtained after the inverse fourier transform can be extracted. In a specific example, as shown in fig. 5, (a) in fig. 5 is an envelope extraction schematic diagram of a frequency domain average signal of a right channel, wherein a curve 501 is a frequency domain average signal before envelope extraction, and after envelope extraction, a right channel analysis signal, i.e., a curve 502 is obtained, and (b) in fig. 5 is an envelope extraction schematic diagram of a frequency domain average signal of a left channel, wherein a curve 503 is a frequency domain average signal before envelope extraction, and after envelope extraction, a left channel analysis signal, i.e., a curve 504 is obtained.

The determination process of the first sound effect localization algorithm is described in detail below with reference to fig. 6.

Fig. 6 is a schematic diagram of a process for acquiring a first sound effect positioning algorithm according to an embodiment of the present application. As shown in fig. 6, firstly, M HRIR data are acquired based on a preset height to obtain HRIR data sets, M is a positive integer, fourier transformation is performed on the HRIR data sets to obtain frequency domain signal data sets corresponding to the HRIR data sets, and the frequency domain signal data sets are averaged to obtain frequency domain average signals, wherein the frequency domain average signals can be determined by a formula (1):

Wherein, The method comprises the steps of obtaining a frequency domain average signal, wherein H _k (t, f) is a kth frequency domain signal in a frequency domain signal data set, k is a positive integer which is more than or equal to 1 and less than or equal to M, t is a time period, and f is frequency.

And then, carrying out envelope extraction on the frequency domain average signal, and synthesizing the frequency domain average signal and the characteristic signals extracted by the envelope into analysis signals. Alternatively, the envelope extraction is performed by HHT. The analytical signal can be determined by equation (2):

wherein H (t, f) is an analytic signal, Is a frequency domain average signal, t is a time period, and f is a frequency.

Then, the analysis signal is subjected to inverse fourier transform to obtain a time domain pulse signal. Alternatively, the inverse fourier transform may be specifically an inverse discrete fourier transform (INVERSE DISCRETE fourier transform, IDFT), and the time domain pulse signal may be determined by equation (3):

h(t)=IDFT(H(t,f)) (3)

Wherein H (t) is a time domain pulse signal, H (t, f) is an analysis signal, t is a time period, and f is a frequency.

The absolute value of the time domain pulse signal may be referred to as amplitude spectrum data.

In one possible implementation, the amplitude spectrum data is a first sound effect localization algorithm. The first sound localization algorithm may be determined by equation (4):

f(t)=abs(h(t)) (4)

Where f (t) is amplitude spectrum data, h (t) is a time domain pulse signal, abs is an absolute function, and t is a time period.

The discrete form of the magnitude spectrum data may be as shown in equation (5):

f(n)=abs(h(n)) (5)

Where f (n) is a discrete form of amplitude spectrum data, h (n) is a discrete form of a time domain pulse signal, and n is a time point.

S403, processing the audio data to be played through a second sound effect positioning algorithm to obtain processed audio data, and playing the processed audio data in an external playing mode.

It should be appreciated that the audio data to be played may be any audio that the terminal device is capable of playing, such as music, audio in video, etc. The audio data to be played is processed through the second sound effect positioning algorithm, so that the transmission mode of sound waves corresponding to the audio to be played to the ears of the user can be changed, and the user can feel that the sound field position of the audio to be played is located in a preset space direction.

As an alternative embodiment, the first sound localization algorithm is derived based on linear compensation phase data and amplitude spectrum data of the speaker of the terminal device.

In one possible implementation, the first sound localization algorithm is determined by equation (6):

S(n)=abs(h(n)).*e^-jp(n) (6)

Wherein S (n) is a first sound effect localization algorithm, abs (h (n)) is a discrete form of amplitude spectrum data, p (n) is linear compensation phase data, and n is a time period.

As an alternative embodiment, S402 may be implemented by calibrating the first sound localization algorithm based on auricle information to obtain a calibrated first sound localization algorithm, and performing binaural time difference (ITD) calibration and/or binaural intensity difference (ILD) calibration on the calibrated first sound localization algorithm to obtain a second sound localization algorithm.

It should be understood that the first sound effect positioning algorithm is calibrated, so as to obtain a calibrated first sound effect positioning algorithm, which can be understood as adjusting parameters in the first sound effect positioning algorithm. The localization of the sound source by the user is not only related to auricle information of the user, but also related to ITD and ILD, so that the accuracy of the second sound effect localization algorithm can be further improved by performing ITD and/or ILD calibration on the calibrated first sound effect localization algorithm, and the overall tone color of the audio can be restored.

ITD and/or ILD calibration of the calibrated first sound effect localization algorithm may be implemented in four cases.

And in the first case, carrying out ILD calibration on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm.

In one possible implementation, the ILD calibration is performed on the calibrated first sound effect positioning algorithm, specifically, the ILD calibration is performed on the calibrated first sound effect positioning algorithm based on a horizontal azimuth angle between the head of the user and the terminal equipment to obtain a third sound effect positioning algorithm, and the third sound effect positioning algorithm is determined to be the second sound effect positioning algorithm.

It should be appreciated that ILD calibration of the calibrated first sound localization algorithm may be understood as adjusting parameters in the calibrated first sound localization algorithm. And (3) taking the center of the head of the user as an origin, establishing a two-dimensional coordinate system in a plane where a connecting line between the center of the head of the user and the sound source is positioned, wherein a y-axis is a straight line in the vertical direction, and then the horizontal azimuth angle can be the angle between the connecting line of the center of the head of the user and the sound source and the y-axis. Alternatively, the horizontal azimuth is a value greater than 0 when the position of the sound source is at a first quadrant in the two-dimensional coordinate system, and is a value less than 0 when the position of the sound source is at a fourth quadrant in the two-dimensional coordinate system. Fig. 7 is a schematic diagram of a horizontal azimuth angle according to an embodiment of the present application. As shown in fig. 7, a point O is the center of the user' S head 701, a point S is a sound source, a coordinate system is established in a plane where the OS straight line is located with the point O as an origin, a y-axis is a vertical straight line, and a horizontal azimuth is an angle between the y-axis and the straight line OS. The sound source may be the center of a speaker of the terminal device.

In one possible implementation, the second sound effect positioning algorithm comprises a first ILD calibration sound effect positioning algorithm and a second ILD calibration sound effect positioning algorithm, the first sound effect positioning algorithm after calibration is subjected to ILD calibration, the method comprises the steps of determining a target left adjustment gain parameter from a plurality of left adjustment gain parameters based on a horizontal azimuth angle and a preset corresponding relation, determining a target right adjustment gain parameter from a plurality of right adjustment gain parameters, the preset corresponding relation comprises a plurality of angles, a plurality of left adjustment gain parameters and a corresponding relation among a plurality of right adjustment gain parameters, the target left adjustment gain parameter is used as a coefficient of the first sound effect positioning algorithm after calibration to obtain the first ILD calibration sound effect positioning algorithm, and the target right adjustment gain parameter is used as a coefficient of the first sound effect positioning algorithm after calibration to obtain the second ILD calibration sound effect positioning algorithm.

It will be appreciated that the plurality of angles may comprise horizontal azimuth angles that may exist, for example a relative angle between the user's head and the terminal device between-90 ° and 90 °, then the plurality of angles may comprise angles between-90 ° and 90 °. The preset corresponding relation comprises a corresponding relation between a plurality of angles and a plurality of left adjustment gain parameters and a corresponding relation between a plurality of angles and a plurality of right adjustment gain parameters, namely, each angle in the plurality of angles corresponds to one left adjustment gain parameter and one right adjustment gain parameter. The azimuth horizontal angle is one of a plurality of angles.

In one possible embodiment, the plurality of angles may include angles between-90 ° and 90 °, and the preset correspondence is shown in table 1.

Table 1 preset correspondence

The left adjustment gain parameter is d _max and the right adjustment gain parameter is d _min when the horizontal azimuth angle is-90 degrees among a plurality of angles, the left adjustment gain parameter is θ '/pi (d _max-d_min)+(d_max-d_min)/2 and the right adjustment gain parameter is θ'/pi (d _min-d_max)+(d_min-d_max)/2 when the horizontal azimuth angle is-90 degrees, the left adjustment gain parameter is (d _max+d_min)/2 and the right adjustment gain parameter is (d _max+d_min)/2 when the horizontal azimuth angle is 0 degrees, the left adjustment gain parameter is- θ '/pi (d _min-d_max)+(d_min-d_max)/2 when the horizontal azimuth angle is greater than 0 degrees and less than 90 degrees, the right adjustment gain parameter is- θ'/pi (d _min-d_max)+(d_min-d_max)/2, and the left adjustment gain parameter is d _min and the right adjustment gain parameter is d _max when the horizontal azimuth angle is 90 degrees. The sum of d _max and d _min is 2, and d _max is greater than d _min,d_max and d _min, which can change with the size of the terminal device, the size of the display screen of the terminal device, the position information of the speaker of the terminal device, the radius of the head of the user, and the like, where θ' is a horizontal azimuth angle.

In one possible implementation, the first ILD calibration sound effect localization algorithm is determined by equation (7):

S_L-ILD(n)=α*R(n) (7)

Wherein S _L-ILD (n) is a first ILD calibration sound effect positioning algorithm, R (n) is a calibrated first sound effect positioning algorithm, and alpha is a target left adjustment gain parameter. The first sound effect positioning algorithm may be S (n) in formula 6, or may be f (n) in formula 5, where S (n) or f (n) obtains R (n) after calibration based on auricle information.

In one possible implementation, the second ILD calibration sound effect localization algorithm is determined by equation (8):

S_R-ILD(n)=β*R(n) (8)

Wherein S _R-ILD (n) is a second ILD calibration sound effect positioning algorithm, R (n) is a first sound effect positioning algorithm after calibration, and beta is a target right adjustment gain parameter. The first sound effect positioning algorithm may be S (n) in the formula (6), or may be f (n) in the formula (5), where S (n) or f (n) obtains R (n) after calibration based on auricle information.

In a possible implementation manner, the step S403 may be specifically implemented by processing the audio data to be played through a first ILD calibration sound effect positioning algorithm to obtain first processed audio data, processing the audio data to be played through a second ILD calibration sound effect positioning algorithm to obtain second processed audio data, and playing the first processed audio data and the second processed audio data through an external playing manner.

It should be appreciated that the first ILD calibration sound effect positioning algorithm may also be referred to as a left channel height filter and the second ILD calibration sound effect positioning algorithm may also be referred to as a right channel height filter. The first ILD calibration sound effect localization algorithm and the second ILD calibration sound effect localization algorithm may be different such that the first processed audio data and the second processed audio data may be different, thereby making the user binaural received audio different. The first ILD calibration sound effect positioning algorithm and the second ILD calibration sound effect positioning algorithm are obtained by carrying out ILD calibration on the calibrated first sound effect positioning algorithm, so that the first ILD calibration sound effect positioning algorithm and the second ILD calibration sound effect positioning algorithm can process audio played on the basis of the horizontal azimuth angle between the head of a user and terminal equipment, the second sound effect positioning algorithm can be dynamically adapted to personalized height perception of different users, the accuracy of the sound effect positioning algorithm is improved, the overall tone of the audio is restored, and the audio and video experience of the user is improved.

And in the second case, performing ITD calibration on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm.

As an alternative embodiment, the method comprises the steps of carrying out ITD calibration on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm, wherein the ITD calibration is carried out on the calibrated first sound effect positioning algorithm based on a horizontal azimuth angle between the head of a user and terminal equipment, a preset human head radius and sound velocity to obtain the second sound effect positioning algorithm.

It should be appreciated that performing ITD calibration on the calibrated first sound localization algorithm may be understood as adjusting parameters of the calibrated first sound localization algorithm. The preset radius of the head is a preset value greater than 0, for example, the preset radius of the head may be an average value of the radii of the heads of a plurality of users.

In one possible implementation, the second sound effect positioning algorithm comprises a first ITD calibration sound effect positioning algorithm and a second ITD calibration sound effect positioning algorithm, wherein the first ITD calibration sound effect positioning algorithm is a calibrated first sound effect positioning algorithm, the calibrated first sound effect positioning algorithm is subjected to ITD calibration to obtain the second sound effect positioning algorithm, the second sound effect positioning algorithm comprises the steps of determining a first parameter based on a horizontal azimuth angle, a preset human head radius and sound velocity, and the calibrated first sound effect positioning algorithm is subjected to calibration based on the first parameter to obtain the second ITD calibration sound effect positioning algorithm.

In one possible implementation, calibrating the calibrated first sound effect positioning algorithm based on the first parameter to obtain a second ITD calibrated sound effect positioning algorithm includes using the first parameter as a time processing parameter in the calibrated first sound effect positioning algorithm to obtain the second ITD calibrated sound effect positioning algorithm.

It should be understood that the time processing parameter refers to a parameter for adjusting the time in the calibrated first sound effect localization algorithm. Calibrating the calibrated first sound effect positioning algorithm based on the first parameter may be understood as adjusting the parameter of the calibrated first sound effect positioning algorithm based on the first parameter, where the adjusted parameter is a parameter in the second ITD calibrated sound effect positioning algorithm.

In one possible embodiment, the first parameter is determined by formula (9):

Wherein δ is a first parameter, r is a preset human head radius, θ is a preset horizontal azimuth, and c is a sound velocity, wherein the preset horizontal azimuth may be 0 °.

The second ITD calibration sound localization algorithm is determined by equation (10):

S_R-ITD(n)=R(n-δ) (10)

Wherein S _R-ITD (n) is a second ITD calibration sound effect positioning algorithm, which can also be called a right channel height filter, R (n) is a calibrated first sound effect positioning algorithm, and delta is a time processing parameter. The first sound effect positioning algorithm may be S (n) in the formula (6), or may be f (n) in the formula (5), where S (n) or f (n) obtains R (n) after calibration based on auricle information.

In one possible implementation, the second ITD calibration sound localization algorithm is determined by equation (11):

S_L-ITD(n)=R(n+δ) (11)

Wherein S _L-ITD (n) is a second ITD calibration sound effect positioning algorithm, which can also be called a left channel height filter, R (n) is a calibrated first sound effect positioning algorithm, and delta is a time processing parameter. The first sound effect positioning algorithm may be S (n) in the formula (6), or may be f (n) in the formula (5), where S (n) or f (n) obtains R (t) after calibration based on auricle information.

In one possible implementation, the method 400 further includes obtaining a head radius of the user, determining a second parameter based on the horizontal azimuth, the head radius of the user, and the speed of sound, calibrating the calibrated first sound localization algorithm based on the first parameter to obtain a second ITD calibrated sound localization algorithm, including taking a difference between the first parameter and the second parameter as a time processing parameter in the calibrated first sound localization algorithm to obtain the second ITD calibrated sound localization algorithm.

It should be understood that calibrating the calibrated first sound effect positioning algorithm based on the first parameter may be understood as adjusting the parameter of the calibrated first sound effect positioning algorithm based on the first parameter, where the adjusted parameter is a parameter in the second ITD calibrated sound effect positioning algorithm. The time processing parameters may be understood as data for adjusting the time parameters in the first sound localization algorithm. The determination of the second parameter may be the same as the determination of the first parameter. The first sound effect positioning algorithm after calibration is further calibrated based on the difference value between the first parameter and the second parameter, and the obtained second sound effect positioning algorithm enables the terminal equipment to process audio data based on the head radius of the user, so that the terminal equipment can dynamically adapt to personalized high perception of different users, the accuracy of the sound effect positioning algorithm is improved, the overall tone of the audio is restored, and the audio and video experience of the user is improved.

In one possible embodiment, the first parameter is determined by equation (12):

where δ ₁ is the second parameter, r ' is the user's head radius, θ ' is the horizontal azimuth, and c is the speed of sound.

The time processing parameters are determined by equation (13):

Wherein δ ' is a time processing parameter, r ' is a head radius of the user, θ is a preset horizontal azimuth, θ ' is a horizontal azimuth, and c is a sound velocity, wherein the preset horizontal azimuth may be 0 °.

The second ITD calibration sound localization algorithm is determined by equation (14):

S_R-ITD(n)=R(n-δ’) (14)

Wherein S _R-ITD (n) is a second ITD calibration sound effect positioning algorithm, which can also be called a right channel height filter, R (n) is a calibrated first sound effect positioning algorithm, and delta' is a time processing parameter. The first sound effect positioning algorithm may be S (n) in the formula (6), or may be f (n) in the formula (5), where S (n) or f (n) obtains R (t) after calibration based on auricle information.

In one possible implementation, the second ITD calibration sound localization algorithm is determined by equation (15):

S_L-ITD(n)=R(n+δ’) (15)

Wherein S _L-ITD (n) is a second ITD calibration sound effect positioning algorithm, which can also be called a left channel height filter, R (n) is a calibrated first sound effect positioning algorithm, and delta' is a time processing parameter. The first sound effect positioning algorithm may be S (n) in the formula (6), or may be f (n) in the formula (5), where S (n) or f (n) obtains R (t) after calibration based on auricle information.

In one possible implementation, the head radius of the user is obtained, including obtaining the head radius with the external air audio switch of the terminal device in an on state.

And thirdly, performing ILD calibration and then ITD calibration on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm.

As an alternative embodiment, the ILD calibration and the ITD calibration are carried out on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm, wherein the ILD calibration is carried out on the calibrated first sound effect positioning algorithm based on a horizontal azimuth to obtain a third sound effect positioning algorithm, and the ITD calibration is carried out on the third sound effect positioning algorithm based on the horizontal azimuth between the head of a user and terminal equipment, a preset human head radius and sound velocity to obtain the second sound effect positioning algorithm.

It should be understood that performing ILD calibration and ITD calibration on the calibrated first sound effect positioning algorithm may be understood as performing ILD calibration and ITD calibration on parameters in the calibrated first sound effect positioning algorithm, where the parameters obtained after calibration are parameters in the second sound effect positioning algorithm. The ILD calibration and ITD calibration are similar to the first case and the second case, respectively, and reference is made to the above description, and the detailed description is omitted here.

In one possible implementation, the third sound effect localization algorithm includes a first ILD calibration sound effect localization algorithm and a second ILD calibration sound effect localization algorithm, the first ILD calibration sound effect localization algorithm is determined by equation (7), and the second ILD calibration sound effect localization algorithm is determined by equation (8).

The second sound effect positioning algorithm comprises a first ILD-ITD calibration sound effect positioning algorithm and a second ILD-ITD calibration sound effect positioning algorithm, wherein the first ILD-ITD calibration sound effect positioning algorithm is the first ILD calibration sound effect positioning algorithm, and the second ILD-ITD calibration sound effect positioning algorithm is obtained by performing ITD calibration on the second ILD calibration sound effect positioning algorithm based on a horizontal azimuth angle between the head of a user and terminal equipment, a preset human head radius and sound velocity.

Specifically, the second ILD-ITD calibration sound localization algorithm may be determined in two ways.

In one approach, the second ILD-ITD calibration sound localization algorithm is determined by equation (16):

S_R-ILD-ITD(n)=β*R(n-δ’) (16)

Wherein S _R-ILD-ITD (n) is a second ILD-ITD calibration sound effect localization algorithm, which may also be called right channel height filter, and δ' is a time processing parameter. Alternatively, in the case where the radius of the head of the user is not acquired, the time processing parameter is δ. And (2) calibrating the beta-R (n) based on the time processing parameter to obtain a second ILD-ITD calibration sound effect positioning algorithm.

The second sound effect positioning algorithm comprises a first ILD-ITD calibration sound effect positioning algorithm and a second ILD-ITD calibration sound effect positioning algorithm, wherein the first ILD-ITD calibration sound effect positioning algorithm is the second ILD calibration sound effect positioning algorithm, and the second ILD-ITD calibration sound effect positioning algorithm is obtained by performing ITD calibration on the first ILD calibration sound effect positioning algorithm based on a horizontal azimuth angle between the head of a user and terminal equipment, a preset human head radius and sound velocity.

In another approach, the second ILD-ITD calibration sound localization algorithm is determined by equation (17):

S_L-ILD-ITD(n)=α*R(n+δ’) (17)

Wherein S _L-ILD-ITD (n) is a second ILD-ITD calibration sound effect localization algorithm, which may also be referred to as a left channel height filter, and δ' is a time processing parameter. Alternatively, in the case where the radius of the head of the user is not acquired, the time processing parameter is δ. And (c) performing calibration on the alpha (n) based on the time processing parameter to obtain a second ILD-ITD calibration sound effect positioning algorithm.

And fourthly, performing ITD calibration on the calibrated first sound effect positioning algorithm, and performing ILD calibration to obtain a second sound effect positioning algorithm.

As an alternative embodiment, the ILD calibration and the ITD calibration are carried out on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm, wherein the ILD calibration is carried out on the calibrated first sound effect positioning algorithm to obtain a fourth sound effect positioning algorithm based on a horizontal azimuth angle between the head of a user and terminal equipment, a preset human head radius and sound velocity, and the ILD calibration is carried out on the fourth sound effect positioning algorithm based on the horizontal azimuth angle to obtain the second sound effect positioning algorithm.

It should be understood that performing ILD calibration and ITD calibration on the calibrated first sound effect positioning algorithm may be understood as performing ILD calibration and ITD calibration on parameters in the calibrated first sound effect positioning algorithm, where the parameters obtained after calibration are parameters in the second sound effect positioning algorithm. The process of sequentially performing ITD calibration and ITD calibration on the parameters of the calibrated first sound effect positioning algorithm is similar to the third case, and reference may be made to the above description, which is not repeated here.

As an alternative embodiment, S402 may be implemented specifically by inputting auricle information and the first sound effect localization algorithm into the neural network model and outputting the second sound effect localization algorithm.

In a specific example, the auricle information includes auricle length, auricle width, auricle thickness, and auricle area, and the determination process of the second sound effect localization algorithm may be as shown in fig. 8. Fig. 8 is a schematic diagram illustrating a determining process of a second sound effect positioning algorithm according to an embodiment of the present application. As shown in fig. 8, after the parameters of the first sound effect positioning algorithm and the auricle information are input into the neural network model, the parameters of the second sound effect positioning algorithm are output, and the terminal device can determine the second sound effect positioning algorithm based on the parameters of the second sound effect positioning algorithm. The second sound effect positioning algorithm may be a finite length unit impulse response (finite impulse response, FIR) filter with preset points.

As an alternative embodiment, S402 may be implemented by inputting auricle information into an auricle information encoding algorithm to obtain encoded auricle information, inputting parameters of a first sound effect positioning algorithm into a sound effect positioning algorithm encoding algorithm to obtain encoded parameters, and determining a second sound effect positioning algorithm based on the encoded auricle information and the encoded parameters.

In one possible implementation, the neural network model may include an auricle information encoding algorithm, an acoustic localization algorithm encoding algorithm, and a decoding algorithm. The process of determining the second sound effect localization algorithm may be as shown in fig. 9. Fig. 9 is a schematic diagram illustrating a determining process of another second sound effect positioning algorithm according to an embodiment of the present application. As shown in fig. 9, parameters of the first sound effect positioning algorithm are input to a sound effect positioning algorithm coding algorithm to obtain coded parameters, the coded parameters are further input to a generating network, parameter data output by the generating network is input to a decoding algorithm, auricle information is input to an auricle information coding algorithm to obtain coded auricle information, the coded auricle information is input to a decoding algorithm, the decoding algorithm decodes the parameter data output by the generating network and the coded auricle information to obtain parameters of a second sound effect positioning algorithm, and the terminal equipment can determine the second sound effect positioning algorithm based on the parameters of the second sound effect positioning algorithm.

It should be appreciated that the generation network may be a convolutional neural network model, for example, a deep learning model (transducer model) based on a self-attention mechanism, or the like.

Fig. 10 is a schematic diagram illustrating a determining process of a second sound effect positioning algorithm according to an embodiment of the present application. As shown in FIG. 10, based on the virtual azimuth, an HRIR data set is obtained, amplitude spectrum data is obtained after Fourier transformation, averaging, envelope extraction and Fourier inversion are sequentially carried out on the HRIR data set, a first sound effect positioning algorithm is obtained based on the amplitude spectrum data and linear compensation phase data, parameters of the first sound effect positioning algorithm and auricle information are input into a neural network model to obtain parameters of the calibrated first sound effect positioning algorithm, ILD calibration is carried out on the parameters of the calibrated first sound effect positioning algorithm based on azimuth information to obtain parameters of a third sound effect positioning algorithm, calibration is carried out on the parameters of the third sound effect positioning algorithm based on azimuth information and human head information to obtain parameters of a second sound effect positioning algorithm, and the terminal equipment can obtain the second sound effect positioning algorithm based on the parameters of the second sound effect positioning algorithm.

The virtual azimuth may be a preset height, or an elevation angle corresponding to the preset height. The method comprises the steps of carrying out ILD calibration on parameters in a first sound effect positioning algorithm after calibration, wherein azimuth information is a horizontal azimuth angle of a user's head relative to terminal equipment, and carrying out ITD calibration on parameters in a third sound effect positioning algorithm, wherein azimuth information is a horizontal azimuth angle between the user's head and the terminal equipment, and human head information is a preset human head radius or a preset human head radius and a user's head radius.

It should be understood that the sequence numbers of the above processes do not mean the order of execution, and the execution order of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation process of the embodiments of the present application.

As an alternative embodiment S401 may be implemented in particular by acquiring the auricle information in case the external spatial audio switch of the terminal device is in an on state.

As an alternative embodiment, the method 400 further includes displaying a first interface including a first selection button for turning the exterior spatial audio switch on or off, and setting the state of the exterior spatial audio switch to an on state or an off state based on a user's selection operation of the first selection button.

It should be appreciated that the selection operation of the first selection button by the user may be any preset input operation, such as clicking, sliding, or the like. Under the condition that the external space audio switch is in the off state, the terminal equipment does not process the audio data to be played, so that the audio to be played is directly externally played, the audio heard by the user lacks of a sense of height and a sense of stereo, under the condition that the external space audio switch is in the on state, the terminal equipment can calibrate the sound effect positioning algorithm, thereby improving the accuracy of the sound effect positioning algorithm, being beneficial to restoring the overall tone of the audio, and improving the audio and video experience of the user.

It should be understood that the external space audio switch may be any other name, and the present application is not limited in particular.

In one specific example, the first interface may be as shown in fig. 11. Fig. 11 is an interface schematic diagram of a notebook computer displaying a first interface according to an embodiment of the present application. As shown in fig. 11, the first interface displayed on the notebook computer includes a first selection button 1101, and the user may click the first selection button 1101 when selecting the first selection button 1101. Based on the selection operation of the first selection button 1101 by the user, the state of the exterior space audio switch can be changed.

As an alternative embodiment, in the case that the external space audio switch is in an on state, a second interface is displayed, the second interface comprises a second selection button for enabling or disabling the personalized sensing switch, the state of the personalized sensing switch is set to be in the on state or the off state based on the selection operation of the second selection button by a user, in the case that the personalized sensing switch is in the on state, a third interface is displayed, the third interface comprises a third selection button for selecting a camera acquisition function, a fourth selection button for selecting a manual input function of the user, a fifth selection button for selecting acquisition auricle information and a sixth selection button for selecting acquisition head information, the head information comprises the head radius of the user, the fourth interface or the fifth interface is displayed based on the first selection operation and the second selection operation of the user, the fourth interface comprises an image of the user acquired by the terminal device through the camera, the fifth interface comprises an input area of auricle information or the head information, and the first selection operation is a selection operation of the third selection button or the fourth selection button is a filling operation of the fifth selection button or the fifth selection button.

It should be understood that the selection operation of the selection button by the user may be any preset input operation, such as clicking, sliding, and the like. The terminal equipment can acquire auricle information and calibrate the first sound effect positioning algorithm based on the auricle information under the condition that the personalized sensing switch is in an off state, thereby improving the accuracy of the sound effect positioning algorithm, being beneficial to restoring the overall tone of the audio and improving the audio and video experience of users.

It should be understood that the personalized sensing switch may be any other name, and the application is not limited in particular.

In one particular example, the second interface may be as shown in fig. 12. Fig. 12 is an interface schematic diagram of a notebook computer displaying a second interface according to an embodiment of the present application. As shown in fig. 12, the second interface displayed on the notebook computer includes a first selection button and a second selection button 1201, and when the external space audio switch is in an on state, the state of the personalized sensing switch can be changed based on the selection operation of the second selection button 1201 by the user. The user's selection operation of the second selection button 1201 may be clicking the second selection button 1201.

It should be understood that the third interface may be interface 2, the fourth interface may be interface 3 and/or interface 4, and the fifth interface may be interface 5 and/or interface 6.

In one possible implementation manner, before the third interface is displayed, in the case that the personalized awareness switch is in an on state, displaying a sixth interface, wherein the sixth interface comprises a third selection button for selecting a camera acquisition function and a fourth selection button for selecting a manual input function of a user, and displaying the third interface based on the selection operation of the third selection button and the fourth selection button by the user.

It is to be understood that the sixth interface may be interface 1 described below.

In one possible implementation, before displaying the third interface, in a case that the personalized awareness switch is in an on state, displaying a seventh interface, wherein the seventh interface comprises a fifth selection button for selecting to collect auricle information and a sixth selection button for selecting to collect head information, and displaying the third interface based on selection operations of the fifth selection button and the sixth selection button by a user.

It should be understood that the seventh interface is similar to the sixth interface, and only the third selection button and the fourth selection button in the sixth interface are replaced by the fifth selection button and the sixth selection button, which are not described herein.

The following describes in detail the interface displayed by the terminal device based on different selection operations of the user when the personalized sensing switch is in the on state with reference to fig. 13 to 19.

In the case that the personalized sensing switch is in the on state, the interface 1 is displayed, the interface 1 comprises a third selection button for selecting a camera acquisition function and a fourth selection button for selecting a manual input function of a user, and the camera acquisition mode or the manual input mode is selected based on the selection operation of the third selection button or the fourth selection button by the user.

It should be understood that, in the case where the personalized awareness switch is in the on state, one of the third selection button and the fourth selection button may be in an unselected state, and the other in a selected state, the user selects the manner in which the camera collects or the manner in which the manual input collects auricle information and/or head information of the user, the head information including a head radius of the user. The third selection button and the fourth selection button cannot be simultaneously selected, that is, the user cannot simultaneously select the mode of camera acquisition and the mode of manual input.

In a specific example, interface 1 may be as shown in fig. 13. Fig. 13 is an interface schematic diagram of a display interface 1 of a notebook computer according to an embodiment of the present application. As shown in fig. 13, the interface 1 displayed on the notebook computer includes a third selection button 1301 and a fourth selection button 1302, where, when the personalized sensing switch is in an on state, based on a selection operation of the third selection button 1301 by a user, the terminal device acquires auricle information and/or head information of the user in a manner of capturing by a camera, and based on a selection operation of the fourth selection button 1302 by the user, the terminal device acquires auricle information and/or head information of the user in a manner of manually inputting.

In a possible implementation manner, if the user clicks or slides the fourth selection button in a case where the user has selected the manner of capturing by the camera through the third selection button, the third selection button will be adjusted to the off state, and the fourth selection button is adjusted to the on state, that is, the manner of selecting by the user is adjusted to the manner of manual input. And vice versa, namely, if the user clicks or slides the third selection button in the case that the user has selected the manual input mode through the fourth selection button, the fourth selection button is adjusted to be in the off state, and the third selection button is adjusted to be in the on state, namely, the mode of selection by the user is adjusted to be the mode of camera acquisition.

In the case where the user selects the third selection button, the interface 2 is displayed, and the interface 2 includes a fifth selection button for selecting the collection of auricle information and a sixth selection button for selecting the collection of head information, and selects the collection of auricle information or the collection of head information based on a selection operation of the fifth selection button or the sixth selection button by the user.

It should be appreciated that one of the fifth selection button and the sixth selection button may be in an unselected state, and the other in a selected state, i.e., the user may choose to collect auricle information or collect head information. The fifth selection button and the sixth selection button cannot be simultaneously selected, that is, the user cannot simultaneously select to collect auricle information and collect head information.

In one specific example, interface 2 is shown in fig. 14. Fig. 14 is an interface schematic diagram of a display interface 2 of a notebook computer according to an embodiment of the present application. As shown in fig. 14, the interface 2 displayed on the notebook computer includes a fifth selection button 1401 and a sixth selection button 1402. Based on the user's selection operation of the fifth selection button 1401, the terminal device will acquire auricle information, and based on the user's selection operation of the sixth selection button 1402, the terminal device will acquire head information.

In the case where the user selects the third selection button and selects the fifth selection button, the interface 3 is displayed, the interface 3 including an image of the user acquired by the terminal device through the camera, the image of the user including an ear of the user.

It should be understood that the terminal device may analyze both ears of the user to determine pinna information, or the terminal device may analyze one ear of the user to determine pinna information.

In a specific example, the interface 3 may be as shown in fig. 15. Fig. 15 is an interface schematic diagram of a display interface 3 of a notebook computer according to an embodiment of the present application. As shown in fig. 15, the interface 3 includes an image of the user acquired by the notebook computer through the camera, and the image of the user includes an ear of the user. The notebook computer obtains auricle information by analyzing and processing the ears of the user.

In case the user selects the third selection button and selects the sixth selection button, the interface 4 is displayed, the interface 4 comprising an image of the user, obtained by the terminal device via the camera, the user's head being included in the image of the user.

It should be understood that the terminal device may perform an analysis process on the user's head to determine the head information, where the head information includes the user's head radius.

In one specific example, interface 4 may be as shown in fig. 16. Fig. 16 is an interface schematic diagram of a display interface 4 of a notebook computer according to an embodiment of the present application. As shown in fig. 16, the interface 4 includes an image of the user acquired by the notebook computer through the camera, and the image of the user includes the head of the user. The notebook computer obtains head information by analyzing and processing the head of the user.

In the case where the user selects the fourth selection button and selects the fifth selection button, the interface 5 is displayed, and the interface 5 includes an input area in which auricle information is filled.

It should be understood that the input area for filling in the auricle information may include at least one of an area for filling in the auricle length, an area for filling in the auricle width, an area for filling in the auricle area, or an area for filling in the auricle thickness.

In a specific example, the interface 5 may be as shown in fig. 17. Fig. 17 is an interface schematic diagram of a display interface 5 of a notebook computer according to an embodiment of the present application. In the case where the user selects the manual input mode by the fourth selection button and selects the collection of auricle information by the fifth selection button, the notebook computer displays the interface 5. Interface 5 as shown in fig. 17, the interface 5 includes an input area 1701 for filling in auricle information. In the case where the auricle information includes the auricle length, the auricle width, the auricle area, and the auricle thickness, each type of auricle information corresponds to one input area 1701, respectively.

In case the user selects the fourth selection button and selects the sixth selection button, the interface 6 is displayed, the interface 6 comprising an input area for filling in header information, the header information further comprising a horizontal distance of the user's head from the terminal device and/or a vertical distance of the user's head from the terminal device.

Alternatively, a horizontal azimuth angle between the user's head and the terminal device can be determined based on the horizontal distance and the vertical distance described above.

It should be understood that in the case where the head information includes the radius of the user's head, the horizontal distance described above, and the vertical distance described above, the input area where the head information is filled includes an input area where the radius of the head is filled, an input area where the horizontal distance is filled, and an input area where the vertical distance is filled.

In one specific example, interface 6 may be as shown in fig. 18. Fig. 18 is an interface schematic diagram of a display interface 6 of a notebook computer according to an embodiment of the present application. In the case where the user selects the manual input mode by the fourth selection button and selects the acquisition head information by the sixth selection button, the notebook computer displays the interface 6. Interface 6 as shown in fig. 18, the interface 6 includes an input area 1801 for filling in header information. In the case where the header information includes a header radius, a horizontal distance of the user's header from the terminal device, and a vertical distance of the user's header from the terminal device, the input area 1801 includes an input area for filling the header radius, an input area for filling the horizontal distance of the user's header from the terminal device, and an input area for filling the vertical distance of the user's header from the terminal device.

It should be understood that, the interface corresponding to the selection of the third selection button and the selection of the sixth selection button by the user, the selection of the fourth selection button and the selection of the fifth selection button by the user, and the selection of the fourth selection button and the selection of the sixth selection button are similar to the interface 2 shown in fig. 14, and only different from the buttons selected in the interface 2, reference may be made to the interface 2, and details are omitted herein.

Fig. 19 is a schematic diagram of a top view of a notebook computer used by a user according to an embodiment of the present application. As shown in fig. 19, when the user uses the notebook computer, the center of the user's head is point a, and the center of the speaker of the notebook computer is point B on the keyboard 1901. The straight line passing through the point B and parallel to the display screen of the notebook computer is a straight line AB. The vertical distance between the user's head and the terminal device is the distance from the point O to the straight line AB, i.e. the distance h, and the horizontal distance between the user's head and the terminal device is the distance l. The horizontal azimuth angle may be determined by the following formula θ '=arctan (l/h), where θ' is the horizontal azimuth angle, l is the horizontal distance, and h is the vertical distance.

It should be understood that the user may select the manner of acquiring the auricle information and/or the head information based on the selection button, and then select to acquire the auricle information or the head information based on the selection button, or may select to acquire the auricle information or the head information based on the selection button, and then select the manner of acquiring the auricle information and/or the head information based on the selection button.

It should be understood that the interface diagrams shown in fig. 11 to fig. 18 are only examples, and the interface displayed by the terminal device may include more or less content, and the terminal device may be a mobile phone, a tablet computer, etc., and the present application is not limited to the content included in the interface and the specific form of the terminal device.

The audio playback method according to the embodiment of the present application is described in detail above with reference to fig. 4 to 19, and the audio playback apparatus according to the embodiment of the present application is described in detail below with reference to fig. 20 and 21.

Fig. 20 is a schematic block diagram of an audio playback apparatus 2000 according to an embodiment of the present application. As shown in fig. 20, the apparatus 2000 includes an acquisition module 2001 and a processing module 2002.

In a possible implementation manner, the apparatus 2000 is configured to implement the steps corresponding to the terminal device in the method 400.

The device comprises an acquisition module 2001 for acquiring auricle information of a user, wherein the auricle information comprises at least one of auricle length, auricle width, auricle thickness or auricle area, a processing module 2002 for determining a second sound effect positioning algorithm based on the auricle information and a first sound effect positioning algorithm, wherein the first sound effect positioning algorithm is used for simulating a sound signal with a preset space orientation, the first sound effect positioning algorithm is obtained based on amplitude spectrum data, the amplitude spectrum data is obtained by carrying out Fourier transform, averaging, inverse Fourier transform and envelope extraction on a plurality of groups of head related impact response HRIR data, and the audio data to be played is processed through the second sound effect positioning algorithm to obtain processed audio data which is played in an external playing mode.

Optionally, the first sound localization algorithm is derived based on the linear compensated phase data and amplitude spectrum data of the speaker of the apparatus 2000.

Optionally, the processing module 2002 is specifically configured to calibrate the first sound effect positioning algorithm based on auricle information to obtain a calibrated first sound effect positioning algorithm, and perform binaural time difference ITD calibration and/or binaural intensity difference ILD calibration on the calibrated first sound effect positioning algorithm to obtain a second sound effect positioning algorithm.

Optionally, the processing module 2002 is specifically configured to perform ILD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth of the head of the user relative to the device 2000, to obtain a third sound effect positioning algorithm, and determine the third sound effect positioning algorithm as the second sound effect positioning algorithm.

Optionally, the processing module 2002 is specifically configured to perform ILD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth angle of the head of the user relative to the device 2000 to obtain a third sound effect positioning algorithm, and perform ITD calibration on the third sound effect positioning algorithm based on the horizontal azimuth angle between the head of the user and the device 2000, a preset human head radius and a sound velocity to obtain a second sound effect positioning algorithm.

The processing module 2002 is specifically configured to determine a target left adjustment gain parameter from a plurality of left adjustment gain parameters, determine a target right adjustment gain parameter from a plurality of right adjustment gain parameters, and a preset correspondence relationship between the plurality of angles, the plurality of left adjustment gain parameters and the plurality of right adjustment gain parameters, take the target left adjustment gain parameter as a coefficient of the calibrated first sound effect positioning algorithm to obtain a first ILD calibration sound effect positioning algorithm, and take the target right adjustment gain parameter as a coefficient of the calibrated first sound effect positioning algorithm to obtain a second ILD calibration sound effect positioning algorithm.

Optionally, the processing module 2002 is specifically configured to process the audio data to be played through a first ILD calibration sound effect positioning algorithm to obtain first processed audio data, process the audio data to be played through a second ILD calibration sound effect positioning algorithm to obtain second processed audio data, and play the first processed audio data and the second processed audio data through an external play mode.

Optionally, the processing module 2002 is specifically configured to perform ITD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth angle between the head of the user and the device 2000, a preset human head radius and a sound velocity, so as to obtain a second sound effect positioning algorithm.

Optionally, the second sound effect positioning algorithm includes a first ITD calibration sound effect positioning algorithm and a second ITD calibration sound effect positioning algorithm, where the first ITD calibration sound effect positioning algorithm is a calibrated first sound effect positioning algorithm, and the processing module 2002 is specifically configured to determine a first parameter based on a horizontal azimuth, a preset human head radius and a sound velocity, and calibrate the calibrated first sound effect positioning algorithm based on the first parameter to obtain the second ITD calibration sound effect positioning algorithm.

Optionally, the processing module 2002 is specifically configured to use the first parameter as a time processing parameter in the calibrated first sound effect positioning algorithm to obtain a second ITD calibrated sound effect positioning algorithm.

Optionally, the acquiring module 2001 is further configured to acquire a head radius of the user, the processing module 2002 is further configured to determine a second parameter based on the horizontal azimuth, the head radius of the user, and the sound velocity, and the processing module 2002 is specifically configured to use a difference between the first parameter and the second parameter as a time processing parameter in the calibrated first sound effect positioning algorithm to obtain a second ITD calibrated sound effect positioning algorithm.

Optionally, the acquiring module 2001 is specifically configured to acquire the head radius when the external spatial audio switch of the device 2000 is in an on state.

Optionally, the processing module 2002 is specifically configured to input auricle information into an auricle information encoding algorithm to obtain encoded auricle information, input parameters of a first sound effect positioning algorithm into a sound effect positioning algorithm encoding algorithm to obtain encoded parameters, and determine a second sound effect positioning algorithm based on the encoded auricle information and the encoded parameters.

Optionally, the acquiring module 2001 is specifically configured to acquire auricle information when the external spatial audio switch of the device 2000 is in an on state.

Optionally, the device 2000 further comprises a display module 2003 for displaying a first interface, the first interface comprising a first selection button for turning on or off the external air space audio switch, and the processing module 2002 is further configured to set the state of the external air space audio switch to an on state or an off state based on a selection operation of the first selection button by a user.

Optionally, the display module 2003 is further configured to display a second interface when the external space audio switch is in an on state, the second interface including a second selection button for turning on or off the personalized sense switch, the processing module 2002 is further configured to set the state of the personalized sense switch to an on state or an off state based on a selection operation of the second selection button by a user, the display module 2003 is further configured to display a third interface when the personalized sense switch is in the on state, the third interface including a third selection button for selecting a camera capturing function, a fourth selection button for selecting a manual input function by the user, a fifth selection button for selecting a capturing auricle information, and a sixth selection button for selecting a capturing head information, the head information including a head radius of the user, and display the fourth interface or the fifth interface including an image of the user captured by the device 2000 through the camera based on the first selection operation and the second selection operation by the user, the fifth interface including an input area for filling in the auricle information or the head information, the first selection operation being a selection operation of the third selection button or the fourth selection button for selecting the auricle information, and the fifth selection operation being a selection operation of the fifth selection button for selecting the head information.

The apparatus 2000 has a function of implementing the corresponding steps executed by the terminal device in the method, and the function may be implemented by hardware or implemented by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In an embodiment of the present application, the device 2000 in FIG. 20 may also be a chip, such as a SOC. Correspondingly, the processing module 2002 may be a transceiver circuit of the chip, which is not limited herein.

Fig. 21 is a schematic block diagram of an audio playback apparatus 2100 according to an embodiment of the present application. The apparatus 2100 includes a processor 2101, a transceiver 2102, and a memory 2103. Wherein the processor 2101, the transceiver 2102 and the memory 2103 are in communication with each other via an internal connection path, the memory 2103 is configured to store instructions, and the processor 2101 is configured to execute the instructions stored in the memory 2103 to control the transceiver 2102 to transmit signals and/or receive signals.

It should be understood that the apparatus 2100 may be specifically a terminal device in the foregoing embodiment, and may be configured to perform each step and/or flow corresponding to the terminal device in the foregoing method embodiment. The memory 2103 may optionally include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type. The processor 2101 may be used to execute instructions stored in memory and, when the processor 2101 executes instructions stored in memory, the processor 2101 is used to perform the steps and/or processes of the method embodiments described above. The transceiver 2102 may include a transmitter that may be used to implement various steps and/or processes for performing transmit actions corresponding to the transceiver described above, and a receiver that may be used to implement various steps and/or processes for performing receive actions corresponding to the transceiver described above.

It is to be appreciated that in embodiments of the application, the processor may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor executes instructions in the memory to perform the steps of the method described above in conjunction with its hardware. To avoid repetition, a detailed description is not provided herein.

The present application also provides a computer readable storage medium for storing a computer program for implementing the method shown in the above-described method embodiments.

The present application also provides a computer program product comprising a computer program (which may also be referred to as code, or instructions) which, when run on a computer, performs the method as shown in the method embodiments described above.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific implementation of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art may easily think about changes or substitutions within the technical scope of the embodiments of the present application, and all changes and substitutions are included in the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. An audio playback method, comprising:

acquiring auricle information of a user, wherein the auricle information comprises at least one of auricle length, auricle width, auricle thickness or auricle area;

Determining a second sound effect positioning algorithm based on the auricle information and a first sound effect positioning algorithm, wherein the first sound effect positioning algorithm is used for simulating sound signals with preset space orientations, the first sound effect positioning algorithm is obtained based on amplitude spectrum data, and the amplitude spectrum data are obtained by carrying out Fourier transformation, averaging, envelope extraction and inverse Fourier transformation on multiple groups of head related impact response HRIR data;

And processing the audio data to be played through the second sound effect positioning algorithm to obtain processed audio data, and playing the processed audio data in an external playing mode.

2. The method of claim 1, wherein the first sound localization algorithm is derived based on the amplitude spectrum data and linear compensation phase data of a speaker of the terminal device.

3. The method according to claim 1 or 2, wherein said determining a second sound effect localization algorithm based on said auricle information and a first sound effect localization algorithm comprises:

Calibrating the first sound effect positioning algorithm based on the auricle information to obtain a calibrated first sound effect positioning algorithm;

And carrying out binaural time difference (ITD) calibration and/or binaural intensity difference (ILD) calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm.

4. A method according to claim 3, wherein said performing ILD calibration on said calibrated first sound localization algorithm results in said second sound localization algorithm, comprising:

Performing ILD calibration on the calibrated first sound effect positioning algorithm based on a horizontal azimuth angle between the head of the user and the terminal equipment to obtain a third sound effect positioning algorithm;

And determining the third sound effect positioning algorithm as the second sound effect positioning algorithm.

5. A method according to claim 3, wherein said performing ITD calibration and ILD calibration on said calibrated first sound localization algorithm results in said second sound localization algorithm, comprising:

and performing ITD calibration on the third sound effect positioning algorithm based on the horizontal azimuth, the preset human head radius and the sound velocity to obtain the second sound effect positioning algorithm.

6. The method of claim 4 or 5, wherein the second sound localization algorithm comprises a first ILD calibrated sound localization algorithm and a second ILD calibrated sound localization algorithm;

The ILD calibration of the calibrated first sound effect positioning algorithm comprises the following steps:

Determining a target left adjustment gain parameter from a plurality of left adjustment gain parameters and a target right adjustment gain parameter from a plurality of right adjustment gain parameters based on the horizontal azimuth angle and a preset corresponding relation, wherein the preset corresponding relation comprises a plurality of angles, the plurality of left adjustment gain parameters and a corresponding relation among the plurality of right adjustment gain parameters;

Taking the target left adjustment gain parameter as a coefficient of the calibrated first sound effect positioning algorithm to obtain the first ILD calibration sound effect positioning algorithm;

and taking the target right adjustment gain parameter as a coefficient of the calibrated first sound effect positioning algorithm to obtain the second ILD calibrated sound effect positioning algorithm.

7. The method of claim 6, wherein the processing the audio data to be played through the second sound effect positioning algorithm to obtain processed audio data, and playing the processed audio data through an external playing mode includes:

Processing the audio data to be played through the first ILD calibration sound effect positioning algorithm to obtain first processed audio data;

Processing the audio data to be played through the second ILD calibration sound effect positioning algorithm to obtain second processed audio data;

And playing the first processed audio data and the second processed audio data in an outward playing mode.

8. A method according to claim 3, wherein said performing ITD calibration of said calibrated first sound localization algorithm results in said second sound localization algorithm, comprising:

And performing ITD (integrated time division multiplexing) calibration on the calibrated first sound effect positioning algorithm based on the horizontal azimuth angle between the head of the user and the terminal equipment, the preset human head radius and the sound velocity to obtain the second sound effect positioning algorithm.

9. The method of claim 8, wherein the second sound localization algorithm comprises a first ITD calibration sound localization algorithm and a second ITD calibration sound localization algorithm, the first ITD calibration sound localization algorithm being the calibrated first sound localization algorithm;

performing ITD calibration on the calibrated first sound effect positioning algorithm to obtain the second sound effect positioning algorithm, wherein the method comprises the following steps:

determining a first parameter based on the horizontal azimuth, the preset human head radius, and the sound speed;

and calibrating the calibrated first sound effect positioning algorithm based on the first parameter to obtain the second ITD calibrated sound effect positioning algorithm.

10. The method of claim 9, wherein calibrating the calibrated first sound localization algorithm based on the first parameter results in the second ITD calibrated sound localization algorithm, comprising:

And taking the first parameter as a time processing parameter in the calibrated first sound effect positioning algorithm to obtain the second ITD calibrated sound effect positioning algorithm.

11. The method according to claim 9, wherein the method further comprises:

Acquiring the head radius of the user;

determining a second parameter based on the horizontal azimuth, the head radius of the user, and the speed of sound;

The calibrating the calibrated first sound effect positioning algorithm based on the first parameter to obtain the second ITD calibrated sound effect positioning algorithm comprises the following steps:

And taking the difference value between the first parameter and the second parameter as a time processing parameter in the calibrated first sound effect positioning algorithm to obtain the second ITD calibrated sound effect positioning algorithm.

12. The method of claim 11, wherein the obtaining the head radius of the user comprises:

And acquiring the head radius under the condition that an external space audio switch of the terminal equipment is in an on state.

13. The method according to any one of claims 1 to 12, wherein said determining a second sound effect localization algorithm based on said pinna information and a first sound effect localization algorithm comprises:

inputting the auricle information into the auricle information coding algorithm to obtain coded auricle information;

Inputting the parameters of the first sound effect positioning algorithm to the sound effect positioning algorithm coding algorithm to obtain coded parameters;

and determining the second sound effect positioning algorithm based on the encoded auricle information and the encoded parameters.

14. The method according to any one of claims 1 to 13, wherein the acquiring auricle information of the user comprises:

and acquiring auricle information under the condition that an external space audio switch of the terminal equipment is in an on state.

15. The method of claim 14, wherein the method further comprises:

Displaying a first interface, wherein the first interface comprises a first selection button for opening or closing the external air space audio switch;

And setting the state of the external space audio switch to be an on state or an off state based on the selection operation of the first selection button by the user.

16. The method of claim 15, wherein the method further comprises:

displaying a second interface under the condition that the external space audio switch is in an on state, wherein the second interface comprises a second selection button for turning on or off the personalized sensing switch;

Setting the state of the personalized sensing switch to be an on state or an off state based on the selection operation of the second selection button by the user;

displaying a third interface under the condition that the personalized sensing switch is in an on state, wherein the third interface comprises a third selection button for selecting a camera acquisition function, a fourth selection button for selecting a manual input function of a user, a fifth selection button for selecting acquisition of auricle information and a sixth selection button for selecting acquisition of head information, and the head information comprises the head radius of the user;

Displaying a fourth interface or a fifth interface based on the first selection operation and the second selection operation of the user, wherein the fourth interface comprises an image of the user acquired by the terminal device through a camera, the fifth interface comprises an input area for filling in auricle information or head information, the first selection operation is a selection operation of the third selection button or the fourth selection button, and the second selection operation is a selection operation of the fifth selection button or the sixth selection button.

17. An audio playback device comprising means for performing the method of any one of claims 1 to 16.

18. An audio playback apparatus comprising a processor coupled to a memory for storing a computer program which, when invoked by the processor, causes the apparatus to perform the method of any one of claims 1 to 16.

19. A computer readable storage medium storing a computer program comprising instructions for implementing the method of any one of claims 1 to 16.

20. A computer program product comprising computer program code embodied therein, which when run on a computer causes the computer to implement the method of any of claims 1 to 16.