CN119541512A

CN119541512A - Sound rendering method and electronic equipment

Info

Publication number: CN119541512A
Application number: CN202311099784.3A
Authority: CN
Inventors: 丁玉江; 陈家熠; 朱梦尧; 黎椿键; 石超宇; 吴修坤; 罗友; 王春鹏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2025-02-28

Abstract

The present application provides a method for rendering sound, which is applied to a first electronic device, and the method includes: displaying a first virtual scene; detecting a first voice input of a user, obtaining a first sound, the first sound including a first direct sound of the user; generating a first rendered sound according to the first direct sound and the first virtual scene, wherein the first rendered sound includes a first virtual reflected sound, and the first virtual reflected sound is a sound formed by the first direct sound reflecting in the first virtual scene; playing the first rendered sound. In an embodiment of the present application, the electronic device can obtain the user's voice, and then render the user's voice according to the first virtual scene to generate a first rendered sound, and play the first rendered sound, the first rendered sound including a first virtual reflected sound brought by the first virtual scene, and after hearing the first rendered sound, the user can better feel that he is in the first virtual scene, which enhances the user's sense of immersion and helps to enhance the user's experience.

Description

Sound rendering method and electronic equipment

Technical Field

The embodiment of the application relates to the field of electronic equipment, and more particularly relates to a method for rendering sound and electronic equipment.

Background

At present, when an electronic device runs services such as VR, AR, MR and the like, playing sound adopts technologies such as spatial audio and the like, and the sound in the service is rendered to a spatial effect matched with a user watching picture or a virtual scene selected by the user. However, in some services, the user needs to interact with the object in the virtual scene, at this time, the user hears the direct sound and the reflected sound of the speech, the reflected sound is associated with the real scene where the user is located, and the real scene may be different from the virtual scene, which will cause the immersion experience of the user to be reduced, and the experience of the user is reduced. Therefore, how to render sound is a technical problem to be solved.

Disclosure of Invention

The application provides a method for rendering sound and electronic equipment, which can acquire the sound of a user, then render the sound of the user according to a virtual scene to generate the rendered sound, and play the rendered sound, wherein the rendered sound comprises virtual reflected sound brought by the virtual scene, so that the user can better feel the virtual scene in the user after hearing the rendered sound, the immersion of the user is improved, and the user experience is improved.

In a first aspect, a method for rendering sound is provided, and is applied to a first electronic device, the method comprises the steps of displaying a first virtual scene, detecting first voice input of a user, obtaining first sound, generating first rendering sound according to the first reaching sound and the first virtual scene, wherein the first rendering sound comprises first virtual reflected sound, the first virtual reflected sound is formed by reflecting the first direct sound in the first virtual scene, and playing the first rendering sound.

In the embodiment of the application, the electronic equipment can acquire the sound of the user, then render the sound of the user according to the first virtual scene to generate the first rendered sound, and play the first rendered sound, because the first rendered sound is rendered according to the first virtual scene, the first virtual reflected sound brought by the first virtual scene is included in the first rendering sound, and after the first rendering sound is heard, the user can better feel the first virtual scene in the body, so that the immersion of the user is improved, and the user experience is improved.

With reference to the first aspect, in some implementations of the first aspect, the generating a first rendered sound according to the first up-to-date sound and the first virtual scene includes determining a first acoustic parameter according to the first virtual scene, and rendering the first up-to-date sound using the first acoustic parameter to generate the first rendered sound.

With reference to the first aspect, in some implementation manners of the first aspect, the method further includes determining that a preset condition is met, switching the first acoustic parameter to a second acoustic parameter, detecting a second voice input of the user, acquiring a second sound, wherein the second sound includes a second direct sound, rendering the second direct sound by using the second acoustic parameter to generate a second rendered sound, and playing the second rendered sound.

In the embodiment of the application, the positions of the virtual scene and the sound source are not invariable, and when the positions of the virtual scene, the sound source and the radiation direction of the sound source are changed, the electronic equipment can render the sound of the user based on the new acoustic parameters to play the second rendering sound, so that the user can better feel the change of the virtual scene after hearing the second rendering sound, the immersion of the user is improved, and the experience of the user is promoted.

With reference to the first aspect, in some implementations of the first aspect, the preset condition is that the first virtual scene is switched to the second virtual scene, or a position and/or a radiation direction of a sound source in the first virtual scene are changed, where the sound source is a direct sound of the user.

With reference to the first aspect, in some implementations of the first aspect, before the generating the first rendered sound according to the first up-to-date sound and the first virtual scene, the method further includes acquiring a sound intensity level of the first up-to-date sound, and the generating the first rendered sound according to the first up-to-date sound and the first virtual scene includes generating the first rendered sound according to the first up-to-date sound, the sound intensity level of the first up-to-date sound, and the first virtual scene.

It will be appreciated that the intensity level of the first rendered sound is associated with the intensity level of the first up-to-date sound. For example, the first direct sound has a sound intensity level of 60dB, the first rendered sound has a sound intensity level of 60dB, the first direct sound has a sound intensity level of 40dB, and the first rendered sound has a sound intensity level of 40dB.

In the embodiment of the application, the sound intensity level of the first rendered sound rendered by the electronic equipment is associated with the first reaching sound of the user, when the sound intensity level of the reaching sound of the user is increased, the first rendered sound is increased, and when the sound intensity level of the reaching sound of the user is reduced, the first rendered sound is also reduced, so that better immersion feeling can be brought to the user, and the experience of the user is improved.

With reference to the first aspect, in some implementations of the first aspect, the first virtual scene includes a first virtual object associated with the user, the method further includes obtaining a timbre of the first virtual object, and the generating a first rendered sound from the first up-sound and the first virtual scene includes generating a first rendered sound from the first direct sound, the timbre of the first virtual object, and the first virtual scene, the first rendered sound further including a first virtual direct sound generated from the first up-sound and the timbre of the first virtual object.

For example, taking a stage scene as an example, a user corresponds to a virtual singer #1 in the stage scene, when the user sings, the electronic device can acquire the direct sound of the user, render the direct sound of the user into a virtual direct sound with tone color of the virtual singer #1, render the direct sound by utilizing acoustic parameters of the stage scene to obtain a virtual reflected sound, and play the virtual direct sound and the virtual reflected sound by the electronic device.

In the embodiment of the application, when the user corresponds to the virtual object in the virtual scene, the electronic equipment can render the direct sound of the user according to the tone of the virtual object and the acoustic parameters of the virtual scene, so that the virtual direct sound with the tone being the tone of the virtual object and the virtual reflected sound based on the virtual scene are obtained, the immersion feeling of the user playing the virtual object can be improved, and the experience of the user is promoted.

With reference to the first aspect, in some implementation manners of the first aspect, the method further includes generating a first noise reduction sound, where the first noise reduction sound is used to cancel a first reflected sound and/or noise of a real scene where the user is located, where the first reflected sound is formed by reflecting the first up sound through the real scene, and playing the first noise reduction sound.

In the embodiment of the application, the electronic equipment can generate the first noise reduction which counteracts the reflected sound and noise, so that the interference of the reflected sound and noise of the real scene to the user is avoided, the user can be brought with better immersion feeling, and the experience of the user is promoted.

With reference to the first aspect, in some implementations of the first aspect, the generating the first noise reduction includes obtaining a third acoustic parameter, where the third acoustic parameter includes an acoustic parameter of the real scene, and generating the first noise reduction by rendering the first up sound according to the third acoustic parameter.

With reference to the first aspect, in certain implementations of the first aspect, the acquiring the third acoustic parameter includes acquiring the third acoustic parameter with one or more sensors including one or more of an inertial measurement unit IMU, a speaker, a microphone, an ultrasonic radar, an infrared radar, a millimeter wave radar, and a camera.

With reference to the first aspect, in certain implementations of the first aspect, the first acoustic parameter and the third acoustic parameter further comprise head related transfer function HRTF parameters of the user.

With reference to the first aspect, in some implementations of the first aspect, the first electronic device is connected to a headset, and the first electronic device plays the first rendered sound through the headset, and the method further includes detecting a type of the headset, and adding a first virtual direct sound to the first rendered sound when the headset is determined to be an in-ear headset or a semi-in-ear headset, where the first virtual direct sound is generated according to the first direct sound.

In the embodiment of the application, after the electronic equipment is connected with the earphone, the type of the earphone can be detected, and the first virtual direct sound can be played according to the type of the earphone, so that the defect of direct sound of the air guide path caused by the earphone can be overcome, better immersion feeling can be brought to a user, and the experience of the user can be improved.

With reference to the first aspect, in certain implementations of the first aspect, when the earphone is an in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, and when the earphone is a semi-in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, the first ratio being greater than the second ratio.

In the embodiment of the application, after the electronic equipment is connected with the earphone, the type of the earphone can be detected, the first virtual direct sound can be played according to the type of the earphone, the proportion of the first virtual direct sound and the first virtual reflected sound can be adjusted, the defect of the direct sound of the air guide path caused by the earphone can be made up, the user can be better immersed, and the experience of the user can be improved.

With reference to the first aspect, in some implementations of the first aspect, the first electronic device is connected to a headset, the first electronic device plays the first rendered sound through the headset, the first rendered sound further includes a first virtual direct sound, the first virtual direct sound is generated according to the first direct sound, and the method further includes detecting a type of the headset, and determining a ratio of the first virtual direct sound and the first virtual reflected sound according to the type of the headset.

For example, when the earphone is an in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, when the earphone is a semi-in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, and since the effect of isolating the air-guide direct sound in the in-ear mode is better than that of the semi-in-ear earphone, when the earphone is an in-ear earphone, the ratio of the first virtual direct sound added by the first electronic device should be greater than that of the first virtual direct sound added by the first electronic device when the earphone is a semi-in-ear earphone, and then the first ratio is greater than the second ratio.

With reference to the first aspect, in certain implementations of the first aspect, when the earphone is an in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, when the earphone is a semi-in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, and when the earphone is an open earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a third ratio, the first ratio is greater than the second ratio, and the second ratio is greater than the third ratio.

With reference to the first aspect, in some implementations of the first aspect, the first electronic device is connected to the second electronic device, and the second electronic device displays the first virtual scene, the method further includes receiving a third rendering sound sent by the second electronic device, and playing the third rendering sound.

For example, the first electronic device is connected to the second electronic device, and the first electronic device and the second electronic device each display a first virtual scene, the user #1 of the first electronic device corresponds to the avatar #1, the user #2 of the second electronic device corresponds to the avatar #2, and the avatar #2 is in front of the left of the avatar # 1. After the first electronic device receives the third rendering sound associated with the user #2 transmitted by the second electronic device, the third rendering sound may be played through a spatial audio technique, so that the user #1 perceives that the user #2 is in front of the left thereof.

In the embodiment of the application, the electronic equipment can also receive and play the third rendering sound of other users sent by other electronic equipment, so that the users can feel multi-user interaction, better immersion feeling can be brought to the users, and the experience of the users is improved.

With reference to the first aspect, in certain implementations of the first aspect, the first virtual scene includes a virtual scene sound, and the method further includes playing the virtual scene sound.

For example, taking a virtual scene as an example, the jungle scene includes a sound of a lion and a sound of a tiger, the first electronic device may play the sound of the lion and the sound of the tiger.

In the embodiment of the application, when the virtual scene comprises the virtual scene sound, the electronic equipment can play the virtual scene sound, so that the user can be better immersed, and the experience of the user can be improved.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes recording or live the first virtual scene and the first rendered sound simultaneously.

For example, taking a virtual scene as a live goods selling scene as an example, a user is a goods selling host, the real scene of the goods selling host is a smaller live broadcasting room, the virtual live goods selling scene is a market, and the first electronic device can live-broadcast the first virtual scene and the first rendering sound displayed by the virtual scene, so that other users watching the live broadcast feel that the goods selling host is in the market, and a better live broadcasting atmosphere can be achieved.

In a second aspect, there is provided an electronic device comprising one or more processors, one or more memories, the one or more memories storing one or more computer programs comprising instructions which, when executed by the one or more processors, cause the electronic device to perform the first aspect or any of the possible implementations of the first aspect.

In a third aspect, there is provided a computer readable storage medium comprising a computer program or instructions which, when run on a computer, cause the first aspect and any one of the possible implementations of the first aspect to be performed.

In a fourth aspect, there is provided a computer program product comprising a computer program or instructions which, when run on a computer, cause the first aspect and any one of the possible implementations of the first aspect to be performed.

In a fifth aspect, there is provided a computer program which, when run on a computer, causes the method as in the first aspect and any one of its possible implementations to be performed.

In a sixth aspect, an electronic device is provided, where the electronic device includes modules/units for performing the method of the above aspect or any of the possible designs of the above aspect, and these modules/units may be implemented by hardware, or may be implemented by hardware for performing corresponding software.

The advantages of the second aspect to the sixth aspect are referred to as the advantages of the first aspect, and the description thereof is not repeated here.

Drawings

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 2 is a software structural block diagram of an electronic device according to an embodiment of the present application.

Fig. 3 is a schematic diagram of direct sound and reflected sound provided by an embodiment of the present application.

Fig. 4 is a schematic diagram of the time and loudness of direct and reflected sounds reaching a user's ear provided by an embodiment of the present application.

Fig. 5 is a schematic diagram of a VR service scenario provided by an embodiment of the present application.

Fig. 6 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

Fig. 7 is a schematic diagram of an application scenario provided in an embodiment of the present application.

Fig. 8 is a schematic diagram of another application scenario provided in an embodiment of the present application.

Fig. 9 is a schematic diagram of another application scenario provided in an embodiment of the present application.

Fig. 10 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

Fig. 11 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

Fig. 12 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

FIG. 13 is a set of GUIs provided in an embodiment of the present application.

Fig. 14 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

FIG. 15 is a set of GUIs provided in an embodiment of the present application.

Fig. 16 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

Fig. 17 is a schematic flow chart of a method of rendering sound provided by an embodiment of the present application.

Fig. 18 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one, two or more than two. The term "and/or" is used to describe an associative relationship of associative objects, and indicates that three relationships may exist, for example, a and/or B may indicate that a exists alone, while a and B exist together, and B exists alone, where A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Embodiments of electronic devices and methods for using such electronic devices are described below. In some embodiments, the electronic device may be a portable electronic device that also includes other functionality such as personal digital assistant and/or music player functionality, such as a cell phone, tablet, head mounted display device (e.g., virtual Reality (VR) head mounted display device, augmented reality (augmented reality, AR) head mounted display device, and mixed display (MR) head mounted display device). Exemplary embodiments of portable electronic devices include, but are not limited to, piggy-backOr other operating system. The portable electronic device may also be other portable electronic devices such as a Laptop computer (Laptop) or the like. It should also be appreciated that in other embodiments, the electronic device described above may not be a portable electronic device, but rather a desktop computer.

By way of example, fig. 1 shows a schematic diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an ear-piece interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a user identification (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example, the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques can include a global system for mobile communications (global system for mobile communications, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-CDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation SATELLITE SYSTEM, GLONASS), a beidou satellite navigation system (beidou navigation SATELLITE SYSTEM, BDS), a quasi zenith satellite system (quasi-zenith SATELLITE SYSTEM, QZSS) and/or a satellite based augmentation system (SATELLITE BASED AUGMENTATION SYSTEMS, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an organic light-emitting diode (OLED), an active-matrix organic LIGHT EMITTING diode (AMOLED), a flexible light-emitting diode (FLED), miniled, microLed, micro-oLed, a quantum dot LIGHT EMITTING diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. Thus, the electronic device 100 may play or record video in a variety of encoding formats, such as moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent recognition of the electronic device 100, for example, image recognition, face recognition, voice recognition, text understanding, etc., can be realized through the NPU.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity greater than or equal to a first pressure threshold acts on the alarm clock application icon, an instruction to newly create an alarm clock is executed.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc. For example, when the mobile phone detects a touch operation of a user on the screen locking interface, the mobile phone can collect fingerprint information of the user through the fingerprint sensor 180H and match the collected fingerprint information with fingerprint information preset in the mobile phone. If the matching is successful, the mobile phone can enter the non-screen locking interface from the screen locking interface.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

Fig. 2 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (Android runtime) and system libraries, and a kernel layer, respectively. The application layer may include a series of application packages.

As shown in fig. 2, the application layer may include cameras, settings, three-way applications, and the like. The three-party application program can comprise a gallery, calendar, conversation, map, navigation, WLAN, bluetooth, music, video, short message, and the like.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer may include some predefined functions.

As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system comprises visual controls, such as a control for displaying characters, a control for displaying pictures, and the like, for example, indication information for prompting a virtual shutter key in the embodiment of the application. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android runtime include core libraries and virtual machines. Android runtime is responsible for scheduling and management of the android system.

The core library comprises two parts, wherein one part is a function required to be called by java language, and the other part is an android core library.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (media library), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

In addition, the system library may also include a state monitoring service module, such as a physical state recognition module, for analyzing and recognizing gestures of the user, and a sensor service module for monitoring sensor data uploaded by various sensors in the hardware layer and determining the physical state of the electronic device 100.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The hardware layer may include various sensors, such as the various sensors described in fig. 1, acceleration sensors, gyroscopic sensors, touch sensors, etc. involved in embodiments of the present application.

Before describing embodiments of the present application, several concepts related to embodiments of the present application will be described first.

Direct sound (direct sound) refers to sound that propagates directly from a sound source to a receiving side without any reflection.

Reflected sound (REFLECTED SOUND), which refers to sound that propagates to the receiver after being reflected by other objects, typically arrives after the direct sound reaches 50ms-80ms, may include early reflected sound (early reflection) and late reverberation (late reverberation).

Reverberation (reverberation) and echo (echo) are the sounds which reach the user after being reflected, but the reverberation and the echo are obviously different, the reverberation is a reflection phenomenon in a relatively small space, the multipath effect of the reverberation is obvious, the path of sound transmission is complex, and the delay is small. Echo refers to the reflection of sound in a wider space, the delay is longer, and the echo is clearly separated from the direct sound energy.

The direct sound and the reflected sound are described in detail below in connection with fig. 3 and 4. Fig. 3 shows a schematic diagram of the direct sound and the reflected sound, and fig. 4 shows a schematic diagram of the time and loudness of the direct sound and the reflected sound reaching the ear of the user.

As shown in fig. 3, it is assumed that the user is in one room, and the user can hear his own voice after speaking. The sound heard by the user to speak by himself includes direct sound and reflected sound. The direct sound may include bone conduction path direct sound, i.e. sound conducted through the head bone to the auditory system of the user, as well as air conduction path direct sound, i.e. sound propagated in air directly from the mouth of the user to the ears. Reflected sound includes sound emitted from the user's mouth that propagates to the user's ears after reflecting off of a wall surface and an obstacle in the room.

As shown in fig. 4, in conjunction with the description of fig. 3, it is not difficult to derive that the reflected sound will reach the user's ears later than the direct sound, and the loudness of the reflected sound will be less than the direct sound, since the reflected sound needs to be reflected off the wall.

The reflected sound is related to the environment in which the user is located, for example, when the user is in a narrow and tight environment, the user can hear more reflected sound, and the time interval between the reflected sound and the direct sound is shorter. For another example, when the user is in a large venue, the reflected sound heard by the user is more and the time interval from the direct sound is greater. As another example, when the user is in a wide outdoor environment, the reflected sound heard by the user may be reduced. Thus, the reflected sound may help the user determine the environment in which they are located.

At present, when an electronic device runs services such as VR, AR, MR and the like, playing sound adopts technologies such as spatial audio and the like, and the sound in the service is rendered to a spatial effect matched with a user watching picture or a virtual scene selected by the user. However, in some services, the user needs to interact with the object in the virtual scene, at this time, the user hears the direct sound and the reflected sound of the speech, where the reflected sound is associated with the real scene where the user is located, and the real scene may be different from the virtual scene, which will cause the immersion experience of the user to be degraded, and the experience of the user will be reduced, as will be described below in connection with fig. 5.

Fig. 5 shows a schematic diagram of a VR service scenario.

As shown in fig. 5, a user uses an electronic device (e.g., a VR head mounted display device) in a living room. The electronic device displays a virtual scene 501, which virtual scene 501 is a virtual outdoor scene, which outdoor scene further comprises an object 502. The electronic device may render the sound of object 502 in conjunction with the outdoor scene so that the user may perceive the outdoor scene at himself through the virtual scene 501 and the sound of object 502 displayed at the VR head mounted display device. However, when the user interacts with the speaking object 502, the user can hear the direct sound of the speaking and the reflected sound reflected by the living room, and the reflected sound reflected by the living room makes the user feel the living room, and cannot combine with the outdoor scene, so that the user feel the split feeling, the immersion experience feeling is reduced, and the user experience is reduced. Based on the above, the method for rendering the sound provided by the embodiment of the application can be used for rendering the sound of the user in combination with the virtual scene, so that the sound heard by the user accords with the virtual scene, brings strong immersion feeling, and is beneficial to improving the experience of the user.

Fig. 6 shows a schematic flowchart of a method for rendering sound 600 according to an embodiment of the present application, where, as shown in fig. 6, the method 600 includes:

s601, displaying a first virtual scene.

When running services such as VR, AR, MR and the like, the first electronic device can display a first virtual scene corresponding to the services.

Illustratively, the first virtual scene includes, but is not limited to, a jungle, an indoor scene, a stage, a conference room, and the like.

S602, detecting first voice input of a user, and acquiring first sound, wherein the first sound comprises first direct sound of the user.

For example, the first electronic device may obtain a first sound through the microphone when detecting the first voice input of the user, the first sound including a first direct sound of the user.

For example, when the first electronic device is connected to an external device with a microphone and a first voice input of a user is detected, the first electronic device may acquire a first sound through the external device. For example, the external device may be a headset.

For example, when the first electronic device detects the first voice input of the user, the first sound may be obtained through the bone conduction pickup device, and the first sound is the first direct sound of the user.

For example, when the first electronic device detects a first voice input of the user, a first sound may be acquired by the microphone and the bone conduction pickup device, the first sound including a first direct sound of the user including a bone conduction path direct sound (i.e., direct sound acquired by the bone conduction pickup device) and a gas conduction path direct sound (i.e., direct sound acquired by the microphone).

Optionally, in some embodiments, when the first sound is acquired by the microphone, the first sound may further include a first reflected sound formed by reflection of the first up-to-date sound through a real scene in which the user is located.

And S603, generating a first rendering sound according to the first arrival sound of the user and the first virtual scene.

The first electronic device may determine a first acoustic parameter according to the first virtual scene, and then render a first up-sound of the user according to the first acoustic parameter to generate a first rendered sound, where the first rendered sound includes a first virtual reflected sound, and the first virtual reflected sound is a sound formed by reflecting the first up-sound in the first virtual scene. The level of intensity of the first rendered sound may be preset, or may be set by a user, or may be determined according to the level of intensity of the first up-to-date sound. The description for the sound intensity level of the first rendered sound, which is determined from the sound intensity level of the first direct sound, is referred to the description below for fig. 11, and will not be described here.

Optionally, in some embodiments, the first rendered sound further comprises a first virtual direct sound. The virtual direct sound in the embodiment of the application can be understood as the sound of the direct sound of the user, which is played by the electronic device according to the tone of the user or the tone of the virtual object after the direct sound of the user is acquired.

When the first sound includes both the first reaching sound and the first reflected sound, the first reaching sound and the first reflected sound may be separated by correlation, or the first reaching sound may be separated according to the acoustic parameters of the actual scene where the user is located.

The first acoustic parameter may be used to characterize a first virtual scene. The first acoustic parameters include, but are not limited to, reverberation parameters, direct-to-reverberant ratio (DRR), echo parameters, reflection density, surround and speech intelligibility (speech intelligibility, SI), and Impulse Response (IR).

The first acoustic parameter may be preset or may be calculated in real time, as will be described below with reference to calculating the reverberation parameter in real time.

In calculating the reverberation parameter in real time, the reflection path may be simulated by the position of the sound source, the radiation direction of the sound source, and three-dimensional geometric information and material information of the first virtual scene, or the reverberation parameter may be calculated by geometric parameter estimation, material parameter estimation, or the like of the virtual scene.

Direction of radiation

The above description requires determining the position of the sound source and the direction of radiation of the sound source, for which the following two possible implementations are provided by embodiments of the present application.

In one possible implementation manner, the position of the sound source and the radiation direction of the sound source are preset, that is, after the first electronic device acquires the first sound of the user, the first sound of the user can be simulated to be emitted from the preset position of the first virtual scene and be propagated to the preset radiation direction.

In one possible implementation manner, when a virtual character corresponding to the user is included in the first virtual scene, the position of the sound source and the radiation direction of the sound source are determined according to the position and the mouth direction of the virtual character.

It should be noted that the above method for calculating the reverberation parameter is only an example, and should not be construed as being a specific limitation of the embodiment of the present application, and other methods may be adopted to calculate the reverberation parameter according to the embodiment of the present application.

After determining the first acoustic parameter and the sound of the user, the first electronic device may render the sound of the user based on the first acoustic parameter, and in the embodiment of the present application, the method for rendering is not limited in particular, for example, the electronic device may separate an angle and a distance, and render the sound by using a head related transfer function (HEAD RELATED TRANSFER functions, HRTF) and a room impulse response (room impulse response, RIR), respectively. As another example, the first electronic device may render with a binaural impulse response (binaural impulse response, BIR).

S604, playing the first rendering sound.

The first electronic device may play the first rendered sound through a speaker, for example.

For example, when the first electronic device is connected to an external device with a speaker, the first electronic device may play the first rendered sound through the speaker.

For example, when the first electronic device has a bone conduction function, the first rendering sound may be played by way of bone conduction.

For example, when the first electronic device is connected to an external device with a bone conduction function, the first electronic device may play the first rendering sound in a bone conduction manner through the external device, and the external device may be a bone conduction earphone.

In the embodiment of the application, the electronic equipment can acquire the sound of the user, then render the sound of the user according to the acoustic parameters of the virtual scene to generate the first rendered sound, and play the first rendered sound, because the first rendered sound is rendered according to the first virtual scene, the first virtual reflected sound brought by the first virtual scene is included in the first rendering sound, and after the first rendering sound is heard, the user can better feel the first virtual scene in the body, so that the immersion of the user is improved, and the user experience is improved.

In order to more clearly describe the method for rendering sound provided by the embodiment of the present application, a specific application scenario will be taken as an example in the following in conjunction with fig. 7 to 9.

Fig. 7 shows a schematic diagram of an application scenario provided by an embodiment of the present application.

As shown in fig. 7, the application scene is a game scene, the virtual scene corresponding to the game scene is a jungle, but the real scene of the user is in a carriage. When the game setting needs to be played, the first electronic device can acquire the voice of the user, then render the voice of the user according to the acoustic parameters of the jungle, and play the rendered voice, so that the user can hear the self-talking voice matched with the jungle.

Fig. 8 shows a schematic diagram of another application scenario provided by the embodiment of the present application.

As shown in fig. 8, the application scene is a singing scene, the virtual scene corresponding to the singing scene is a stage, but the real scene of the user is a bedroom. When a user starts singing, the first electronic device can acquire singing voice of the user, then render the singing voice of the user according to acoustic parameters of the stage, and play the rendered singing voice, so that the user can feel the stage at the user.

It should be noted that the effect of the rendered singing voice is to supplement the reflected sound of the stage, rather than being played back through a speaker to increase the intensity of the direct sound.

Fig. 9 shows a schematic diagram of another application scenario provided by the embodiment of the present application.

As shown in fig. 9, the application scene is a conference scene, the virtual scene corresponding to the conference scene is a conference room, but the real scene of the user is a grassland. When a user is required to speak in the conference process, the first electronic device can acquire the sound of the user, then render the sound of the user according to the acoustic parameters of the conference room, play the rendered sound, and feel the conference room when the user hears the rendered sound.

When the electronic device executes services such as VR, AR, MR, etc., the virtual scene may change, or the sound of the user is used as a sound source in the same virtual scene, and the position and the radiation direction of the sound source may also change.

For example, the virtual scene changes from a jungle scene to an indoor scene. For another example, taking a virtual scene as a stage scene, the sound source is located at a stage position #1 at a time #1 and at a stage position #2 at a time #2.

It will be appreciated that when the above change occurs, after the user's sound is again acquired, the user's sound needs to be rendered using the changed acoustic parameters, as will be described below in connection with fig. 10.

As shown in fig. 10, S603 generates a first rendered sound according to a first arrival sound of a user and a first virtual scene, including:

S6031, determining a first acoustic parameter according to the first virtual scene.

S6032, rendering the first arrival sound using the first acoustic parameter generates a first rendered sound.

The method 600 further includes:

s605, determining that a preset condition is met, and switching the first acoustic parameter to the second acoustic parameter.

Illustratively, the preset condition is that the first virtual scene becomes the second virtual scene.

For example, taking a game scene as an example, as the progress of the game advances, a virtual scene in the game is changed from a jungle scene to an indoor scene, and the first electronic device may acquire a second acoustic parameter corresponding to the indoor scene.

The preset condition is, for example, that the position of the sound source and/or the radiation direction of the sound source changes.

For example, as shown in fig. 8, taking a virtual scene as an example of a stage scene, the stage scene includes a virtual character #1, the virtual character #1 corresponds to a user, the virtual character #1 is located at a stage position #1 at a time #1, and a sound source position is changed at a stage position #2 at a time # 2.

For another example, a virtual scene is taken as a conference scene, and the conference scene includes a virtual character #1, the virtual character #1 corresponds to a user, the virtual character #1 is talking in a chair at a time #1, and the virtual character #1 stands up from the chair at a time #2 to talk, that is, the position of a sound source is changed.

For another example, a virtual scene is taken as a conference scene, and the conference scene includes a virtual character #1, a virtual character #2, and a virtual character #3, wherein the virtual character #1 corresponds to a user, and the virtual character #1 speaks into the virtual character #2 at time #1, and speaks into the virtual character #3 at time #2, that is, the radiation direction of the sound source changes.

It will be appreciated that the second acoustic parameters may be preset or may be calculated in real time, and the description of the second acoustic parameters may refer to the description of the first acoustic parameters, which will not be repeated herein for brevity.

S606, a second voice input of the user is detected, and a second sound is acquired, the second sound including a second direct sound of the user.

Optionally, in some embodiments, when the second sound is acquired through the microphone, the second sound may further include a second reflected sound formed by reflection of the second direct sound through a real scene in which the user is located.

S607, generating a second rendered sound according to the second direct sound of the user and the second acoustic parameter.

S608, playing the second rendering sound.

It should be understood that the descriptions for S606-S608 may be referred to the descriptions for S602-S604, and are not repeated here for brevity.

The ratio of the direct sound and the reflected sound heard by the user in the real scene is certain, e.g. the ratio of the component of the direct sound and the component of the reflected sound is 5:1. Therefore, in the same real scene, when the sound intensity of the direct sound increases, the sound intensity of the reflected sound also increases. For example, when the user speaks in a room and the user increases the volume of speech, the direct sound of the user hearing the user's own speech increases, and similarly, the reflected sound increases in proportion. Based on this, in order to ensure the user's sense of immersion in the virtual scene and the authenticity of the rendered sound, the electronic device may acquire the sound intensity level of the direct sound, and then, when rendering the sound, may match the rendered sound with the sound intensity level of the direct sound, as will be described below in connection with fig. 11.

Optionally, in some embodiments, as shown in fig. 11, the method 600 further comprises:

s609, acquiring the sound intensity level of the first arrival sound.

For example, when the first electronic device obtains the first sound through the microphone, the sound intensity level of the first sound may be determined, and then the sound intensity level of the first reaching sound may be determined according to the sound intensity level of the first sound.

For example, when the first electronic device obtains the first sound through the bone conduction pickup device, the first sound is the first reaching sound, and the sound intensity level of the first sound is the sound intensity level of the first reaching sound.

For example, when the first electronic device obtains the first sound together through the microphone and the bone conduction pickup device, the first reaching sound includes the bone conduction path direct sound and the air conduction path direct sound, the first electronic device may determine the sound intensity levels of the bone conduction path direct sound and the air conduction path direct sound, respectively, and then determine the sound intensity level of the first reaching sound according to the sound intensity levels of the bone conduction path direct sound and the air conduction path direct sound.

S603, generating a first rendering sound according to a first arrival sound of a user and a first virtual scene, including:

S6033, generating a first rendering sound according to the first direct sound of the user, the sound intensity level of the first direct sound, and the first virtual scene.

After the first electronic device determines the sound intensity level of the first reaching sound, the first reaching sound can be rendered by using the first acoustic parameter of the first virtual scene to generate a first rendered sound, and then when the sound intensity level of the first rendering is determined, the sound intensity level of the first rendered sound can be determined according to the sound intensity level of the first reaching sound.

As described above, in some scenarios, a user may correspond with a virtual object in a virtual scene, including, but not limited to, a virtual character, a virtual animal. The tone color of the virtual object may be different from the tone color of the user, and if the tone color of the user is still adopted during rendering, the immersion of the user may still be reduced, so that the electronic device may render with the tone color of the virtual object during rendering, thereby ensuring the immersion of the user, which will be described below in connection with fig. 12.

Optionally, in some embodiments, as shown in fig. 12, the method 600 further comprises:

s610, acquiring tone color of a first virtual object, wherein the first virtual object is a virtual object corresponding to a user.

s6034, generating a first rendering sound according to the first direct sound of the user, the tone color of the first virtual object, and the first virtual scene.

The first electronic device may render the first up-sound using the first acoustic parameter of the first virtual scene and the timbre of the first virtual object, thereby obtaining a first rendered sound. The first rendered sound also includes a first virtual direct sound generated from the first direct sound and a timbre of the first virtual object. In other words, after determining the timbre of the first virtual object, the timbre of the user in the first direct sound may be replaced with the timbre of the first virtual object to generate the first virtual direct sound.

It should be noted that there is no actual execution sequence between S602 and S610. In other words, S602 may be performed first and then S610 may be performed, or S610 may be performed first and then S602 may be performed, or S602 and S610 may be performed simultaneously.

In some embodiments, when the user corresponds to a virtual object in the virtual scene, the first electronic device may automatically acquire a timbre of the virtual object, and then render the direct sound of the user with the timbre of the virtual object and acoustic parameters of the virtual scene.

In other embodiments, the first electronic device may also respond to the operation of selecting the tone color of the virtual object by the user, and then render the direct sound of the user with the tone color of the virtual object and the acoustic parameters of the virtual scene.

For example, as shown in the GUI of fig. 13, the electronic device displays a virtual scene 1301, which is a stage scene, including a virtual object 1302. The electronic device may also display an option box 1303 when displaying the virtual scene 1302, where the option box 1303 includes a user tone option 1304 and a virtual object tone option 1305, and when detecting an operation that the user clicks the virtual object tone option 1305, the electronic device may render the direct sound of the user with the tone of the virtual object 1302 and the virtual scene 1301 when rendering the direct sound of the user.

Optionally, in other embodiments, the timbre of the first virtual direct sound is the timbre of the user.

It should be noted that, when the tone color of the first virtual direct sound is the tone color of the virtual object, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, and when the tone color of the first virtual direct sound is the tone color of the user, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, and the first ratio may be greater than the second ratio.

The user may hear a first reflected sound and/or noise of the real scene (e.g., the sound of the other person speaking in the real scene) in the real scene where it is actually located, which may affect the user's immersion, so that in some embodiments of the present application, the electronic device may perform a noise reduction process, as will be described in detail below.

As shown in fig. 14, the method 600 further includes:

s611, a first noise reduction is generated.

S612, playing the first noise reduction.

The electronic device may play the first noise reduction while playing the first rendered sound.

It should be noted that, S611 and S601-S603 do not have an actual execution sequence.

Generating the first noise reduction by the first electronic device may include several possible implementations:

In one possible implementation manner, the first sound acquired by the first electronic device includes a first reflected sound, and the first electronic device generates sound waves opposite to the first reflected sound, namely, first noise reduction, according to a preset noise reduction algorithm and the first reflected sound, where the first noise reduction is used for counteracting the first reflected sound.

In one possible implementation manner, the first sound acquired by the first electronic device includes first reflected sound and noise, and the first electronic device generates sound waves opposite to the first reflected sound and noise, namely first noise reduction, according to a preset noise reduction algorithm and the first reflected sound and noise, and the first noise reduction is used for canceling the first reflected sound and noise.

In one possible implementation manner, the first sound acquired by the first electronic device includes a first reflected sound, the first electronic device acquires an acoustic parameter of a real scene, and then renders the first arrival sound according to the acoustic parameter of the real scene to generate a first noise reduction sound, where the first noise reduction sound is used for counteracting the first reflected sound.

Acoustic parameters of the real scene include, but are not limited to, reverberation parameters, DRR, echo parameters, reflection density, surround sound, SI, and IR.

The manner in which the electronic device acquires the acoustic parameters of the real scene is not limited in the embodiment of the present application, and several possible implementations are given below by way of example.

In one possible implementation manner, the first electronic device is preset with a plurality of acoustic parameters of a real scene, and the electronic device can obtain the corresponding acoustic parameters according to the selection of a user.

For example, as shown in fig. 15, the first electronic device is preset with acoustic parameters of a noisy car, acoustic parameters of a bedroom, acoustic parameters of a living room, acoustic parameters of the open air, and the electronic device may display an option box 1502 when displaying the virtual scene 1501, where the option box 1502 includes a noisy car option 1503, a bedroom option 1504, an open air option 1505, and a living room option 1506. The electronic device may determine the acoustic parameter corresponding to the option according to the operation of selecting the option by the user.

In one possible implementation, the first electronic device obtains acoustic parameters of the real scene via one or more sensors. The one or more sensors include, but are not limited to, an inertial measurement unit (inertial measurement unit, IMU), speakers, microphones, ultrasonic radar, infrared radar, millimeter wave radar, and cameras.

Taking the example that the one or more sensors include a speaker and a microphone, the first electronic device may play audio of a particular frequency through the speaker and receive audio of the particular frequency through the microphone, and then determine the acoustic parameter based on the audio of the particular frequency played by the speaker and the audio of the particular frequency received by the microphone.

Taking the example that the one or more sensors include a camera, the first electronic device may provide three-dimensional size information of the real scene acquired by the camera, and then the electronic device may determine the acoustic parameter according to the three-dimensional size information.

In one possible implementation manner, the first electronic device may obtain acoustic parameters of the real scene according to big data statistics.

For example, when executing VR service #1, the first electronic device may communicate with a server, where the server stores acoustic parameters of a real scene corresponding to VR service #1, and the first electronic device may receive parameters of the real scene corresponding to VR service #1 sent by the server. The acoustic parameters of the real scene corresponding to VR service #1 stored by the server may be determined according to the acoustic parameters uploaded by many users.

In one possible implementation manner, the first electronic device obtains acoustic parameters of the real scene according to the system information.

For example, if the first electronic device is preset with outdoor acoustic parameters, the first electronic device may determine that the user is outdoor through the positioning information, and then the first electronic device may acquire the outdoor acoustic parameters.

Optionally, in some embodiments, the first electronic device may further obtain HRTF parameters of the user, and then generate the first noise reduction according to acoustic parameters of the real scene and HRTF parameters of the user.

As indicated above, the first electronic device may be connected to an external device, which may be a headset, with a speaker and a microphone to obtain and play corresponding sound. Headphones may include in-ear headphones, semi-in-ear headphones, and open headphones. It will be appreciated that when the earphone is an in-ear earphone, the direct sound of the air guide path is isolated, and when the earphone is a half-in-ear earphone, the direct sound of part of the air guide path is isolated, and as the direct sound of the air guide path is isolated or partially isolated, the proportion of virtual reflected sound in the sound heard by the user is increased, so that the immersion sense is reduced. Therefore, the first electronic device may play the virtual direct sound or increase the proportion of the virtual direct sound in the rendered sound when it is determined that the connected external device is a headset, and the headset is an in-ear headset or a half-in-ear headset, which will be described in detail below with reference to fig. 16.

Optionally, in some embodiments, as shown in fig. 16, before performing S604, the method 600 further includes:

s613, detecting the type of the earphone.

The first electronic device may detect the type of earphone connected thereto, may perform S614 when it is determined that the type of earphone is an in-ear earphone or a semi-in-ear earphone and the virtual direct sound is not included in the first rendered sound, and may perform S615 when the virtual direct sound is included in the first rendered sound.

S614 adds a first virtual direct sound to the first rendered sound, the first virtual direct sound being generated from the first direct sound.

When the first electronic device determines that the type of the earphone is in-ear earphone or semi-in-ear earphone, a first virtual direct sound can be generated, and the first virtual direct sound is generated according to the first direct sound, namely, after the first direct sound is acquired, the first direct sound can be played. Note that, the tone color of the first virtual direct sound may be the tone color of the user, or may be the tone color of the virtual object.

In some embodiments, when the first electronic device adds the first virtual direct sound to the first rendered sound, the ratio of the first virtual direct sound to the first virtual reflected sound may also be determined according to the type of the headphones.

In other embodiments, the ratio of the first virtual direct sound to the first virtual reflected sound is preset when the first electronic device adds the first virtual direct sound to the first rendered sound. In other words, in these embodiments, the ratio of the first virtual direct sound and the first virtual reflected sound does not change with the change in the type of headphones.

S615, determining a ratio of the first virtual direct sound to the first virtual reflected sound according to the type of the earphone.

When the first virtual direct sound is included in the first rendered sound, the first electronic device may determine a ratio of the first virtual direct sound and the first virtual reflected sound according to a type of the headphones.

In some embodiments, when the headphones are in-ear headphones, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio.

In some embodiments, when the headphones are semi-in-ear headphones, the ratio of the first virtual direct sound to the first virtual reflected sound is the second ratio.

In some embodiments, when the headphones are open headphones, the ratio of the first virtual direct sound to the first virtual reflected sound is a third ratio.

It is understood that the first ratio is greater than the second ratio, which is greater than the third ratio.

Some VR, AR, MR, etc. services may support connections between electronic devices, for example, through wireless local area network connections between electronic devices, or through server connections. When the first electronic device executes the service, the direct sound of other users sent by the electronic device connected with the first electronic device can be received, and then the direct sound is played, which will be described below in connection with fig. 17.

Optionally, in some embodiments, as shown in fig. 17, the method 600 further comprises:

s616, receiving the third rendering sound sent by the second electronic device.

S617, playing the third rendering sound.

The first electronic device may receive and play a third rendering sound sent by the second electronic device, where the second electronic device is connected to the first electronic device, and the second electronic device also displays the first virtual scene, where the third rendering sound includes a virtual direct sound and a virtual reflected sound corresponding to the user of the second electronic device.

It should be appreciated that the method for determining the third rendered sound for the second electronic device may be referred to above, and will not be described herein for brevity.

It may be appreciated that the first electronic device may play the third rendered sound using a spatial audio technique or the like, and may enable the user to perceive a direction of the third rendered sound.

Optionally, in some embodiments, the first virtual scene includes a virtual scene sound, and the first electronic device may further play the virtual scene sound.

Optionally, in some embodiments, the first electronic device may further record, live, the first virtual scene, and the first rendered sound simultaneously. In these embodiments, the first rendered sound includes a first virtual direct sound and a first virtual reflected sound.

The method for rendering sound provided by the embodiment of the application is mainly described from the perspective of the electronic equipment. It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application can divide the functional modules (or units) of the processor in the electronic device according to the method example, for example, each functional module (or unit) can be divided corresponding to each function, or two or more functions can be integrated in one processing module (or unit). The integrated modules (or units) may be implemented in hardware or in software functional modules (or units). It should be noted that, in the embodiment of the present application, the division of the modules (or units) is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.

In the case of dividing each functional module (or unit) into corresponding functions, fig. 18 shows a schematic diagram of the first electronic device 1800 according to an embodiment of the present application, where the first electronic device 1800 includes a display module 1810, an acquisition module 1820, a processing module 1830, and a playing module 1840 as shown in fig. 1800.

The display module 1810 is configured to display the first virtual scene.

The module 1820 is acquired. For detecting a first speech input of a user, a first sound is acquired, the first sound comprising a first direct sound of the user.

The processing module 1830 is configured to generate a first rendered sound according to a first direct sound of a user and a first virtual scene, where the first rendered sound includes a first virtual reflected sound, and the first virtual reflected sound is a sound formed by reflecting the first direct sound in the first virtual scene.

The playing module 1840 is configured to play the first rendered sound.

Optionally, in some embodiments, the processing module 1830 is specifically configured to:

Determining a first acoustic parameter according to the first virtual scene;

Rendering the first up-sound with the first acoustic parameter generates a first rendered sound.

Optionally, in some embodiments, the processing module 1830 is further configured to determine that a preset condition is met, and switch the first acoustic parameter to the second acoustic parameter.

The obtaining module 1820 is further configured to detect a second voice input of the user, and obtain a second sound, where the second sound includes a second direct sound

The processing module 1830 is further configured to render the second direct sound using the second acoustic parameter to generate a second rendered sound.

The playing module 1840 is further configured to play the second rendered sound.

Optionally, in some embodiments, the preset condition is that the first virtual scene is switched to the second virtual scene, or the position and/or the radiation direction of the sound source in the first virtual scene are changed, wherein the sound source is the direct sound of the user.

Optionally, in some embodiments, the obtaining module 1820 is further configured to obtain a sound intensity level of the first up-sound before the processing module 1830 generates the first rendered sound according to the first up-sound and the first virtual scene.

The processing module 1830 is specifically configured to generate the first rendered sound according to the first direct sound, the sound intensity level of the first direct sound, and the first virtual scene.

Optionally, in some embodiments, the first virtual scene includes a first virtual object, the first virtual object being associated with a user, the obtaining module 1820 further configured to obtain a timbre of the first virtual object.

The processing module 1830 is specifically configured to generate a first rendered sound according to a first direct sound of the user, a tone color of the first virtual object, and a first virtual scene, where the first rendered sound further includes the first virtual direct sound, and the first virtual direct sound is generated according to the first direct sound and the tone color of the first virtual object.

Optionally, in some embodiments, the processing module 1830 is further configured to generate a first noise reduction, where the first noise reduction is used to cancel a first reflected sound and/or noise of a real scene in which the user is located, and the first reflected sound is formed by reflecting the first up sound through the real scene.

The playing module 1840 is further configured to play the first noise reduction.

Optionally, in some embodiments, the acquiring module 1820 is further configured to acquire a third acoustic parameter, where the third acoustic parameter includes an acoustic parameter of the real scene.

The processing module 1830 is specifically configured to generate the first noise reduction according to the third acoustic parameter.

Optionally, in some embodiments, the acquisition module 1820 is specifically configured to acquire the third acoustic parameter using one or more sensors, including one or more of an inertial measurement unit IMU, a speaker, a microphone, an ultrasonic radar, an infrared radar, a millimeter wave radar, and a camera.

Optionally, in some embodiments, the first acoustic parameter and the third acoustic parameter further comprise head related transfer function HRTF parameters of the user.

Optionally, in some embodiments, the first electronic device is connected to a headset, the first electronic device plays the first rendered sound through the headset, and the processing module 1830 is further configured to:

Detecting the type of the earphone;

When the earphone is determined to be an in-ear earphone or a semi-in-ear earphone, adding a first virtual direct sound in the first rendered sound, wherein the first virtual direct sound is generated according to the first direct sound.

Optionally, in some embodiments, when the earpiece is an in-ear earpiece, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, and when the earpiece is a semi-in-ear earpiece, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, the first ratio being greater than the second ratio.

Optionally, in some embodiments, the first electronic device is connected to the headphones, the first electronic device plays a first rendered sound through the headphones, the first rendered sound further comprising a first virtual direct sound, the first virtual direct sound being generated from the first direct sound, and the processing module 1830 is further configured to:

Detecting the type of the earphone;

The ratio of the first virtual direct sound to the first virtual reflected sound is determined according to the type of earphone.

Optionally, in some embodiments, when the earphone is an in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, when the earphone is a semi-in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, and when the earphone is an open earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a third ratio, the first ratio is greater than the second ratio, and the second ratio is greater than the third ratio.

Optionally, in some embodiments, the first electronic device is connected to a second electronic device, and the second electronic device displays the first virtual scene, and the obtaining module 1810 is further configured to receive a third rendered sound sent by the second electronic device.

The playing module 1840 is further configured to play the third rendering sound.

Optionally, in some embodiments, the first virtual scene comprises a virtual scene sound, the method further comprising:

The playing module 1840 is further configured to play the virtual scene sound.

Optionally, in some embodiments, the playing module 1840 is further configured to record or live the first virtual scene and the first rendered sound synchronously.

An embodiment of the present application provides a computer program product, which when executed on an electronic device, causes the electronic device to execute the technical solution in the foregoing embodiment. The implementation principle and technical effects are similar to those of the related embodiments of the method, and are not repeated here.

An embodiment of the present application provides a readable storage medium, where the readable storage medium contains instructions, and when the instructions are executed in an electronic device, the instructions cause the electronic device to execute the technical solution of the foregoing embodiment. The implementation principle and technical effect are similar, and are not repeated here.

The embodiment of the application provides a chip for executing instructions, and when the chip runs, the technical scheme in the embodiment is executed. The implementation principle and technical effect are similar, and are not repeated here.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be substantially contributing or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific implementation of the embodiment of the present application, but the protection scope of the embodiment of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the embodiment of the present application, and the changes or substitutions are covered by the protection scope of the embodiment of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for rendering sound, characterized in that the method is applied to a first electronic device, and the method comprises:

displaying a first virtual scene;

Detecting a first voice input of a user and acquiring a first sound, wherein the first sound includes a first direct sound of the user;

generating a first rendered sound according to the first direct sound and the first virtual scene, wherein the first rendered sound includes a first virtual reflected sound, and the first virtual reflected sound is a sound formed by the first direct sound being reflected in the first virtual scene;

The first rendered sound is played.

2. The method according to claim 1, wherein generating a first rendered sound according to the first direct sound and the first virtual scene comprises:

determining a first acoustic parameter according to the first virtual scene;

The first direct sound is rendered using the first acoustic parameter to generate the first rendered sound.

3. The method according to claim 2, characterized in that the method further comprises:

Determining that a preset condition is met, switching the first acoustic parameter to a second acoustic parameter;

Detecting a second voice input of a user and acquiring a second sound, wherein the second sound includes a second direct sound;

Rendering the second direct sound using the second acoustic parameter to generate a second rendered sound;

Play the second rendered sound.

4. The method according to claim 3, characterized in that the preset condition is:

Switching from the first virtual scene to the second virtual scene, or

The position and/or radiation direction of the sound source in the first virtual scene changes, wherein the sound source is the direct sound of the user.

5. The method according to any one of claims 1 to 4, characterized in that the method further comprises:

Get the sound intensity level of the first direct sound;

The generating a first rendered sound according to the first direct sound and the first virtual scene includes:

A first rendered sound is generated according to the first direct sound, the sound intensity level of the first direct sound, and the first virtual scene.

6. The method according to any one of claims 1 to 5, characterized in that the first virtual scene includes a first virtual object, the first virtual object is associated with the user, and the method further comprises:

Acquire the timbre of the first virtual object;

A first rendered sound is generated according to the first direct sound, the timbre of the first virtual object and the first virtual scene. The first rendered sound also includes a first virtual direct sound. The first virtual direct sound is generated according to the first direct sound and the timbre of the first virtual object.

7. The method according to any one of claims 1 to 6, characterized in that the method further comprises:

generating a first noise reduction, where the first noise reduction is used to cancel a first reflected sound and/or noise of a real scene where the user is located, where the first reflected sound is formed by the first direct sound being reflected by the real scene;

The first noise reduction is played.

8. The method according to claim 7, characterized in that the generating the first noise reduction comprises:

Acquiring a third acoustic parameter, wherein the third acoustic parameter includes an acoustic parameter of the real scene;

The first direct sound is rendered according to the third acoustic parameter to generate a first noise reduction.

9. The method according to claim 8, characterized in that the obtaining of the third acoustic parameter comprises:

The third acoustic parameter is acquired using one or more sensors, wherein the one or more sensors include one or more of the following:

Inertial measurement unit IMU, speakers, microphones, ultrasonic radar, infrared radar, millimeter wave radar and cameras.

10 . The method according to claim 8 or 9 , wherein the first acoustic parameter and the third acoustic parameter further include head-related transfer function (HRTF) parameters of the user.

11. The method according to any one of claims 1 to 10, characterized in that the first electronic device is connected to a headset, and the first electronic device plays the first rendered sound through the headset, and the method further comprises:

Detecting the type of the headset;

When it is determined that the earphone is an in-ear earphone or a semi-in-ear earphone, a first virtual direct sound is added to the first rendered sound, where the first virtual direct sound is generated based on the first direct sound.

12. The method according to claim 11 is characterized in that when the earphone is an in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, and when the earphone is a semi-in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, and the first ratio is greater than the second ratio.

13. The method according to any one of claims 1 to 10, characterized in that the first electronic device is connected to a headset, the first electronic device plays the first rendered sound through the headset, the first rendered sound also includes a first virtual direct sound, and the first virtual direct sound is generated according to the first direct sound, and the method further includes:

Detecting the type of the headset;

The ratio of the first virtual direct sound to the first virtual reflected sound is determined according to the type of the earphone.

14. The method according to claim 13 is characterized in that, when the earphone is an in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a first ratio, when the earphone is a semi-in-ear earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a second ratio, and when the earphone is an open earphone, the ratio of the first virtual direct sound to the first virtual reflected sound is a third ratio, the first ratio is greater than the second ratio, and the second ratio is greater than the third ratio.

15. The method according to any one of claims 1 to 13, characterized in that the first electronic device is connected to the second electronic device, the second electronic device displays the first virtual scene, and the method further comprises:

receiving a third rendered sound sent by a second electronic device;

The third rendered sound is played.

16. The method according to any one of claims 1 to 15, characterized in that the first virtual scene includes virtual scene sound, and the method further comprises:

Play the virtual scene sound.

17. The method according to any one of claims 1 to 16, characterized in that the method further comprises:

The first virtual scene and the first rendered sound are synchronously recorded or live broadcasted.

18. An electronic device, characterized in that it comprises one or more processors; one or more memories; the one or more memories store one or more computer programs, and the one or more computer programs include instructions, and when the instructions are executed by the one or more processors, the method described in any one of claims 1 to 17 is executed.

19. A chip, characterized in that the chip comprises a processor and a communication interface, the communication interface is used to receive a signal and transmit the signal to the processor, the processor processes the signal so that the method according to any one of claims 1 to 17 is executed.

20. A computer-readable storage medium, characterized in that computer instructions are stored in the computer-readable storage medium, and when the computer instructions are run on a computer, the method according to any one of claims 1 to 17 is executed.