CN116391226A - Psychoacoustic analysis method, device, equipment and storage medium - Google Patents
Psychoacoustic analysis method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116391226A CN116391226A CN202380008348.2A CN202380008348A CN116391226A CN 116391226 A CN116391226 A CN 116391226A CN 202380008348 A CN202380008348 A CN 202380008348A CN 116391226 A CN116391226 A CN 116391226A
- Authority
- CN
- China
- Prior art keywords
- masking
- source
- tone
- masking source
- sources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The embodiment of the disclosure discloses a psychoacoustic analysis method, a device, equipment and a storage medium, which can be applied to a communication system, wherein the method comprises the following steps: determining a plurality of masking sources of the audio signal; the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources. By implementing the method disclosed by the invention, as part of masking sources are selected from all masking sources of the audio signal to participate in analysis and calculation of the masking threshold, the calculation amount of psychoacoustic analysis can be effectively reduced, and the calculation complexity is further reduced.
Description
Technical Field
The disclosure relates to the technical field of communication, and in particular relates to a psychoacoustic analysis method, device, equipment and storage medium.
Background
The psychoacoustic model can be applied to the field of audio encoding and decoding, and can be used for assisting in removing redundancy smaller than an auditory threshold in an audio signal, so that signals irrelevant to auditory perception are reduced, subjective quality of audio encoding is improved, and quantization code rate and quantization noise are reduced. The psychoacoustic model can also be applied to audio digital watermarking techniques to enable hiding of the watermark into audio signals that cannot be perceived by the human ear and recovering the hidden information in the decoding section. The psychoacoustic model can also be applied to the field of sound quality evaluation, and objective evaluation is carried out on the sound quality through the psychoacoustic model, so that a method for improving the sound quality is provided.
In the related art, in a psychoacoustic analysis process, an input audio signal is processed, all tone masking sources and non-tone masking sources included in the audio signal are extracted, and then masking thresholds of the audio signal are analyzed based on the extracted tone masking sources and non-tone masking sources.
Disclosure of Invention
The embodiment of the disclosure provides a psychoacoustic analysis method, a device, equipment, a chip system, a storage medium, a computer program and a computer program product, which can be applied to the technical field of communication, and can effectively reduce the calculated amount of psychoacoustic analysis and further reduce the calculation complexity.
In a first aspect, embodiments of the present disclosure provide a psychoacoustic analysis method, the method comprising: determining a plurality of masking sources of the audio signal; the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
In a second aspect, embodiments of the present disclosure provide a communications device having a function of implementing part or all of the method described in the first aspect, for example, a function of a communications device may be provided with a function of some or all of the embodiments of the present disclosure, or may be provided with a function of implementing any of the embodiments of the present disclosure separately. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more units or modules corresponding to the functions described above.
Optionally, in an embodiment of the disclosure, the structure of the communication device may include a transceiver module and a processing module, where the processing module is configured to support the communication device to perform the corresponding functions in the method described above. The transceiver module is used for supporting communication between the communication device and other equipment. The communication device may further comprise a memory module for coupling with the transceiver module and the processing module, which holds the necessary computer programs and data of the communication device.
As an example, the processing module may be a processor, the transceiver module may be a transceiver or a communication interface, and the storage module may be a memory.
In a third aspect, embodiments of the present disclosure provide a communication device, which includes a processor that, when invoking a computer program in a memory, performs the psycho-acoustic analysis method of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a communication apparatus comprising a processor and a memory, the memory having a computer program stored therein; the processor executes the computer program stored in the memory to cause the communication device to perform the psycho-acoustic analysis method of the first aspect described above.
In a fifth aspect, embodiments of the present disclosure provide a communication device, the device comprising a processor and an interface circuit for receiving code instructions and transmitting to the processor, the processor for executing the code instructions to cause the device to perform the psychoacoustic analysis method of the first aspect described above.
In a sixth aspect, embodiments of the present disclosure provide a communication system, which includes the communication device according to the second aspect, or which includes the communication device according to the third aspect, or which includes the communication device according to the fourth aspect, or which includes the communication device according to the fifth aspect.
In a seventh aspect, embodiments of the present disclosure provide a computer readable storage medium storing instructions for use by a terminal device, which when executed, cause the terminal device to perform the psycho-acoustic analysis method of the first aspect.
In an eighth aspect, an embodiment of the disclosure provides a readable storage medium storing instructions for use by a network device, which when executed, cause the network device to perform the psycho-acoustic analysis method of the first aspect.
In a ninth aspect, the present disclosure also provides a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the psycho-acoustic analysis method of the first aspect described above.
In a tenth aspect, the present disclosure provides a chip system comprising at least one processor and an interface for supporting a terminal device and/or a network device to implement the functionality referred to in the first aspect, e.g. to determine or process at least one of data and information referred to in the above method.
In one possible design, the chip system further includes a memory for holding computer programs and data necessary for the terminal device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
In an eleventh aspect, the present disclosure provides a computer program which, when run on a computer, causes the computer to perform the psychoacoustic analysis method of the first aspect described above.
In summary, the psychoacoustic analysis method, apparatus, device, chip system, storage medium, computer program and computer program product provided in the embodiments of the present disclosure may achieve the following technical effects:
By determining a plurality of masking sources of the audio signal and then analyzing the masking threshold of the audio signal according to part of the masking sources in the plurality of masking sources, the computing amount of psychoacoustic analysis can be effectively reduced and the computing complexity can be further reduced because part of masking sources are selected from all the masking sources of the audio signal to participate in the analysis and computation of the masking threshold.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background of the present disclosure, the following description will explain the drawings that are required to be used in the embodiments or the background of the present disclosure.
Fig. 1 is a schematic architecture diagram of a communication system according to an embodiment of the disclosure;
FIG. 2 is a flow chart of a psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 4 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 6 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 7a is a flow chart of yet another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 7b is a flow chart of yet another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 7c is a flow chart of yet another psychoacoustic analysis method provided by an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a masking spread function in an embodiment of the present disclosure;
FIG. 9 is a schematic architecture diagram of a psychoacoustic analysis method according to an embodiment of the present disclosure;
FIG. 10a is a graphical representation of experimental statistics when cross masking is turned on in an embodiment of the present disclosure;
FIG. 10b is a graphical representation of experimental statistics when cross masking is turned off in an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a communication device according to an embodiment of the disclosure;
fig. 12 is a schematic structural diagram of another communication device provided in an embodiment of the present disclosure;
fig. 13 is a schematic structural diagram of a chip according to an embodiment of the disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the embodiments of the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects of embodiments of the present disclosure as detailed in the accompanying claims.
The terminology used in the embodiments of the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the disclosure. As used in this disclosure of embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present disclosure. The words "if" and "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.
For ease of understanding, the terms referred to in this disclosure are first introduced.
1. And (5) a psychoacoustic model.
A psychoacoustic model is a mathematical expression model of statistical properties of human auditory sensations, which explains the physiological principles of various human auditory sensations.
2. Masking.
Masking in audiology means that the threshold of perception of one sound by the human ear is raised by the presence of another sound.
3. Masking source refers to a sound source that brings about a masking effect. Tone masking source refers to a masking source of a tone component that is brought into masking effect, and non-tone masking source refers to a masking source of a non-tone component (e.g., noise) that is brought into masking effect.
In order to better understand a psychoacoustic analysis method disclosed in the embodiments of the present disclosure, a communication system to which the embodiments of the present disclosure are applied will be described first.
Referring to fig. 1, fig. 1 is a schematic architecture diagram of a communication system according to an embodiment of the disclosure. The communication system may include, but is not limited to, one network device and one terminal device, and the number and form of devices shown in fig. 1 are only for example and not limiting the embodiments of the present disclosure, and may include two or more network devices and two or more terminal devices in practical applications. The communication system shown in fig. 1 is exemplified as including a network device 101 and a terminal device 102.
It should be noted that the technical solution of the embodiment of the present disclosure may be applied to various communication systems. For example: a long term evolution (long term evolution, LTE) system, a fifth generation (5th generation,5G) mobile communication system, a 5G New Radio (NR) system, or other future new mobile communication systems, etc.
The network device 101 in the embodiment of the present disclosure is an entity for transmitting or receiving a signal on the network side. For example, the network device 101 may be an evolved NodeB (eNB), a transmission and reception point (transmission reception point, TRP), a next generation NodeB (gNB) in an NR system, a private network system, a base station in other future mobile communication systems, or an access node in a wireless fidelity (wireless fidelity, wiFi) system, or the like. The embodiments of the present disclosure do not limit the specific technology and specific device configuration employed by the network device.
The network device provided by the embodiments of the present disclosure may be composed of a Central Unit (CU) and a Distributed Unit (DU), where the CU may also be referred to as a control unit (control unit), the structure of the CU-DU may be used to split the protocol layers of the network device, such as a base station, and the functions of part of the protocol layers are placed in the CU for centralized control, and the functions of part or all of the protocol layers are distributed in the DU, so that the CU centrally controls the DU.
The terminal device 102 in the embodiments of the present disclosure is an entity on the user side for receiving or transmitting signals, such as a mobile phone. The terminal device may also be referred to as a terminal device (terminal), a User Equipment (UE), a Mobile Station (MS), a mobile terminal device (MT), etc. The terminal device may be an automobile with a communication function, a smart car, a mobile phone (mobile phone), a wearable device, a tablet computer (Pad), a computer with a wireless transceiving function, a Virtual Reality (VR) terminal device, an augmented reality (augmented reality, AR) terminal device, a wireless terminal device in industrial control (industrial control), a wireless terminal device in unmanned-driving (self-driving), a wireless terminal device in teleoperation (remote medical surgery), a wireless terminal device in smart grid (smart grid), a wireless terminal device in transportation safety (transportation safety), a wireless terminal device in smart city (smart city), a wireless terminal device in smart home (smart home), or the like.
The embodiment of the present disclosure does not limit the specific technology and the specific device configuration adopted by the terminal device.
It may be understood that, the communication system described in the embodiments of the present disclosure is for more clearly describing the technical solutions of the embodiments of the present disclosure, and is not limited to the technical solutions provided in the embodiments of the present disclosure, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of new service scenarios, the technical solutions provided in the embodiments of the present disclosure are equally applicable to similar technical problems.
The psychoacoustic analysis method in The embodiment of The present disclosure may be applied to The above network device, or may be applied to a terminal device, or may also be applied to a network device and a terminal device, or may also be applied to any other system that may be capable of psychoacoustic analysis of an audio signal, for example, a streaming media transmission system, OTT (Over The Top, which refers to a media transmission system that provides various application services to a user through The internet), which is not limited.
In the related art, in a psychoacoustic analysis process, an input audio signal is processed, all tone masking sources and non-tone masking sources included in the audio signal are extracted, and then masking thresholds of the audio signal are analyzed based on the extracted tone masking sources and non-tone masking sources. In this way, the magnitude of the calculation amount is related to the number of masking sources participating in the operation, and the greater the number of masking sources participating in the operation is, the greater the calculation amount is, resulting in higher calculation complexity. Therefore, in the embodiment of the disclosure, a plurality of masking sources of the audio signal can be determined, and then, the masking threshold of the audio signal is analyzed according to part of the masking sources, and the part of masking sources are selected from all the masking sources of the audio signal to participate in the analysis and calculation of the masking threshold, so that the calculation amount of psychoacoustic analysis can be effectively reduced, and the calculation complexity is further reduced.
It should be noted that, the psychoacoustic analysis method provided in any one of the embodiments of the present application may be performed alone or in combination with possible implementation methods in other embodiments, and may also be performed in combination with any one of the technical solutions in the related art.
The psychoacoustic analysis method and apparatus provided by the present disclosure are described in detail below with reference to the accompanying drawings. Fig. 2 is a flow chart illustrating a psychoacoustic analysis method according to an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 2, the method may include, but is not limited to, the steps of:
s201: a plurality of masking sources of the audio signal are determined.
In some embodiments, the audio signal may be input to a communication device, which may perform psycho-acoustic analysis on the audio signal to determine a plurality of masking sources for the audio signal, without limitation.
In some embodiments, all masking sources included in the audio signal may be first determined, and then all of the plurality of masking sources may be selectively processed, without limitation.
In some embodiments, the masking sources include at least one of tonal masking sources, non-tonal masking sources, for example, the number of tonal masking sources may be one or more, the number of non-tonal masking sources may be one or more, and the non-tonal masking sources may also be referred to as noise masking sources, without limitation.
In some embodiments, the power spectrum and the sound pressure level of the input audio signal may be calculated first, then, the tonal components and the non-tonal components contained in the audio signal are found according to the calculated result, and then, the tonal masking sources and/or the non-tonal masking sources are extracted based on the calculated tonal components and the non-tonal components, thereby determining a plurality of masking sources contained in the audio signal, which is not limited.
S202: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
The partial masking source refers to a masking source selected from a plurality of masking sources, the selected masking source participates in the calculation of a masking threshold, one or more tone masking sources and one or more non-tone masking sources may be selected from the plurality of masking sources, and the selected partial masking source may be a tone masking source or a non-tone masking source, which is not limited.
In some embodiments, masking sources may be extracted from the audio signal one by one, and in extracting each masking source, a determination is made as to whether or not to select the masking source as a partial masking source; alternatively, all masking sources may be extracted from the audio signal, and after the extraction is completed, a part of masking sources may be selected from the plurality of masking sources; of course, determining the partial masking source may be implemented in any other possible manner, and is not limited in this regard.
After determining the plurality of masking sources of the audio signal as described above, a partial masking source may be selected from the plurality of masking sources, and a masking threshold of the audio signal may be analyzed based on the selected partial masking source, without limitation.
In some embodiments of the present disclosure, a partial masking source may also be determined from a plurality of masking sources to support analysis of masking thresholds of the audio signal based on the selected partial masking source, e.g., a partial masking source may be selected from the plurality of masking sources based on a set selection policy; alternatively, a portion of the masking sources may also be selected from a plurality of masking sources based on an artificial intelligence approach; of course, determining partial masking sources from among a plurality of masking sources may be implemented based on any other possible manner, which is not limiting.
In this embodiment, by determining a plurality of masking sources of the audio signal, and then analyzing a masking threshold of the audio signal according to a portion of the masking sources, the computing amount of psychoacoustic analysis can be effectively reduced, and the computing complexity can be further reduced, because a portion of the masking sources are selected from all the masking sources of the audio signal to participate in the analysis and computation of the masking threshold.
Fig. 3 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 3, the method may include, but is not limited to, the steps of:
s301: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a tonal masking source and a non-tonal masking source.
That is, in some embodiments of the present disclosure, the masking sources may include one or more tonal masking sources, and one or more non-tonal masking sources, and the partial masking sources may be selected from the one or more tonal masking sources, and the one or more non-tonal masking sources, without limitation.
S302: if a tone masking source is included within the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is obtained.
It will be appreciated that the audio signal may cover a plurality of critical bands and that different masking sources may correspond to the same or different critical bands, and that in some embodiments the selection of a partial masking source may be implemented in dependence on whether a tonal masking source and a non-tonal masking source correspond to critical bands.
In some embodiments, it may be determined for each critical band whether it contains a tone masking source, and if so, the tone masking source may be referenced to select a tone masking source as well as a non-tone masking source within the critical band, without limitation.
In some embodiments, the number of critical bands may be multiple, and the selection of the tone masking source and the non-tone masking source in the corresponding critical bands may be implemented according to whether the tone masking source is included in each critical band, which is not limited.
In some embodiments, if the critical frequency band to which the non-tone masking source belongs contains a tone masking source, the frequency distance between the tone masking source and the non-tone masking source is obtained, and then, according to the frequency distance, the selection processing of the tone masking source and the non-tone masking source in the critical frequency band is realized, which is not limited.
Where the frequency distance represents the distance in the frequency dimension between the tone masking source and the non-tone masking source, the unit may be a bark scale, which is a unit of measure used in critical band principles.
That is, it may be first determined whether a tone masking source is included in a critical band to which a non-tone masking source belongs, and if the tone masking source is included in the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is acquired to make masking source selection.
In some embodiments, when the step of obtaining the frequency distance between the tone masking source and the non-tone masking source is performed, the frequency domain position of the tone masking source in the critical frequency band may be determined, and the frequency domain position of the non-tone masking source in the critical frequency band may be determined, and then, the frequency domain position of the tone masking source in the critical frequency band and the frequency domain position of the non-tone masking source in the critical frequency band may be subjected to a difference process to obtain the frequency distance between the tone masking source and the non-tone masking source, which is not limited.
In other embodiments, the frequency distance between the tone masking source and the non-tone masking source may be in the bark scale, which is a unit of measure used in critical band principles, without limitation.
S303: from the frequency distance, a tonal masking source and/or a non-tonal masking source are determined to be partial masking sources.
After the frequency distance between the tonal masking source and the non-tonal masking source is obtained as described above, the tonal masking source and/or the non-tonal masking source may be determined to be partial masking sources based on the frequency distance.
In some embodiments, whether to select a tone masking source and/or a non-tone masking source may be determined based on the frequency distance and a set rule; alternatively, a masking source selection model may also be incorporated to determine whether to select a tonal masking source and/or a non-tonal masking source; of course, any other possible manner of determining whether to select a tone masking source and/or a non-tone masking source may be used, without limitation.
In other embodiments, in determining whether to select a tone masking source and/or a non-tone masking source, it may be, for example, determined whether to select a tone masking source as a partial masking source; alternatively, determining whether a non-tonal masking source is selected as a partial masking source; alternatively, it may be determined whether or not a tone masking source and a non-tone masking source are selected as partial masking sources, which is not limited.
In other embodiments, if the audio signal contains multiple critical bands, the above-mentioned processing may be performed on each critical band containing a tone masking source, that is, for each critical band containing a tone masking source, whether to select a tone masking source and/or a non-tone masking source as a partial masking source is determined based on the frequency distance between the tone masking source and the non-tone masking source contained therein, which is not limited.
In other embodiments, the critical band may not include a tone masking source, or may further include a plurality of tone masking sources, and in the critical band may generally include a non-tone masking source, and in this embodiment, if the critical band includes a plurality of tone masking sources, the frequency distance between each tone masking source and the non-tone masking source in the critical band may be calculated, and whether to select each tone masking source and the non-tone masking source is not limited.
S304: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
After the tone masking source and/or the non-tone masking source are determined to be partial masking sources, the masking threshold of the audio signal may be analyzed based on the partial masking source of the plurality of masking sources.
In this embodiment, since a part of masking sources is selected from all masking sources of the audio signal to participate in analysis and calculation of the masking threshold, the calculation amount of psychoacoustic analysis can be effectively reduced, and the calculation complexity can be further reduced. If the masking source comprises a tone masking source and a non-tone masking source, in the case that the tone masking source is contained in a critical frequency band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is acquired, and the tone masking source and/or the non-tone masking source are determined to be partial masking sources according to the frequency distance, so that the partial masking sources can be quickly and accurately selected from a plurality of masking sources, and the accuracy of psychoacoustic analysis can be effectively ensured based on the selected partial masking sources.
Fig. 4 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 4, the method may include, but is not limited to, the steps of:
s401: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a non-tonal masking source.
In some embodiments, the plurality of masking sources included in the audio signal may each be a non-tonal masking source, and different non-tonal masking sources may belong to the same or different critical bands.
S402: if the tone masking source is not included in the critical frequency band to which the non-tone masking source belongs, the non-tone masking source is determined to be a partial masking source.
In other embodiments, if the audio signal includes a plurality of non-tone masking sources, it may be analyzed whether a tone masking source is included in a critical frequency band to which each non-tone masking source belongs, if a tone masking source is not included, the non-tone masking source is directly determined to be a partial masking source, and if a tone masking source is included, whether the non-tone masking source is selected as a partial masking source is determined based on the method steps in the embodiment shown in fig. 3 described above, which is not limited.
S403: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
In some embodiments, where the tone masking source is not included within the critical frequency band to which the non-tone masking source belongs, after determining that the non-tone masking source is a partial masking source, the masking threshold of the audio signal may be analyzed based on the partial masking source.
In this embodiment, since a part of masking sources is selected from all masking sources of the audio signal to participate in analysis and calculation of the masking threshold, the calculation amount of psychoacoustic analysis can be effectively reduced, and the calculation complexity can be further reduced. Under the condition that the tone masking source is not contained in the critical frequency band to which the non-tone masking source belongs, the non-tone masking source is determined to be a partial masking source, so that the selection flexibility of the partial masking source can be effectively improved, the method is effectively applicable to the personalized distribution condition of the masking source, and is flexibly applicable to the psychoacoustic analysis of personalized audio signals.
Fig. 5 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 5, the method may include, but is not limited to, the steps of:
S501: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a tonal masking source and a non-tonal masking source.
S502: if a tone masking source is included within the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is obtained.
S503: a distance threshold is obtained.
Where the distance threshold may refer to a threshold value that determines whether a tone masking source and a non-tone masking source are selected as the frequency distances of the partial masking sources. The distance threshold, the unit may be a bark scale.
In some embodiments of the present disclosure, a set distance threshold may be obtained, so that the obtaining efficiency of the distance threshold may be effectively improved.
For example, in the embodiment of the present disclosure, the distance threshold may be set to a fixed value of 0.5bark, which is not limited.
In other embodiments of the present disclosure, a distance threshold corresponding to the critical frequency band may also be obtained, so that the distance threshold can be effectively adapted to the critical frequency band, and the selection accuracy of the tone masking source and the non-tone masking source is improved.
The distance threshold corresponding to the critical band is a critical band of the current selection of the partial masking source, for example, the critical band to which the non-tone masking source belongs in step S502, which is not limited.
For example, a corresponding distance threshold may be set for each critical band, without limitation.
In other embodiments of the present disclosure, the distance threshold may be further determined according to the frequency at which the critical frequency band is located, so as to improve flexibility of obtaining the distance threshold, and effectively adapt to psychoacoustic analysis of the personalized audio signal.
For example, in the embodiment of the present disclosure, the first 15 critical frequency bands may be selected to set the distance threshold to 0.3bark, and the last 10 critical frequency bands may be selected to set the distance threshold to 0.6bark, where the distance threshold is selected in relation to the frequency, and the higher the frequency the critical frequency band is located, the larger the threshold is selected.
For example, the frequency range of sounds perceived by the human ear is 20 Hertz (Hz) -20 kilohertz (KHz), and is most sensitive to sounds in the frequency range of 1KHz-3 KHz. Therefore, more masking source components can be reserved by adopting a smaller distance threshold value in a low-frequency critical frequency band, and a larger distance threshold value is selected in a high-frequency critical frequency band so as to remove more masking source components, thereby being more beneficial to subjective feeling of hearing.
S504: if the frequency distance is greater than or equal to the distance threshold, the tonal masking source and the non-tonal masking source are determined to be partial masking sources.
After the frequency distance between the tone masking source and the non-tone masking source is obtained and the distance threshold is obtained, it may also be determined that the tone masking source and the non-tone masking source are partial masking sources if the frequency distance is greater than or equal to the distance threshold, that is, in the critical frequency band where the tone masking source exists, it is determined whether the frequency distance between the position (bark scale) of the tone masking source and the critical frequency band noise masking source (non-tone masking source) in the critical frequency band is less than the set distance threshold, and if the frequency distance is greater than or equal to the distance threshold, the tone masking source and the non-tone masking source may be directly selected as partial masking sources, that is, the tone masking source and the non-tone masking source are selected to participate in the calculation of the masking threshold, which is not limited.
S505: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
In this embodiment, since a part of masking sources is selected from all masking sources of the audio signal to participate in analysis and calculation of the masking threshold, the calculation amount of psychoacoustic analysis can be effectively reduced, and the calculation complexity can be further reduced. Under the condition that the critical frequency band to which the non-tone masking source belongs contains the tone masking source, the frequency distance between the tone masking source and the non-tone masking source is acquired, the distance threshold is acquired, and under the condition that the frequency distance is larger than or equal to the distance threshold, the tone masking source and the non-tone masking source are determined to be part masking sources, so that the part masking sources can be accurately selected from a plurality of masking sources, and the accuracy of psychoacoustic analysis is improved.
Fig. 6 is a flow chart of another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 6, the method may include, but is not limited to, the steps of:
s601: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a tonal masking source and a non-tonal masking source.
S602: if a tone masking source is included within the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is obtained.
S603: a distance threshold is obtained.
S604: if the frequency distance is less than the distance threshold, determining that the tonal masking source and/or the non-tonal masking source is a partial masking source based on the frequency distance, the first sound pressure level of the tonal masking source, and the second sound pressure level of the non-tonal masking source.
In some embodiments, in a critical frequency band where a tone masking source exists, it is determined whether a frequency distance between a position (bark scale) of the tone masking source in the critical frequency band and a critical frequency band noise masking source (non-tone masking source) is less than a set distance threshold, and if the frequency distance is less than the distance threshold, a sound pressure level of the tone masking source and a sound pressure level of the non-tone masking source may be further calculated, and then, in combination with the frequency distance, the sound pressure level of the tone masking source, and the sound pressure level of the non-tone masking source, the tone masking source and/or the non-tone masking source is determined to be a partial masking source, which is not limited.
Wherein the sound pressure level of the tone masking source may be referred to as a first sound pressure level and the sound pressure level of the non-tone masking source may be referred to as a second sound pressure level.
The manner of calculating the sound pressure level of the tone masking source and the sound pressure level of the non-tone masking source can be referred to in the related art, and will not be described herein.
In some embodiments, whether to select a tone masking source and/or a non-tone masking source may be determined based on the frequency distance, a first sound pressure level of the tone masking source, and a second sound pressure level of the non-tone masking source in conjunction with a set rule; alternatively, a masking source selection model may also be incorporated to determine whether to select a tonal masking source and/or a non-tonal masking source; of course, any other possible manner of determining whether to select a tone masking source and/or a non-tone masking source may be used, without limitation.
S605: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
In this embodiment, since a part of masking sources is selected from all masking sources of the audio signal to participate in analysis and calculation of the masking threshold, the calculation amount of psychoacoustic analysis can be effectively reduced, and the calculation complexity can be further reduced. Under the condition that the critical frequency band to which the non-tone masking source belongs contains a tone masking source, acquiring a frequency distance between the tone masking source and the non-tone masking source, acquiring a distance threshold, and under the condition that the frequency distance is smaller than the distance threshold, determining the tone masking source and/or the non-tone masking source as part of the masking source according to the frequency distance, a first sound pressure level of the tone masking source and a second sound pressure level of the non-tone masking source, so as to accurately select part of the masking source from a plurality of masking sources, and improve the accuracy of psychoacoustic analysis.
Fig. 7a is a flow chart of yet another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 7a, the method may include, but is not limited to, the steps of:
s701a: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a tonal masking source and a non-tonal masking source.
S702a: if a tone masking source is included within the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is obtained.
S703a: a distance threshold is obtained.
S704a: if the frequency distance is less than the distance threshold, a first sound pressure level of the tone masking source and a second sound pressure level of the non-tone masking source are acquired.
In some embodiments, in a critical frequency band where a tone masking source exists, it is determined whether a frequency distance between a position (bark scale) of the tone masking source in the critical frequency band and a critical frequency band noise masking source (non-tone masking source) is less than a set distance threshold, and if the frequency distance is less than the distance threshold, a sound pressure level of the tone masking source and a sound pressure level of the non-tone masking source may be further calculated, and then, in combination with the frequency distance, the sound pressure level of the tone masking source, and the sound pressure level of the non-tone masking source, the tone masking source and/or the non-tone masking source is determined to be a partial masking source, which is not limited.
Wherein the sound pressure level of the tone masking source may be referred to as a first sound pressure level and the sound pressure level of the non-tone masking source may be referred to as a second sound pressure level.
The manner of calculating the sound pressure level of the tone masking source and the sound pressure level of the non-tone masking source can be referred to in the related art, and will not be described herein.
S705a: the tone masking source and the non-tone masking source are determined to be partial masking sources, wherein the first sound pressure level is equal to the second sound pressure level.
That is, in some possible embodiments, if the frequency distance is less than the distance threshold and the first sound pressure level of the tone masking source is the same as the second sound pressure level of the non-tone masking source, then both the tone masking source and the non-tone masking source may be selected to participate in the calculation of the masking threshold.
S706a: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
Therefore, in this embodiment, when the frequency distance is smaller than the distance threshold and the first sound pressure level of the tone masking source is the same as the second sound pressure level of the non-tone masking source, accurate selection of a part of masking sources from the plurality of masking sources is achieved, and accuracy of psychoacoustic analysis is improved.
Fig. 7b is a flow chart of yet another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 7b, the method may include, but is not limited to, the steps of:
s701b: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a tonal masking source and a non-tonal masking source.
S702b: if a tone masking source is included within the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is obtained.
S703b: a distance threshold is obtained.
S704b: if the frequency distance is less than the distance threshold, a first sound pressure level of the tone masking source and a second sound pressure level of the non-tone masking source are acquired.
S705b: and determining a tone masking source and/or a non-tone masking source as a partial masking source based on the first attenuation value, a first sound pressure level, and a second sound pressure level, wherein the first sound pressure level is greater than the second sound pressure level, the first attenuation value being an attenuation value of a masking capability of the tone masking source at the non-tone masking source location.
That is, in the case where the frequency distance between the tone masking source and the non-tone masking source in the critical frequency band is smaller than the distance threshold and the first sound pressure level of the tone masking source is greater than the second sound pressure level of the non-tone masking source, an attenuation value of the masking capability of the tone masking source at the non-tone masking source location, which may be referred to as a first attenuation value, may be calculated, and then the tone masking source and/or the non-tone masking source are determined to be partial masking sources based on the first attenuation value, the first sound pressure level, and the second sound pressure level.
In some embodiments, a first attenuation value for the masking capability of the tone masking source at the non-tone masking source location may be calculated based on the following formula:
one stronger masking component masks the approximate expression of the spread function at its neighboring locations as:
s (Δb) is in dB and Δb is in bark, indicating the distance from the masking source (an alternative example of a distance, which may be, for example, a frequency distance), with a value ranging from (-0.7 bark,0.7 bark).
In a prescribed range of values, the function image is shown in fig. 8, fig. 8 is a schematic diagram of a masking expansion function in the embodiment of the disclosure, s (Δb) can be approximated as a segment s ′ (Δb),s ′ The expression (Δb) is:
in some embodiments, one possible value of Δb, representing the frequency distance between a tone masking source and a non-tone masking source in the critical band, may be substituted into the above equation to calculate a first attenuation value s ′ And (delta b), wherein delta b takes the value of frequency distance. The above formula can determine how much the masking ability of the stronger masking component has changed at the location of the other masking component, s (Δb) can be calculated by only the distance on the bark scale.
In some embodiments of the present disclosure, when the step of determining that the tone masking source and/or the non-tone masking source is a partial masking source according to the first attenuation value, the first sound pressure level, and the second sound pressure level is performed, it may be determined that the tone masking source is a partial masking source if the sum value of the first sound pressure level and the first attenuation value is greater than the second sound pressure level, which is not limited.
In other embodiments of the present disclosure, when the step of determining that the tone masking source and/or the non-tone masking source are partial masking sources according to the first attenuation value, the first sound pressure level, and the second sound pressure level is performed, it may be determined that the tone masking source and the non-tone masking source are partial masking sources in a case where a sum value of the first sound pressure level and the first attenuation value is less than or equal to the second sound pressure level, which is not limited.
For example, the first sound level and the first attenuation value may be summed to obtain a sum value of the first sound level and the first attenuation value, and if the sum value of the first sound level and the first attenuation value is greater than the second sound level, the tone masking source is determined to be a partial masking source, and if the sum value of the first sound level and the first attenuation value is less than or equal to the second sound level, the tone masking source and the non-tone masking source are determined to be partial masking sources.
S706b: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
Therefore, in the embodiment of the disclosure, the tone masking source or the non-tone masking source can be selected as the partial masking source according to the attenuation condition of the masking capability of the tone masking source at the non-tone masking source position, so that the selection accuracy of the partial masking source is effectively improved, and the accuracy of psychoacoustic analysis is effectively improved.
Fig. 7c is a flow chart of yet another psychoacoustic analysis method provided by an embodiment of the present disclosure, which is performed by a communication system. The psychoacoustic analysis method in the present embodiment may be applied to a communication system, and is not limited thereto.
As shown in fig. 7c, the method may include, but is not limited to, the steps of:
S701c: determining a plurality of masking sources for the audio signal, wherein the masking sources comprise: a tonal masking source and a non-tonal masking source.
S702c: if a tone masking source is included within the critical band to which the non-tone masking source belongs, a frequency distance between the tone masking source and the non-tone masking source is obtained.
S703c: a distance threshold is obtained.
S704c: if the frequency distance is less than the distance threshold, a first sound pressure level of the tone masking source and a second sound pressure level of the non-tone masking source are acquired.
S705c: and determining a second attenuation value according to the frequency distance, and determining that the tone masking source and/or the non-tone masking source are part of the masking source according to the second attenuation value, the first sound pressure level and the second sound pressure level, wherein the second attenuation value is an attenuation value of the masking capability of the non-tone masking source at the tone masking source position, and the first sound pressure level is smaller than the second sound pressure level.
That is, in the case where the frequency distance between the tone masking source and the non-tone masking source in the critical frequency band is smaller than the distance threshold and the first sound pressure level of the tone masking source is smaller than the second sound pressure level of the non-tone masking source, an attenuation value of the masking ability of the non-tone masking source at the tone masking source position, which may be referred to as a second attenuation value, may be calculated, and then the tone masking source and/or the non-tone masking source are determined to be partial masking sources based on the second attenuation value, the first sound pressure level, and the second sound pressure level.
In some embodiments, a second attenuation value for the masking capability of the non-tone masking source at the tone masking source location may be calculated based on the above equation:
in some embodiments, one possible value of Δb, representing the frequency distance between a tone masking source and a non-tone masking source in the critical band, may be substituted into the above equation to calculate a second attenuation value s ′ And (delta b), wherein delta b takes the value of frequency distance. The above formula can determine how much the masking ability of the stronger masking component has changed at the location of the other masking component, s (Δb) can be calculated by only the distance on the bark scale.
In some embodiments, when the step of determining that the tone masking source and/or the non-tone masking source are part of the masking source according to the second attenuation value, the first sound pressure level, and the second sound pressure level is performed, the non-tone masking source may be determined to be part of the masking source in a case where the sum value of the second sound pressure level and the second attenuation value is greater than the first sound pressure level, without limitation.
In other embodiments, when the step of determining that the tone masking source and/or the non-tone masking source are partial masking sources according to the second attenuation value, the first sound pressure level, and the second sound pressure level is performed, the tone masking source and the non-tone masking source may be determined to be partial masking sources in a case where the sum value of the second sound pressure level and the second attenuation value is less than or equal to the first sound pressure level, which is not limited.
For example, the second sound level and the second attenuation value may be summed to obtain a sum of the second sound level and the second attenuation value, and if the sum of the second sound level and the second attenuation value is greater than the first sound pressure level, the non-tonal masking source is determined to be a partial masking source, and if the sum of the second sound level and the second attenuation value is less than or equal to the first sound pressure level, the tonal masking source and the non-tonal masking source are determined to be partial masking sources.
S706c: the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
Therefore, in the embodiment of the disclosure, the tone masking source or the non-tone masking source can be selected as the partial masking source according to the attenuation condition of the masking capability of the non-tone masking source at the tone masking source position, so that the selection accuracy of the partial masking source is effectively improved, and the accuracy of psychoacoustic analysis is effectively improved.
The method of selecting a partial masking source in the above embodiments of the present disclosure may also be referred to as a cross masking algorithm, and is exemplified as follows:
as shown in fig. 9, fig. 9 is a schematic diagram of an architecture of a psychoacoustic analysis method according to an embodiment of the present disclosure, a cross masking algorithm may be performed to select partial masking sources from a plurality of masking sources after extracting tonal masking sources and/or non-tonal masking sources of an audio signal, and a masking threshold is calculated based on the selected partial masking sources.
The computation flow of the cross masking is as follows:
(1) in the critical frequency band where the tone masking source exists, whether the distance between the position (bark scale) of the tone masking source in the critical frequency band and the position of the critical frequency band noise masking source is smaller than a set threshold value is judged, and if so, the next step is performed.
(2) A determination is made as to which sound pressure level of the tone masking source and the noise masking source is greater, the greater taking into account the attenuation s (Δb) of its masking ability at the location of the lesser. It is determined which of the larger sound pressure level + s (deltab) and the smaller sound pressure level is large, and if the former is large, the latter is masked, that is, in this case, the latter is excluded from the calculation of the final masking threshold.
(3) Repeating (1) (2) if there are other tone masking sources in the critical frequency band, otherwise, entering the critical frequency band where the tone masking source exists next.
In this embodiment, the threshold is set to a fixed value of 0.5bar, and the corresponding process flow is as follows.
(1) In a critical band where a tone masking source exists, it is judged whether or not the distance between the position of the tone masking source in the critical band (bark scale) and the position of the critical band noise masking source is less than + -0.5 bark, and if so, the next step is performed.
(2) A determination is made as to which sound pressure level of the tone masking source and the noise masking source is greater, the greater taking into account the attenuation of its masking capability at the location of the lesser. Judging the sound pressure level +s of the larger person ′ (Δb) and the smaller sound pressure level, the latter being masked if the former is large, i.e. in this case the latter will be excluded from the calculation of the final masking threshold.
(3) Repeating the processes (1) (2) if there are other tone signals in the critical frequency band, otherwise, entering the critical frequency band with tone signals.
As shown in fig. 10a and 10b, fig. 10a is a schematic diagram of experimental statistics when cross masking is turned on in the embodiment of the present disclosure, fig. 10b is a schematic diagram of experimental statistics when cross masking is turned off in the embodiment of the present disclosure, the time means the time required for each frame of encoder to run the psychoacoustic model, the time that the program has spent on the psychoacoustic model once per frame of output means the time spent altogether on the psychoacoustic model when the encoder finishes encoding. 10a and 10b are the number of clock timing units consumed in running 1476 frames of audio with cross masking on and cross masking off, respectively, it can be seen that, although the number of clock timing units required to run the program is floating, the cross masking on average is about 150 clocks faster than the cross masking off average.
In another embodiment of the present disclosure, it is contemplated that the frequency range of sound perceived by the human ear is 20Hz-20KHz and is most sensitive to sounds in the frequency range of 1KHz-3 KHz. Therefore, more masking source components are reserved by adopting a smaller distance threshold value in a low-frequency critical frequency band, and a larger distance threshold value is selected in a high-frequency critical frequency band so as to remove more masking source components, thereby being beneficial to subjective feeling of hearing.
In another embodiment of the present disclosure, the first 15 critical frequency bands may be selected to set a distance threshold of 0.3bark, and the last 10 critical frequency bands may be selected to set a distance threshold of 0.6bark, where the distance threshold is frequency dependent, and the higher the frequency the greater the distance threshold is selected.
The corresponding process flow is as follows.
(1) In a critical band where a tone masking source exists, it is determined whether or not the distance between the position of the tone masking source in the critical band (bark scale) and the position of the critical band noise masking source is smaller than a set distance threshold (the distance threshold is related to the critical band where the tone masking source exists), and if so, the next step is performed.
(2) A determination is made as to which sound pressure level of the tone masking source and the noise masking source is greater, the greater taking into account the attenuation of its masking capability at the location of the lesser. Judging the sound pressure level +s of the larger person ′ (Δb) and the smaller sound pressure level, the latter being masked if the former is large, that is to say the latter in this case being excluded from the calculation of the final masking threshold.
(3) Repeating (1) (2) if there are other tone masking sources in the critical frequency band, otherwise, entering the critical frequency band where the tone masking source exists next.
Therefore, in the embodiment of the disclosure, the number of unnecessary masking source components participating in the calculation of the masking threshold is reduced through the cross masking algorithm, so that the overall calculation complexity of the psychoacoustic model is reduced, the masking threshold is hardly changed, and the power consumption of the device can be effectively reduced.
Fig. 11 is a schematic structural diagram of a communication device according to an embodiment of the disclosure. The communication device 110 shown in fig. 11 may include a transceiver module 1101 and a processing module 1102. The transceiver module 1101 may include a transmitting module for implementing a transmitting function and/or a receiving module for implementing a receiving function, and the transceiver module 1101 may implement the transmitting function and/or the receiving function.
The communication device 110 may be a terminal device (such as the terminal device in the foregoing method embodiment), or may be a device in the terminal device, or may be a device that can be used in a matching manner with the terminal device. Alternatively, the communication apparatus 110 may be a network device (such as the network device in the foregoing method embodiment), or may be an apparatus in the network device, or may be an apparatus that can be used in a matching manner with the network device.
A communication device 110, the device comprising:
a processing module 1102 for determining a plurality of masking sources of an audio signal; the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
In this embodiment, by determining a plurality of masking sources of the audio signal, and then analyzing a masking threshold of the audio signal according to a portion of the masking sources, the computing amount of psychoacoustic analysis can be effectively reduced, and the computing complexity can be further reduced, because a portion of the masking sources are selected from all the masking sources of the audio signal to participate in the analysis and computation of the masking threshold.
Fig. 12 is a schematic structural diagram of another communication device according to an embodiment of the present disclosure. The communication device 120 may be a network device, a terminal device, a chip system, a processor, or the like that supports the network device to implement the above method, or a chip, a chip system, a processor, or the like that supports the terminal device to implement the above method. The device can be used for realizing the method described in the method embodiment, and can be particularly referred to the description in the method embodiment.
The communication device 120 may include one or more processors 1201. The processor 1201 may be a general purpose processor, a special purpose processor, or the like. For example, a baseband processor or a central processing unit. The baseband processor may be used to process communication protocols and communication data, and the central processor may be used to control communication devices (e.g., base stations, baseband chips, terminal equipment chips, DUs or CUs, etc.), execute computer programs, and process data of the computer programs.
Optionally, the communication device 120 may further include one or more memories 1202, on which a computer program 1204 may be stored, and the processor 1201 may store a computer program 1203, where the processor 1201 executes the computer program 1204 and/or the computer program 1203, so that the communication device 120 performs the method described in the above method embodiments. Optionally, the memory 1202 may also have data stored therein. The communication device 120 and the memory 1202 may be provided separately or may be integrated.
Optionally, the communication device 120 may further include a transceiver 1205, an antenna 1206. The transceiver 1205 may be referred to as a transceiver unit, transceiver circuitry, or the like, for implementing a transceiver function. The transceiver 1205 may include a receiver, which may be referred to as a receiver or a receiving circuit, etc., for implementing a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., for implementing a transmitting function.
Optionally, one or more interface circuits 1207 may also be included in the communication device 120. The interface circuit 1207 is configured to receive code instructions and transmit the code instructions to the processor 1201. The processor 1201 executes code instructions to cause the communication device 120 to perform the method described in the method embodiments described above.
In one implementation, a transceiver for implementing the receive and transmit functions may be included in the processor 1201. For example, the transceiver may be a transceiver circuit, or an interface circuit. The transceiver circuitry, interface or interface circuitry for implementing the receive and transmit functions may be separate or may be integrated. The transceiver circuit, interface or interface circuit may be used for reading and writing codes/data, or the transceiver circuit, interface or interface circuit may be used for transmitting or transferring signals.
In one implementation, the processor 1201 may store a computer program 1203, where the computer program 1203 runs on the processor 1201, and may cause the communication apparatus 120 to perform the method described in the above method embodiment. The computer program 1203 may be solidified in the processor 1201, in which case the processor 1201 may be implemented in hardware.
In one implementation, the communication device 120 may include circuitry that may implement the functions of transmitting or receiving or communicating in the foregoing method embodiments. The processors and transceivers described in this disclosure may be implemented on integrated circuits (integrated circuit, ICs), analog ICs, radio frequency integrated circuits RFICs, mixed signal ICs, application specific integrated circuits (application specific integrated circuit, ASIC), printed circuit boards (printed circuit board, PCB), electronic devices, and the like. The processor and transceiver may also be fabricated using a variety of IC process technologies such as complementary metal oxide semiconductor (complementary metal oxide semiconductor, CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (bipolar junction transistor, BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
The communication apparatus described in the above embodiment may be a network device or a terminal device, but the scope of the communication apparatus described in the present disclosure is not limited thereto, and the structure of the communication apparatus may not be limited by fig. 12. The communication means may be a stand-alone device or may be part of a larger device. For example, the communication device may be:
(1) A stand-alone integrated circuit IC, or chip, or a system-on-a-chip or subsystem;
(2) A set of one or more ICs, optionally including storage means for storing data, a computer program;
(3) An ASIC, such as a Modem (Modem);
(4) Modules that may be embedded within other devices;
(5) A receiver, a terminal device, an intelligent terminal device, a cellular phone, a wireless device, a handset, a mobile unit, a vehicle-mounted device, a network device, a cloud device, an artificial intelligent device, and the like;
(6) Others, and so on.
For the case where the communication device may be a chip or a chip system, reference may be made to the schematic structural diagram of the chip shown in fig. 13. The chip shown in fig. 13 includes a processor 1301 and an interface 1302. Wherein the number of processors 1301 may be one or more, and the number of interfaces 1302 may be a plurality.
For the case where the chip is used to implement the functionality of the communication system (e.g., including the terminal device and/or the network device) in embodiments of the present application:
Optionally, the chip further comprises a memory 1303, the memory 1303 being configured to store necessary computer programs and data.
Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block) and steps (step) described in connection with the embodiments of the disclosure may be implemented by electronic hardware, computer software, or combinations of both. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the functionality in a variety of ways for each particular application, but such implementation should not be construed as beyond the scope of the embodiments of the present disclosure.
The disclosed embodiments also provide a communication system comprising the communication device of the foregoing embodiment of fig. 11, or the system comprises the communication device of the foregoing embodiment of fig. 12.
The present disclosure also provides a readable storage medium having instructions stored thereon which, when executed by a computer, perform the functions of any of the method embodiments described above.
The present disclosure also provides a computer program product which, when executed by a computer, performs the functions of any of the method embodiments described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer programs. When the computer program is loaded and executed on a computer, the flow or functions in accordance with embodiments of the present disclosure are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer program may be stored in or transmitted from one computer readable storage medium to another, for example, a website, computer, server, or data center via a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) connection. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
Those of ordinary skill in the art will appreciate that: the various numbers of first, second, etc. referred to in this disclosure are merely for ease of description and are not intended to limit the scope of embodiments of this disclosure, nor to indicate sequencing.
At least one of the present disclosure may also be described as one or more, a plurality may be two, three, four or more, and the present disclosure is not limited. In the embodiment of the disclosure, for a technical feature, the technical features in the technical feature are distinguished by "first", "second", "third", "a", "B", "C", and "D", and the technical features described by "first", "second", "third", "a", "B", "C", and "D" are not in sequence or in order of magnitude.
The correspondence relationships shown in the tables in the present disclosure may be configured or predefined. The values of the information in each table are merely examples, and may be configured as other values, and the present disclosure is not limited thereto. In the case of the correspondence between the configuration information and each parameter, it is not necessarily required to configure all the correspondence shown in each table. For example, in the table in the present disclosure, the correspondence shown by some rows may not be configured. For another example, appropriate morphing adjustments, e.g., splitting, merging, etc., may be made based on the tables described above. The names of the parameters indicated in the tables may be other names which are understood by the communication device, and the values or expressions of the parameters may be other values or expressions which are understood by the communication device. When the tables are implemented, other data structures may be used, for example, an array, a queue, a container, a stack, a linear table, a pointer, a linked list, a tree, a graph, a structure, a class, a heap, a hash table, or a hash table.
Predefined in this disclosure may be understood as defining, predefining, storing, pre-negotiating, pre-configuring, curing, or pre-sintering.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it is intended to cover the scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
Claims (13)
1. A method of psychoacoustic analysis, the method comprising:
determining a plurality of masking sources of the audio signal;
the masking threshold of the audio signal is analyzed based on a portion of the masking sources of the plurality of masking sources.
2. The method of claim 1, wherein the masking source comprises at least one of:
a tone masking source;
a non-tonal masking source.
3. The method of any one of claims 1-2, wherein the method further comprises:
the partial masking source is determined from the plurality of masking sources.
4. A method as claimed in any one of claims 1 to 3, wherein the masking source comprises: a tonal masking source and a non-tonal masking source; wherein determining the partial masking source from the plurality of masking sources comprises:
if the tone masking source is contained in the critical frequency band to which the non-tone masking source belongs, acquiring a frequency distance between the tone masking source and the non-tone masking source;
and determining the tone masking source and/or the non-tone masking source as the partial masking source according to the frequency distance.
5. The method of claim 4, wherein the determining that the tone masking source and/or the non-tone masking source is the partial masking source based on the frequency distance comprises:
Acquiring a distance threshold;
determining that the tone masking source and the non-tone masking source are the partial masking sources if the frequency distance is greater than or equal to the distance threshold;
and if the frequency distance is smaller than the distance threshold, determining that the tone masking source and/or the non-tone masking source is the partial masking source according to the frequency distance, the first sound pressure level of the tone masking source and the second sound pressure level of the non-tone masking source.
6. The method of claim 5, wherein the acquisition distance threshold comprises at least one of:
acquiring a set distance threshold;
acquiring a distance threshold corresponding to the critical frequency band;
and determining a distance threshold according to the frequency of the critical frequency band.
7. The method of any of claims 5-6, wherein the determining that the tone masking source and/or the non-tone masking source is the partial masking source based on the frequency distance, a first sound pressure level of the tone masking source, and a second sound pressure level of the non-tone masking source comprises at least one of:
determining the tonal masking source and the non-tonal masking source as the partial masking source, wherein the first sound pressure level is equal to the second sound pressure level;
Determining a first attenuation value according to the frequency distance, and determining the tone masking source and/or the non-tone masking source as the partial masking source according to the first attenuation value, the first sound pressure level and the second sound pressure level, wherein the first sound pressure level is greater than the second sound pressure level, and the first attenuation value is an attenuation value of masking capability of the tone masking source at the non-tone masking source position;
determining a second attenuation value according to the frequency distance, and determining the tone masking source and/or the non-tone masking source as the partial masking source according to the second attenuation value, the first sound pressure level and the second sound pressure level, wherein the second attenuation value is an attenuation value of masking capability of the non-tone masking source at the tone masking source position, and the first sound pressure level is smaller than the second sound pressure level.
8. The method of claim 7, wherein the determining that the tone masking source and/or the non-tone masking source is the partial masking source based on the first attenuation value, the first sound pressure level, and the second sound pressure level comprises at least one of:
Determining the tone masking source as the partial masking source if a sum of the first sound level and the first attenuation value is greater than the second sound level;
if the sum of the first sound level and the first attenuation value is less than or equal to the second sound level, determining the tonal masking source and the non-tonal masking source as the partial masking source.
9. The method of claim 7, wherein the determining that the tone masking source and/or the non-tone masking source is the partial masking source based on the second attenuation value, the first sound pressure level, and the second sound pressure level comprises at least one of:
determining the non-tone masking source as the partial masking source if the sum of the second sound pressure level and the second attenuation value is greater than the first sound pressure level;
and if the sum of the second sound pressure level and the second attenuation value is less than or equal to the first sound pressure level, determining the tone masking source and the non-tone masking source as the partial masking source.
10. A method as claimed in any one of claims 1 to 3, wherein the masking source comprises: a non-tonal masking source; wherein determining the partial masking source from the plurality of masking sources comprises:
If the tone masking source is not included in the critical frequency band to which the non-tone masking source belongs, the non-tone masking source is determined to be the partial masking source.
11. A communication device, the device comprising:
and the processing module is used for determining a plurality of masking sources of the audio signal and analyzing a masking threshold of the audio signal according to part of the masking sources in the plurality of masking sources.
12. A communication system, characterized in that the communication system comprises a network device and a terminal device, the network device performing the method according to any of claims 1-10 and/or the terminal device performing the method according to any of claims 1-10.
13. A computer readable storage medium storing instructions that, when executed, cause the method of any one of claims 1-10 to be implemented.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2023/076994 WO2024168922A1 (en) | 2023-02-17 | 2023-02-17 | Psychoacoustic analysis method, apparatus, device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116391226A true CN116391226A (en) | 2023-07-04 |
Family
ID=86979193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202380008348.2A Pending CN116391226A (en) | 2023-02-17 | 2023-02-17 | Psychoacoustic analysis method, device, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116391226A (en) |
WO (1) | WO2024168922A1 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980072457A (en) * | 1997-03-05 | 1998-11-05 | 이준우 | Signal processing method and apparatus therefor in psychoacoustic sound when compressing audio signal |
US6801886B1 (en) * | 2000-06-22 | 2004-10-05 | Sony Corporation | System and method for enhancing MPEG audio encoder quality |
SG135920A1 (en) * | 2003-03-07 | 2007-10-29 | St Microelectronics Asia | Device and process for use in encoding audio data |
CN110265046B (en) * | 2019-07-25 | 2024-05-17 | 腾讯科技(深圳)有限公司 | Encoding parameter regulation and control method, device, equipment and storage medium |
-
2023
- 2023-02-17 WO PCT/CN2023/076994 patent/WO2024168922A1/en unknown
- 2023-02-17 CN CN202380008348.2A patent/CN116391226A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2024168922A1 (en) | 2024-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109379653A (en) | Audio transmission method and device, electronic equipment and storage medium | |
CN104980788B (en) | Video encoding/decoding method and device | |
CN113421578B (en) | Audio processing method, device, electronic equipment and storage medium | |
US20210335378A1 (en) | Encoding parameter adjustment method and apparatus, device, and storage medium | |
CN109147818A (en) | Acoustic feature extraction method and device, storage medium and terminal equipment | |
CN109119093A (en) | Voice noise reduction method and device, storage medium and mobile terminal | |
KR20130066563A (en) | Sending device | |
US11463071B2 (en) | Asymmetrical filtering to improve GNSS performance in presence of wideband interference | |
CN105027540B (en) | Echo suppressing | |
CN102543097A (en) | Denoising method and equipment | |
CN108449506B (en) | Voice call data processing method and device, storage medium and mobile terminal | |
CN108922558B (en) | A voice processing method, voice processing device and mobile terminal | |
EP3375195A1 (en) | Annoyance noise suppression | |
CN103282960A (en) | Voice control device, voice control method and voice control program | |
CN117153181B (en) | Speech noise reduction method, device and storage medium | |
CN111863011B (en) | Audio processing method and electronic equipment | |
CN104917994A (en) | Audio and video calling system and method | |
CN110517708B (en) | Audio processing method and device and computer storage medium | |
KR20170029624A (en) | Acoustical signal processing method and device of communication device | |
US12149919B2 (en) | Automatic acoustic handoff | |
CN113808566B (en) | Vibration noise processing method and device, electronic equipment and storage medium | |
CN108449492B (en) | Voice call data processing method, device, storage medium and mobile terminal | |
CN116391226A (en) | Psychoacoustic analysis method, device, equipment and storage medium | |
CN112992170A (en) | Model training method and device, storage medium and electronic device | |
CN116665692B (en) | Voice noise reduction method and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |