CN112929731B - Multimedia switch system - Google Patents
Multimedia switch system Download PDFInfo
- Publication number
- CN112929731B CN112929731B CN202110508270.3A CN202110508270A CN112929731B CN 112929731 B CN112929731 B CN 112929731B CN 202110508270 A CN202110508270 A CN 202110508270A CN 112929731 B CN112929731 B CN 112929731B
- Authority
- CN
- China
- Prior art keywords
- audio
- signal
- multimedia switch
- multimedia
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005236 sound signal Effects 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 13
- 230000001360 synchronised effect Effects 0.000 claims description 10
- 230000009467 reduction Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000005538 encapsulation Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 abstract description 16
- 230000000694 effects Effects 0.000 abstract description 7
- 230000009471 action Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 238000001914 filtration Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 244000261422 Lysimachia clethroides Species 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4398—Processing of audio elementary streams involving reformatting operations of audio signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/08—Arrangements for producing a reverberation or echo sound
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4305—Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440218—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/643—Communication protocols
- H04N21/6437—Real-time Transport Protocol [RTP]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephone Function (AREA)
Abstract
The application relates to the field of multimedia switches and discloses a multimedia switch system, which comprises: the system comprises a multimedia switch and N paths of audio acquisition ends; the multimedia switch is configured to: calculating an energy ratio D before and after audio signal speech enhancement in each audio data frame, and calculating a signal-to-noise ratio according to the audio data frames with the D value smaller than a preset threshold; sending exchange information including signal-to-noise ratio to the N paths of audio acquisition ends, wherein the signal-to-noise ratio is estimated according to audio signals from the same audio acquisition end received by the exchanger in the previous period; the audio collection end is configured to: receiving exchange information sent by a multimedia switch, and if the signal-to-noise ratio in the exchange information is smaller than a preset threshold, performing amplitude increase adjustment on a signal in a preset frequency band to increase the signal of the voice component; the audio signal is sent to the multimedia switch. The multimedia switch system reduces the transmission time delay of the audio data, improves the multi-channel audio acquisition effect and saves the hardware cost.
Description
Technical Field
The application relates to the field of multimedia switches, in particular to a multimedia switch system.
Background
In order to realize teaching scenes such as normalized multimedia teaching, recorded broadcast teaching, online classroom and remote interactive teaching in a traditional classroom, more equipment needs to be installed, the system structure is complex, and centralized processing and management of various data in the classroom are not facilitated.
In addition, in the existing multi-channel audio acquisition and audio data transmission mode, an audio cable transmission mode is adopted after the acquisition of an analog microphone, although the transmission delay is low, the influence of different multi-channel audio acquisition delays and fast attenuation of distance transmission audio signals is avoided, and the subsequent processing of multi-channel audio signal mixing, enhancement, audio and video synthesis and the like is not facilitated; the digital microphone collects the data transmitted by the Ethernet, and although the transmission distance is long and the line deployment is simple, the Ethernet transmission has the problems of network congestion, large time delay jitter and the like, so that the digital microphone is not suitable for the application with higher real-time requirement such as local sound amplification and the like.
Disclosure of Invention
The application provides a multimedia switch system, and the first purpose is to improve the sound mixing effect of a multipath channel and avoid sound breaking.
The second purpose is to solve the problems of network congestion, high noise and time delay jitter during multi-channel audio transmission, effectively improve the transmission efficiency and the audio quality of audio data and reduce the transmission time delay.
The application provides a multimedia switch system, including:
the system comprises a multimedia switch and N audio acquisition ends, wherein N is an integer greater than or equal to 2;
the audio acquisition end is configured to acquire an audio signal through a microphone;
the multimedia switch is configured to:
acquiring an audio signal from the audio acquisition end in the form of an audio data frame, and performing voice enhancement and coding to obtain an audio stream;
calculating an energy ratio D before and after audio signal speech enhancement in each audio data frame, and calculating a signal-to-noise ratio according to the audio data frames with the D value smaller than a preset threshold;
sending exchange information including the signal-to-noise ratio to the N audio acquisition ends, wherein the signal-to-noise ratio is estimated according to the audio signals from the same audio acquisition end received by the multimedia switch in the previous period;
the audio collection end is configured to:
receiving the exchange information sent by the multimedia exchange, and if the signal-to-noise ratio in the exchange information is smaller than a preset threshold, performing amplitude increase adjustment on a signal in a preset frequency band to increase a signal of a voice component;
and sending the audio signal to the multimedia switch.
In one embodiment, the system further comprises M video acquisition ends, wherein M is an integer greater than or equal to 1;
the multimedia switch is further configured to:
in K time slices of the same operation period, respectively sending exchange information comprising clock synchronization information and signal-to-noise ratio to the N paths of audio acquisition ends;
receiving and coding the video signals from the M paths of video acquisition ends to obtain video streams;
encapsulating the audio stream and the video stream and time-stamping the audio stream and the video stream to ensure synchronicity;
the video acquisition terminal is configured to acquire the video signal and transmit the video signal into the multimedia switch.
In one embodiment, the multimedia switch is configured to:
deleting audio data frames with Zn smaller than a preset first threshold and Mn smaller than a preset second threshold, wherein:,;
wherein Sin represents an audio signal, sgn is a function of a symbol, j is an audio acquisition end number, i is an audio data frame sample number, H is an audio data frame sample number, and η is a preset compensation factor.
In one embodiment, the energy ratio D before and after speech enhancement of the audio signal is calculated by:
s (i) represents an ith frame original signal of the audio acquisition end, so (i) represents a signal which is transmitted to the multimedia switch by the ith frame and is output after speech enhancement;
if the value D is larger than a preset threshold value, the energy change before and after the voice enhancement processing is larger than the preset threshold value, and the ith frame is a silent section; if the value D is smaller than a preset threshold value, the energy change before and after the voice enhancement processing is smaller than the preset threshold value, and the ith frame is a voice section;
after the voice segment data is determined, taking F frame voice segment data as an analysis sample, and calculating a signal-to-noise ratio, wherein the signal-to-noise ratio is based on the following formula:
where SNR represents the signal-to-noise ratio.
In one embodiment, the enhancement processing of the received audio signal by the multimedia switch comprises: noise reduction, echo cancellation, howling suppression, and automatic gain.
In one embodiment, the operation of the N-way audio capturing end further comprises:
storing the acquired audio data into an input storage area of each path of audio acquisition end;
when the data memory amount of the input storage area reaches a set first threshold value, storing and transferring the audio data in the input storage area to an output storage area;
and sending the audio data in the output storage area to the multimedia switch as the audio signal.
In one embodiment, the N audio acquisition ends calculate the time difference between the clock of the acquisition end and the synchronous clock according to the time interval of receiving the exchange information between adjacent periods;
and when the time difference is larger than a set second threshold value, calibrating the clock of the acquisition end into a synchronous clock.
In one embodiment, the M-channel video acquisition terminal is connected to the multimedia switch through an HDMI interface, and the multimedia switch acquires pixel data in YUV format and encodes the pixel data into the video stream by using an encoder.
In one embodiment, the multimedia switch encapsulates the audio stream and the video stream, further comprising:
and (5) carrying out FLV format encapsulation and pushing to the server through an RTMP protocol.
In one embodiment, the multimedia switch comprises:
the processor module comprises an audio acquisition and output interface, a video acquisition and output interface, a network interface and an external equipment interface;
the FPGA audio acquisition module is connected with the processor module;
the audio compiling module comprises an analog audio acquisition equipment interface, a digital audio acquisition equipment interface and an audio output interface, and is connected with the audio acquisition module;
an audio processing module coupled with the audio acquisition module.
In the embodiment of the application, compared with the prior art, the sound mixing effect can be improved, and the sound breaking is avoided.
In addition, multichannel audio acquisition end wheel flow is uploaded audio data to the switch in an orderly manner, and each way audio acquisition end has calibrated the local clock according to the signal, the transmission delay of multichannel audio acquisition end has been reduced, the transmission conflict between the multichannel audio acquisition end has been avoided, and then for adopt multimedia switch and audio acquisition end mutually support the pronunciation reinforcing that realizes the audio acquisition end and provide the basis, transmit multichannel audio signal to multimedia switch high quality ground, to the reduction of original sound when reducing the noise reduction filtering in the data processing of later stage switch, multichannel audio acquisition effect has been improved. Compared with the method for performing voice enhancement only at the audio acquisition end, the method has the advantages that the audio acquisition end and the switch are matched with each other to realize voice enhancement of the audio acquisition end, the audio acquisition end only needs to perform voice enhancement according to the calculation result received from the multimedia switch, a large amount of calculation in the voice enhancement is performed at the end of the switch, the transmission time delay of audio data is reduced, the multi-channel audio acquisition effect is improved, and the hardware cost is saved.
And audio and video are packaged in the switch, so that the synchronization of the audio and video is ensured.
The present specification describes a number of technical features distributed throughout the various technical aspects, and if all possible combinations of technical features (i.e. technical aspects) of the present specification are listed, the description is made excessively long. In order to avoid this problem, the respective technical features disclosed in the above summary of the invention of the present application, the respective technical features disclosed in the following embodiments and examples, and the respective technical features disclosed in the drawings may be freely combined with each other to constitute various new technical solutions (which are considered to have been described in the present specification) unless such a combination of the technical features is technically infeasible. For example, in one example, the feature a + B + C is disclosed, in another example, the feature a + B + D + E is disclosed, and the features C and D are equivalent technical means for the same purpose, and technically only one feature is used, but not simultaneously employed, and the feature E can be technically combined with the feature C, then the solution of a + B + C + D should not be considered as being described because the technology is not feasible, and the solution of a + B + C + E should be considered as being described.
Drawings
Fig. 1 is a schematic diagram of a basic structure of a multimedia switch system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a multi-channel audio acquisition network architecture (using a handle) according to an embodiment of the present application;
fig. 3 is a schematic diagram of audio-video synthesis editing according to an embodiment of the present application;
figure 4 is a multimedia switch module schematic according to one embodiment of the present application.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application may be implemented without these technical details and with various changes and modifications based on the following embodiments.
The present application relates to some of the following terms:
HDMI: high Definition Multimedia Interface (High Definition Multimedia Interface)
PCM: pulse Code Modulation (Pulse Code Modulation)
AAC: advanced Audio Coding (Advanced Audio Coding)
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the present application relates to a multimedia switch system, including:
the system comprises a multimedia switch and N paths of audio acquisition ends, wherein N is an integer greater than or equal to 2;
the audio acquisition end is configured to acquire an audio signal through the microphone.
The multimedia switch is configured to:
the audio signal is acquired from the audio acquisition terminal in the form of audio data frames.
Deleting audio data frames with Zn smaller than a preset first threshold and Mn smaller than a preset second threshold, wherein:,;
wherein Sin represents an audio signal, sgn is a function of a symbol, j is an audio acquisition end number, i is an audio data frame sample number, H is an audio data frame sample number, and η is a preset compensation factor.
Optionally, in an embodiment, the multimedia switch calculates an average amplitude of a signal acquired by each audio in a plurality of time slices, and each audio with the largest average amplitude is taken as an output for each of the time slices, and finally, the audio data in the plurality of time slices is synthesized and output as mixed sound, where the step is completed by an FPGA module in the multimedia switch.
Optionally, in an embodiment, the video processing system further includes M video capturing ends, where M is an integer greater than or equal to 1;
the video acquisition terminal is configured to acquire a video signal and transmit the video signal into the multimedia switch. The N paths of video acquisition ends are accessed into the multimedia switch through the HDMI, and then the multimedia switch acquires pixel data in a YUV format and encodes the pixel data into video streams by using an encoder.
The audio acquisition terminal is configured to receive the switching information sent by the multimedia switch and then perform the following operations:
firstly, sending an audio signal to a multimedia switch, calibrating a clock of a collection end according to clock synchronization information in the switching information, and performing voice enhancement according to a signal-to-noise ratio in the switching information.
The multimedia switch is further configured to:
in K time slices of the same operation period, switching information comprising clock synchronization information and signal-to-noise ratio is sent to N paths of audio acquisition ends respectively, and the signal-to-noise ratio is estimated according to audio signals from the same audio acquisition end and received by an exchanger in the previous period.
Receiving the audio signals from the N audio acquisition ends, and performing voice enhancement and coding to obtain audio streams. For example, the enhancement processing performed on the audio signal may include one or any combination of the following: noise reduction, echo cancellation, howling suppression, and automatic gain.
And thirdly, receiving and coding the video signals from the M paths of video acquisition ends to obtain video streams.
And fourthly, encapsulating the audio stream and the video stream, and time-stamping the audio stream and the video stream to ensure synchronism.
Optionally, in an embodiment, the specific steps of synthesizing and encapsulating are as follows, and an audio/video synthesis editing schematic diagram 3 shows:
(1) multi-channel video acquisition and coding: firstly, a PC desktop screen signal is accessed into a multimedia switch through an HDMI interface to obtain pixel data in a YUV format, and the YUV data is encoded into an H264 video stream by using a video encoder of libx 264; and secondly, accessing the video data of the camera into the multimedia switch through the Ethernet port to acquire the RTSP video stream and the H264 code.
(2) And audio acquisition and coding process: firstly, an audio compiler is adopted to carry out PCM coding on an audio signal; and secondly, transmitting the audio-mixed and voice-enhanced audio to a CPU module for ACC coding.
(3) And finally, carrying out FLV format encapsulation on the obtained H264 video stream and AAC audio stream, and pushing the obtained H264 video stream and AAC audio stream to a server through an RTMP protocol. And time stamps are respectively stamped on the audio data and the video data, so that the audio and video time synchronism is ensured.
When the CPU receives the video data frame and the audio PCM code, the CPU marks a time stamp, and the specific time stamp marks are as follows:
video time stamping: pts = inc + + (1000/fps), where inc is static, has an initial value of 0, adds 1 to each time the timestamp inc is done, and fps is the frame rate.
Audio time stamping: pts = inc + + (frame _ size 1000/sample _ rate); where frame _ size is the frame length, sample _ rate is the sample rate.
Optionally, in an embodiment, as shown in fig. 2, the specific multi-channel audio capturing step includes:
the multimedia switch sends the switching information at regular time: the multimedia switch distributes port numbers for each path of audio acquisition end in the network, generates switching information carrying a local clock in a timed (periodic) manner, and sends the switching information to the corresponding audio acquisition ends in order.
Secondly, the N paths of audio acquisition ends calculate the time difference between the clock of the acquisition end and the synchronous clock according to the time interval of receiving the exchange information between adjacent periods; and when the time difference is greater than a set second threshold value, calibrating the clock of the acquisition end into a synchronous clock. Optionally, the audio acquisition end calibrates the local clock according to the exchange information: and each audio acquisition end acquires clock message information according to the received exchange information, calculates the time deviation between the synchronous clock and the local clock and calibrates the local clock.
Thirdly, the audio acquisition end acquires and caches data: and each audio acquisition end acquires voice data through a high-sensitivity microphone pickup head, and caches the voice data to a local cache region after AD conversion.
Fourthly, the audio acquisition end sends data to the multimedia exchange end: and when each path of audio acquisition end receives the exchange information sent by the multimedia switch, the audio data in the buffer area is loaded to the uplink data packet and sent to the multimedia switch end.
Optionally, in an embodiment, the audio acquisition end stores the acquired audio data in an input storage area of each channel of the audio acquisition end; when the data memory amount of the input storage area reaches a set first threshold value, storing and transferring the audio data in the input storage area to an output storage area; and sending the audio data in the output storage area to the multimedia exchange as an audio signal.
Fifthly, caching audio data at the multimedia exchange end: the multimedia switch provides independent data buffer areas for each port, and after uplink data packets sent by each audio acquisition end are received, the data packets are respectively buffered to the corresponding data buffer areas.
Optionally, in an embodiment, each audio acquisition end is responsible for acquiring, buffering, and sending an original audio signal; the multimedia switch end is responsible for realizing clock synchronization control of each path of audio acquisition end in the network and receiving, caching and processing of audio and video data (the part is finished by an FPGA module in the multimedia switch). Wherein, the multimedia exchange and the audio acquisition end adopt 100M/1000M synchronous Ethernet connection.
Optionally, in an embodiment, the system applies an ATM technology, and establishes a channel for each audio acquisition end in a time division multiplexing manner, so as to ensure that each audio acquisition end in the network performs data transmission efficiently and orderly; meanwhile, a clock network synchronization technology is adopted, so that the consistency of the clock frequency of the audio acquisition and the receiving end is ensured, the time delay jitter is greatly reduced, and the accuracy and error code free of data acquisition and transmission are ensured. The basic structure diagram of the multimedia switch system is shown in fig. 1.
Optionally, in an embodiment, the multi-channel audio acquisition network structure adopts a pull handle manner, as shown in fig. 2. The multimedia switch and the audio acquisition end are connected and communicated in a bus mode, the audio acquisition end is connected and communicated in a handle mode, the single end and the multiple points are simultaneously connected into the multimedia switch, the communication rate is 10M, the clock synchronization is accurate, the time delay is low, and no codes exist.
Optionally, in one embodiment, the multimedia switch supports multiple stereo audio interfaces and multiple RJ45 network interface digital audio inputs. The sound source of the stereo input comprises a linear input line-in, a gooseneck microphone or a wireless microphone and the like, and is processed by an audio compiler (mainly comprising filtering, amplifying AD conversion and the like) and connected with a central processing unit (namely a programmable logic device FPGA) through an I2S interface; the multi-path digital microphone input passes through a physical layer transceiver PHY module and then is connected with the central processing unit through a medium communication MII interface.
Optionally, in an embodiment, according to the signal-to-noise ratio of each audio signal fed back by the multimedia switch, each audio acquisition end performs pre-enhancement processing on the front-end audio signal, and then combines with the back-end speech enhancement processing of the multimedia switch, so that the loss of sound field and line transmission can be better compensated, the quality of the original signal is improved, the reduction of the back-end noise reduction filtering processing on the original sound is reduced, and the speech enhancement effect is improved. The method comprises the following concrete steps:
firstly, the multimedia exchange end estimates the signal-to-noise ratio of an original signal: the multimedia switch estimates the signal-to-noise ratio of each audio signal by adopting a wiener filtering method for the original audio signal according to the data uploaded by each audio acquisition end, and simultaneously feeds the value back to the audio acquisition end through a downlink synchronous cell, wherein the step is completed by an FPGA module in the multimedia switch.
Secondly, the audio acquisition end performs voice enhancement: the method comprises the steps of obtaining signal-to-noise ratio (s/n) information according to a synchronous cell sent by a multimedia switch end, and when the signal-to-noise ratio is smaller than a preset reference value, carrying out amplitude increase adjustment on a signal in a preset frequency band to increase the signal-to-noise ratio of a signal of a human voice s component.
Calculating the energy ratio D of the audio signal before and after speech enhancement:
wherein, S (i) represents the ith frame original signal of the audio acquisition end, so (i) represents the signal output after the ith frame is transmitted to the multimedia exchange and is subjected to voice enhancement.
If the value D is larger than the preset threshold value, the energy change before and after the voice enhancement processing is larger than the preset threshold value, and the ith frame is a silence section; if the value D is smaller than the preset threshold value, the energy change before and after the voice enhancement processing is smaller than the preset threshold value, and the ith frame is a voice section.
After the voice segment data is determined, taking F frame voice segment data as an analysis sample, and calculating a signal-to-noise ratio based on the following formula:
where SNR represents the signal-to-noise ratio.
And when the signal-to-noise ratio in the exchange information is smaller than a set third threshold value, the N paths of audio acquisition ends adjust the preset frequency band signals to a preset amplitude in later acquisition.
Thirdly, the multimedia exchange end carries out voice enhancement: and (3) carrying out noise reduction, echo elimination, howling inhibition and automatic gain processing (a general method is completed by a DSP module in a multimedia switch) on the audio data after sound mixing to realize a voice enhancement effect, wherein the processed audio output can be used for audio and video synthesis editing or local sound amplification.
Optionally, in an embodiment, the multimedia switch includes:
the processor module comprises an audio acquisition and output interface, a video acquisition and output interface, a network interface and an external equipment interface;
the FPGA audio acquisition module is connected with the processor module;
the audio compiling module comprises an analog audio acquisition equipment interface, a digital audio acquisition equipment interface and an audio output interface, and is connected with the audio acquisition module;
and the audio processing module is connected with the audio acquisition module.
Optionally, in an embodiment, the multimedia switch includes a CPU processor, a DSP chip, an FPGA chip, an audio compiling module, various data interfaces, and other main modules, so as to implement multi-channel audio and video acquisition, audio mixing, speech enhancement, and audio and video synthesis processing. A multimedia switch block diagram is shown in figure 4.
It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. The expression of a plurality of, a plurality of and the like includes 2, 2 and more than 2, more than 2 and more than 2.
All documents mentioned in this application are to be considered as being incorporated in their entirety into the disclosure of this application so as to be subject to modification as necessary. It should be understood that the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.
Claims (9)
1. A multimedia switch system, comprising:
the system comprises a multimedia switch and N audio acquisition ends, wherein N is an integer greater than or equal to 2;
the audio acquisition end is configured to acquire an audio signal through a microphone;
the multimedia switch is configured to:
acquiring the audio signal from the audio acquisition end in the form of an audio data frame, and performing voice enhancement and coding to obtain an audio stream;
calculating an energy ratio D before and after the audio signal speech enhancement in each audio data frame, and calculating a signal-to-noise ratio according to the audio data frame with the D value smaller than a preset threshold value;
the energy ratio D of the audio signal before and after the voice enhancement is calculated by the following method:
wherein, s (i) represents an original signal in an ith frame of the audio acquisition end, so (i) represents a signal output after the ith frame is transmitted to the multimedia switch and speech enhancement is performed, and H is a sample number in an audio data frame;
if the value D is larger than a preset threshold value, the energy change before and after the voice enhancement processing is larger than the preset threshold value, and the ith frame is a silent section; if the value D is smaller than a preset threshold value, the energy change before and after the voice enhancement processing is smaller than the preset threshold value, and the ith frame is a voice section;
after the voice segment data is determined, taking F frame voice segment data as an analysis sample, and calculating a signal-to-noise ratio, wherein the signal-to-noise ratio is based on the following formula:
wherein SNR represents a signal-to-noise ratio;
sending exchange information including the signal-to-noise ratio to the N audio acquisition ends, wherein the signal-to-noise ratio is estimated according to the audio signals from the same audio acquisition end received by the multimedia switch in the previous period;
the audio collection end is configured to:
receiving the exchange information sent by the multimedia exchange, and if the signal-to-noise ratio in the exchange information is smaller than a preset threshold, performing amplitude increase adjustment on a signal in a preset frequency band to increase a signal of a voice component;
and sending the audio signal to the multimedia switch.
2. The multimedia switch system of claim 1, further comprising M video capture ports, wherein M is an integer greater than or equal to 1;
the multimedia switch is further configured to:
in K time slices of the same operation period, respectively sending exchange information comprising clock synchronization information and the signal-to-noise ratio to the N paths of audio acquisition ends;
receiving and coding the video signals from the M paths of video acquisition ends to obtain video streams;
encapsulating the audio stream and the video stream and time-stamping the audio stream and the video stream to ensure synchronicity;
the video acquisition terminal is configured to acquire the video signal and transmit the video signal into the multimedia switch.
3. The multimedia switch system of claim 1, wherein the multimedia switch is further configured to:
deleting audio data frames with Zn smaller than a preset first threshold and Mn smaller than a preset second threshold, wherein:,;
wherein Sin represents an audio signal, sgn is a function of a symbol, j is an audio acquisition end number, i is an audio data frame sample number, H is an audio data frame sample number, and η is a preset compensation factor.
4. The multimedia switch system of claim 1, wherein the multimedia switch performs enhancement processing on the received audio signal comprising one or any combination of: noise reduction, echo cancellation, howling suppression, and automatic gain.
5. The multimedia switch system of claim 1, wherein the operations of the N-way audio capture port further comprise:
storing the acquired audio data into an input storage area of each path of audio acquisition end;
when the data memory amount of the input storage area reaches a set first threshold value, storing and transferring the audio data in the input storage area to an output storage area;
and sending the audio data in the output storage area to the multimedia switch as the audio signal.
6. The multimedia switch system of claim 2, wherein the N-way audio acquisition end calculates a time difference between the own acquisition end clock and the synchronous clock according to a time interval of receiving the switching information between adjacent periods;
and when the time difference is larger than a set second threshold value, calibrating the clock of the acquisition end into a synchronous clock.
7. The multimedia switch system of claim 2, wherein the M-way video capture ports access the multimedia switch via an HDMI interface, and the multimedia switch obtains pixel data in YUV format and encodes the pixel data into the video stream using an encoder.
8. The multimedia switch system of claim 2, wherein the multimedia switch encapsulates the audio stream and the video stream, further comprising:
and (5) carrying out FLV format encapsulation and pushing to the server through an RTMP protocol.
9. The multimedia switch system of claim 2, wherein the multimedia switch comprises:
the processor module comprises an audio acquisition and output interface, a video acquisition and output interface, a network interface and an external equipment interface;
the FPGA audio acquisition module is connected with the processor module;
the audio compiling module comprises an analog audio acquisition equipment interface, a digital audio acquisition equipment interface and an audio output interface, and is connected with the audio acquisition module;
an audio processing module coupled with the audio acquisition module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110508270.3A CN112929731B (en) | 2021-05-11 | 2021-05-11 | Multimedia switch system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110508270.3A CN112929731B (en) | 2021-05-11 | 2021-05-11 | Multimedia switch system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112929731A CN112929731A (en) | 2021-06-08 |
CN112929731B true CN112929731B (en) | 2021-07-30 |
Family
ID=76174837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110508270.3A Active CN112929731B (en) | 2021-05-11 | 2021-05-11 | Multimedia switch system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112929731B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103826084A (en) * | 2014-02-17 | 2014-05-28 | 宁波公众信息产业有限公司 | Audio encoding method |
CN105812721A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Tracking monitoring method and tracking monitoring device |
WO2016150320A1 (en) * | 2015-03-25 | 2016-09-29 | 中兴通讯股份有限公司 | Method and device for sending audio |
CN107888567A (en) * | 2017-10-23 | 2018-04-06 | 浙江大华技术股份有限公司 | A kind of transmission method and device of compound multi-media signal |
CN110473567A (en) * | 2019-09-06 | 2019-11-19 | 上海又为智能科技有限公司 | Audio-frequency processing method, device and storage medium based on deep neural network |
CN112071132A (en) * | 2020-09-03 | 2020-12-11 | 北京竞业达数码科技股份有限公司 | Audio and video teaching equipment and intelligent teaching system |
-
2021
- 2021-05-11 CN CN202110508270.3A patent/CN112929731B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103826084A (en) * | 2014-02-17 | 2014-05-28 | 宁波公众信息产业有限公司 | Audio encoding method |
CN105812721A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Tracking monitoring method and tracking monitoring device |
WO2016150320A1 (en) * | 2015-03-25 | 2016-09-29 | 中兴通讯股份有限公司 | Method and device for sending audio |
CN107888567A (en) * | 2017-10-23 | 2018-04-06 | 浙江大华技术股份有限公司 | A kind of transmission method and device of compound multi-media signal |
CN110473567A (en) * | 2019-09-06 | 2019-11-19 | 上海又为智能科技有限公司 | Audio-frequency processing method, device and storage medium based on deep neural network |
CN112071132A (en) * | 2020-09-03 | 2020-12-11 | 北京竞业达数码科技股份有限公司 | Audio and video teaching equipment and intelligent teaching system |
Also Published As
Publication number | Publication date |
---|---|
CN112929731A (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108206833B (en) | Audio and video data transmission method and system | |
US7243150B2 (en) | Reducing the access delay for transmitting processed data over transmission data | |
US8665370B2 (en) | Method for synchronized playback of wireless audio and video and playback system using the same | |
JP4184397B2 (en) | VIDEO / AUDIO PROCESSING SYSTEM AND ITS CONTROL METHOD, AUDIO PROCESSING SYSTEM, VIDEO / AUDIO PROCESSING SYSTEM CONTROL PROGRAM, AND RECORDING MEDIUM CONTAINING THE PROGRAM | |
US9055332B2 (en) | Lip synchronization in a video conference | |
US20220038769A1 (en) | Synchronizing bluetooth data capture to data playback | |
CN105306110B (en) | A kind of method and system realized synchronous music and played | |
CN101604987A (en) | The low latency, high quality link that is used for audio transmission | |
JP2011505743A (en) | Playback delay estimation | |
CN113055312B (en) | Multichannel audio pickup method and system based on synchronous Ethernet | |
CN101272200B (en) | Multimedia stream synchronization caching method and system | |
CN113645485A (en) | Method and device for realizing conversion from any streaming media protocol to NDI (network data interface) | |
CN109040818B (en) | Audio and video synchronization method, storage medium, electronic equipment and system during live broadcasting | |
CN112929731B (en) | Multimedia switch system | |
CN103826084A (en) | Audio encoding method | |
CN101453286B (en) | Method for digital audio multiplex transmission in multimedia broadcasting system | |
US20210075533A1 (en) | Timing improvement for cognitive loudspeaker system | |
JP4218456B2 (en) | Call device, call method, and call system | |
WO2021255327A1 (en) | Managing network jitter for multiple audio streams | |
CN103474076A (en) | Method and device for transmitting aligned multichannel audio frequency | |
CN111726669B (en) | Distributed decoding equipment and audio and video synchronization method thereof | |
CN102404546A (en) | Conference audio system | |
Tatlas et al. | An Error–Concealment Technique for Wireless Digital Audio Delivery | |
WO2005020580A1 (en) | Apparatus and method for converting media stream for multimedia service in dab system | |
Tatlas et al. | Wireless digital audio delivery analysis and evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240816 Address after: Baiyun District of Guangzhou City, Guangdong province 510540 North Road No. 1633 is private science and Technology Park Branch Road, No. 1 Patentee after: Guangzhou Blue Pigeon Software Co.,Ltd. Country or region after: China Address before: No. 1968, Nanxi East Road, Nanhu District, Jiaxing City, Zhejiang Province Patentee before: ZHEJIANG LANCOO TECHNOLOGY Co.,Ltd. Country or region before: China |
|
TR01 | Transfer of patent right |