[go: up one dir, main page]

CN112929731B - Multimedia switch system - Google Patents

Multimedia switch system Download PDF

Info

Publication number
CN112929731B
CN112929731B CN202110508270.3A CN202110508270A CN112929731B CN 112929731 B CN112929731 B CN 112929731B CN 202110508270 A CN202110508270 A CN 202110508270A CN 112929731 B CN112929731 B CN 112929731B
Authority
CN
China
Prior art keywords
audio
signal
multimedia switch
multimedia
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110508270.3A
Other languages
Chinese (zh)
Other versions
CN112929731A (en
Inventor
张新华
陈华锋
李兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Blue Pigeon Software Co ltd
Original Assignee
Zhejiang Lancoo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lancoo Technology Co ltd filed Critical Zhejiang Lancoo Technology Co ltd
Priority to CN202110508270.3A priority Critical patent/CN112929731B/en
Publication of CN112929731A publication Critical patent/CN112929731A/en
Application granted granted Critical
Publication of CN112929731B publication Critical patent/CN112929731B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/08Arrangements for producing a reverberation or echo sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephone Function (AREA)

Abstract

The application relates to the field of multimedia switches and discloses a multimedia switch system, which comprises: the system comprises a multimedia switch and N paths of audio acquisition ends; the multimedia switch is configured to: calculating an energy ratio D before and after audio signal speech enhancement in each audio data frame, and calculating a signal-to-noise ratio according to the audio data frames with the D value smaller than a preset threshold; sending exchange information including signal-to-noise ratio to the N paths of audio acquisition ends, wherein the signal-to-noise ratio is estimated according to audio signals from the same audio acquisition end received by the exchanger in the previous period; the audio collection end is configured to: receiving exchange information sent by a multimedia switch, and if the signal-to-noise ratio in the exchange information is smaller than a preset threshold, performing amplitude increase adjustment on a signal in a preset frequency band to increase the signal of the voice component; the audio signal is sent to the multimedia switch. The multimedia switch system reduces the transmission time delay of the audio data, improves the multi-channel audio acquisition effect and saves the hardware cost.

Description

Multimedia switch system
Technical Field
The application relates to the field of multimedia switches, in particular to a multimedia switch system.
Background
In order to realize teaching scenes such as normalized multimedia teaching, recorded broadcast teaching, online classroom and remote interactive teaching in a traditional classroom, more equipment needs to be installed, the system structure is complex, and centralized processing and management of various data in the classroom are not facilitated.
In addition, in the existing multi-channel audio acquisition and audio data transmission mode, an audio cable transmission mode is adopted after the acquisition of an analog microphone, although the transmission delay is low, the influence of different multi-channel audio acquisition delays and fast attenuation of distance transmission audio signals is avoided, and the subsequent processing of multi-channel audio signal mixing, enhancement, audio and video synthesis and the like is not facilitated; the digital microphone collects the data transmitted by the Ethernet, and although the transmission distance is long and the line deployment is simple, the Ethernet transmission has the problems of network congestion, large time delay jitter and the like, so that the digital microphone is not suitable for the application with higher real-time requirement such as local sound amplification and the like.
Disclosure of Invention
The application provides a multimedia switch system, and the first purpose is to improve the sound mixing effect of a multipath channel and avoid sound breaking.
The second purpose is to solve the problems of network congestion, high noise and time delay jitter during multi-channel audio transmission, effectively improve the transmission efficiency and the audio quality of audio data and reduce the transmission time delay.
The application provides a multimedia switch system, including:
the system comprises a multimedia switch and N audio acquisition ends, wherein N is an integer greater than or equal to 2;
the audio acquisition end is configured to acquire an audio signal through a microphone;
the multimedia switch is configured to:
acquiring an audio signal from the audio acquisition end in the form of an audio data frame, and performing voice enhancement and coding to obtain an audio stream;
calculating an energy ratio D before and after audio signal speech enhancement in each audio data frame, and calculating a signal-to-noise ratio according to the audio data frames with the D value smaller than a preset threshold;
sending exchange information including the signal-to-noise ratio to the N audio acquisition ends, wherein the signal-to-noise ratio is estimated according to the audio signals from the same audio acquisition end received by the multimedia switch in the previous period;
the audio collection end is configured to:
receiving the exchange information sent by the multimedia exchange, and if the signal-to-noise ratio in the exchange information is smaller than a preset threshold, performing amplitude increase adjustment on a signal in a preset frequency band to increase a signal of a voice component;
and sending the audio signal to the multimedia switch.
In one embodiment, the system further comprises M video acquisition ends, wherein M is an integer greater than or equal to 1;
the multimedia switch is further configured to:
in K time slices of the same operation period, respectively sending exchange information comprising clock synchronization information and signal-to-noise ratio to the N paths of audio acquisition ends;
receiving and coding the video signals from the M paths of video acquisition ends to obtain video streams;
encapsulating the audio stream and the video stream and time-stamping the audio stream and the video stream to ensure synchronicity;
the video acquisition terminal is configured to acquire the video signal and transmit the video signal into the multimedia switch.
In one embodiment, the multimedia switch is configured to:
deleting audio data frames with Zn smaller than a preset first threshold and Mn smaller than a preset second threshold, wherein:
Figure DEST_PATH_IMAGE001
Figure DEST_PATH_IMAGE002
outputting a mixed sound signal
Figure DEST_PATH_IMAGE003
Wherein, in the step (A),
Figure DEST_PATH_IMAGE004
wherein Sin represents an audio signal, sgn is a function of a symbol, j is an audio acquisition end number, i is an audio data frame sample number, H is an audio data frame sample number, and η is a preset compensation factor.
In one embodiment, the energy ratio D before and after speech enhancement of the audio signal is calculated by:
Figure DEST_PATH_IMAGE005
s (i) represents an ith frame original signal of the audio acquisition end, so (i) represents a signal which is transmitted to the multimedia switch by the ith frame and is output after speech enhancement;
if the value D is larger than a preset threshold value, the energy change before and after the voice enhancement processing is larger than the preset threshold value, and the ith frame is a silent section; if the value D is smaller than a preset threshold value, the energy change before and after the voice enhancement processing is smaller than the preset threshold value, and the ith frame is a voice section;
after the voice segment data is determined, taking F frame voice segment data as an analysis sample, and calculating a signal-to-noise ratio, wherein the signal-to-noise ratio is based on the following formula:
Figure DEST_PATH_IMAGE006
where SNR represents the signal-to-noise ratio.
In one embodiment, the enhancement processing of the received audio signal by the multimedia switch comprises: noise reduction, echo cancellation, howling suppression, and automatic gain.
In one embodiment, the operation of the N-way audio capturing end further comprises:
storing the acquired audio data into an input storage area of each path of audio acquisition end;
when the data memory amount of the input storage area reaches a set first threshold value, storing and transferring the audio data in the input storage area to an output storage area;
and sending the audio data in the output storage area to the multimedia switch as the audio signal.
In one embodiment, the N audio acquisition ends calculate the time difference between the clock of the acquisition end and the synchronous clock according to the time interval of receiving the exchange information between adjacent periods;
and when the time difference is larger than a set second threshold value, calibrating the clock of the acquisition end into a synchronous clock.
In one embodiment, the M-channel video acquisition terminal is connected to the multimedia switch through an HDMI interface, and the multimedia switch acquires pixel data in YUV format and encodes the pixel data into the video stream by using an encoder.
In one embodiment, the multimedia switch encapsulates the audio stream and the video stream, further comprising:
and (5) carrying out FLV format encapsulation and pushing to the server through an RTMP protocol.
In one embodiment, the multimedia switch comprises:
the processor module comprises an audio acquisition and output interface, a video acquisition and output interface, a network interface and an external equipment interface;
the FPGA audio acquisition module is connected with the processor module;
the audio compiling module comprises an analog audio acquisition equipment interface, a digital audio acquisition equipment interface and an audio output interface, and is connected with the audio acquisition module;
an audio processing module coupled with the audio acquisition module.
In the embodiment of the application, compared with the prior art, the sound mixing effect can be improved, and the sound breaking is avoided.
In addition, multichannel audio acquisition end wheel flow is uploaded audio data to the switch in an orderly manner, and each way audio acquisition end has calibrated the local clock according to the signal, the transmission delay of multichannel audio acquisition end has been reduced, the transmission conflict between the multichannel audio acquisition end has been avoided, and then for adopt multimedia switch and audio acquisition end mutually support the pronunciation reinforcing that realizes the audio acquisition end and provide the basis, transmit multichannel audio signal to multimedia switch high quality ground, to the reduction of original sound when reducing the noise reduction filtering in the data processing of later stage switch, multichannel audio acquisition effect has been improved. Compared with the method for performing voice enhancement only at the audio acquisition end, the method has the advantages that the audio acquisition end and the switch are matched with each other to realize voice enhancement of the audio acquisition end, the audio acquisition end only needs to perform voice enhancement according to the calculation result received from the multimedia switch, a large amount of calculation in the voice enhancement is performed at the end of the switch, the transmission time delay of audio data is reduced, the multi-channel audio acquisition effect is improved, and the hardware cost is saved.
And audio and video are packaged in the switch, so that the synchronization of the audio and video is ensured.
The present specification describes a number of technical features distributed throughout the various technical aspects, and if all possible combinations of technical features (i.e. technical aspects) of the present specification are listed, the description is made excessively long. In order to avoid this problem, the respective technical features disclosed in the above summary of the invention of the present application, the respective technical features disclosed in the following embodiments and examples, and the respective technical features disclosed in the drawings may be freely combined with each other to constitute various new technical solutions (which are considered to have been described in the present specification) unless such a combination of the technical features is technically infeasible. For example, in one example, the feature a + B + C is disclosed, in another example, the feature a + B + D + E is disclosed, and the features C and D are equivalent technical means for the same purpose, and technically only one feature is used, but not simultaneously employed, and the feature E can be technically combined with the feature C, then the solution of a + B + C + D should not be considered as being described because the technology is not feasible, and the solution of a + B + C + E should be considered as being described.
Drawings
Fig. 1 is a schematic diagram of a basic structure of a multimedia switch system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a multi-channel audio acquisition network architecture (using a handle) according to an embodiment of the present application;
fig. 3 is a schematic diagram of audio-video synthesis editing according to an embodiment of the present application;
figure 4 is a multimedia switch module schematic according to one embodiment of the present application.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application may be implemented without these technical details and with various changes and modifications based on the following embodiments.
The present application relates to some of the following terms:
HDMI: high Definition Multimedia Interface (High Definition Multimedia Interface)
PCM: pulse Code Modulation (Pulse Code Modulation)
AAC: advanced Audio Coding (Advanced Audio Coding)
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the present application relates to a multimedia switch system, including:
the system comprises a multimedia switch and N paths of audio acquisition ends, wherein N is an integer greater than or equal to 2;
the audio acquisition end is configured to acquire an audio signal through the microphone.
The multimedia switch is configured to:
the audio signal is acquired from the audio acquisition terminal in the form of audio data frames.
Deleting audio data frames with Zn smaller than a preset first threshold and Mn smaller than a preset second threshold, wherein:
Figure 523449DEST_PATH_IMAGE001
Figure 175010DEST_PATH_IMAGE002
outputting a mixed sound signal
Figure 689168DEST_PATH_IMAGE003
Wherein, in the step (A),
Figure 146694DEST_PATH_IMAGE004
wherein Sin represents an audio signal, sgn is a function of a symbol, j is an audio acquisition end number, i is an audio data frame sample number, H is an audio data frame sample number, and η is a preset compensation factor.
Optionally, in an embodiment, the multimedia switch calculates an average amplitude of a signal acquired by each audio in a plurality of time slices, and each audio with the largest average amplitude is taken as an output for each of the time slices, and finally, the audio data in the plurality of time slices is synthesized and output as mixed sound, where the step is completed by an FPGA module in the multimedia switch.
Optionally, in an embodiment, the video processing system further includes M video capturing ends, where M is an integer greater than or equal to 1;
the video acquisition terminal is configured to acquire a video signal and transmit the video signal into the multimedia switch. The N paths of video acquisition ends are accessed into the multimedia switch through the HDMI, and then the multimedia switch acquires pixel data in a YUV format and encodes the pixel data into video streams by using an encoder.
The audio acquisition terminal is configured to receive the switching information sent by the multimedia switch and then perform the following operations:
firstly, sending an audio signal to a multimedia switch, calibrating a clock of a collection end according to clock synchronization information in the switching information, and performing voice enhancement according to a signal-to-noise ratio in the switching information.
The multimedia switch is further configured to:
in K time slices of the same operation period, switching information comprising clock synchronization information and signal-to-noise ratio is sent to N paths of audio acquisition ends respectively, and the signal-to-noise ratio is estimated according to audio signals from the same audio acquisition end and received by an exchanger in the previous period.
Receiving the audio signals from the N audio acquisition ends, and performing voice enhancement and coding to obtain audio streams. For example, the enhancement processing performed on the audio signal may include one or any combination of the following: noise reduction, echo cancellation, howling suppression, and automatic gain.
And thirdly, receiving and coding the video signals from the M paths of video acquisition ends to obtain video streams.
And fourthly, encapsulating the audio stream and the video stream, and time-stamping the audio stream and the video stream to ensure synchronism.
Optionally, in an embodiment, the specific steps of synthesizing and encapsulating are as follows, and an audio/video synthesis editing schematic diagram 3 shows:
(1) multi-channel video acquisition and coding: firstly, a PC desktop screen signal is accessed into a multimedia switch through an HDMI interface to obtain pixel data in a YUV format, and the YUV data is encoded into an H264 video stream by using a video encoder of libx 264; and secondly, accessing the video data of the camera into the multimedia switch through the Ethernet port to acquire the RTSP video stream and the H264 code.
(2) And audio acquisition and coding process: firstly, an audio compiler is adopted to carry out PCM coding on an audio signal; and secondly, transmitting the audio-mixed and voice-enhanced audio to a CPU module for ACC coding.
(3) And finally, carrying out FLV format encapsulation on the obtained H264 video stream and AAC audio stream, and pushing the obtained H264 video stream and AAC audio stream to a server through an RTMP protocol. And time stamps are respectively stamped on the audio data and the video data, so that the audio and video time synchronism is ensured.
When the CPU receives the video data frame and the audio PCM code, the CPU marks a time stamp, and the specific time stamp marks are as follows:
video time stamping: pts = inc + + (1000/fps), where inc is static, has an initial value of 0, adds 1 to each time the timestamp inc is done, and fps is the frame rate.
Audio time stamping: pts = inc + + (frame _ size 1000/sample _ rate); where frame _ size is the frame length, sample _ rate is the sample rate.
Optionally, in an embodiment, as shown in fig. 2, the specific multi-channel audio capturing step includes:
the multimedia switch sends the switching information at regular time: the multimedia switch distributes port numbers for each path of audio acquisition end in the network, generates switching information carrying a local clock in a timed (periodic) manner, and sends the switching information to the corresponding audio acquisition ends in order.
Secondly, the N paths of audio acquisition ends calculate the time difference between the clock of the acquisition end and the synchronous clock according to the time interval of receiving the exchange information between adjacent periods; and when the time difference is greater than a set second threshold value, calibrating the clock of the acquisition end into a synchronous clock. Optionally, the audio acquisition end calibrates the local clock according to the exchange information: and each audio acquisition end acquires clock message information according to the received exchange information, calculates the time deviation between the synchronous clock and the local clock and calibrates the local clock.
Thirdly, the audio acquisition end acquires and caches data: and each audio acquisition end acquires voice data through a high-sensitivity microphone pickup head, and caches the voice data to a local cache region after AD conversion.
Fourthly, the audio acquisition end sends data to the multimedia exchange end: and when each path of audio acquisition end receives the exchange information sent by the multimedia switch, the audio data in the buffer area is loaded to the uplink data packet and sent to the multimedia switch end.
Optionally, in an embodiment, the audio acquisition end stores the acquired audio data in an input storage area of each channel of the audio acquisition end; when the data memory amount of the input storage area reaches a set first threshold value, storing and transferring the audio data in the input storage area to an output storage area; and sending the audio data in the output storage area to the multimedia exchange as an audio signal.
Fifthly, caching audio data at the multimedia exchange end: the multimedia switch provides independent data buffer areas for each port, and after uplink data packets sent by each audio acquisition end are received, the data packets are respectively buffered to the corresponding data buffer areas.
Optionally, in an embodiment, each audio acquisition end is responsible for acquiring, buffering, and sending an original audio signal; the multimedia switch end is responsible for realizing clock synchronization control of each path of audio acquisition end in the network and receiving, caching and processing of audio and video data (the part is finished by an FPGA module in the multimedia switch). Wherein, the multimedia exchange and the audio acquisition end adopt 100M/1000M synchronous Ethernet connection.
Optionally, in an embodiment, the system applies an ATM technology, and establishes a channel for each audio acquisition end in a time division multiplexing manner, so as to ensure that each audio acquisition end in the network performs data transmission efficiently and orderly; meanwhile, a clock network synchronization technology is adopted, so that the consistency of the clock frequency of the audio acquisition and the receiving end is ensured, the time delay jitter is greatly reduced, and the accuracy and error code free of data acquisition and transmission are ensured. The basic structure diagram of the multimedia switch system is shown in fig. 1.
Optionally, in an embodiment, the multi-channel audio acquisition network structure adopts a pull handle manner, as shown in fig. 2. The multimedia switch and the audio acquisition end are connected and communicated in a bus mode, the audio acquisition end is connected and communicated in a handle mode, the single end and the multiple points are simultaneously connected into the multimedia switch, the communication rate is 10M, the clock synchronization is accurate, the time delay is low, and no codes exist.
Optionally, in one embodiment, the multimedia switch supports multiple stereo audio interfaces and multiple RJ45 network interface digital audio inputs. The sound source of the stereo input comprises a linear input line-in, a gooseneck microphone or a wireless microphone and the like, and is processed by an audio compiler (mainly comprising filtering, amplifying AD conversion and the like) and connected with a central processing unit (namely a programmable logic device FPGA) through an I2S interface; the multi-path digital microphone input passes through a physical layer transceiver PHY module and then is connected with the central processing unit through a medium communication MII interface.
Optionally, in an embodiment, according to the signal-to-noise ratio of each audio signal fed back by the multimedia switch, each audio acquisition end performs pre-enhancement processing on the front-end audio signal, and then combines with the back-end speech enhancement processing of the multimedia switch, so that the loss of sound field and line transmission can be better compensated, the quality of the original signal is improved, the reduction of the back-end noise reduction filtering processing on the original sound is reduced, and the speech enhancement effect is improved. The method comprises the following concrete steps:
firstly, the multimedia exchange end estimates the signal-to-noise ratio of an original signal: the multimedia switch estimates the signal-to-noise ratio of each audio signal by adopting a wiener filtering method for the original audio signal according to the data uploaded by each audio acquisition end, and simultaneously feeds the value back to the audio acquisition end through a downlink synchronous cell, wherein the step is completed by an FPGA module in the multimedia switch.
Secondly, the audio acquisition end performs voice enhancement: the method comprises the steps of obtaining signal-to-noise ratio (s/n) information according to a synchronous cell sent by a multimedia switch end, and when the signal-to-noise ratio is smaller than a preset reference value, carrying out amplitude increase adjustment on a signal in a preset frequency band to increase the signal-to-noise ratio of a signal of a human voice s component.
Calculating the energy ratio D of the audio signal before and after speech enhancement:
Figure DEST_PATH_IMAGE007
wherein, S (i) represents the ith frame original signal of the audio acquisition end, so (i) represents the signal output after the ith frame is transmitted to the multimedia exchange and is subjected to voice enhancement.
If the value D is larger than the preset threshold value, the energy change before and after the voice enhancement processing is larger than the preset threshold value, and the ith frame is a silence section; if the value D is smaller than the preset threshold value, the energy change before and after the voice enhancement processing is smaller than the preset threshold value, and the ith frame is a voice section.
After the voice segment data is determined, taking F frame voice segment data as an analysis sample, and calculating a signal-to-noise ratio based on the following formula:
Figure 443290DEST_PATH_IMAGE006
where SNR represents the signal-to-noise ratio.
And when the signal-to-noise ratio in the exchange information is smaller than a set third threshold value, the N paths of audio acquisition ends adjust the preset frequency band signals to a preset amplitude in later acquisition.
Thirdly, the multimedia exchange end carries out voice enhancement: and (3) carrying out noise reduction, echo elimination, howling inhibition and automatic gain processing (a general method is completed by a DSP module in a multimedia switch) on the audio data after sound mixing to realize a voice enhancement effect, wherein the processed audio output can be used for audio and video synthesis editing or local sound amplification.
Optionally, in an embodiment, the multimedia switch includes:
the processor module comprises an audio acquisition and output interface, a video acquisition and output interface, a network interface and an external equipment interface;
the FPGA audio acquisition module is connected with the processor module;
the audio compiling module comprises an analog audio acquisition equipment interface, a digital audio acquisition equipment interface and an audio output interface, and is connected with the audio acquisition module;
and the audio processing module is connected with the audio acquisition module.
Optionally, in an embodiment, the multimedia switch includes a CPU processor, a DSP chip, an FPGA chip, an audio compiling module, various data interfaces, and other main modules, so as to implement multi-channel audio and video acquisition, audio mixing, speech enhancement, and audio and video synthesis processing. A multimedia switch block diagram is shown in figure 4.
It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. The expression of a plurality of, a plurality of and the like includes 2, 2 and more than 2, more than 2 and more than 2.
All documents mentioned in this application are to be considered as being incorporated in their entirety into the disclosure of this application so as to be subject to modification as necessary. It should be understood that the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.

Claims (9)

1. A multimedia switch system, comprising:
the system comprises a multimedia switch and N audio acquisition ends, wherein N is an integer greater than or equal to 2;
the audio acquisition end is configured to acquire an audio signal through a microphone;
the multimedia switch is configured to:
acquiring the audio signal from the audio acquisition end in the form of an audio data frame, and performing voice enhancement and coding to obtain an audio stream;
calculating an energy ratio D before and after the audio signal speech enhancement in each audio data frame, and calculating a signal-to-noise ratio according to the audio data frame with the D value smaller than a preset threshold value;
the energy ratio D of the audio signal before and after the voice enhancement is calculated by the following method:
Figure 834490DEST_PATH_IMAGE001
wherein, s (i) represents an original signal in an ith frame of the audio acquisition end, so (i) represents a signal output after the ith frame is transmitted to the multimedia switch and speech enhancement is performed, and H is a sample number in an audio data frame;
if the value D is larger than a preset threshold value, the energy change before and after the voice enhancement processing is larger than the preset threshold value, and the ith frame is a silent section; if the value D is smaller than a preset threshold value, the energy change before and after the voice enhancement processing is smaller than the preset threshold value, and the ith frame is a voice section;
after the voice segment data is determined, taking F frame voice segment data as an analysis sample, and calculating a signal-to-noise ratio, wherein the signal-to-noise ratio is based on the following formula:
Figure 444463DEST_PATH_IMAGE002
wherein SNR represents a signal-to-noise ratio;
sending exchange information including the signal-to-noise ratio to the N audio acquisition ends, wherein the signal-to-noise ratio is estimated according to the audio signals from the same audio acquisition end received by the multimedia switch in the previous period;
the audio collection end is configured to:
receiving the exchange information sent by the multimedia exchange, and if the signal-to-noise ratio in the exchange information is smaller than a preset threshold, performing amplitude increase adjustment on a signal in a preset frequency band to increase a signal of a voice component;
and sending the audio signal to the multimedia switch.
2. The multimedia switch system of claim 1, further comprising M video capture ports, wherein M is an integer greater than or equal to 1;
the multimedia switch is further configured to:
in K time slices of the same operation period, respectively sending exchange information comprising clock synchronization information and the signal-to-noise ratio to the N paths of audio acquisition ends;
receiving and coding the video signals from the M paths of video acquisition ends to obtain video streams;
encapsulating the audio stream and the video stream and time-stamping the audio stream and the video stream to ensure synchronicity;
the video acquisition terminal is configured to acquire the video signal and transmit the video signal into the multimedia switch.
3. The multimedia switch system of claim 1, wherein the multimedia switch is further configured to:
deleting audio data frames with Zn smaller than a preset first threshold and Mn smaller than a preset second threshold, wherein:
Figure 614544DEST_PATH_IMAGE003
Figure 120612DEST_PATH_IMAGE004
outputting a mixed sound signal
Figure 602409DEST_PATH_IMAGE005
Wherein, in the step (A),
Figure 904821DEST_PATH_IMAGE006
wherein Sin represents an audio signal, sgn is a function of a symbol, j is an audio acquisition end number, i is an audio data frame sample number, H is an audio data frame sample number, and η is a preset compensation factor.
4. The multimedia switch system of claim 1, wherein the multimedia switch performs enhancement processing on the received audio signal comprising one or any combination of: noise reduction, echo cancellation, howling suppression, and automatic gain.
5. The multimedia switch system of claim 1, wherein the operations of the N-way audio capture port further comprise:
storing the acquired audio data into an input storage area of each path of audio acquisition end;
when the data memory amount of the input storage area reaches a set first threshold value, storing and transferring the audio data in the input storage area to an output storage area;
and sending the audio data in the output storage area to the multimedia switch as the audio signal.
6. The multimedia switch system of claim 2, wherein the N-way audio acquisition end calculates a time difference between the own acquisition end clock and the synchronous clock according to a time interval of receiving the switching information between adjacent periods;
and when the time difference is larger than a set second threshold value, calibrating the clock of the acquisition end into a synchronous clock.
7. The multimedia switch system of claim 2, wherein the M-way video capture ports access the multimedia switch via an HDMI interface, and the multimedia switch obtains pixel data in YUV format and encodes the pixel data into the video stream using an encoder.
8. The multimedia switch system of claim 2, wherein the multimedia switch encapsulates the audio stream and the video stream, further comprising:
and (5) carrying out FLV format encapsulation and pushing to the server through an RTMP protocol.
9. The multimedia switch system of claim 2, wherein the multimedia switch comprises:
the processor module comprises an audio acquisition and output interface, a video acquisition and output interface, a network interface and an external equipment interface;
the FPGA audio acquisition module is connected with the processor module;
the audio compiling module comprises an analog audio acquisition equipment interface, a digital audio acquisition equipment interface and an audio output interface, and is connected with the audio acquisition module;
an audio processing module coupled with the audio acquisition module.
CN202110508270.3A 2021-05-11 2021-05-11 Multimedia switch system Active CN112929731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110508270.3A CN112929731B (en) 2021-05-11 2021-05-11 Multimedia switch system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110508270.3A CN112929731B (en) 2021-05-11 2021-05-11 Multimedia switch system

Publications (2)

Publication Number Publication Date
CN112929731A CN112929731A (en) 2021-06-08
CN112929731B true CN112929731B (en) 2021-07-30

Family

ID=76174837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110508270.3A Active CN112929731B (en) 2021-05-11 2021-05-11 Multimedia switch system

Country Status (1)

Country Link
CN (1) CN112929731B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103826084A (en) * 2014-02-17 2014-05-28 宁波公众信息产业有限公司 Audio encoding method
CN105812721A (en) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 Tracking monitoring method and tracking monitoring device
WO2016150320A1 (en) * 2015-03-25 2016-09-29 中兴通讯股份有限公司 Method and device for sending audio
CN107888567A (en) * 2017-10-23 2018-04-06 浙江大华技术股份有限公司 A kind of transmission method and device of compound multi-media signal
CN110473567A (en) * 2019-09-06 2019-11-19 上海又为智能科技有限公司 Audio-frequency processing method, device and storage medium based on deep neural network
CN112071132A (en) * 2020-09-03 2020-12-11 北京竞业达数码科技股份有限公司 Audio and video teaching equipment and intelligent teaching system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103826084A (en) * 2014-02-17 2014-05-28 宁波公众信息产业有限公司 Audio encoding method
CN105812721A (en) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 Tracking monitoring method and tracking monitoring device
WO2016150320A1 (en) * 2015-03-25 2016-09-29 中兴通讯股份有限公司 Method and device for sending audio
CN107888567A (en) * 2017-10-23 2018-04-06 浙江大华技术股份有限公司 A kind of transmission method and device of compound multi-media signal
CN110473567A (en) * 2019-09-06 2019-11-19 上海又为智能科技有限公司 Audio-frequency processing method, device and storage medium based on deep neural network
CN112071132A (en) * 2020-09-03 2020-12-11 北京竞业达数码科技股份有限公司 Audio and video teaching equipment and intelligent teaching system

Also Published As

Publication number Publication date
CN112929731A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN108206833B (en) Audio and video data transmission method and system
US7243150B2 (en) Reducing the access delay for transmitting processed data over transmission data
US8665370B2 (en) Method for synchronized playback of wireless audio and video and playback system using the same
JP4184397B2 (en) VIDEO / AUDIO PROCESSING SYSTEM AND ITS CONTROL METHOD, AUDIO PROCESSING SYSTEM, VIDEO / AUDIO PROCESSING SYSTEM CONTROL PROGRAM, AND RECORDING MEDIUM CONTAINING THE PROGRAM
US9055332B2 (en) Lip synchronization in a video conference
US20220038769A1 (en) Synchronizing bluetooth data capture to data playback
CN105306110B (en) A kind of method and system realized synchronous music and played
CN101604987A (en) The low latency, high quality link that is used for audio transmission
JP2011505743A (en) Playback delay estimation
CN113055312B (en) Multichannel audio pickup method and system based on synchronous Ethernet
CN101272200B (en) Multimedia stream synchronization caching method and system
CN113645485A (en) Method and device for realizing conversion from any streaming media protocol to NDI (network data interface)
CN109040818B (en) Audio and video synchronization method, storage medium, electronic equipment and system during live broadcasting
CN112929731B (en) Multimedia switch system
CN103826084A (en) Audio encoding method
CN101453286B (en) Method for digital audio multiplex transmission in multimedia broadcasting system
US20210075533A1 (en) Timing improvement for cognitive loudspeaker system
JP4218456B2 (en) Call device, call method, and call system
WO2021255327A1 (en) Managing network jitter for multiple audio streams
CN103474076A (en) Method and device for transmitting aligned multichannel audio frequency
CN111726669B (en) Distributed decoding equipment and audio and video synchronization method thereof
CN102404546A (en) Conference audio system
Tatlas et al. An Error–Concealment Technique for Wireless Digital Audio Delivery
WO2005020580A1 (en) Apparatus and method for converting media stream for multimedia service in dab system
Tatlas et al. Wireless digital audio delivery analysis and evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240816

Address after: Baiyun District of Guangzhou City, Guangdong province 510540 North Road No. 1633 is private science and Technology Park Branch Road, No. 1

Patentee after: Guangzhou Blue Pigeon Software Co.,Ltd.

Country or region after: China

Address before: No. 1968, Nanxi East Road, Nanhu District, Jiaxing City, Zhejiang Province

Patentee before: ZHEJIANG LANCOO TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right