[go: up one dir, main page]

CN115460186B - A method and device for generating recording files of capability platform based on AMR-WB coding - Google Patents

A method and device for generating recording files of capability platform based on AMR-WB coding Download PDF

Info

Publication number
CN115460186B
CN115460186B CN202211065275.4A CN202211065275A CN115460186B CN 115460186 B CN115460186 B CN 115460186B CN 202211065275 A CN202211065275 A CN 202211065275A CN 115460186 B CN115460186 B CN 115460186B
Authority
CN
China
Prior art keywords
voice stream
frame
recording
file
side voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211065275.4A
Other languages
Chinese (zh)
Other versions
CN115460186A (en
Inventor
雷震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Digital Intelligence Technology Co Ltd
Original Assignee
China Telecom Digital Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Digital Intelligence Technology Co Ltd filed Critical China Telecom Digital Intelligence Technology Co Ltd
Priority to CN202211065275.4A priority Critical patent/CN115460186B/en
Publication of CN115460186A publication Critical patent/CN115460186A/en
Application granted granted Critical
Publication of CN115460186B publication Critical patent/CN115460186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1063Application servers providing network services

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请公开了一种基于AMR‑WB编码的能力平台录音文件生成方法和装置,属于通信技术领域。该方法包括获取语音通话的AMR‑WB编码的语音流数据,将所述语音流数据发送至能力平台;在所述能力平台将所述语音流数据划分为主叫侧语音流和被叫侧语音流,将所述主叫侧语音流和所述被叫侧语音流打上标记并放置在预设网盘;将所述主叫侧语音流和所述被叫侧语音流分别进行静音补偿,并转码生成PCMS16LE格式的两个第一录音文件;将每个所述第一录音文件根据采样率增加文件头以生成第二录音文件,所述第二录音文件为纯音频文件;按时间序列逐帧写入两个所述第二录音文件并增加双通道头,合成一个目标双声道录音文件。本申请针对能力平台的AMR‑WB语音数据流可以实现高效低耗的录音文件生成。

The present application discloses a method and device for generating recording files of a capability platform based on AMR-WB encoding, which belongs to the field of communication technology. The method includes obtaining AMR-WB encoded voice stream data of a voice call, and sending the voice stream data to the capability platform; dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream and placing them on a preset network disk; performing silence compensation on the calling side voice stream and the called side voice stream respectively, and transcoding to generate two first recording files in PCMS16LE format; adding a file header to each of the first recording files according to the sampling rate to generate a second recording file, and the second recording file is a pure audio file; writing two second recording files frame by frame in a time sequence and adding a dual-channel header to synthesize a target dual-channel recording file. The present application can realize efficient and low-cost recording file generation for the AMR-WB voice data stream of the capability platform.

Description

AMR-WB coding-based capacity platform recording file generation method and device
Technical Field
The application belongs to the technical field of communication, and particularly relates to a method and a device for generating a recording file of a capability platform based on AMR-WB coding.
Background
With the development of the fifth generation communication technology, the application of voice over IP (Voice over Internet Protocol, voIP) technology is becoming more and more widespread. VoIP voice call achieves voice call and multimedia conference through Internet Protocol (IP), communication is needed through capability platform, and the method has the characteristics of high efficiency and low cost.
However, because the capability platform is built earlier, only the basic PCMA coding mode is supported, the AMR-WB coding mode cannot be supported, so that the AMR-WB coding voice communication cannot be continued, the recording cannot be carried out, and the development of platform business is seriously affected. Meanwhile, the capability platform has high concurrent real-time requirements on voice channels, and the AMR-WB encoding and decoding are more complex in transcoding and occupy and consume more system resources such as a CPU.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for generating a recording file of a capacity platform based on AMR-WB coding, which can solve the problems that AMR-WB coding voice communication of the capacity platform in the related technology cannot record and consumes large system resources.
In order to solve the technical problems, the application is realized as follows:
In a first aspect, the present application provides a method for generating a recording file of a capability platform based on AMR-WB encoding, the method comprising:
Acquiring AMR-WB coded voice stream data of voice call, and transmitting the voice stream data to a capability platform;
dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform, marking the calling side voice stream and the called side voice stream, and placing the voice stream data on a preset network disk;
Respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, and transcoding to generate two first recording files in a PCMS16LE format;
Adding a file header to each first recording file according to the sampling rate to generate a second recording file, wherein the second recording file is a pure audio file;
And writing two second recording files frame by frame according to time sequence, adding a double-channel head, and synthesizing a target double-channel recording file.
Further, dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform, marking the calling side voice stream and the called side voice stream, and placing the marked voice stream and the voice stream on a preset network disk comprises:
dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform;
dividing the calling side voice stream and the called side voice stream into a plurality of data frames according to one frame of 20ms data;
Writing the plurality of data frames into two RTP audio files respectively;
extracting user number information, marking client marks for the two RTP audio files, taking the client marks as file names, and placing the two RTP audio files on a preset network disk, wherein the capacity of the preset network disk can be elastically expanded.
Further, after the dividing the caller-side voice stream and the callee-side voice stream into a plurality of data frames according to 20ms one-frame data, the method further includes:
identifying the plurality of data frames;
the step of respectively performing mute compensation on the calling side voice stream and the called side voice stream, and transcoding the mute compensated voice streams to generate a first recording file in a PCMS16LE format specifically comprises the following steps:
Under the condition that the data frame is a comfortable noise frame, acquiring the duration of the comfortable noise frame, and performing PCMS16LE format coding filling on the comfortable noise frame according to the duration to form a first transcoding frame;
decoding the voice data frame into a PCMS16LE format to form a second transcoding frame under the condition that the data frame is a voice data frame;
And synthesizing the first transcoding frame and the second transcoding frame into the first recording file.
Further, after dividing the caller-side voice stream and the callee-side voice stream into a plurality of data frames according to 20ms one-frame data, the method further includes recording a time stamp of each data frame;
the step of writing two second recording files frame by frame according to the time sequence and adding a double-channel head, and the step of generating a target double-channel recording file specifically comprises the following steps:
and writing two second recording files frame by frame according to the time stamp, adding a double-channel head, and synthesizing a target double-channel recording file.
Further, the step of adding the header to each first audio record file according to the sampling rate to generate a second audio record file, where the second audio record file is a pure audio file specifically includes:
and increasing the first recording file by 44 bytes according to the sampling rate to generate a second recording file, wherein the second recording file is a wav format file.
In a second aspect, the present application provides a device for generating a recording file of a capability platform based on AMR-WB encoding, the device comprising:
The voice stream data acquisition module is used for acquiring AMR-WB coded voice stream data of a voice call and sending the voice stream data to the capability platform;
The division marking module is used for dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream, and placing the voice streams on a preset network disk;
The transcoding module is used for respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, and transcoding to generate two first recording files in a PCMS16LE format;
The audio generation module is used for increasing the file header of each first recording file according to the sampling rate to generate a second recording file, wherein the second recording file is a pure audio file;
and the double-channel synthesis module is used for writing two second recording files frame by frame in time sequence and adding double-channel heads to synthesize a target double-channel recording file.
Further, the division marking module includes:
dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform;
The cutting sub-module is used for dividing the calling side voice stream and the called side voice stream into a plurality of data frames according to 20ms one-frame data;
An audio writing sub-module, configured to write the plurality of data frames into two RTP audio files respectively;
And the marking sub-module is used for extracting user number information, marking client marks for the two RTP audio files, taking the client marks as file names, and placing the two RTP audio files on a preset network disk, wherein the capacity of the preset network disk can be elastically expanded.
Further, the division marking module further includes:
the identification sub-module is used for identifying the plurality of data frames;
The transcoding module comprises:
the filling sub-module is used for acquiring the duration of the comfortable noise frame under the condition that the data frame is the comfortable noise frame, and carrying out PCMS16LE format coding filling on the comfortable noise frame according to the duration to form a first transcoding frame;
the decoding sub-module is used for decoding the voice data frame into a PCMS16LE format to form a second transcoding frame when the data frame is the voice data frame;
and the synthesis submodule is used for synthesizing the first transcoding frame and the second transcoding frame into the first recording file.
Further, the division marking module further includes:
and the time stamp recording submodule is used for recording the time stamp of each data frame.
Further, the audio generation module is specifically configured to:
and increasing the first recording file by 44 bytes according to the sampling rate to generate a second recording file, wherein the second recording file is a wav format file.
The AMR-WB coding-based capacity platform recording file generation method comprises the steps of after an capacity platform obtains AMR-WB coding voice stream data of voice call, dividing the voice stream data into a calling side voice stream and a called side voice stream, marking the calling side voice stream and the called side voice stream, placing the marking on a preset network disk, respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, converting the mute compensation on the calling side voice stream and the called side voice stream to generate two first recording files in a PCMS16LE format, increasing a file header of each first recording file according to a sampling rate to generate a second recording file, wherein the second recording files are pure audio files, writing the two second recording files frame by frame according to a time sequence, adding two channel headers, and synthesizing a target two-channel recording file. According to the recording file generation scheme provided by the application, synchronous recording can be realized on the VoIP voice call of AMR-WB coding in a transcoding mode, and on the other hand, the occupation of system resources can be greatly reduced and the system processing efficiency can be improved through the voice streams on the calling side and the called side to be respectively processed and the direct filling of comfortable noise frames.
Drawings
FIG. 1 is a flowchart of a method for generating a recording file of a capability platform based on AMR-WB encoding according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a method and apparatus for generating a recording file based on an AMR-WB encoding capability platform according to an embodiment of the present application.
The achievement of the object, functional features and advantages of the present invention will be further described with reference to the embodiments, referring to the accompanying drawings.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
First, the proper nouns mentioned in the embodiments of the present application are explained.
PCMA coding, namely, an audio technology, namely, the PCM audio data are compressed and coded according to A-law;
AMR-WB (adaptive multirate wideband, adaptive multi-rate wideband speech coding) coding, with a sampling frequency of 16kHz, is a wideband speech coding standard adopted by the International standardization organization ITU-T and 3GPP at the same time, also called G722.2 standard. AMR-WB provides a voice bandwidth range up to 50-7000 hz, which the user can subjectively perceive as more natural, comfortable and easily distinguishable than before.
SIP (Session initialization Protocol, session initiation protocol), which is a multimedia communication protocol formulated by IETF (INTERNET ENGINEERING TASK Force ), is a text-based application-layer control protocol for creating, modifying and releasing sessions for one or more participants. SIP is an IP voice session control protocol derived from the internet, and has the characteristics of flexibility, easy implementation, convenient expansion, and the like.
RTP (Real-time Transport Protocol ) files for transmitting Real-time audio data in a network, in which various audio data frames are encapsulated. The RTP file refers to that the capability platform directly records and stores RTP data streams of both parties of a call, does not need to do any extra work of stripping frame header and decoding, greatly improves concurrency capability of a front end and reduces CPU consumption.
Call-id is a globally unique identifier and SIP request and response messages in a one-way Call are unique.
Comfort noise (comfort noise) is used to create background noise for telephone communications when a short silence occurs during a call.
Soft switch, which is an entity implementing the "call control" function of the conventional program controlled exchange, but the conventional "call control" function is integrated with services, the call control functions required for different services are different, and the soft switch is service independent, which requires the call control function provided by the soft switch to be basic call control of various services.
The method for generating the recording file of the capacity platform based on AMR-WB coding provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
Example 1
Referring to fig. 1, a flow of a method for generating a recording file of a capability platform based on AMR-WB encoding according to an embodiment of the present application is shown.
The application provides a method for generating a recording file of a capability platform based on AMR-WB coding, which comprises the following steps:
S101, acquiring AMR-WB coded voice stream data of a voice call, and sending the voice stream data to a capability platform;
It can be understood that, according to the service characteristics of both parties and the media processing capability of the respective terminals, the calling party and the called party can negotiate into AMR-WB codes by using SIP signaling to perform both party voice call, obtain voice stream data during the call process, and send the voice stream data of both parties to the soft switch recording processing module of the capability platform. The capability platform is provided by a mobile internet operator.
S102, dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform, marking the calling side voice stream and the called side voice stream, and placing the marked calling side voice stream and the marked called side voice stream on a preset network disk;
And the soft exchange recording module of the capability platform judges the RTP audio frame header 0x80 characteristic value field and the payload type frame type according to the media negotiation result of the calling side and the called side.
If the AMR-WB codes the audio frame, the voice stream data is divided into two files of a calling side voice stream and a called side voice stream. Therefore, the voice streams on the calling side and the voice streams on the called side can be processed simultaneously in a multipath manner during processing, and the processing efficiency of the audio data can be improved.
Extracting user number information FROM FROM and To fields of SIP signaling, marking client marks according To calling number, called number and Call-id To form file names, naming a first record file of a calling side and a first record file of a called side respectively, and using amrwb as an extension. For example, the file on the caller side obtained in the above step is named "caller 13xx.amrwb", and the file on the callee side is named "callee 15xx.amrwb". Therefore, the two RTP audio files can be marked with the user marks respectively, thereby avoiding the confusion of the files in the subsequent processing process and being beneficial to improving the processing efficiency.
After marking the user information of the calling side and the called side on two RTP audio files respectively, the audio files need to be placed on a designated network disk path. The network disk path can be a cloud network disk path provided by the capability platform, and the memory of the network disk can be elastically expanded so as to elastically increase or reduce the memory of the network disk according to the size of the audio file, thereby not only ensuring the stable storage of the audio file, but also avoiding unreasonable occupation of the storage space and further improving the utilization rate of system resources.
S103, respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, and transcoding to generate two first recording files in a PCMS16LE format;
Optionally, after dividing the voice stream data into a calling side voice stream and a called side voice stream, dividing the calling side voice stream and the called side voice stream into a plurality of data frames according to a certain length respectively, and then writing the plurality of data frames into two RTP audio files according to time sequence. Alternatively, the caller-side voice stream and the callee-side voice stream are written into two RTP audio files, respectively, in 20ms of one frame data. Of course, in practical use, other intervals may be used to cut the data frame, such as cutting the audio data stream at 10ms, 30ms, or 50ms frames. It should be noted that when cutting the audio data stream, the time stamp of each current frame needs to be recorded in a buffer, so that the subsequent synthesis is performed according to the sequence of the time stamps.
After the data frame is cut, the type of the data frame may be identified. And loading the recorded two RTP recording files of the calling party and the called party by the back end AMR-WB recording transcoding service module, and stripping the RTP head according to RTP specifications to carry out AMR-WB audio processing. When each audio frame is processed, the first byte of each frame is read first, or 1-4 bytes are taken to judge the current frame type.
If the frame type is identified as the comfortable noise frame, the cached time stamp of the previous frame and the time stamp of the current frame are taken out for carrying out difference value calculation, and the mute time is obtained. And directly using the mute time length to carry out PCMS16LE format coding filling without carrying out transcoding processing on the frame, and sequentially writing non-compressed raw format files. It will be appreciated that short silence during a call is used to create background noise, known as comfort noise, for telephone communications. After the call voice data stream is divided into a calling side data stream and a called side data stream, a comfortable noise frame generated when a called side user speaks is existed in an audio data file of the calling side data stream, when the frame is determined to be the comfortable noise frame through a frame header field after the data frame is cut, the duration of the frame is determined, the PCMS16LE format codes with the same duration are used for filling, and similarly, when the comfortable noise frame generated when the calling side user speaks is existed in the audio data file of the called side data stream, the duration of the frame is determined to be the comfortable noise frame through the frame header field after the data frame is cut, the PCMS16LE format codes with the same duration are used for filling, and a first transcoding frame is formed after filling. Therefore, after the comfortable noise frame is identified, the data frame is processed in a direct format coding mode, so that occupation of system resources can be greatly reduced, and the processing efficiency of audio conversion is improved.
If the identified frame type is a voice data frame, the frame length corresponding to the current frame is obtained according to the AMR-WB standard, and then the data of which the frame length is reduced by 1-4 length (the byte length read for judging the frame type is reduced by 1-4) is continuously read, so that a complete AMR-WB frame is obtained. And inputting the read AMR-WB frame into a decoding module to be decoded into a PCMS16LE format, and writing the PCMS16LE file to form a second transcoding frame.
After the first transcoding frame and the second transcoding frame are formed, the first transcoding frame and the second transcoding frame are synthesized according to the sequence of the time stamps, and the first recording file of the calling side and the first recording file of the called side are synthesized respectively.
S104, increasing the file header of each first recording file according to the sampling rate to generate a second recording file, wherein the second recording file is an audio-only file;
the first recording file is increased by 44 bytes of wav file header according to the sampling rate to generate a second recording file, wherein the second recording file is a wav format file.
Specifically, after transcoding is finished, the first recording file is a PCMS16LE file in an uncompressed format, a wav file header of 44 bytes can be added to the first recording file according to attributes such as a sampling rate, the first recording file is converted into a 16k16bit L16 encoded second recording file, and the second recording file is a wav file and is a pure audio file capable of being played and processed.
It will be appreciated that since the calling side and the called side each generate a respective first sound recording file, the second sound recording files of the calling side and the called side are correspondingly generated after conversion.
And S105, writing two second recording files frame by frame in time sequence, adding a double-channel head, and synthesizing a target double-channel recording file.
And writing two second recording files frame by frame according to the time stamp, adding a double-channel head, and synthesizing a target double-channel recording file.
A second recording file on the calling side and the called side can be written frame by frame in time sequence according to the voice stream on the calling side and the called side by using a SOX (audio processing tool), and a wav dual channel header is added to form a dual-channel file. Therefore, the obtained sound recording file can separate the voice of the calling party and the voice of the called party without adding extra workload, and can be used for directly carrying out bidirectional voice detection and carrying out value-added service processing.
The AMR-WB coding-based capacity platform recording file generation method comprises the steps of after an capacity platform obtains AMR-WB coding voice stream data of voice call, dividing the voice stream data into a calling side voice stream and a called side voice stream, marking the calling side voice stream and the called side voice stream, placing the marking on a preset network disk, respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, converting the mute compensation on the calling side voice stream and the called side voice stream to generate two first recording files in a PCMS16LE format, increasing a file header of each first recording file according to a sampling rate to generate a second recording file, wherein the second recording files are pure audio files, writing the two second recording files frame by frame according to a time sequence, adding two channel headers, and synthesizing a target two-channel recording file. According to the recording file generation scheme provided by the application, synchronous recording can be realized on the VoIP voice call of AMR-WB coding in a transcoding mode, and on the other hand, the occupation of system resources can be greatly reduced and the system processing efficiency can be improved through the voice streams on the calling side and the called side to be respectively processed and the direct filling of comfortable noise frames.
The recording file generation method provided by the embodiment of the application greatly reduces the dependence on the platform performance, optimizes the processing mechanism of AMR-WB coding comfort noise, simplifies the complexity of transcoding work and improves the processing speed. In addition, the front end of the capability platform records the original RTP voice stream without decoding processing, the system consumes low speed, the high concurrency of the capability platform is improved, and the RTP file recorded by the front end soft switch of the capability platform is transmitted to the rear end transcoding processing service module in a net disk type, thereby having the advantages of high reliability and flexible expansion of capacity.
In a second aspect, the present application provides an AMR-WB encoding based capability platform audio file generating apparatus 20, comprising:
a voice stream data obtaining module 201, configured to obtain AMR-WB encoded voice stream data of a voice call, and send the voice stream data to a capability platform;
The division marking module 202 is configured to divide the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, mark the calling side voice stream and the called side voice stream, and place the marked voice streams on a preset network disk;
the transcoding module 203 is configured to perform mute compensation on the calling side voice stream and the called side voice stream, and transcode the two first recording files in PCMS16LE format;
the audio generation module 204 is configured to increase each first recording file by a file header according to a sampling rate to generate a second recording file, where the second recording file is a pure audio file;
the dual-channel synthesizing module 205 is configured to write two of the second audio files frame by frame in time sequence and add a dual-channel header to synthesize a target dual-channel audio file.
Wherein the division marking module 202 includes:
A dividing submodule 2021, configured to divide the voice stream data into a calling side voice stream and a called side voice stream at the capability platform;
A cutting sub-module 2022, configured to divide the caller-side voice stream and the callee-side voice stream into a plurality of data frames according to 20ms one-frame data;
an audio writing sub-module 2023, configured to write the plurality of data frames into two RTP audio files respectively;
the marking sub-module 2024 is configured to extract user number information, mark the two RTP audio files with client marks, place the two RTP audio files on a preset network disk by using the client marks as file names, and the capacity of the preset network disk can be elastically expanded.
An identification submodule 2025 for identifying the plurality of data frames;
a time stamp recording sub-module 2026 for recording a time stamp of each of the data frames.
Wherein the transcoding module 203 comprises:
A padding submodule 2031, configured to obtain a duration of the comfort noise frame when the data frame is the comfort noise frame, and perform PCMS16LE format encoding padding on the comfort noise frame according to the duration to form a first transcoding frame;
A decoding submodule 2032, configured to decode the voice data frame into a PCMS16LE format to form a second transcoded frame if the data frame is a voice data frame;
A synthesis submodule 2033, configured to synthesize the first transcoded frame and the second transcoded frame into the first recording file.
The device 20 for generating the recording file of the capability platform based on the AMR-WB encoding according to the embodiment of the present application can implement each process implemented in the embodiment of the method for generating the recording file of the capability platform based on the AMR-WB encoding, and in order to avoid repetition, a detailed description is omitted here.
The AMR-WB coding-based capacity platform recording file generation device provided by the application comprises a voice stream data acquisition module 201, a division marking module 202, a transcoding module 203, an audio generation module 204 and a dual-channel synthesis module 205, wherein after the voice stream data of AMR-WB coding of a voice call is acquired by the capacity platform, the voice stream data is divided into a calling side voice stream and a called side voice stream, the calling side voice stream and the called side voice stream are marked and placed on a preset network disk, the calling side voice stream and the called side voice stream are respectively subjected to mute compensation and transcoding to generate two first recording files in a PCMS16LE format, the file header of each first recording file is increased according to the sampling rate to generate a second recording file, the second recording file is a pure audio file, and the two second recording files are written into a dual-channel header frame by frame according to time sequence to synthesize a target dual-channel recording file. According to the recording file generation scheme provided by the application, synchronous recording can be realized on the VoIP voice call of AMR-WB coding in a transcoding mode, and on the other hand, the occupation of system resources can be greatly reduced and the system processing efficiency can be improved through the voice streams on the calling side and the called side to be respectively processed and the direct filling of comfortable noise frames.
The AMR-WB encoding-based capacity platform recording file generating device 20 provided by the application can optimize the processing mechanism of the comfortable noise of AMR-WB encoding in the recording link, the soft exchange at the front end of the capacity platform directly records the original RTP voice stream without decoding processing, the resource consumption of the platform system can be reduced, and the front end RTP recording and the rear end transcoding service are in seamless connection through a network disk medium and have the characteristics of high reliability and high elasticity.
The virtual device in the embodiment of the application can be a device, and also can be a component, an integrated circuit or a chip in a terminal.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.
Finally, it should be noted that the foregoing embodiments are merely for illustrating the technical solutions of the present disclosure, and not for limiting the same, and although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical solutions described in the foregoing embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present disclosure.
The above description is only an example of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims (10)

1.一种基于AMR-WB编码的能力平台录音文件生成方法,其特征在于,所述方法包括:1. A method for generating a recording file of a capability platform based on AMR-WB encoding, characterized in that the method comprises: 获取语音通话的AMR-WB编码的语音流数据,将所述语音流数据发送至能力平台;Acquire AMR-WB encoded voice stream data of the voice call, and send the voice stream data to the capability platform; 在所述能力平台将所述语音流数据划分为主叫侧语音流和被叫侧语音流,将所述主叫侧语音流和所述被叫侧语音流打上标记并放置在预设网盘;The capability platform divides the voice stream data into a calling side voice stream and a called side voice stream, marks the calling side voice stream and the called side voice stream and places them on a preset network disk; 将所述主叫侧语音流和所述被叫侧语音流分别进行静音补偿,并转码生成PCMS16LE格式的两个第一录音文件;Performing silence compensation on the calling side voice stream and the called side voice stream respectively, and transcoding to generate two first recording files in PCMS16LE format; 将每个所述第一录音文件根据采样率增加文件头以生成第二录音文件,所述第二录音文件为纯音频文件;Adding a file header to each of the first recording files according to the sampling rate to generate a second recording file, wherein the second recording file is a pure audio file; 按时间序列逐帧写入两个所述第二录音文件并增加双通道头,合成一个目标双声道录音文件。The two second recording files are written frame by frame in time sequence and a dual-channel header is added to synthesize a target dual-channel recording file. 2.根据权利要求1所述的录音文件生成方法,其特征在于,所述在所述能力平台将所述语音流数据划分为主叫侧语音流和被叫侧语音流,将所述主叫侧语音流和所述被叫侧语音流打上标记并放置在预设网盘包括:2. The recording file generation method according to claim 1 is characterized in that the step of dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream and placing them on a preset network disk comprises: 在所述能力平台将所述语音流数据划分为主叫侧语音流和被叫侧语音流;Dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform; 将所述主叫侧语音流和所述被叫侧语音流按20ms一帧数据分为多个数据帧;Divide the calling side voice stream and the called side voice stream into multiple data frames according to a frame data of 20ms; 将所述多个数据帧分别写入两个RTP音频文件;Writing the multiple data frames into two RTP audio files respectively; 提取用户号码信息为所述两个RTP音频文件打上客户标记,将所述客户标记作为文件名将所述两个RTP音频文件放置在预设网盘,所述预设网盘的容量可弹性扩展。The user number information is extracted to add a customer tag to the two RTP audio files, and the customer tag is used as the file name to place the two RTP audio files in a preset network disk, and the capacity of the preset network disk can be flexibly expanded. 3.根据权利要求2所述的录音文件生成方法,其特征在于,所述将所述主叫侧语音流和所述被叫侧语音流按20ms一帧数据分为多个数据帧之后,所述方法还包括:3. The recording file generation method according to claim 2, characterized in that after dividing the calling side voice stream and the called side voice stream into multiple data frames according to a frame of 20 ms, the method further comprises: 对所述多个数据帧进行识别;Identifying the multiple data frames; 所述将所述主叫侧语音流和所述被叫侧语音流分别进行静音补偿,并转码生成PCMS16LE格式的两个第一录音文件具体为:The method of performing silence compensation on the calling side voice stream and the called side voice stream respectively and transcoding to generate two first recording files in PCMS16LE format is specifically as follows: 在所述数据帧为舒适噪音帧的情况下,获取所述舒适噪音帧的时长,根据所述时长对所述舒适噪音帧进行PCMS16LE格式编码填充形成第一转码帧;In the case where the data frame is a comfort noise frame, obtaining a duration of the comfort noise frame, and performing PCMS16LE format encoding and padding on the comfort noise frame according to the duration to form a first transcoded frame; 在所述数据帧为语音数据帧的情况下,将所述语音数据帧解码成PCMS16LE格式形成第二转码帧;In the case where the data frame is a voice data frame, decoding the voice data frame into a PCMS16LE format to form a second transcoded frame; 将所述第一转码帧和所述第二转码帧合成所述第一录音文件。The first transcoded frame and the second transcoded frame are synthesized into the first recording file. 4.根据权利要求2所述的录音文件生成方法,其特征在于,所述将所述主叫侧语音流和所述被叫侧语音流按20ms一帧数据分为多个数据帧后,所述方法还包括记录每个所述数据帧的时间戳;4. The recording file generation method according to claim 2, characterized in that after dividing the calling side voice stream and the called side voice stream into multiple data frames according to a frame of 20ms, the method further comprises recording a timestamp of each of the data frames; 所述按时间序列逐帧写入两个所述第二录音文件并增加双通道头,生成一个目标双声道录音文件具体为:The step of writing the two second recording files frame by frame in time sequence and adding a dual-channel header to generate a target dual-channel recording file is specifically as follows: 按照所述时间戳逐帧写入两个所述第二录音文件并增加双通道头,合成一个目标双声道录音文件。The two second recording files are written frame by frame according to the timestamp and a dual-channel header is added to synthesize a target dual-channel recording file. 5.根据权利要求1所述的录音文件生成方法,其特征在于,所述将每个所述第一录音文件根据采样率增加文件头以生成第二录音文件,所述第二录音文件为纯音频文件具体为:5. The recording file generation method according to claim 1, characterized in that the step of adding a file header to each of the first recording files according to a sampling rate to generate a second recording file, wherein the second recording file is a pure audio file, specifically comprises: 将所述第一录音文件根据采样率增加44字节的wav文件头以生成第二录音文件,所述第二录音文件为wav格式的文件。A 44-byte wav file header is added to the first recording file according to the sampling rate to generate a second recording file, and the second recording file is a file in wav format. 6.一种基于AMR-WB编码的能力平台录音文件生成装置,其特征在于,所述装置包括:6. A device for generating a recording file of a capability platform based on AMR-WB encoding, characterized in that the device comprises: 语音流数据获取模块,用于获取语音通话的AMR-WB编码的语音流数据,将所述语音流数据发送至能力平台;A voice stream data acquisition module, used to acquire the AMR-WB encoded voice stream data of the voice call, and send the voice stream data to the capability platform; 划分标记模块,用于在所述能力平台将所述语音流数据划分为主叫侧语音流和被叫侧语音流,将所述主叫侧语音流和所述被叫侧语音流打上标记并放置在预设网盘;A division marking module, used for dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream and placing them in a preset network disk; 转码模块,用于将所述主叫侧语音流和所述被叫侧语音流分别进行静音补偿,并转码生成PCMS16LE格式的两个第一录音文件;A transcoding module, used for performing silence compensation on the calling side voice stream and the called side voice stream respectively, and transcoding to generate two first recording files in PCMS16LE format; 音频生成模块,用于将每个所述第一录音文件根据采样率增加文件头以生成第二录音文件,所述第二录音文件为纯音频文件;An audio generation module, used for adding a file header to each of the first recording files according to a sampling rate to generate a second recording file, wherein the second recording file is a pure audio file; 双通道合成模块,用于按时间序列逐帧写入两个所述第二录音文件并增加双通道头,合成一个目标双声道录音文件。The dual-channel synthesis module is used to write the two second recording files frame by frame in time sequence and add a dual-channel header to synthesize a target dual-channel recording file. 7.根据权利要求6所述的录音文件生成装置,其特征在于,所述划分标记模块包括:7. The recording file generating device according to claim 6, characterized in that the division marking module comprises: 划分子模块,用于在所述能力平台将所述语音流数据划分为主叫侧语音流和被叫侧语音流;A division submodule, used for dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform; 切割子模块,用于将所述主叫侧语音流和所述被叫侧语音流按20ms一帧数据分为多个数据帧;A cutting submodule, used for dividing the calling side voice stream and the called side voice stream into multiple data frames according to a frame data of 20ms; 音频写入子模块,用于将所述多个数据帧分别写入两个RTP音频文件;An audio writing submodule, used for writing the multiple data frames into two RTP audio files respectively; 标记子模块,用于提取用户号码信息为所述两个RTP音频文件打上客户标记,将所述客户标记作为文件名将所述两个RTP音频文件放置在预设网盘,所述预设网盘的容量可弹性扩展。The marking submodule is used to extract the user number information to mark the two RTP audio files with a customer tag, and use the customer tag as the file name to place the two RTP audio files in a preset network disk, and the capacity of the preset network disk can be flexibly expanded. 8.根据权利要求7所述的录音文件生成装置,其特征在于,所述划分标记模块还包括:8. The recording file generating device according to claim 7, characterized in that the division marking module further comprises: 识别子模块,用于对所述多个数据帧进行识别;An identification submodule, used for identifying the multiple data frames; 所述转码模块包括:The transcoding module comprises: 填充子模块,用于在所述数据帧为舒适噪音帧的情况下,获取所述舒适噪音帧的时长,根据所述时长对所述舒适噪音帧进行PCMS16LE格式编码填充形成第一转码帧;A filling submodule, configured to obtain the duration of the comfort noise frame when the data frame is a comfort noise frame, and perform PCMS16LE format encoding and filling on the comfort noise frame according to the duration to form a first transcoded frame; 解码子模块,用于在所述数据帧为语音数据帧的情况下,将所述语音数据帧解码成PCMS16LE格式形成第二转码帧;A decoding submodule, for decoding the voice data frame into a PCMS16LE format to form a second transcoded frame when the data frame is a voice data frame; 合成子模块,用于将所述第一转码帧和所述第二转码帧合成所述第一录音文件。A synthesis submodule is used to synthesize the first transcoded frame and the second transcoded frame into the first recording file. 9.根据权利要求7所述的录音文件生成装置,其特征在于,所述划分标记模块还包括:9. The recording file generating device according to claim 7, characterized in that the division marking module further comprises: 时间戳记录子模块,用于记录每个所述数据帧的时间戳。The timestamp recording submodule is used to record the timestamp of each data frame. 10.根据权利要求6所述的录音文件生成装置,其特征在于,所述音频生成模块具体用于:10. The recording file generating device according to claim 6, characterized in that the audio generating module is specifically used for: 将所述第一录音文件根据采样率增加44字节的wav文件头以生成第二录音文件,所述第二录音文件为wav格式的文件。A 44-byte wav file header is added to the first recording file according to the sampling rate to generate a second recording file, and the second recording file is a file in wav format.
CN202211065275.4A 2022-09-01 2022-09-01 A method and device for generating recording files of capability platform based on AMR-WB coding Active CN115460186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211065275.4A CN115460186B (en) 2022-09-01 2022-09-01 A method and device for generating recording files of capability platform based on AMR-WB coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211065275.4A CN115460186B (en) 2022-09-01 2022-09-01 A method and device for generating recording files of capability platform based on AMR-WB coding

Publications (2)

Publication Number Publication Date
CN115460186A CN115460186A (en) 2022-12-09
CN115460186B true CN115460186B (en) 2025-01-24

Family

ID=84301176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211065275.4A Active CN115460186B (en) 2022-09-01 2022-09-01 A method and device for generating recording files of capability platform based on AMR-WB coding

Country Status (1)

Country Link
CN (1) CN115460186B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117082291A (en) * 2023-07-28 2023-11-17 中移互联网有限公司 Call voice synthesis method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110536000A (en) * 2018-05-23 2019-12-03 中国移动通信集团浙江有限公司 A kind of call recording method and system
CN113014728A (en) * 2021-02-09 2021-06-22 广州市申迪计算机系统有限公司 Method, system and computer storage medium for implementing communication assistant service

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9998847B2 (en) * 2016-11-17 2018-06-12 Glen A. Norris Localizing binaural sound to objects
KR102727351B1 (en) * 2019-12-02 2024-11-08 삼성전자주식회사 Method for recoding video call and electronic device thereof
CN114302353B (en) * 2021-12-31 2023-10-20 咪咕音乐有限公司 Media negotiation method, communication device and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110536000A (en) * 2018-05-23 2019-12-03 中国移动通信集团浙江有限公司 A kind of call recording method and system
CN113014728A (en) * 2021-02-09 2021-06-22 广州市申迪计算机系统有限公司 Method, system and computer storage medium for implementing communication assistant service

Also Published As

Publication number Publication date
CN115460186A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
US7529675B2 (en) Conversational networking via transport, coding and control conversational protocols
US8195470B2 (en) Audio data packet format and decoding method thereof and method for correcting mobile communication terminal codec setup error and mobile communication terminal performance same
Sun et al. Guide to voice and video over IP: for fixed and mobile networks
US8340959B2 (en) Method and apparatus for transmitting wideband speech signals
WO2007070860A2 (en) Intelligent codec selection to optimize audio transmission in wireless communications
CN108932948B (en) Audio data processing method and device, computer equipment and computer readable storage medium
JP5713296B2 (en) Apparatus and method for encoding at least one parameter associated with a signal source
Gibson Multimedia communications: directions and innovations
CN110740283A (en) method for converting voice into character based on video communication
CN115881138A (en) Decoding method, device, equipment, storage medium and computer program product
CN115460186B (en) A method and device for generating recording files of capability platform based on AMR-WB coding
TW200917764A (en) System and method for providing AMR-WB DTX synchronization
KR100465318B1 (en) Transmiiter and receiver for wideband speech signal and method for transmission and reception
US9231814B2 (en) Communication device, method for generating a transport protocol message, and method for processing a transport protocol message
US7715365B2 (en) Vocoder and communication method using the same
Chinna Rao et al. Real-time implementation and testing of VoIP vocoders with asterisk PBX using wireshark packet analyzer
KR101038228B1 (en) A control component that removes one or more encoded frames from the isochronous communication stream based on one or more code rates of the one or more encoded frames to produce an isochronous communication stream.
CN107733836A (en) VoLTE and GSM encryption voice intercommunication method and system
CN107733833B (en) Voice intercommunication method and system for CDMA and VoLTE terminals
CN118869666B (en) Voice communication method, device, electronic equipment, medium and program product
CN103548141B (en) Coding groups is selected to create the second voice flow from the first voice flow
WO2009029565A2 (en) Method, system and apparatus for providing signal based packet loss concealment for memoryless codecs
CN119583712A (en) Recording method, recording device, recording apparatus, recording medium, and recording program product
JP6142475B2 (en) Sound source file management apparatus, sound source file management method, and program thereof
CN102177688B (en) Method, apparatus and system for speech coding and decoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant