CN115460186B

CN115460186B - A method and device for generating recording files of capability platform based on AMR-WB coding

Info

Publication number: CN115460186B
Application number: CN202211065275.4A
Authority: CN
Inventors: 雷震
Original assignee: China Telecom Digital Intelligence Technology Co Ltd
Current assignee: China Telecom Digital Intelligence Technology Co Ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2025-01-24
Anticipated expiration: 2042-09-01
Also published as: CN115460186A

Abstract

The present application discloses a method and device for generating recording files of a capability platform based on AMR-WB encoding, which belongs to the field of communication technology. The method includes obtaining AMR-WB encoded voice stream data of a voice call, and sending the voice stream data to the capability platform; dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream and placing them on a preset network disk; performing silence compensation on the calling side voice stream and the called side voice stream respectively, and transcoding to generate two first recording files in PCMS16LE format; adding a file header to each of the first recording files according to the sampling rate to generate a second recording file, and the second recording file is a pure audio file; writing two second recording files frame by frame in a time sequence and adding a dual-channel header to synthesize a target dual-channel recording file. The present application can realize efficient and low-cost recording file generation for the AMR-WB voice data stream of the capability platform.

Description

AMR-WB coding-based capacity platform recording file generation method and device

Technical Field

The application belongs to the technical field of communication, and particularly relates to a method and a device for generating a recording file of a capability platform based on AMR-WB coding.

Background

With the development of the fifth generation communication technology, the application of voice over IP (Voice over Internet Protocol, voIP) technology is becoming more and more widespread. VoIP voice call achieves voice call and multimedia conference through Internet Protocol (IP), communication is needed through capability platform, and the method has the characteristics of high efficiency and low cost.

However, because the capability platform is built earlier, only the basic PCMA coding mode is supported, the AMR-WB coding mode cannot be supported, so that the AMR-WB coding voice communication cannot be continued, the recording cannot be carried out, and the development of platform business is seriously affected. Meanwhile, the capability platform has high concurrent real-time requirements on voice channels, and the AMR-WB encoding and decoding are more complex in transcoding and occupy and consume more system resources such as a CPU.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for generating a recording file of a capacity platform based on AMR-WB coding, which can solve the problems that AMR-WB coding voice communication of the capacity platform in the related technology cannot record and consumes large system resources.

In order to solve the technical problems, the application is realized as follows:

In a first aspect, the present application provides a method for generating a recording file of a capability platform based on AMR-WB encoding, the method comprising:

Acquiring AMR-WB coded voice stream data of voice call, and transmitting the voice stream data to a capability platform;

dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform, marking the calling side voice stream and the called side voice stream, and placing the voice stream data on a preset network disk;

Respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, and transcoding to generate two first recording files in a PCMS16LE format;

Adding a file header to each first recording file according to the sampling rate to generate a second recording file, wherein the second recording file is a pure audio file;

And writing two second recording files frame by frame according to time sequence, adding a double-channel head, and synthesizing a target double-channel recording file.

Further, dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform, marking the calling side voice stream and the called side voice stream, and placing the marked voice stream and the voice stream on a preset network disk comprises:

dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform;

dividing the calling side voice stream and the called side voice stream into a plurality of data frames according to one frame of 20ms data;

Writing the plurality of data frames into two RTP audio files respectively;

extracting user number information, marking client marks for the two RTP audio files, taking the client marks as file names, and placing the two RTP audio files on a preset network disk, wherein the capacity of the preset network disk can be elastically expanded.

Further, after the dividing the caller-side voice stream and the callee-side voice stream into a plurality of data frames according to 20ms one-frame data, the method further includes:

identifying the plurality of data frames;

the step of respectively performing mute compensation on the calling side voice stream and the called side voice stream, and transcoding the mute compensated voice streams to generate a first recording file in a PCMS16LE format specifically comprises the following steps:

Under the condition that the data frame is a comfortable noise frame, acquiring the duration of the comfortable noise frame, and performing PCMS16LE format coding filling on the comfortable noise frame according to the duration to form a first transcoding frame;

decoding the voice data frame into a PCMS16LE format to form a second transcoding frame under the condition that the data frame is a voice data frame;

And synthesizing the first transcoding frame and the second transcoding frame into the first recording file.

Further, after dividing the caller-side voice stream and the callee-side voice stream into a plurality of data frames according to 20ms one-frame data, the method further includes recording a time stamp of each data frame;

the step of writing two second recording files frame by frame according to the time sequence and adding a double-channel head, and the step of generating a target double-channel recording file specifically comprises the following steps:

and writing two second recording files frame by frame according to the time stamp, adding a double-channel head, and synthesizing a target double-channel recording file.

Further, the step of adding the header to each first audio record file according to the sampling rate to generate a second audio record file, where the second audio record file is a pure audio file specifically includes:

and increasing the first recording file by 44 bytes according to the sampling rate to generate a second recording file, wherein the second recording file is a wav format file.

In a second aspect, the present application provides a device for generating a recording file of a capability platform based on AMR-WB encoding, the device comprising:

The voice stream data acquisition module is used for acquiring AMR-WB coded voice stream data of a voice call and sending the voice stream data to the capability platform;

The division marking module is used for dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream, and placing the voice streams on a preset network disk;

The transcoding module is used for respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, and transcoding to generate two first recording files in a PCMS16LE format;

The audio generation module is used for increasing the file header of each first recording file according to the sampling rate to generate a second recording file, wherein the second recording file is a pure audio file;

and the double-channel synthesis module is used for writing two second recording files frame by frame in time sequence and adding double-channel heads to synthesize a target double-channel recording file.

Further, the division marking module includes:

dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform;

The cutting sub-module is used for dividing the calling side voice stream and the called side voice stream into a plurality of data frames according to 20ms one-frame data;

An audio writing sub-module, configured to write the plurality of data frames into two RTP audio files respectively;

And the marking sub-module is used for extracting user number information, marking client marks for the two RTP audio files, taking the client marks as file names, and placing the two RTP audio files on a preset network disk, wherein the capacity of the preset network disk can be elastically expanded.

Further, the division marking module further includes:

the identification sub-module is used for identifying the plurality of data frames;

The transcoding module comprises:

the filling sub-module is used for acquiring the duration of the comfortable noise frame under the condition that the data frame is the comfortable noise frame, and carrying out PCMS16LE format coding filling on the comfortable noise frame according to the duration to form a first transcoding frame;

the decoding sub-module is used for decoding the voice data frame into a PCMS16LE format to form a second transcoding frame when the data frame is the voice data frame;

and the synthesis submodule is used for synthesizing the first transcoding frame and the second transcoding frame into the first recording file.

Further, the division marking module further includes:

and the time stamp recording submodule is used for recording the time stamp of each data frame.

Further, the audio generation module is specifically configured to:

The AMR-WB coding-based capacity platform recording file generation method comprises the steps of after an capacity platform obtains AMR-WB coding voice stream data of voice call, dividing the voice stream data into a calling side voice stream and a called side voice stream, marking the calling side voice stream and the called side voice stream, placing the marking on a preset network disk, respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, converting the mute compensation on the calling side voice stream and the called side voice stream to generate two first recording files in a PCMS16LE format, increasing a file header of each first recording file according to a sampling rate to generate a second recording file, wherein the second recording files are pure audio files, writing the two second recording files frame by frame according to a time sequence, adding two channel headers, and synthesizing a target two-channel recording file. According to the recording file generation scheme provided by the application, synchronous recording can be realized on the VoIP voice call of AMR-WB coding in a transcoding mode, and on the other hand, the occupation of system resources can be greatly reduced and the system processing efficiency can be improved through the voice streams on the calling side and the called side to be respectively processed and the direct filling of comfortable noise frames.

Drawings

FIG. 1 is a flowchart of a method for generating a recording file of a capability platform based on AMR-WB encoding according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a method and apparatus for generating a recording file based on an AMR-WB encoding capability platform according to an embodiment of the present application.

The achievement of the object, functional features and advantages of the present invention will be further described with reference to the embodiments, referring to the accompanying drawings.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

First, the proper nouns mentioned in the embodiments of the present application are explained.

PCMA coding, namely, an audio technology, namely, the PCM audio data are compressed and coded according to A-law;

AMR-WB (adaptive multirate wideband, adaptive multi-rate wideband speech coding) coding, with a sampling frequency of 16kHz, is a wideband speech coding standard adopted by the International standardization organization ITU-T and 3GPP at the same time, also called G722.2 standard. AMR-WB provides a voice bandwidth range up to 50-7000 hz, which the user can subjectively perceive as more natural, comfortable and easily distinguishable than before.

SIP (Session initialization Protocol, session initiation protocol), which is a multimedia communication protocol formulated by IETF (INTERNET ENGINEERING TASK Force ), is a text-based application-layer control protocol for creating, modifying and releasing sessions for one or more participants. SIP is an IP voice session control protocol derived from the internet, and has the characteristics of flexibility, easy implementation, convenient expansion, and the like.

RTP (Real-time Transport Protocol ) files for transmitting Real-time audio data in a network, in which various audio data frames are encapsulated. The RTP file refers to that the capability platform directly records and stores RTP data streams of both parties of a call, does not need to do any extra work of stripping frame header and decoding, greatly improves concurrency capability of a front end and reduces CPU consumption.

Call-id is a globally unique identifier and SIP request and response messages in a one-way Call are unique.

Comfort noise (comfort noise) is used to create background noise for telephone communications when a short silence occurs during a call.

Soft switch, which is an entity implementing the "call control" function of the conventional program controlled exchange, but the conventional "call control" function is integrated with services, the call control functions required for different services are different, and the soft switch is service independent, which requires the call control function provided by the soft switch to be basic call control of various services.

The method for generating the recording file of the capacity platform based on AMR-WB coding provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Example 1

Referring to fig. 1, a flow of a method for generating a recording file of a capability platform based on AMR-WB encoding according to an embodiment of the present application is shown.

The application provides a method for generating a recording file of a capability platform based on AMR-WB coding, which comprises the following steps:

S101, acquiring AMR-WB coded voice stream data of a voice call, and sending the voice stream data to a capability platform;

It can be understood that, according to the service characteristics of both parties and the media processing capability of the respective terminals, the calling party and the called party can negotiate into AMR-WB codes by using SIP signaling to perform both party voice call, obtain voice stream data during the call process, and send the voice stream data of both parties to the soft switch recording processing module of the capability platform. The capability platform is provided by a mobile internet operator.

S102, dividing the voice stream data into a calling side voice stream and a called side voice stream at the capability platform, marking the calling side voice stream and the called side voice stream, and placing the marked calling side voice stream and the marked called side voice stream on a preset network disk;

And the soft exchange recording module of the capability platform judges the RTP audio frame header 0x80 characteristic value field and the payload type frame type according to the media negotiation result of the calling side and the called side.

If the AMR-WB codes the audio frame, the voice stream data is divided into two files of a calling side voice stream and a called side voice stream. Therefore, the voice streams on the calling side and the voice streams on the called side can be processed simultaneously in a multipath manner during processing, and the processing efficiency of the audio data can be improved.

Extracting user number information FROM FROM and To fields of SIP signaling, marking client marks according To calling number, called number and Call-id To form file names, naming a first record file of a calling side and a first record file of a called side respectively, and using amrwb as an extension. For example, the file on the caller side obtained in the above step is named "caller 13xx.amrwb", and the file on the callee side is named "callee 15xx.amrwb". Therefore, the two RTP audio files can be marked with the user marks respectively, thereby avoiding the confusion of the files in the subsequent processing process and being beneficial to improving the processing efficiency.

After marking the user information of the calling side and the called side on two RTP audio files respectively, the audio files need to be placed on a designated network disk path. The network disk path can be a cloud network disk path provided by the capability platform, and the memory of the network disk can be elastically expanded so as to elastically increase or reduce the memory of the network disk according to the size of the audio file, thereby not only ensuring the stable storage of the audio file, but also avoiding unreasonable occupation of the storage space and further improving the utilization rate of system resources.

S103, respectively carrying out mute compensation on the calling side voice stream and the called side voice stream, and transcoding to generate two first recording files in a PCMS16LE format;

Optionally, after dividing the voice stream data into a calling side voice stream and a called side voice stream, dividing the calling side voice stream and the called side voice stream into a plurality of data frames according to a certain length respectively, and then writing the plurality of data frames into two RTP audio files according to time sequence. Alternatively, the caller-side voice stream and the callee-side voice stream are written into two RTP audio files, respectively, in 20ms of one frame data. Of course, in practical use, other intervals may be used to cut the data frame, such as cutting the audio data stream at 10ms, 30ms, or 50ms frames. It should be noted that when cutting the audio data stream, the time stamp of each current frame needs to be recorded in a buffer, so that the subsequent synthesis is performed according to the sequence of the time stamps.

After the data frame is cut, the type of the data frame may be identified. And loading the recorded two RTP recording files of the calling party and the called party by the back end AMR-WB recording transcoding service module, and stripping the RTP head according to RTP specifications to carry out AMR-WB audio processing. When each audio frame is processed, the first byte of each frame is read first, or 1-4 bytes are taken to judge the current frame type.

If the frame type is identified as the comfortable noise frame, the cached time stamp of the previous frame and the time stamp of the current frame are taken out for carrying out difference value calculation, and the mute time is obtained. And directly using the mute time length to carry out PCMS16LE format coding filling without carrying out transcoding processing on the frame, and sequentially writing non-compressed raw format files. It will be appreciated that short silence during a call is used to create background noise, known as comfort noise, for telephone communications. After the call voice data stream is divided into a calling side data stream and a called side data stream, a comfortable noise frame generated when a called side user speaks is existed in an audio data file of the calling side data stream, when the frame is determined to be the comfortable noise frame through a frame header field after the data frame is cut, the duration of the frame is determined, the PCMS16LE format codes with the same duration are used for filling, and similarly, when the comfortable noise frame generated when the calling side user speaks is existed in the audio data file of the called side data stream, the duration of the frame is determined to be the comfortable noise frame through the frame header field after the data frame is cut, the PCMS16LE format codes with the same duration are used for filling, and a first transcoding frame is formed after filling. Therefore, after the comfortable noise frame is identified, the data frame is processed in a direct format coding mode, so that occupation of system resources can be greatly reduced, and the processing efficiency of audio conversion is improved.

If the identified frame type is a voice data frame, the frame length corresponding to the current frame is obtained according to the AMR-WB standard, and then the data of which the frame length is reduced by 1-4 length (the byte length read for judging the frame type is reduced by 1-4) is continuously read, so that a complete AMR-WB frame is obtained. And inputting the read AMR-WB frame into a decoding module to be decoded into a PCMS16LE format, and writing the PCMS16LE file to form a second transcoding frame.

After the first transcoding frame and the second transcoding frame are formed, the first transcoding frame and the second transcoding frame are synthesized according to the sequence of the time stamps, and the first recording file of the calling side and the first recording file of the called side are synthesized respectively.

S104, increasing the file header of each first recording file according to the sampling rate to generate a second recording file, wherein the second recording file is an audio-only file;

the first recording file is increased by 44 bytes of wav file header according to the sampling rate to generate a second recording file, wherein the second recording file is a wav format file.

Specifically, after transcoding is finished, the first recording file is a PCMS16LE file in an uncompressed format, a wav file header of 44 bytes can be added to the first recording file according to attributes such as a sampling rate, the first recording file is converted into a 16k16bit L16 encoded second recording file, and the second recording file is a wav file and is a pure audio file capable of being played and processed.

It will be appreciated that since the calling side and the called side each generate a respective first sound recording file, the second sound recording files of the calling side and the called side are correspondingly generated after conversion.

And S105, writing two second recording files frame by frame in time sequence, adding a double-channel head, and synthesizing a target double-channel recording file.

A second recording file on the calling side and the called side can be written frame by frame in time sequence according to the voice stream on the calling side and the called side by using a SOX (audio processing tool), and a wav dual channel header is added to form a dual-channel file. Therefore, the obtained sound recording file can separate the voice of the calling party and the voice of the called party without adding extra workload, and can be used for directly carrying out bidirectional voice detection and carrying out value-added service processing.

The recording file generation method provided by the embodiment of the application greatly reduces the dependence on the platform performance, optimizes the processing mechanism of AMR-WB coding comfort noise, simplifies the complexity of transcoding work and improves the processing speed. In addition, the front end of the capability platform records the original RTP voice stream without decoding processing, the system consumes low speed, the high concurrency of the capability platform is improved, and the RTP file recorded by the front end soft switch of the capability platform is transmitted to the rear end transcoding processing service module in a net disk type, thereby having the advantages of high reliability and flexible expansion of capacity.

In a second aspect, the present application provides an AMR-WB encoding based capability platform audio file generating apparatus 20, comprising:

a voice stream data obtaining module 201, configured to obtain AMR-WB encoded voice stream data of a voice call, and send the voice stream data to a capability platform;

The division marking module 202 is configured to divide the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, mark the calling side voice stream and the called side voice stream, and place the marked voice streams on a preset network disk;

the transcoding module 203 is configured to perform mute compensation on the calling side voice stream and the called side voice stream, and transcode the two first recording files in PCMS16LE format;

the audio generation module 204 is configured to increase each first recording file by a file header according to a sampling rate to generate a second recording file, where the second recording file is a pure audio file;

the dual-channel synthesizing module 205 is configured to write two of the second audio files frame by frame in time sequence and add a dual-channel header to synthesize a target dual-channel audio file.

Wherein the division marking module 202 includes:

A dividing submodule 2021, configured to divide the voice stream data into a calling side voice stream and a called side voice stream at the capability platform;

A cutting sub-module 2022, configured to divide the caller-side voice stream and the callee-side voice stream into a plurality of data frames according to 20ms one-frame data;

an audio writing sub-module 2023, configured to write the plurality of data frames into two RTP audio files respectively;

the marking sub-module 2024 is configured to extract user number information, mark the two RTP audio files with client marks, place the two RTP audio files on a preset network disk by using the client marks as file names, and the capacity of the preset network disk can be elastically expanded.

An identification submodule 2025 for identifying the plurality of data frames;

a time stamp recording sub-module 2026 for recording a time stamp of each of the data frames.

Wherein the transcoding module 203 comprises:

A padding submodule 2031, configured to obtain a duration of the comfort noise frame when the data frame is the comfort noise frame, and perform PCMS16LE format encoding padding on the comfort noise frame according to the duration to form a first transcoding frame;

A decoding submodule 2032, configured to decode the voice data frame into a PCMS16LE format to form a second transcoded frame if the data frame is a voice data frame;

A synthesis submodule 2033, configured to synthesize the first transcoded frame and the second transcoded frame into the first recording file.

The device 20 for generating the recording file of the capability platform based on the AMR-WB encoding according to the embodiment of the present application can implement each process implemented in the embodiment of the method for generating the recording file of the capability platform based on the AMR-WB encoding, and in order to avoid repetition, a detailed description is omitted here.

The AMR-WB coding-based capacity platform recording file generation device provided by the application comprises a voice stream data acquisition module 201, a division marking module 202, a transcoding module 203, an audio generation module 204 and a dual-channel synthesis module 205, wherein after the voice stream data of AMR-WB coding of a voice call is acquired by the capacity platform, the voice stream data is divided into a calling side voice stream and a called side voice stream, the calling side voice stream and the called side voice stream are marked and placed on a preset network disk, the calling side voice stream and the called side voice stream are respectively subjected to mute compensation and transcoding to generate two first recording files in a PCMS16LE format, the file header of each first recording file is increased according to the sampling rate to generate a second recording file, the second recording file is a pure audio file, and the two second recording files are written into a dual-channel header frame by frame according to time sequence to synthesize a target dual-channel recording file. According to the recording file generation scheme provided by the application, synchronous recording can be realized on the VoIP voice call of AMR-WB coding in a transcoding mode, and on the other hand, the occupation of system resources can be greatly reduced and the system processing efficiency can be improved through the voice streams on the calling side and the called side to be respectively processed and the direct filling of comfortable noise frames.

The AMR-WB encoding-based capacity platform recording file generating device 20 provided by the application can optimize the processing mechanism of the comfortable noise of AMR-WB encoding in the recording link, the soft exchange at the front end of the capacity platform directly records the original RTP voice stream without decoding processing, the resource consumption of the platform system can be reduced, and the front end RTP recording and the rear end transcoding service are in seamless connection through a network disk medium and have the characteristics of high reliability and high elasticity.

The virtual device in the embodiment of the application can be a device, and also can be a component, an integrated circuit or a chip in a terminal.

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.

It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.

Finally, it should be noted that the foregoing embodiments are merely for illustrating the technical solutions of the present disclosure, and not for limiting the same, and although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical solutions described in the foregoing embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present disclosure.

The above description is only an example of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims

1. A method for generating a recording file of a capability platform based on AMR-WB encoding, characterized in that the method comprises:

Acquire AMR-WB encoded voice stream data of the voice call, and send the voice stream data to the capability platform;

The capability platform divides the voice stream data into a calling side voice stream and a called side voice stream, marks the calling side voice stream and the called side voice stream and places them on a preset network disk;

Performing silence compensation on the calling side voice stream and the called side voice stream respectively, and transcoding to generate two first recording files in PCMS16LE format;

Adding a file header to each of the first recording files according to the sampling rate to generate a second recording file, wherein the second recording file is a pure audio file;

The two second recording files are written frame by frame in time sequence and a dual-channel header is added to synthesize a target dual-channel recording file.

2. The recording file generation method according to claim 1 is characterized in that the step of dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream and placing them on a preset network disk comprises:

Divide the calling side voice stream and the called side voice stream into multiple data frames according to a frame data of 20ms;

Writing the multiple data frames into two RTP audio files respectively;

The user number information is extracted to add a customer tag to the two RTP audio files, and the customer tag is used as the file name to place the two RTP audio files in a preset network disk, and the capacity of the preset network disk can be flexibly expanded.

3. The recording file generation method according to claim 2, characterized in that after dividing the calling side voice stream and the called side voice stream into multiple data frames according to a frame of 20 ms, the method further comprises:

Identifying the multiple data frames;

The method of performing silence compensation on the calling side voice stream and the called side voice stream respectively and transcoding to generate two first recording files in PCMS16LE format is specifically as follows:

In the case where the data frame is a comfort noise frame, obtaining a duration of the comfort noise frame, and performing PCMS16LE format encoding and padding on the comfort noise frame according to the duration to form a first transcoded frame;

In the case where the data frame is a voice data frame, decoding the voice data frame into a PCMS16LE format to form a second transcoded frame;

The first transcoded frame and the second transcoded frame are synthesized into the first recording file.

4. The recording file generation method according to claim 2, characterized in that after dividing the calling side voice stream and the called side voice stream into multiple data frames according to a frame of 20ms, the method further comprises recording a timestamp of each of the data frames;

The step of writing the two second recording files frame by frame in time sequence and adding a dual-channel header to generate a target dual-channel recording file is specifically as follows:

The two second recording files are written frame by frame according to the timestamp and a dual-channel header is added to synthesize a target dual-channel recording file.

5. The recording file generation method according to claim 1, characterized in that the step of adding a file header to each of the first recording files according to a sampling rate to generate a second recording file, wherein the second recording file is a pure audio file, specifically comprises:

A 44-byte wav file header is added to the first recording file according to the sampling rate to generate a second recording file, and the second recording file is a file in wav format.

6. A device for generating a recording file of a capability platform based on AMR-WB encoding, characterized in that the device comprises:

A voice stream data acquisition module, used to acquire the AMR-WB encoded voice stream data of the voice call, and send the voice stream data to the capability platform;

A division marking module, used for dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform, marking the calling side voice stream and the called side voice stream and placing them in a preset network disk;

A transcoding module, used for performing silence compensation on the calling side voice stream and the called side voice stream respectively, and transcoding to generate two first recording files in PCMS16LE format;

An audio generation module, used for adding a file header to each of the first recording files according to a sampling rate to generate a second recording file, wherein the second recording file is a pure audio file;

The dual-channel synthesis module is used to write the two second recording files frame by frame in time sequence and add a dual-channel header to synthesize a target dual-channel recording file.

7. The recording file generating device according to claim 6, characterized in that the division marking module comprises:

A division submodule, used for dividing the voice stream data into a calling side voice stream and a called side voice stream on the capability platform;

A cutting submodule, used for dividing the calling side voice stream and the called side voice stream into multiple data frames according to a frame data of 20ms;

An audio writing submodule, used for writing the multiple data frames into two RTP audio files respectively;

The marking submodule is used to extract the user number information to mark the two RTP audio files with a customer tag, and use the customer tag as the file name to place the two RTP audio files in a preset network disk, and the capacity of the preset network disk can be flexibly expanded.

8. The recording file generating device according to claim 7, characterized in that the division marking module further comprises:

An identification submodule, used for identifying the multiple data frames;

The transcoding module comprises:

A filling submodule, configured to obtain the duration of the comfort noise frame when the data frame is a comfort noise frame, and perform PCMS16LE format encoding and filling on the comfort noise frame according to the duration to form a first transcoded frame;

A decoding submodule, for decoding the voice data frame into a PCMS16LE format to form a second transcoded frame when the data frame is a voice data frame;

A synthesis submodule is used to synthesize the first transcoded frame and the second transcoded frame into the first recording file.

9. The recording file generating device according to claim 7, characterized in that the division marking module further comprises:

The timestamp recording submodule is used to record the timestamp of each data frame.

10. The recording file generating device according to claim 6, characterized in that the audio generating module is specifically used for: