CN113851103B

CN113851103B - Audio noise reduction accelerator system and method based on RISC v custom instruction set expansion

Info

Publication number: CN113851103B
Application number: CN202111037629.XA
Authority: CN
Inventors: 袁军; 赵强; 孟祥胜; 李军
Original assignee: Shenzhen Jusheng Technology Co ltd
Current assignee: Shenzhen Jusheng Technology Co ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2025-02-07
Anticipated expiration: 2041-09-06
Also published as: CN113851103A

Abstract

The present invention claims protection for an audio noise reduction accelerator system and method based on the expansion of RISC v custom instruction set, which belongs to the field of integrated circuit technology, and mainly includes: E203_CORE, NICE_CORE, NICE_Interface, E203_SOC, audio codec WM8731 module, and audio noise reduction FxLMS algorithm. Among them, E203_CORE is connected to NICE_CORE through NICE_Interface, E203_CORE, NICE_CORE and related peripheral ports together form E203_SOC, E203_SOC is connected to the audio codec WM8731 module, and the audio noise reduction FxLMS algorithm is downloaded to the RISC v processor core through software programming for operation. The innovation point is that compared with the processor of ARM instruction set architecture, the processor using RISC v custom instruction set can accelerate the specific operation part of the audio noise reduction FxLMS algorithm; the present invention can further optimize the problems of area, power consumption, granularity, etc., while improving the flexibility and feasibility of the algorithm.

Description

Audio noise reduction accelerator system and method based on RISC v custom instruction set expansion

Technical Field

The invention belongs to the technical field of integrated circuits, and particularly relates to an audio noise reduction accelerator system and method based on RISC v custom instruction set expansion.

Background

Along with the development of economy and the continuous progress of technology, the application scenes of the audio noise reduction system are more and more, such as in-car noise reduction, in-gas station noise reduction, earphone noise reduction and the like. However, at present, a mode of passive noise reduction such as physical noise isolation is often adopted, a mode of active noise reduction and cancellation of superposition of sound source signals is often difficult to realize by hardware, and an improved excellent algorithm is difficult to realize by hardware by using an FPGA, so that an improved algorithm can be realized by adopting a software implementation mode, and the method is an important means for solving the problem that an audio noise reduction algorithm is difficult to realize.

Meanwhile, RISC v is an emerging instruction set architecture, and has the advantages of open source and late issue. Aiming at the embedded field, the soft core adopting the instruction set architecture can customize instructions, and aiming at the product accumulation structure and the convolution structure in the algorithm, a special accelerating unit circuit can be customized, so that the special design field of audio noise reduction is realized. With the fierce promotion of RISC v instruction set architecture in China, the design of the SOC chip in the special field can be greatly developed.

Then, with the rising of AIOT in China, aiming at the problems of running speed of a processor core and the like, a plurality of different types of accelerators, such as neural network accelerators aiming at images, are generated, wherein the conventional hardware FxLMS algorithm is not suitable for accelerating specific operation of the accelerators, and a multiplier and an adder make the hardware implementation of the FxLMS algorithm more challenging along with the complexity of the algorithm, so that the FPGA+MCU architecture is adopted for cooperation of software and hardware, the implementation of the algorithm is realized by software, the acceleration of the hardware aiming at specific operation is realized, and new directions and possibilities are presented in an audio noise reduction system.

The traditional noise reduction method is that, for example, amplitude value angle values of a plurality of noise audio signals are extracted, then training is carried out by using a neural network to determine a complex spectrum of a pure audio signal, then the obtained complex spectrum is subjected to inverse transformation, and finally audio noise reduction is carried out according to the inverse transformation. The mode does not have flexibility, retraining is needed for different noise sources, and meanwhile, the mode also does not have flexibility of adopting software and hardware cooperative processing, so that the audio noise reduction accelerator system expanded based on the RISC v custom instruction set provides a new method for audio noise reduction.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The invention provides an audio noise reduction accelerator system and a method based on RISC v user-defined instruction set expansion, which can realize flexible alterability of an algorithm, stability of transmission, rapidness of operation and strong specificity, and can be widely used in the noise reduction fields of sound sources such as automobile noise reduction, earphone noise reduction, high-speed railway plane noise reduction and the like, and the technical scheme of the invention is as follows:

An audio noise reduction accelerator system based on RISC v custom instruction set expansion comprises an E203_CORE (a buzzer E203 open source processor CORE), an NICE_CORE (a custom coprocessor), an NICE_interface (an Interface circuit between a main processor and a coprocessor), an E203_SOC (a system on a chip built by taking the E203 as a CORE), an audio codec WM8731 module and an audio noise reduction FxLMS algorithm, wherein,

The E203_CORE is connected with the NICE_CORE through the NICE_interface, the E203_CORE, the NICE_CORE and related peripheral ports form an E203_SOC, the E203_SOC is connected with an Audio encoding/decoding WM8731 module, and an Audio noise reduction FxLMS algorithm is downloaded into a RISC v processor CORE through software programming to run; after the system is started, the E203_CORE is initialized, then an IIC Interface circuit on a peripheral bus is accessed according to instruction sequence, an Audio codec module WM8731 is configured according to the instruction, then a sound source signal after superposition of a sound signal and target noise is acquired through a reference microphone, residual noise is acquired through an error microphone, the acquired sound source signal and residual noise pass through an Audio_data_rx of the WM8731 module, an analog signal is converted into a digital signal through an ADC conversion module built in the WM8731 module, then a digital filter module built in the WM8731 module is used for filtering processing, the filtered digital signal is transmitted to the peripheral bus through an IIS Interface circuit, then the filtered digital signal is acquired from the peripheral bus according to the instruction and is transmitted to a software implementation flow of an FxLMS algorithm for processing, a decoding part in the E230_CORE is detected as a self-defined instruction when the convolution product is processed and accumulated and calculated, a specific instruction is transmitted to the peripheral bus through an Interface filter module, then the filter module is used for transmitting the digital signal after the special instruction is processed, the channel is transmitted to the peripheral bus for the channel, the channel is processed after the channel is finished, and the channel is finished, and then the digital signals are transmitted to a WM8731 Audio encoding and decoding module through an IIS peripheral interface circuit to be converted into analog signals, the analog signals are transmitted to a secondary sound source through audio_data_tx to obtain inverted noise, and finally the obtained inverted noise is interfered with target noise to be cancelled, so that active noise reduction of Audio is carried out, and an Audio noise reduction system is realized.

Further, the e203_soc is composed of e203_core, nice_core, icb_apb, UART serial port, IIC Interface and IIS Interface; wherein, E203_CORE is a processor CORE for running instructions and executing the instructions in sequence, and controlling corresponding modules according to the instruction content to process data information, a general register Reg of an execution unit EXU in E203_CORE stores processed data returned by a coprocessor, a data tight coupler DTCM in a memory control unit LSU in E203_CORE provides data corresponding to a source register index to be transmitted to the coprocessor, so that a custom instruction can perform read-write access of the data, NICE_interface is used for communication transmission between E203_CORE and NICE_CORE, NICE_CORE is used for accelerating the data transmitted from the processor CORE, the custom instruction is acquired through NICE_interface, then the corresponding custom instruction is found through decoding Decode, namely Lbuf, sbuf, conv, mac, then data access storage or data processing and write-back are carried out according to the functions of the corresponding instructions, wherein the Lbuf instruction is a loading process of data, data acquisition is carried out from a DTCM in an LSU and the data are stored into a Buf, the Sbuf instruction is a data storage process, the data in the Buf are stored into the DTCM in the LSU, the Conv instruction is a convolution operation of acquiring a weight coefficient and an input signal from the Buf and writing the acquired operation result back to a Reg in the EXU, the Mac instruction is a multiplication and accumulation operation of acquiring the weight coefficient and an error component from the Buf and temporarily storing the operation result to the Buf, the whole result is written back to the Reg in the EXU after the operation is finished, the whole process is controlled by a finite state machine FSM for state transition, the ICB_APB is used for communication transmission between a processor CORE and each peripheral Interface, the UART serial port is used for transmitting ADCDATA and DACDATA to an upper computer for storage, so that MATLAB is convenient for analyzing data, the IIC interface is used for checking the configuration of registers by the audio codec WM8731 module by the processor, and the IIS interface is used for transmitting data between the processor core and the audio codec WM8731 module.

Further, the nice_core decodes the custom instruction and the signal including the source operand register address transmitted from the e203_core through nice_interface, decodes the custom instruction by comparing the specific format parts thereof to obtain a corresponding custom instruction, and then executes the corresponding custom instruction under the control of the state machine.

Further, an internal ADC, a DAC and a digital filter are disposed in the audio codec module WM8731, an external sound source to be processed is collected in an MIC transmission collection mode, an analog signal is converted from an analog signal to a digital signal through the internal ADC, the obtained digital signal is filtered by the internal digital filter to obtain output data ADCDATA, and the output data DACDATA after RISC v soft kernel processing is input to the DAC in the audio codec module to convert the digital signal to the analog signal, and the converted analog signal is output through the MIC.

Furthermore, the audio codec module WM8731 is internally provided with an IIC interface circuit, which can perform configuration in master-slave mode, ADC and DAC enable selection, and operations including volume size and whether to turn on digital filtering in MIC and LINE modes, and since the principle of the ANC audio noise reduction system is to use destructive interference of sound waves, the configuration WM8731 module is configured to be slave mode, MIC mode, ADC and DAC enable, and the register configuration information is configured by e203_core through the IIC interface bus.

Furthermore, the audio frequency noise reduction FxLMS algorithm is compiled by taking a C language as a description language and an inline assembly language as an auxiliary language, an executable file is downloaded into a RISC v soft core through compiling and assembling links by IDE software, and the FxLMS audio frequency noise reduction algorithm is realized in a software mode through executing instructions.

Further, in the software implementation of the audio noise reduction FxLMS algorithm, the digital signal obtained after the noise reduction processing of the digital signal to be processed is subjected to the noise reduction processing by adopting the following formula:

y(n)=y(n)+w(n)(k-2)x(n)

e(n)=d(n)-y_s(n)

d(n)=p(n)*x(n)

y_s(n)=s(n)*y(n)

Wherein y (n) is a secondary sound source, w (n) (k) is a weight coefficient, x (n) is a sound source signal, e (n) is an error signal, d (n) is a residual noise, y _s (n) is a secondary sound source passing through the secondary path, p (n) is a reference signal from the sound source signal to the error microphone, s (n) is a signal generated from the secondary sound source to the error microphone secondary path, The estimated value of the sound source signal, wherein the constraint on the step factor is specifically:

wherein mu is the step size of FxLMS algorithm, and lambda _max is the maximum value of the eigenvalue of the autocorrelation matrix.

An audio noise reduction method based on the system, comprising the steps of:

Firstly, storing an executable file generated after compiling and assembling links of a C language program through an off-chip Flash memory, after a system is started, initializing E203_CORE, then executing and accessing an IIC interface circuit on a peripheral bus according to an instruction sequence, and firstly configuring an audio encoding and decoding module WM8731 through the IIC interface circuit according to the instruction;

Secondly, a sound source signal after superposition of a sound signal and target noise is collected through a reference microphone, residual noise is collected through an error microphone, the collected sound source signal and the residual noise pass through an audio_data_rx of a WM8731 module, an analog signal is converted into a digital signal through an ADC conversion module built in the WM8731 module, then filtering processing is carried out through a digital filter module built in the WM8731 module, and the filtered digital signal is transmitted to a peripheral bus through an IIS interface circuit;

Then, obtaining a filtered digital signal from a peripheral bus according to an instruction, transmitting the filtered digital signal to an E203_CORE for processing according to a software implementation flow of FxLMS algorithm, wherein a decoded part in the E203_CORE is detected to be a custom instruction when a multiply-accumulate operation and a convolution operation are processed, the E203_CORE transmits the custom instruction and a source operand to the NICE_CORE through the NICE_interface for acceleration of a specific operation part, the multiply-accumulate operation calls a Mac instruction circuit, and the convolution operation calls a Conv instruction circuit;

then, the processed digital signals are transmitted back to the peripheral bus according to the instruction, the digital signals processed by the noise reduction FxLMS algorithm are output, and then are transmitted to the WM8731 audio encoding and decoding module through the IIS peripheral interface circuit to be subjected to DAC conversion, the processed digital signals are converted into analog signals, and the analog signals are transmitted to the secondary sound source through the audio data transmitting end to obtain anti-phase noise;

And finally, the interference of the anti-phase noise and the target noise is cancelled, so that the active noise reduction of the audio is carried out, and the audio noise reduction system is realized.

Further, a custom instruction set coding format is defined in the coprocessor, 4 custom instructions are involved in total, namely an LBUF instruction, an SBUF instruction, a CONV instruction and an MAC instruction, wherein the LBUF instruction and the SBUF instruction are respectively in a3 'b 010 format, the LBUF instruction and the SBUF instruction are respectively in a mode of being operated on BUF of the same address, so that difference is needed by an opcode field, namely 7' b0001011 and 7 'b 0101011, and the LBUF instruction are respectively determined by a custom0 field and a custom1 field, the SBUF instruction only needs to read and write data from a DTCM, and does not need to write data, so that only source operand 1, namely rs1, is needed, funct fields are defined as 3' b010 formats, representing only operating on source operand register 1, and rs2 are not used, as a length selection of data to be read or written is needed, funct fields are defined as initial addresses stored in the custom F, the CONV instruction only needs to return data to a target register, namely a target register is needed to be read and a target register, namely a virtual register is required to be read and a virtual register is required to be read, namely a virtual register is required to be read and a virtual register is required to be 2, namely a combination of the two-3, namely a virtual register is 3, a virtual register is 3 is defined by a virtual register is 3, namely a virtual register is 3, a virtual register is 3, and a virtual register is defined by a virtual register is 3, and a virtual register is required to be 3, and a virtual register is 3.

Furthermore, the data information to be processed can be loaded through the LBUF instruction, then the data information is stored in two pseudo dual-port SRAMs of XBUF and QBUF, and WBUF defines an empty pseudo dual-port SRAM in an initial value form. And then, reading data through a CONV instruction to XBUF and WBUF, wherein the instruction is a convolution operation unit circuit adopting an addition tree structure, the data length is determined by the order of a filter, namely, when the order of the filter is J, a signal X [ J-1:0] to be processed and a weight coefficient W [ J-1:0] are processed, the operation result is obtained and then stored in YBUF for FIFO storage, and the width and the depth are defined as the data width and the order length of the filter. And adding the product of QBUF and the error coefficient with the weight coefficient WBUF of the last column to obtain a new weight coefficient WBUF through the MAC instruction, writing the new weight coefficient into the new weight coefficient through the MAC instruction, and updating the weight, wherein the product of the error coefficient is obtained by subtracting data stored in the primary noise signal Dn and YBUF to obtain a product of the error coefficient and a step length factor, and repeating the steps until all the outputs fill the depth of the FIFO.

The invention has the advantages and beneficial effects as follows:

1. the invention adopts RISC v soft core SOC, has high configurability, can configure different peripheral interface circuits according to different requirements of functions, and simultaneously eliminates some interface peripherals which are not used in the circuits, thereby reducing the waste of resources.

2. The invention adopts RISC v instruction set architecture, which can define instructions according to the operation key steps in the algorithm, and then adopts the format of inline assembly to design a hardware acceleration unit so as to accelerate the specific operation key steps. The algorithm has faster operation speed, and can better reduce the instruction number and the execution cycle number.

3. The invention adopts the hardware IIS interface circuit, can stably transmit the audio signal, can meet the selection of different sampling rates and sampling byte numbers by configurable FIFO width and depth, clock units and the like, and has wider application range.

4. The invention adopts four custom instructions to accelerate operation, wherein the CONV instruction only uses a multiplier and an adder to form a convolution operation unit by adopting an addition tree structure, which can multiplex instruction circuits for convolution operation for a plurality of times in an algorithm, and the MAC instruction can multiplex a product accumulation part in the algorithm for a plurality of times, thereby reducing hardware resources consumed by the development of the whole coprocessor and reducing granularity thereof.

5. The invention adopts a software mode to realize FxLMS algorithm, utilizes a jacobian formula to calculate the maximum eigenvalue of the matrix, and then calculates the reciprocal to obtain the step factor. Compared with a pure hardware implementation mode, the software implementation is more feasible and difficult without hardware implementation, meanwhile, the software implementation has better flexibility, the hardware acceleration unit can be used for partially accelerating the RISC v instruction set system, and the advantages of hardware and software development are converged by adopting a processing mode of cooperation of software and hardware.

Drawings

FIG. 1 is a diagram of an audio noise reduction accelerator system architecture in accordance with a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of a NICE_interface circuit;

FIG. 3 is a diagram of custom instruction set encoding;

FIG. 4 is a block diagram of a convolution operation and multiply-accumulate operation circuit;

FIG. 5 is a block diagram of a hardware model of the FxLMS algorithm;

FIG. 6 is a flow chart of the C language implementation of the FxLMS algorithm;

FIG. 7 is a diagram of a joint simulation of the MAC custom instruction set VIVADO.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

The invention relates to an audio noise reduction accelerator system based on RISC v user-defined instruction set expansion, which is shown in a figure 1, and structurally comprises an E203_CORE, an NICE_CORE, an NICE_ INTERFACE, E203_SOC, an audio encoding and decoding WM8731 module and an audio noise reduction FxLMS algorithm, wherein:

The E203_SOC is composed of an E203_CORE, a NICE_CORE, an ICB_APB, a UART serial port, an IIC Interface and an IIS Interface; wherein, save the executable file produced after compiling and assembling the interlinkage through the off-chip Flash memory, after the system starts, at first E203_CORE is initialized, then according to the order execution of instructions to visit IIC Interface circuit on the peripheral bus, at first dispose the Audio codec module WM8731 through IIC Interface circuit according to the instruction, afterwards, collect the sound signal and the sound source signal after the target noise is superimposed through the reference microphone, collect the residual noise through the error microphone, pass the sound source signal and residual noise collected through the Audio_data_rx of WM8731 module, pass the ADC conversion module built-in by WM8731 module, convert the analog signal into the digital signal, then carry on the filter process through the digital filter module built-in by WM8731 module, transmit the digital signal after the filter to the peripheral bus through IIS Interface circuit, then obtain the digital signal after the filter according to the instruction from the peripheral bus and transmit to the E203_CORE in accordance with the software implementation flow of FxLMS algorithm, wherein, at the time of processing operation and convolution operation, because of adopting the custom instruction of the interior-connected instruction in C to define the Interface instruction, then the user-defined by the Interface instruction is transmitted to the peripheral bus, the instruction is processed, the user-defined by the user-defined instruction is carried out, the accumulating instruction is carried out, after the instruction is processed by the user-defined instruction, the digital signal after the user-defined instruction is accumulated instruction is processed by the Interface instruction, and the instruction is output by the Interface instruction, and the instruction is processed, and then the digital signals are transmitted to a WM8731 Audio encoding and decoding module through an IIS peripheral interface circuit to be converted into analog signals, the analog signals are transmitted to a secondary sound source through audio_data_tx to obtain inverted noise, and finally the obtained inverted noise is interfered with target noise to be cancelled, so that active noise reduction of Audio is carried out, and an Audio noise reduction system is realized.

In order to enable good time sequence stability between the NICE_CORE and the E203_CORE, an Interface circuit NICE_interface shown in figure 2 is adopted, when an EXU execution module detects that an instruction is a self-defined instruction, the instruction and source operand information are transmitted to the NICE_CORE through judging the encoding condition of the operation code and function 3 part of the instruction, the NICE_CORE carries out operation acceleration of a coprocessor part under the control of an internal FSM (state machine), particularly a convolution operation and multiply-accumulate operation part, when the processor needs to access data in a DTCM (data transfer control) in an LSU, the processor sends address read-write request, write data and other information to the LSU, and after the LSU returns a read result, the processor processes the data, judges whether the data needs to return to a target register according to a self-defined instruction encoding rule after the processing is finished, and returns the result to a general register in the EXU if the data needs to be processed.

The invention adopts the definition of the custom instruction set reserved by RISC v international authorities, as shown in figure 3, the opcode field, funct field and funct field determine the difference of the custom instruction set, and rd, rs1, rs2 and the like represent a written-back target register, a written-back source operand register 1 and a written-back source operand register 2 respectively. In the present invention, a total of 4 custom instructions, namely LBUF instruction, SBUF instruction, CONV instruction and MAC instruction, are respectively involved, wherein LBUF instruction and SBUF instruction are respectively determined by custom0 and custom1 because they operate BUF of the same address, namely 7 'b 0001011 and 7' b0101011, respectively, because SBUF and LBUF instructions only need to read and write data from DTCM and do not need to write back data, only need to use source operand 1, namely rs1, funct3 field is defined as 3 'b 010 format, representing that only source operand register rs1 is operated, because rd and rs2 are not used, this part can be defined as the length selection of the read or written data, the funct field is defined as the starting address of the load store in the BUF, wherein the CONV instruction only needs the target register to return data, so that the returned target register needs to be operated, that is, the rd, funct \ funct7\rs1\rs2 and other parts are combined, the data X [9:0] and the weight coefficient W [9:0] which are defined as the processing of the CONV instruction are read by the pseudo dual-port SRAM and transferred to the CONV circuit part for operation, the MAC instruction needs to call the target register and the source operand for operation, so that the funct field is defined as 3' b111 format, the funct field does not define requirements, the opcodes of the CONV and the MAC instruction are defined as four types of cure, and the invention uses cure 3 and cure 4, that is 7 'b 1011011 and 7' b1111011.

The operation part of the self-defined instruction in the invention is shown in fig. 4, the information of the data to be processed can be loaded through the LBUF instruction, then the information is stored in two pseudo dual-port SRAMs of XBUF and QBUF, and WBUF defines an empty pseudo dual-port SRAM in the form of initial value. And then, reading data through a CONV instruction to XBUF and WBUF, wherein the instruction is a convolution operation unit circuit adopting an addition tree structure, the data length is determined by the filter order, namely, when the filter order is J, the signal X [ J-1:0] to be processed and the weight coefficient W [ J-1:0] are stored in YBUF for FIFO storage after the operation result is obtained, and the width and the depth are defined as the data width and the order length of the filter. And adding the product of QBUF and the error coefficient with the weight coefficient WBUF of the last column to obtain a new weight coefficient WBUF through the MAC instruction, writing the new weight coefficient into the new weight coefficient through the MAC instruction, and updating the weight, wherein the product of the error coefficient is obtained by subtracting data stored in the primary noise signal Dn and YBUF to obtain a product of the error coefficient and a step length factor, and repeating the steps until the depths of all output FIFOs are filled.

The hardware model of the typical FxLMS algorithm is shown in the figure 5, the input sequence is multiplied by the weight coefficient after passing through the delayer, the output sequence is obtained by accumulation, the weight coefficient is updated and iterated after entering the LMS, the step is repeated for a plurality of times, the input sequence is updated for a plurality of times, the sequence after the noise reduction algorithm is obtained, and meanwhile, the following operation formula can be extracted through the hardware model of the FxLMS algorithm:

y(n)=y(n)+w(n)(k-2)x(n)

e(n)=d(n)-y_s(n)

d(n)=p(n)*x(n)

y_s(n)=s(n)*y(n)

and writing a C language program shown in figure 6 on the formula, adding a specific hardware acceleration unit circuit aiming at a specific operation part in the algorithm, adding a convolution operation unit aiming at a digital filter part, and adding a product accumulation circuit aiming at a part with updated weight coefficient to accelerate operation, so that the advantages of a RISC v instruction set architecture are reflected.

Finally, as shown in fig. 7, the result after carrying out VIVADO joint simulation by adopting the MAC custom instruction can be seen from the figure, and the specific instruction format, the information such as the data to be processed loaded and the data obtained by the last operation can be known to have complete hardware acceleration part of the whole system.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. An audio noise reduction accelerator system based on RISC v custom instruction set expansion, characterized in that it includes: E203_CORE, NICE_CORE, NICE_Interface, E203_SOC, audio codec WM8731 module, audio noise reduction FxLMS algorithm, E203_CORE is the hummingbird E203 open source processor core, NICE_CORE is a custom coprocessor, NICE_Interface is the interface circuit between the main processor and the coprocessor, E203_SOC is a system on chip built with E203 as the core, wherein the E203_CORE is connected to NICE_CORE through NICE_Interface, E203_CORE, NICE_CORE and related peripheral ports together form E203_SOC, E203_SOC is connected to the audio codec WM8731 module, and the audio noise reduction FxLMS algorithm is downloaded to RISC through software programming v runs in the processor core; the executable file is saved in the off-chip Flash memory. After the system starts, E203_CORE is initialized first, and then the IIC interface circuit on the peripheral bus is accessed according to the instruction sequence. First, the audio codec module WM8731 is configured through the IIC interface circuit. Then, the sound signal and the sound source signal after the superposition of the target noise are collected through the reference microphone, and the residual noise is collected through the error microphone. The collected sound source signal and residual noise are passed through the Audio_data_rx of the WM8731 module, and the analog signal is converted into a digital signal through the built-in ADC conversion module of the WM8731 module. After that, the digital filter module built in the WM8731 module is used for filtering, and the filtered digital signal is transmitted to the peripheral bus through the IIS interface circuit. Then, according to the instruction, the filtered digital signal is obtained from the peripheral bus and transmitted to the software in E203_CORE according to the FxLMS algorithm. The implementation process is processed, wherein when processing the multiplication and accumulation operation and the convolution operation, they will be detected as custom instructions by the decoding part in E230_CORE, and E203_CORE will pass the custom instructions and source operands to NICE_CORE through NICE_Interface to accelerate the specific operation part, wherein the multiplication and accumulation operation will call the Mac instruction circuit, and the convolution operation will call the Conv instruction circuit, and then according to the instruction execution, the processed digital signal will be transmitted to the peripheral bus, and the digital signal after the FxLMS algorithm noise reduction processing will be output, and then transmitted to the WM8731 audio codec module through the IIS peripheral interface circuit for DAC conversion, and the processed digital signal will be converted into an analog signal, and transmitted to the secondary sound source through Audio_data_tx to obtain the anti-phase noise, and finally the anti-phase noise obtained will interfere with the target noise, thereby performing active noise reduction of the audio and realizing the audio noise reduction system.

2. A RISC-based system according to claim 1. v An audio noise reduction accelerator system expanded by a custom instruction set, characterized in that the E203_SOC is composed of E203_CORE, NICE_CORE, ICB_APB, UART serial port, IIC interface and IIS interface; wherein E203_CORE is a processor core used to run instructions and execute instructions in sequence, and at the same time control the corresponding module to process data information according to the instruction content, the general register Reg of the execution unit EXU in E203_CORE saves the processed data returned by the coprocessor, and the data tight coupler DTCM in the memory access control unit LSU in E203_CORE provides data corresponding to the source register index to the coprocessor, so that the custom instruction can read and write data, NICE_Interface is used for communication and transmission between E203_CORE and NICE_CORE, NICE_CORE is used to accelerate the processing of data transmitted from the processor core, obtain the custom instruction through NICE_Interface, and then find the corresponding custom instruction through decoding Decode, the instruction is Lbuf, Sbuf, Conv, Mac, and then root According to the function of the corresponding instruction, data access storage or data processing and writing back are performed. Among them, the Lbuf instruction is the data loading process, which obtains data from the DTCM in the LSU and stores it in Buf. The Sbuf instruction is the data storage process, which returns the data in Buf to the DTCM in the LSU. The Conv instruction obtains the weight coefficient and the input signal from Buf for convolution operation, and writes the obtained operation result back to Reg in EXU. The Mac instruction obtains the weight coefficient and the error component from Buf for multiplication and accumulation operation, and temporarily stores the operation result in Buf. After the operation is completed, the entire result is written back to Reg in EXU. The whole process is controlled by a finite state machine FSM for state transfer. ICB_APB is used for communication transmission between the processor core and various peripheral interfaces. The UART serial port is used to transmit ADCDATA and DACDATA to the host computer for storage, which is convenient for analyzing data with MATLAB. The IIC interface is used by the processor core to configure the registers of the audio codec WM8731 module, and the IIS interface is used for data transmission between the processor core and the audio codec WM8731 module.

3. According to claim 1 or 2, an audio noise reduction accelerator system based on RISC v custom instruction set expansion is characterized in that the NICE_CORE decodes the signals including the custom instructions and source operand register addresses transmitted by E203_CORE through NICE_Interface, and decodes the corresponding custom instructions by comparing the specific format parts, and then executes the corresponding custom instructions under the regulation of the state machine.

4. According to claim 1 or 2, an audio noise reduction accelerator system based on RISC v custom instruction set expansion is characterized in that the audio codec module WM8731 is internally provided with a built-in ADC, a DAC and a digital filter. The MIC transmission acquisition mode is used to collect the external sound source to be processed, and the analog signal is converted to a digital signal through its built-in ADC, and then the obtained digital signal is filtered through the built-in digital filter to obtain output data ADCDATA, and then the data DACDATA after processing by the RISC v soft core is input into the DAC in the audio codec module to convert the digital signal to the analog signal, and the converted analog signal is output through the MIC.

5. According to claim 4, an audio noise reduction accelerator system based on RISC v custom instruction set expansion is characterized in that an IIC interface circuit is set inside the audio codec module WM8731, which can perform operations including master-slave mode configuration, ADC, DAC enable selection, volume size in MIC and LINE modes, and whether to turn on digital filtering. Since the principle of the ANC audio noise reduction system is to use the destructive interference of sound waves, the WM8731 module will be configured as slave mode, MIC mode, ADC, DAC enable, and the register configuration information will be configured by E203_CORE through the IIC interface bus.

6. According to claim 5, an audio noise reduction accelerator system based on RISC v custom instruction set expansion is characterized in that the audio noise reduction FxLMS algorithm is written in C language as the description language and inline assembly language as the auxiliary language. It is compiled and assembled through IDE software to link and download the executable file to the RISC v soft core, and the FxLMS audio noise reduction algorithm is implemented in software through the execution of instructions.

7. According to claim 6, an audio noise reduction accelerator system based on RISC v custom instruction set expansion is characterized in that, in the software implementation of the audio noise reduction FxLMS algorithm, the digital signal obtained after the noise reduction processing of the digital signal to be processed is subjected to noise reduction processing using the following formula:

y(n)＝y(n)+w(n)(k-2)x(n)

e(n)＝d(n) _-ys (n)

d(n)＝p(n)*x(n)

y _s (n) = s (n) * y (n)

Where y(n): secondary sound source, w(n)(k): weight coefficient, x(n): sound source signal, e(n): error signal, d(n): residual noise, _ys (n): secondary sound source through the secondary path, p(n): reference signal from the sound source signal to the error microphone, s(n): signal generated by the secondary path from the secondary sound source to the error microphone, The estimated value of the sound source signal, where the constraint on the step factor is:

Wherein μ: the step size of the FxLMS algorithm; λ _max : the maximum value of the eigenvalue of the autocorrelation matrix.

8. An audio noise reduction method based on the system according to any one of claims 1 to 7, characterized in that it comprises the following steps:

First, the executable file generated after the C language program is compiled, assembled and linked is saved in the off-chip Flash memory. After the system starts, E203_CORE is initialized first, and then the IIC interface circuit on the peripheral bus is accessed according to the instruction sequence. According to the instruction, the audio codec module WM8731 is first configured through the IIC interface circuit;

Secondly, the reference microphone collects the sound source signal after the sound signal and the target noise are superimposed, and the error microphone collects the residual noise. The collected sound source signal and residual noise are passed through the Audio_data_rx of the WM8731 module, and the built-in ADC conversion module of the WM8731 module converts the analog signal into a digital signal. After that, the built-in digital filter module of the WM8731 module performs filtering processing, and the filtered digital signal is transmitted to the peripheral bus through the IIS interface circuit;

Next, according to the instruction, the filtered digital signal is obtained from the peripheral bus and transmitted to E203_CORE for processing according to the software implementation process of the FxLMS algorithm. When processing the multiplication and accumulation operation and the convolution operation, the decoding part in E203_CORE will detect it as a custom instruction. E203_CORE will pass the custom instruction and source operand to NICE_CORE through NICE_Interface to accelerate the specific operation part. The multiplication and accumulation operation will call the Mac instruction circuit, and the convolution operation will call the Conv instruction circuit.

Then, according to the instruction execution, the processed digital signal is transmitted back to the peripheral bus, the digital signal processed by the noise reduction FxLMS algorithm is output, and then transmitted to the WM8731 audio codec module through the IIS peripheral interface circuit for DAC conversion, the processed digital signal is converted into an analog signal, and transmitted to the secondary sound source through the audio data sending end to obtain the anti-phase noise;

Finally, the anti-phase noise and the target noise interfere with each other, thereby performing active audio noise reduction and realizing an audio noise reduction system.

9. The audio noise reduction method according to claim 8 is characterized in that a total of 4 custom instructions are involved, namely, LBUF instruction, SBUF instruction, CONV instruction, and MAC instruction. Since the LBUF instruction and the SBUF instruction operate on the BUF of the same address, they need to be distinguished by their opcode fields, which are determined by custom0 and custom1, respectively, i.e., 7`b0001011 and 7`b0101011. Since the SBUF and LBUF instructions only need to read and write from DTCM and do not need to write back data, they only need to use their source operand 1, i.e., rs1, so the funct3 field is defined as 3`b 010 format, represents that only the source operand register rs1 is operated. Since rd and rs2 are not used, this part can be defined as the length selection of the data to be read or written. The funct7 field is defined as the starting address of the load storage in the BUF. The CONV instruction only needs the target register to return the data, so it is necessary to operate the target register returned, that is, rd. The funct3\funct7\rs1\rs2 and other parts are combined and defined as the data X[9:0] and weight coefficient W[9:0] processed by it, which are all read by the pseudo dual-port SRAM and passed to the CONV circuit part for calculation. The MAC instruction needs to call the target register and the source operand for its calculation, so its funct3 field is defined as 3`b111 format, and the funct7 field has no definition requirements. The opcodes of the CONV and MAC instructions are defined as any of the four custom ones. The present invention uses custom3 and custom4, that is, 7`b1011011 and 7`b1111011.

10. The audio noise reduction method according to claim 8 is characterized in that the data information to be processed can be loaded through the LBUF instruction, and then stored in the two pseudo dual-port SRAMs XBUF and QBUF, and WBUF is defined as an empty pseudo dual-port SRAM in the form of initial value, and then the data of XBUF and WBUF are read through the CONV instruction, which is a convolution operation unit circuit using an addition tree structure, and its data length is determined by the filter order, that is, when the filter order is J, its signal to be processed X[J-1:0] and weight coefficient W[J-1:0] are obtained. After the operation result is saved in YBUF for FIFO storage, its width and depth are defined as the data width and the order length of the filter; afterwards, the product of QBUF and the error coefficient is multiplied by the MAC instruction and then added to the weight coefficient WBUF in the previous column to obtain a new weight coefficient WBUF, and the new weight coefficient is written to it through the MAC custom instruction to update the weight, where the product part of the error coefficient is obtained by subtracting the primary noise signal Dn from the data stored in YBUF and then multiplying it by the step size factor, and then the above steps are repeated until all outputs fill the depth of the FIFO.