8000 stmhal: I2S and non-blocking DMA transfers · Issue #1422 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

stmhal: I2S and non-blocking DMA transfers #1422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
blmorris opened this issue Aug 11, 2015 · 15 comments
Closed

stmhal: I2S and non-blocking DMA transfers #1422

blmorris opened this issue Aug 11, 2015 · 15 comments
Labels
ports Relates to multiple ports, or a new/proposed port

Comments

@blmorris
Copy link
Contributor

In the process of developing an I2S driver for uPy I have run into a few issues that I thought might be relevant to a broader discussion of hardware peripheral implementations.

So far with I2S I've managed to muddle along by finding the closest approximations to what I need in the existing code and adapting that, learning both basic C programming and uPy's specific implementation details as I go. At this point I have gotten a bit stuck; as far as I can tell, the next steps in integrating I2S require some techniques that currently don't have any close approximations in the uPy code, I thought that a discussion of these issues may point the way to some implementation enhancements beyond I2S.

As background, the way the existing serial hardware drivers are implemented (broadly referring here to UART, SPI, I2C, and CAN), a buffer gets set up in memory and data gets sent from the buffer out through the peripheral, or is received from the peripheral and stored in the buffer. Even in those cases where DMA is implemented, the transactions are blocking (DMA transactions implement a busy-wait loop); a command doesn't yield control until the transfer has completed successfully or timed out.

I2S will need to work differently. Even the shortest audio tracks - single words, sound effects, etc - will be a substantial fraction of a second in length. A sentence is a few seconds, a music track will usually be a few minutes, and there is nothing to preclude a .wav file from being a few GB in size and running for several hours. A 16-bit stereo track at 44.1kHz (CD-quality) has a data rate of 176.4 kBytes / sec; 48kHz brings that to 192 kBytes / sec. Even one second of audio far exceeds the available RAM on the pyboard. (Using MP3's does not ameliorate the problem, compressed audio needs to be decompressed before being sent to a codec via I2S.)

ST's HAL provides I2S functions which support polling, interrupts, and DMA, as it does for all other serial peripherals. Within uPy polling simply won't work for I2S; nothing else will get a chance to run when an I2S transaction is running continuously for minutes at a time, and it isn't clear that there would even be an opportunity to get new data in from the SD Card. Likewise the interrupt methods will generate 96,000 interrupts per second for 48kHz stereo simplex - possibly twice that for duplex - and if any interrupts are missed because a gc cycle is in progress then there will be glitches in the audio.

The data transfer requirements of I2S audio can be met with DMA, and ST has provided standalone examples of how to do this using their HAL. It isn't yet clear to me how to integrate these into the uPy framework, mostly because there are techniques used which aren't yet implemented anywhere in uPy.

Currently the peripherals which support DMA (SPI, I2C, and DAC) will initialize DMA immediately before a transaction, transfer the data (receiving into or transmitting from a single buffer), block in a busy-wait loop until the transaction is completed, and then immediately de-initialize DMA for the peripheral.

This won't work for I2S; in order to maintain channel synchronization the DMA should remain active as long as the I2S port is enabled. To ensure glitch-free playback, double-buffering is used: when the Tx / Rx complete interrupt is caught the callback must immediately reinitiate the transfer using another pre-filled buffer, and then proceed to refill the just-used buffer with new data before the second buffer is empty. This must continue as long as the file contains new data. (Note: the wave.py module does the work of opening .wav files and providing the playback parameters and frame data.)

Given that wave file playback can continue for long periods without interruption, it also needs to be non-blocking. In my case, I have a system that will be monitoring the acoustic environment in real time, adjusting the volume and possibly even changing the digital equalization while audio playback is in process.

One of the mechanisms provided by the HAL are Tx/RxCplt and HalfCplt callbacks. For demostration purposes I can use them to print test messages or toggle LED's to show that they are working, but it isn't clear to me if or how they should be integrated into the Python callback functions; none of the callbacks implemented in uPy currently utilize the HAL callbacks.

I know that all of this can be done on the pyboard's processor, and I intend to get it working somehow in uPy. I have a colleague who has made I2S work in a standalone application who can help me with the C implementation details.

I wanted to bring this up in a separate discussion from #1361 because implementation of these features will require significant divergence from anything that currently exists in uPy, and may influence further development of existing peripheral drivers - in particular the possibility of non-blocking SPI and I2C transfers, possibly also extending to DAC and ADC. It would be nice to get some ideas before proceeding, even if it ultimately requires a few iterations to get a consensus on the design.

@dhylands
Copy link
Contributor

In order to implement a DMA callback, you would write a callback function in C with the following prototype:

void MyDMACallback(DMA_HandleTypeDef * hdma) {
    gc_lock();
    nlr_buf_t nlr;
    if (nlr_push(&nlr) == 0) {
        mp_call_function_1(callback, tim);
        nlr_pop();
    } else {
        // Uncaught exception; disable the callback so it doesn't run again.
        tim->callback = mp_const_none;
        __HAL_DMA_DISABLE_IT(hdma, irq_mask);
        printf("uncaught exception in DMA(%u) interrupt handler\n", dma_channel);
        mp_obj_print_exception(&mp_plat_print, (mp_obj_t)nlr.ret_val);
    }
    gc_unlock();
}

and you would store it in the hmda object by doing something like:

hdma->XferHalfCpltCallback = MyDMACallback;

Getting the callback/irq_mask etc can be a bit tricky. I don't see anything in the API that allows a "user" pointer to be provided. So a trick I oft 8000 en use (if I can) is to create my own structure which includes the DMA_HandleTypeDef as the first member and then the additional information as additional members. So something like:

typedef struct 
{
    DMA_HandleTypeDef    hdma;
    int  my_custom_field;
    mp_obj_t callback;
} MyDMAHandle;

then in MyDMACallback, the hdma not only points to the handle, but with a simple cast it points to your own MyDMAHandle structure (of course you need to declare the DMA handles using MyDMAHandle instead of just using DMA_HandleTypeDef).

If that doesn't make sense, feel free to ask questions.

@dpgeorge
Copy link
Member

There are lots of ways to make streaming I2S work, it just depends how customisable with Python you want it to be.

If you can write the code in C and have it working glitch-free (eg streaming a file from the SD card to I2S) then you could simply provide a uPy function call to start that C code running. Eg i2s.stream('/sd/track.wav'). All the work would be done in C. It could use DMA or interrupts to run in the background.

I don't see why interrupts wouldn't work. Yes they might take a lot of CPU, but that's because you're doing a complex thing, and as long as you have enough cycles you'll be ok (eg 96000 irqs a second would each get at most 1750 cycles to execute). The GC does not disable interrupts when running so wouldn't introduce glitches (as long as no Python code is needed to actually drive the streaming).

But I agree that DMA is the way to go. If the idea is to have a Python function fill the double buffer each time it's emptied then that will require some analysis to see if it's capable with the given number of CPU cycles that the F405 has available.

@blmorris can you provide a rough sketch of the code that you would like to run to stream I2S data?

@blmorris
Copy link
Contributor Author

The GC does not disable interrupts when running so wouldn't introduce glitches (as long as no Python code is needed to actually drive the streaming).

Okay, that was my misconception. I thought that memory couldn't be allocated during a GC cycle, and that memory can't be allocated during an interrupt handler, and made the wrong connection ;)

If you can write the code in C and have it working glitch-free (eg streaming a file from the SD card to I2S) then you could simply provide a uPy function call to start that C code running. Eg i2s.stream('/sd/track.wav'). All the work would be done in C. It could use DMA or interrupts to run in the background.

As a first implementation I would be happy with that. I might even aim for the stream function to take a customized file handle instead, something like this:

wf = wave.open('/sd/track.wav')
i2s.stream(wf)

This could provide a few benefits: the wave object would pass parameters like sample rate, sample width, and number of channels directly to the stream method. Reaching a bit further, I could imagine having an mp3.open function that could do the same thing for mp3 files (it would be substantially more complicated, and might need to be written in C as well.)

can you provide a rough sketch of the code that you would like to run to stream I2S data?

I don't have any firm ideas yet, and for now I would be happy to be able to run i2s.stream(wf) in the background while being free to execute other code. i2s.stream should provide options to play audio, record it, and do both simultaneously; i2s should also have methods to pause, resume, or stop a track and to get a callback when it is finished so the next track can be played. This could all be implemented in C, and there are examples in the HAL Cube package.

In the longer term, it would be interesting to be able to process and even generate audio signals on the fly in Python; that would require having access to the double buffers directly from Python, and it would be an open question whether the processing power would be available to do it (might require assembler routines.)

Until you suggested implementing a streaming method completely in C I had expected that I would be filling the buffers directly in Python; I thought it might look something like these PyAudio examples: Polling Playback and Callback Plaback, but I wasn't stuck on doing it that way.

Mostly, I would like to make the I2S module useful and interesting enough to be a part of MicroPython that other people use; understanding that for most users this will require an I2S codec board that is compatible with the pyboard. Im working on that one too ;)

@blmorris
Copy link
Contributor Author

@dhylands - Thanks for the example; it looks like the Timer example that you linked to earlier. I think that I see how to make it work, I will talk it over with my colleague Divya.

@blmorris
Copy link
Contributor Author

@dpgeorge - I guess that the short takeaway message is that I should go ahead and diverge from the existing uPy methods in order to make I2S do what it needs to be useful - we can worry about how it might influence other methods later.

@danicampora
Copy link
Member

I think the approach of:

wf = wave.open('/sd/track.wav')
i2s.stream(wf)

Makes the most sense, but I have also been thinking myself about:

in particular the possibility of non-blocking SPI and I2C transfers, possibly also extending to DAC and ADC.

I really think we should add a way to use SPI, I2C, UART, etc and other peripherals in an asynchronous way. For instance, If you want to send a lot of SPI data without waiting for the transfer to complete, you pass a buffer, and the function returns immediately, then you can poll afterwards to know when the transfer is done. The current API it's fine, so I was thinking we could add an extra param (e.g. async) to enable async transfers and then an extra method to poll the status. Is this reasonable?

@dhylands
Copy link
Contributor

I would also like to see the ability to specify a callback to be called when the transfer is complete.

@danicampora
Copy link
8000
Member

I would also like to see the ability to specify a callback to be called when the transfer is complete.

Yes also that. I created a generic callback class for the WiPy that is used by the UART, RTC, WLAN, Timers, Pin. It's based on this discussion: #1118

It goes like this:

callback(mode, value, priority, handler, wakes_from)

Each peripheral defines it's own constants for mode (Pin has INT_RISING for instance), and decides what values it can accept. Priority is an integer, the higher the number, the higher the priority. handler is the function that gets called when the callback is triggered, and wakes_from can be ACTIVE, SUSPENDED or HIBERNATING.

The idea was to make a generic callback API for all ports, but we never got there...

@dpgeorge
Copy link
Member

The idea was to make a generic callback API for all ports, but we never got there...

@danicampora I feel for you! Please make an issue with a list of things that we need to decide on before WiPy ships.

@danicampora
Copy link
Member

@dpgeorge sure, thanks!. I'll prepare that list today :-)

@blmorris
Copy link
Contributor Author

@dhylands @dpgeorge @danicampora - Thanks all for the suggestions and feedback!

I opened this issue with the idea that I2S would be implemented around non-blocking DMA buffer transfers, hoping to get a discussion going about how non-blocking DMA transfers could be consistently implemented across different peripherals and ports.

Now, my plan is to base I2S streaming on file handles rather than buffers, and the discussion of non-blocking buffer transfers with callback support can be followed, along with the other WiPy-related issues, at #1425

I'm happy to continue the part of the discussion directly related to I2S at #1361, and close this issue for having served its purpose.
Any objections?

@dpgeorge
Copy link
Member

I had expected that I would be filling the buffers directly in Python; I thought it might look something like these PyAudio examples: Polling Playback and Callback Plaback

PyAudio certainly provides a clean API for play/record. You really want to use the callback scheme of PyAudio, but note that it uses threads to implement the callback! There's a lot of work before uPy can do such a thing.

... hoping to get a discussion going about how non-blocking DMA transfers could be consistently implemented across different peripherals and ports.

That is certainly a worthy thing to do, and we should still try to do it. But it likely won't happen quickly.

There seems to be one big difference between I2S and other streamable peripherals: I2S needs to stream for longer than available RAM, but SPI/I2C/UART don't. So while it's rather straightforward to implement something like spi.send(data, async=True) which assumes data is a buffer and just uses DMA to copy the bytes, it's a different concept with i2s.stream(callback_which_yields_data).

Maybe send/recv or read/write can be used for transferring a fixed buffer, but stream_in/stream_out for streaming using some object which satisfies a protocol. Eg to make a user filesystem the mounted object provides readblocks/writeblocks. So a streaming object can be anything that provides, eg, read/write. Eg:

i2s.stream_out(StringIO('stream this data')) # will call .read on the StringIO object
s = StringIO()
i2s.stream_in(s) # will call .write on s
f = open('/sd/test.wav')
i2s.stream_out(f) # will call .read on f

Since a file object provides read/write it can be passed directly to stream_in/stream_out. And this case of streaming to/from a file can be optimised behind the scenes if needed. If you want to generate audio on the fly then just make a class which implements read and yields the generated waveform.

@danicampora
Copy link
Member

Maybe send/recv or read/write can be used for transferring a fixed buffer, but stream_in/stream_out for streaming using some object which satisfies a protocol. Eg to make a user filesystem the mounted object provides readblocks/writeblocks. So a streaming object can be anything that provides, eg, read/write. Eg:

i2s.stream_out(StringIO('stream this data')) # will call .read on the StringIO object
s = StringIO()
i2s.stream_in(s) # will call .write on s
f = open('/sd/test.wav')
i2s.stream_out(f) # will call .read on f
Since a file object provides read/write it can be passed directly to stream_in/stream_out. And this case of streaming to/from a file can be optimised behind the scenes if needed. If you want to generate audio on the fly then just make a class which implements read and yields the generated waveform.

Sound good!

@blmorris
Copy link
Contributor Author

Maybe send/recv or read/write can be used for transferring a fixed buffer, but stream_in/stream_out for streaming using some object which satisfies a protocol. Eg to make a user filesystem the mounted object provides readblocks/writeblocks. So a streaming object can be anything that provides, eg, read/write. Eg:

i2s.stream_out(StringIO('stream this data')) # will call .read on the StringIO object
s = StringIO()
i2s.stream_in(s) # will call .write on s
f = open('/sd/test.wav')
i2s.stream_out(f) # will call .read on f

Since a file object provides read/write it can be passed directly to stream_in/stream_out. And this case of streaming to/from a file can be optimised behind the scenes if needed. If you want to generate audio on the fly then just make a class which implements read and yields the generated waveform.

It took me a few times to read through carefully and think it through, but now I'm convinced that this is the right approach.

I had intended to require passing a *.wav file that had been opened by the wave.py module, just to get the audio file parameters and readframes() / writeframes(); it is now clear to me that this would be too restrictive. Specifically it would make stream_in/stream_out dependent on an external module written in Python that isn't guaranteed to be available! Much better instead to require a file-like object providing .read or .write and get the playback parameters to pass to I2S separately.

I especially like the idea that generating audio on the fly just requires creating a class that implements read - keeping it simple.

The next trick - and I don't know yet if it is actually tricky - will be to allow stream_in and stream_out to share a duplex I2S instance. Duplexing on a single I2S instance (which I plan to do) requires using the HAL_I2SEx_TransmitReceive(DMA) function, meaning that stream_in and stream_out will call the same underlying function and can't be completely independent. For example, if a stream_out is in progress and we want to begin a stream_in, my first implementation attempt would be to wait until a buffer-empty callback triggers a switch between the double buffers, and add the stream_in operation to the next HAL_I2SEx_TransmitReceive call. The problem would be if the stream_in and stream_out calls need to be more concurrent than the time to empty a buffer… that is a problem that can wait, and may simply require adding a stream_in_out method to begin both operations at the same time.

Plenty to work with here, time to start writing code again...

@pfalcon pfalcon added the ports Relates to multiple ports, or a new/proposed port label Oct 25, 2015
tannewt pushed a commit to tannewt/circuitpython that referenced this issue Jan 7, 2019
tannewt pushed a commit to tannewt/circuitpython that referenced this issue Jan 7, 2019
@dpgeorge
Copy link
Member

I2S was implemented with DMA in 8a5bfe4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ports Relates to multiple ports, or a new/proposed port
Projects
None yet
Development

No branches or pull requests

5 participants
0