-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
stmhal: uart chars are being dropped, even when only at 115200 baud #1555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looking at IRQ priorities:
I think that the UART priority is totally at the wrong end (currently almost lowest priority) when I think it should be one of the highest priorities. Changing the 0xd, 0xd to 0, 0 made no difference which suggests that interrupts are being disabled. |
If you are writing to the internal flash then it could be that the page erase is hogging the MCU.
The sys.stdin/stdout objects call
I agree.
Strange... did you try using 1,1 or 2,2? |
I'll second that. Also... I'm pretty new to the code base so I'm hesitant to make coding-convention comments, but I note that sorting out the interrupt levels requires grepping out hard-coded constants from 9 different .c files. May I suggest collecting interrupt priorities in a single .h file of #defines? Also, perhaps someone can educate me on this point: I see that the UART API has a way to specify a receive buffer, but not a transmit buffer. What is the reason for that? |
Strictly speaking, you don't need it. Admittedly it does slow down the sender, at the cost of using less RAM. I also got quite confused by the receive buffer whose head doesn't point to first character to read but rather points to the next available space in the buffer. I've found that using the names get_idx and put_idx generally clears up that confusion, but I'll leave that for another day. I think that adding a send buffer is a good idea, especially for lower baud rates. |
It would be the same reason why USB_VCP implementation says "it would be nice to have circular buffer", while UART does have it. And uttered in general manner, that reason would be "While uPy core is pretty clean, ports leave much to be desired, even stmhal one." For example, I didn't touch pyboard for a while, then tested #1517 on it. I discovered 5-6 issues. And of course it happens not because someone doesn't do it right (though there's bit of it too), but because the whole context doesn't provide incentive to do it right - built on messy vendor libs, the result is also messy. And I remember why I worked on https://github.com/pfalcon/PeripheralTemplateLibrary - to break from that vendor NIH vendor cycle, while not compromising on efficiency and clarity. It's all doable in C, though with compromises in those aspects, which gives less incentive to do it, but we need to do it anyway, because otherwise it's all turning into mess. And I'm writing this exactly to set new baseline to try to reach. As for UART buffering, see also #1533 |
The problem is definitely because interrupts are being disabled. Channel 0 is the Rx UART data I see a couple of options: 1 - Figure out how to do Rx DMA on the UART. I have a sneaky feeling that this will wind up having further implications due to there being a limited number of DMA channels. I suspect we may need a simple DMA manager (been there before) 2 - Turn off the async flushing of sdcard/flash data if we can't really do async I/O. At least if everything was synchronous, there would only be file system I/O when calling a file system function. This would address the issue I'm seeing here but not address the issue in general. 3 - Make the sdcard stuff not need to disable interrupts. I think I'll at least take a stab at #1. |
Indeed. Senders that care can poll.
Yes, thanks, I saw that, which is part of the reason the concept was top-of-mind.
Looking at how pyb.disable_irq() and enable_irq() are defined, they don't play very will with the concept of interrupt levels. It is all-or-nothing. My code snipped in http://forum.micropython.org/viewtopic.php?f=6&t=1066 would be improved if one could say: def counter_value():
s = pyb.disable_irq(pyb.INTR_PRI_COUNTER) # Lock out counter and lower priority interrupts only.
t = buf[0]
pyb.enable_irq(s)
return t & 0xffffffff # coerce to arbitrary precision because the aim is only to create a critical section around the counter interrupt, and there is no need to lock out UART, CAN, etc. This problem generalizes to callbacks that could be quite lengthy. We could tweak all day with interrupt priorities, but in the end unless pyb.disable_irq() is little more flexible, dropped interrupts are going to be an on-going issue. |
I'd suggest going for this one. It means using DMA for the SD card which has been on the todo list for some time (and is partially implemented).
I don't think the MCU supports masking up to a certain interrupt level. You'd need to mask all irq flags individually for each peripheral that you want to disable. That would be expensive. |
Yes! |
Hmm.... OK. I'll look into the details of that. |
@dhylands - If you are going to dig into DMA wrt the UART, could you take a look also at what I have done for I2S in #1361? I don't know enough to claim that it is the right way to do it - my code diverges from the examples set in the SPI and I2C code in that the DMA hardware is left initialized for the entirety of long transfers instead of started and stopped for each transaction. I can say that it is functional. |
It looks like the M3/M4 have a BASEPRI register which can mask interrupts based on priority. This article: http://embeddedgurus.com/state-space/2014/02/cutting-through-the-confusion-with-arm-cortex-m-interrupt-priorities/ (near the end of the page) looks useful. |
If we do decide to implement masking with a priority level, I think that we should extend the HW API so that you can query the interrupt priority (or whatever it is we need to pass into disable_irq to mask it). Then when we later add the ability for python code to manipulate interrupt levels (seems inevitable to me), the code can still just query the peripheral it wants to create a ciritcal section for. |
@dhylands I found the same things that you did, Cortex-M3/M4 have a basepri register for interrupt-level masking.
The current definition of disable_irq() returns a boolean, that you are expected pass back to enable_irq(). So I think the boolean simply becomes an int. I think the idiom for getting the current level is simply disable_irq(really_large_number), which will return the current level. disable_irq() certainly needs to be defined such that it can't reduce the current interrupt priority, so the check to turn disable_irq() into a no-op would need to be included anyway. |
Right, but lets go back to your counter example. if you were decrementing the counter in the main python code and incrementing it in a timer callback, then you want to protect the decrement against the timer callback (so prevent timer callbacks from happening while decrementing). UART interrupts would be fine. But if we did have UART callbacks, and you were incrementing the counter in a UART callback, then the python code wants to protect against UART IRQs. |
Well, yes, but I don't see how that changes things. The critical section would block interrupts for the highest priority interrupt that could encroach. That seems like an application design issue, not an API issue. The user needs to understand his own critical sections. As to API issues, if the concept of interrupt levels gets pulled in, then disable_irq() and enable_irq() lose their meaning somewhat. A single: # Of course, this needs to be implemented as C code in the hardware module....
def irq_level(new_pri = None):
old_pri = cur_pri
if new_pri is not None:
cur_pri = new_pri
return old_pri would be sufficient to query, raise, and lower the interrupt priority. What this does not allow for is writing code that is robustly portable, in that a callback could unintentionally reduce the system interrupt priority when ported to a system with a different ordering of interrupts. It would probably also be good to include a protect_irq_level(new_pri) that only changed the interrupt level in the direction of higher priority interrupts. I'm also assuming the hardware module would include constants like: INTR_PRI_UART
INTR_PRI_CAN
INTR_PRI_COUNTER
# ...etc... With careful forethought, something like: block_these = min([INTR_PRI_UART, INTR_PRI_COUNTER])
saved = raise_irq_level(block_these)
# do something critical
irq_level(saved) could be made to work. |
Since I suggested it I should probably volunteer. Seems like a simple enough patch. If people are OK with names like: #define INTR_PRI_UART 0xd
#define INTR_SUBPRI_UART 0xd I will create a patch confined to this simple text substitution. |
I realized that there is also a granularity thing that we could take advantage of. For example, if you're just looking for protection against a counter being incremented by your timer callback:
then I think it makes sense to be able to call ic.disable_irq() which would just disable the interrupt from channel 4 of timer 2. This level of granularity would mean that it would have virtually no impact in the rest of the system. I also think that it would make sense to have something like Timer.irq_level() which would return the currently configured level of the timer interrupt. Then if you needed to disable any events from that level and lower, you would make an appropriate call to do so. I think both concepts are valid and when used properly give you a system with maximum responsiveness. I'd still start by putting the irq constants in one central header file. If we allow run-time level specification, then these would be the defaults. |
I agree that this level of control makes good sense. I think you still need to deal with the issue where if a particular IRQ is already disabled, you should not re-enable, but restore previous state. So I'd be more inclined to go with saved=ic.enable_irq(False)... ic.enable_irq(saved) to disable/enable. But the per-mask-bit access makes total sense to me.
This frees the application writer from having to understand/juggle constants -- the peripheral simply knows it's level and can tell you.
Run-time level specification is a strong argument for peripherals knowing their interrupt level. Run time level specification also no doubt has other subtle implications... I want to think on that for a while. We are talking about this in the context of the PyBoard HAL, other devices certainly will not have all of these interrupt concepts. We should probably think about how this maps into a Cortex M0, for instance, or other implementations of the hardware API. |
In a environment that didn't have IRQ levels, then the irq_level function would probably just return True or False, and you would still do:
it would just wind up disabling all interrupts internally. Masking off individual interrupts would still work (and yes - I was assuming the ic.disable_irq would return something so that it works as a save/restore). |
I've confirmed that I can use rshell to copy file files over serial to sd and not drop any characters when PR #1625 is applied, however, I think it is probably still prudent to raise the interrupt priority level of the UART to allow dealing with higher speed baud rates. So I'm going to leave this open for now to deal with that particular issue. |
@dhylands
And a couple of comments:
|
I think that it needs to be higher priority than TIM3 and probably USB, but I think that's the only criteria.
I haven't played with that, but plan to soon. By toggling a GPIO in the IRQ handler I should be able to get a pretty good feeling for CPU overhead versus baudrate.
I definitely want to get there as well. My bioloid servos run at 1 Mbaud, and I can achieve reliable 1 Mbaud comms on my 16-MHz AVR (and it also doesn't have a FIFO) using just C code, so I see no reason why this can't be done on the STM. On the AVR this consumes about 50% of the CPU (for interrupt handling) during the actual comms. I'd really expect the CPU overhead to be much lower on the STM, but that remains to be seen. I'm going to guess that the UART will need to be higher priority than flash. I'll need to instrument SysTick and see how much time it takes as well. If the UART priority is lower than SysTick, then the time that SysTick takes will be the limiting factor. It may make sense to have something special where we can boost the UART even higher than SysTick (which really means lowering the priority of SysTick), but I'll wait and see what the numbers say. And there is still the option of using DMA for the UART. Now that I've been mucking about with the DMA code for the sdcard, I understand the DMA stuff alot better and I have better idea of how to implement it.
Currently there aren't any callbacks associated with the UART, and if there were it would good if they were more FIFO relevant than character relevant. The STM supports some SW ISR/Event features that I haven't investigated yet and maybe it makes sense to have the UART trigger something like that especially if we can make it have a lower priority. It's too bad that there isn't anything between the "main thread' and HW ISRs. I'd really like to see something akin to software ISRs that linux supports. In the world of python bytecode this would be something that interrupts the main thread, but is actually executed in the context of the main thread, so it would have full heap access and would run with HW interrupts fully enabled. |
@dhylands I just pushed my irq-statistics branch dpgeorge@d9a1f24 which you might find useful for inspecting the number of irqs. Maybe we can merge this to mainline with some suitable #define config variable to enable it only for debugging?
You could use the "pending exception" check in the VM to implement something like this. You could set the mp_pending_exception variable to a non-exception object (eg a small int) to indicate that you want to run a software ISR, and the VM would detect such a case, and then call the relevant function. So you'd be implementing software ISRs that only execute between completed opcodes. I think this would play ok with the memory manager. It's like secretly inserting a function call in between 2 opcodes. |
@dpgeorge - Sweet - I'll look at both of those, and I'm definitely interested in taking a stab at the irq-statistics thing. Something along the lines of what you suggested for mp_pending_exception was exactly what I was thinking of. |
Soft ISRs would be a good (temporary?) solution to the "I want to allocate on the heap during an interrupt" problem. We could add an option to the callback/irq setup function to indicate soft/hard interrupt. Eg, |
Yeah - I was thinking that something along those lines would be useful, as well as the ability to schedule a SW ISR from a HW ISR. That way you could still have a HW ISR and it could do its thing and schedule the SW ISR to do more work later by the VM. To be really useful, it would also need a mutual exclusion mechanism (essentially, a disable/enable sw isr which would parallel the disable/enable_irq functions). |
This sounds very much like the "top half, bottom half" concept in Linux device drivers. Top half runs at interrupt priority, the bottom half is scheduled at return-from-interrupt time.
Or... is there any reason why it would not simply be an exception object of a different flavor?
My concern wasn't with callbacks tied to a UART, but currently a callback tied to anything locks out all interrupts. So a timer callback, for instance, might execute enough code to squash a UART interrupt. Also, the current interrupt disable is a very blunt instrument, so arbitrary Python code can lock out all interrupts for arbitrary periods of time. It would be nice to be able to create a fine-tuned critical section on say, a timer interrupt, without locking out UART interrupts. |
Yeah - linux has used the term bottom-half in the old days, tasklet is the new terminology. Some real-time OSes call them software ISRs. But they're essentially all the same thing. They run with interrupts enabled and you could think of them as "highest priority" tasks in a real-time OS, except that they're not typically preemptive. They're usually run-to-completion and normal threads don't continue execution until they've finished. What we're proposing here would be conceptually similar, but would be initiated by the VM rather than running at the bottom of the HW ISR stack (which is the place that you'd put them with a real OS). By having the VM run them at the "bytecode" level it means that they still interrupt the running python code, but they can allocate memory. They'll get delayed by GC (just like regular byte code).
An exception would unwind the execution stack of the running python code until an appropriate catch handler was hit. We don't want to do that. We want to execute some code more like inserting a function call in the middle of the python bytecode.
No - That's not the case. The callbacks are run at the same interrupt priority as the corresponding HW ISR. Higher priority interrupts are still enabled, and interrupts of the same or lower priority will be deferred (until the currently running handler finishes). We don't disable interrupts while executing a callback, we just disable heap allocations. If we find the need to add the more fine-grained control, we could always add enabled/disable_irq functions which are attached to the appropriate peripheral. You could implement those today using the stm module. But it doesn't matter what you do, people have enough rope to hang themselves with. |
What is the current status of this issue? Using firmware built from today's source, and boot.py from @dhylands first message, copying even a small (297 byte) file to /sd using rshell causes a hang. rshell otherwise works, including copying from /sd to . |
@peterhinch It should be resolved once PR #1625 is merged into master. |
@dbc I setup a test where I had the host send 4096 bytes at 1 Mbit/sec. The data consisted of an incrementing pattern so that the receiver could verify that the correct data was received (which it was). The receive FIFO was set at its default of 64 bytes. I was doing a readinto a 4K bytearray. https://www.dropbox.com/s/2gnnh9o2rsis4bz/UART-1Mbit.png?dl=0 shows a logic analyzer capture. The top line is the UART data being sent a 1 MBit. The ISR takes approx 0.5 usec, with 10 usec between chars (on average). So it looks like its only using 5% of the CPU. The 0.5 usec time should be constant regardless of the baud rate. The limiting factor will be how fast the data can be processed. For something like the bioloid stuff, I don't see any issues since the packets are fairly small and by setting the FIFO size appropriately (i.e. larger than the largest packet) then there shouldn't be any issues. So provided the UART ISR is sufficiently high, then there shouldn't be any characters dropped even at 1 Mbit/sec. |
Outstanding. That is a great result.
Except for that pesky pyb.disable_irq(). But if one could pass a level: old_level = irq_level(new_level) # raise or lower
cur_level = irq_level() # what is the current level?
old_level = irq_level_min(new_level) # change in direction of increasing priority only Then it would make it easier for Python code to stay out of the way of high-priority interrupts. |
And for anybody that's interested, I wrote up a wiki page that documents how I grabbed that capture: |
I decided to instrument pyb.disable_irq and pyb.enable_irq and my results show that this code:
takes about 14 usec between disabling and enabling. Even this optimized version:
takes 10 usec between disable/enable. So that tells me that simply using disable/enable_irq will cause dropped chars at 1 Mbit/sec. So we'll definitely need some type of irq_level API. |
perhaps chaning disable/enable like the following: old_level = pyb.disable_irq(level=None) level=None implies the current behaviour and returns True or False. assuming interupts are fully enabled pyb.disable_irq(4) would disable interrupts 4..15 and return True pyb.disable_irq(6) would disable 6..15 and return True |
+1 to adding leves to disable and enable irq. Sent from my iPhone
|
@dhylands I think it should go along with a general addition to the hardware API that all peripherals be able to report their irq priority. That way code can be implemented in a portable fashion without having to know internals of the hardware API. So: old_level = pyb.disable_irq(aCounter.irq_level())
# do some critical stuff
pyb.enable_irq(old_level) is totally portable, and if aCounter.irq_level() is below the current level then the disable/enable calls become no-ops. |
Support print("", flush=True)
If I try to copy a file to the pyboard over a UART using rshell, then characters get dropped.
For this test, boot.py contains:
I brought a GPIO high at the beginning of USART6_IRQHandler and low at the end. Channel 0 is the data arriving on the UART pin, and Channel 1 is the UART IRQ.
Logic Analyzer Capture: https://www.dropbox.com/s/6mj5kr8sn6ad81j/UART_irq.png?dl=0
The UART IRQs are spaced approx 86 usec or so apart (sometimes a bit more) 115200 full tilt would be 86.8 usec per character.
The first gap corresponds to the place in the buffer where the first dropped character occurs. The first gap is 847 usec long and the second gap is 694 usec long.
So this tells me that either interrupts are being disabled for these periods or a higher priority interrupt is occurring. I'll add some more instrumentation to see if I can figure out what might be happening.
I also found it interesting that even though there are timeouts set on the uart, the call to sys.stdin.buffer.readinto() never times out.
The text was updated successfully, but these errors were encountered: