A Comparison of Scheduling Latency in Linux, PREEMPT RT, and LITMUS
A Comparison of Scheduling Latency in Linux, PREEMPT RT, and LITMUS
number of samples
100000 100000
10000 10000
1000 1000
100 100
10 10
1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(a) LITMUSRT with the P-FP scheduler plugin (b) Linux 3.0
Linux 3.8.13: scheduling latency (no bg tasks) Linux 3.8.13 w/ PREEMPT-RT: scheduling latency (no bg tasks)
min=1.52us max=19.73us avg=2.89us median=2.58us stdev=0.69us min=1.55us max=11.20us avg=2.74us median=2.57us stdev=0.42us
1e+07 1e+07
samples: total=5854801 samples: total=5854801
1e+06 1e+06
number of samples
number of samples
100000 100000
10000 10000
1000 1000
100 100
10 10
1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(c) Linux 3.8.13 without the PREEMPT RT patch (d) Linux 3.8.13 with the PREEMPT RT patch
Figure 3: Histograms of observed scheduling latency in an otherwise idle system.
ditional overhead. We suspect two primary sources for this and may carry out significant processing, which both directly
additional overhead. First, LITMUSRT s scheduling path affects scheduling latency. It should be noted that Linux
needs to acquire (at least) one additional spin lock com- has long supported split interrupt handling (e.g., see [9]),
pared to stock Linux, which is especially costly in the pres- wherein interrupt handlers are split into a (short) top half
ence of high cache and memory-bandwidth contention. This and a (typically longer) bottom half, and only the top half
additional spin lock acquisition stems from the fact that is executed in the (hard) interrupt context, and the bottom
LITMUSRT s scheduling state is not protected by Linuxs half is queued for later processing. However, in stock Linux,
runqueue locks; however, Linuxs runqueue locks must bottom halves still effectively have higher priority than
still be acquired prior to invoking LITMUSRT s schedul- regular real-time tasks, in the sense that the execution of bot-
ing framework. And second, the increased average-case tom halves is not under control of the regular SCHED FIFO
overheads might be due to a lack of low-level optimizations process scheduler8 and thus may negatively affect scheduling
in LITMUSRT (in comparison with the mature codebase latencies. Further, bottom halves may still disable interrupts
of Linux). Given that LITMUSRT is primarily a research- and preemptions for prolonged times.
oriented project focused on algorithmic real-time scheduling Considerable effort has been invested by the developers of
issues, a certain lack of low-level tuning is not surprising. the PREEMPT RT patch to address these very issues. This
As was already briefly mentioned, the CPU-bound back- is accomplished by forcing bottom half processing to take
ground workload matches the setup that has been used in place in kernel threads (which can be scheduled such that
prior LITMUSRT -based studies (e.g., [5, 6, 10, 12]). As they do not delay high-priority real-time tasks), and by iden-
is apparent when comparing Fig. 3(a) with Fig. 4(a), our tifying and breaking up code segments that disable interrupts
data confirms that the CPU-bound workload generates suf- and preemptions for prolonged durations. In contrast, since
ficient memory and cache pressure to magnify kernel over- LITMUSRT is currently based on stock Linux, and since
heads. Conversely, conducting overhead experiments with- the focus of LITMUSRT is the exploration and evaluation
out a cache-intensive background workload does not yield of new scheduling policies (and not the reengineering of the
an accurate picture of kernel overheads. Next, we discuss underlying Linux kernel), no such improvements are present
the impact of interrupt-intensive background workloads. in LITMUSRT . A key motivation for our experiments was
to determine to which extent LITMUSRT is penalized by the
3.3 I/O-bound Background Workload
Interrupts are challenging from a latency point of view since 8 Bottom halves are processed by so-called softirqs,
which in stock Linux
interrupt handlers typically disable interrupts temporarily are invoked from interrupt and exception return paths.
LITMUS^RT P-FP: scheduling latency (CPU-bound bg tasks) Linux 3.0: scheduling latency (CPU-bound bg tasks)
min=2.10us max=47.59us avg=5.17us median=4.37us stdev=2.75us min=2.04us max=72.73us avg=4.22us median=3.86us stdev=1.37us
1e+07 1e+07
samples: total=5854719 samples: total=5854711
1e+06 1e+06
number of samples
number of samples
100000 100000
10000 10000
1000 1000
100 100
10 10
1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(a) LITMUSRT with the P-FP scheduler plugin (b) Linux 3.0
Linux 3.8.13: scheduling latency (CPU-bound bg tasks) Linux 3.8.13 w/ PREEMPT-RT: scheduling latency (CPU-bound bg tasks)
min=2.14us max=64.47us avg=4.02us median=3.67us stdev=1.20us min=1.73us max=17.42us avg=3.40us median=3.02us stdev=1.12us
1e+07 1e+07
samples: total=5854707 samples: total=5854640
1e+06 1e+06
number of samples
number of samples
100000 100000
10000 10000
1000 1000
100 100
10 10
1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(c) Linux 3.8.13 without the PREEMPT RT patch (d) Linux 3.8.13 with the PREEMPT RT patch
Figure 4: Histograms of observed scheduling latency in the presence of a CPU-bound background workload.
number of samples
100000 100000
10000 10000
1000 1000
100 100
10 10
1 1
0 150 300 450 600 750 900 1050 1200 1350 0 150 300 450 600 750 900 1050 1200 1350
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(a) LITMUSRT with the P-FP scheduler plugin (b) Linux 3.0
Linux 3.8.13: scheduling latency (IO-bound bg tasks) Linux 3.8.13 w/ PREEMPT-RT: scheduling latency (IO-bound bg tasks)
min=1.85us max=5464.07us avg=6.23us median=4.60us stdev=15.91us min=1.47us max=44.16us avg=4.12us median=4.07us stdev=0.99us
1e+07 1e+07
samples: total=5854773 samples: total=5854748
1e+06 1e+06
number of samples
number of samples
100000 100000
10000 10000
1000 1000
100 100
10 10
1 1
0 150 300 450 600 750 900 1050 1200 1350 0 150 300 450 600 750 900 1050 1200 1350
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(c) Linux 3.8.13 without the PREEMPT RT patch (d) Linux 3.8.13 with the PREEMPT RT patch
Figure 5: Histograms of observed scheduling latency in the presence of an I/O-bound background workload.
without the PREEMPT RT patch in the presence of only 3.4 Scheduling Latency of LITMUSRT Plugins
CPU-bound workloads, which is apparent when comparing
In the preceding sections, we have focused on LITMUSRT s
Fig. 4(c) with Fig. 5(d). Remarkably, the average and median
P-FP plugin, since it implements the same scheduling policy
scheduling latency under PREEMPT RT worsened by less
as SCHED FIFO (albeit with a larger number of priorities
than 0.7s with the introduction of the I/O-bound workload.
and support for additional real-time locking protocols) and
thus allows for the most direct comparison. We also in-
Finally, we also ran two variations of the I/O-bound work- vestigated how scheduling latency varies among the three
load with varying degrees of disk activity. First, we disabled evaluated LITMUSRT scheduler plugins. Fig. 6 compares
bonnie++ altogether, which brought down the maximum the P-FP, PSN-EDF and GSN-EDF plugins in LITMUSRT ,
observed latencies under Linux 3.0, Linux 3.8.13 (without under each of the three considered background workloads.
the PREEMPT RT patch), and LITMUSRT to around 550s, Comparing insets (g), (h), and (i), it is apparent that the
which is still too high for practical purposes, but shows that three plugins are equally subject to high scheduling latencies
the extreme outliers are caused by disk-related code. And (approaching 4ms) in the case of the I/O-bound background
second, we tried launching an instance of bonnie++ on each workload. This is not surprising, since the long tail of high
core, which brought the disk I/O subsystem to its knees and scheduling latencies is caused by the design of the underlying
caused latency spikes in the range of 80200 milliseconds (!) Linux kernel, and thus independent of the choice of plugin.
under the three non-PREEMPT RT kernels. Remarkably, the Further, comparing Fig. 6(a) with Fig. 6(b), and Fig. 6(d)
maximum observed scheduling latency under PREEMPT RT with Fig. 6(e), it is apparent that the PSN-EDF and P-FP plu-
remained below 50s even in this case. gins yield near-identical scheduling latency distributions,
despite the difference in implemented scheduling policy.
Overall, our experiment asserts the importance of This, however, is expected since the tests run only one real-
PREEMPT RT in turning Linux into a viable real-time plat- time task per processor; the real-time scheduler is hence not
form. Given the huge differences in maximum observed stressed and the cost of the scheduling operation is so small
latency, LITMUSRT would be substantially improved if it compared to other sources of latency that any differences
incorporated PREEMPT RT. Though this will require con- between fixed-priority and EDF scheduling disappear in the
siderable engineering effort (both patches modify in part the noise. Differences emerge only for higher task counts [10].
same code regions), there are no fundamental obstacles to However, looking at Fig. 6(f) and Fig. 6(i), it is apparent
rebasing LITMUSRT on top of the PREEMPT RT patch. that the scheduling latency is noticeably higher under GSN-
LITMUS^RT P-EDF: scheduling latency (no bg tasks) LITMUS^RT P-FP: scheduling latency (no bg tasks) LITMUS^RT G-EDF: scheduling latency (no bg tasks)
min=1.76us max=26.17us avg=3.45us median=2.87us stdev=1.24us min=1.96us max=15.13us avg=3.45us median=3.10us stdev=1.03us min=1.59us max=14.34us avg=3.06us median=2.56us stdev=1.18us
1e+07 1e+07 1e+07
samples: total=5854783 samples: total=5854818 samples: total=5854797
1e+06 1e+06 1e+06
number of samples
number of samples
number of samples
100000 100000 100000
10000 10000 10000
1000 1000 1000
100 100 100
10 10 10
1 1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
LITMUS^RT P-EDF: scheduling latency (CPU-bound bg tasks) LITMUS^RT P-FP: scheduling latency (CPU-bound bg tasks) LITMUS^RT G-EDF: scheduling latency (CPU-bound bg tasks)
min=2.40us max=73.27us avg=5.14us median=4.21us stdev=2.95us min=2.10us max=47.59us avg=5.17us median=4.37us stdev=2.75us min=1.91us max=60.20us avg=5.81us median=5.39us stdev=2.51us
1e+07 1e+07 1e+07
samples: total=5854739 samples: total=5854719 samples: total=5854728
1e+06 1e+06 1e+06
number of samples
number of samples
number of samples
100000 100000 100000
10000 10000 10000
1000 1000 1000
100 100 100
10 10 10
1 1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
LITMUS^RT P-EDF: scheduling latency (IO-bound bg tasks) LITMUS^RT P-FP: scheduling latency (IO-bound bg tasks) LITMUS^RT G-EDF: scheduling latency (IO-bound bg tasks)
min=1.98us max=3874.99us avg=6.56us median=5.11us stdev=12.66us min=1.89us max=3956.48us avg=6.60us median=5.17us stdev=12.76us min=2.26us max=3905.79us avg=10.95us median=7.38us stdev=14.11us
1e+07 1e+07 1e+07
samples: total=5854606 samples: total=5854660 samples: total=5854793
1e+06 1e+06 1e+06
number of samples
number of samples
number of samples
100000 100000 100000
10000 10000 10000
1000 1000 1000
100 100 100
10 10 10
1 1 1
0 150 300 450 600 750 900 1050 1200 1350 0 150 300 450 600 750 900 1050 1200 1350 0 150 300 450 600 750 900 1050 1200 1350
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
EDF in the average case, which is due to its more complex underlying (unpatched) Linux versions. For ease of compari-
implementation. Issues such as contention caused by coarse- son, the results are repeated in Fig. 7.
grained locking, extra bookkeeping, and cache-coherence A comparison of inset (a)-(c) with insets (d)-(f) shows
delays when accessing shared structures increase both the that, though the observed maxima vary (for example, from
median and average observed scheduling latencies. 13.89s to 19.73s in the scenario without background
While this shows that LITMUSRT s implementation of tasks), the shapes of the distributions are largely similar. Fur-
global scheduling incurs higher overheads, there is little ther, there are no substantial differences in the average and
reason to employ global scheduling when the number of median latencies of the two kernel versions. This indicates
tasks does not exceed the number of available cores (which that no significant improvements concerning latency and pre-
is the case in the considered cyclictest setup). If the number emptivity have been incorporated since Linux 3.0. Therefore,
of tasks actually exceeds the number of available coresthat a direct comparison between the LITMUSRT patch and the
is, if the scheduling problem is not entirely trivialthen PREEMPT RT patch is valid.
other factors such as the impact of interference from higher- This concludes the discussion of our experimental results.
priority tasks or a need for bounded tardiness [18] can make Next, we briefly discuss how the presented cyclictest experi-
minor differences in scheduling latency a secondary concern, ments differ from the overhead and latency tracing typically
with only little impact on overall temporal correctness. used to evaluate LITMUSRT .
3.5 Linux 3.0 vs. Linux 3.8
4 Limitations of cyclictest
In this paper, we compared the latency of LITMUSRT and
Linux with the PREEMPT RT patch using the latest ver- As discussed in Sec. 1, LITMUSRT is normally evaluated
sions of each patch, which are based on Linux 3.0 and Linux using Feather-Trace, not cyclictest. While cyclictest is a very
3.8.13, respectively. As already discussed in the preceding useful tool to assess and compare different kernel versions
sections, to verify that comparing the two patches is valid (e.g., it can be used to test whether a proposed patch has
despite the difference in the underlying kernel version, we a negative impact on scheduling latency), it also has some
also measured the scheduling latencies exhibited by the two limitations if used as the sole metric for estimating a systems
Linux 3.0: scheduling latency (no bg tasks) Linux 3.0: scheduling latency (CPU-bound bg tasks) Linux 3.0: scheduling latency (IO-bound bg tasks)
min=1.87us max=13.89us avg=2.89us median=2.77us stdev=0.51us min=2.04us max=72.73us avg=4.22us median=3.86us stdev=1.37us min=1.85us max=4300.43us avg=6.39us median=4.98us stdev=13.25us
1e+07 1e+07 1e+07
samples: total=5854779 samples: total=5854711 samples: total=5854674
1e+06 1e+06 1e+06
number of samples
number of samples
number of samples
100000 100000 100000
10000 10000 10000
1000 1000 1000
100 100 100
10 10 10
1 1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 150 300 450 600 750 900 1050 1200 1350
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(a) Linux 3.0 (idle) (b) Linux 3.0 (CPU-bound) (c) Linux 3.0 (I/O-bound)
Linux 3.8.13: scheduling latency (no bg tasks) Linux 3.8.13: scheduling latency (CPU-bound bg tasks) Linux 3.8.13: scheduling latency (IO-bound bg tasks)
min=1.52us max=19.73us avg=2.89us median=2.58us stdev=0.69us min=2.14us max=64.47us avg=4.02us median=3.67us stdev=1.20us min=1.85us max=5464.07us avg=6.23us median=4.60us stdev=15.91us
1e+07 1e+07 1e+07
samples: total=5854801 samples: total=5854707 samples: total=5854773
1e+06 1e+06 1e+06
number of samples
number of samples
number of samples
100000 100000 100000
10000 10000 10000
1000 1000 1000
100 100 100
10 10 10
1 1 1
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 150 300 450 600 750 900 1050 1200 1350
overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us) overhead in microseconds (bin size = 1.00us)
(d) Linux 3.8.13 (idle) (e) Linux 3.8.13 (CPU-bound) (f) Linux 3.8.13 (I/O-bound)
Figure 7: Histograms of observed scheduling latency under Linux 3.0 and 3.8.13, under each of the three considered background workloads.
capability to provide temporal guarantees. overheads are fully reflected in the derived temporal guaran-
The primary advantage of cyclictest is that it provides tees for all tasks (and not just the highest-priority task).
an easy-to-interpret metric that reflects various sources of As another example, consider how tasks are resumed
unpredictability as a single, opaque measure. That is, it treats under partitioned schedulers such as the P-FP plugin (or
the kernel and the underlying hardware as a black box and SCHED FIFO with appropriate processor affinity masks). If
reports the actual cumulative impact of system overheads a real-time task resumes on a remote processor (i.e., any pro-
and hardware capabilities on real-time tasks. For application cessor other than its assigned partition), an inter-processor in-
developers, this is convenient as it requires neither post- terrupt (IPI) must be sent to its assigned processor to trigger
tracing analysis nor a detailed understanding of the kernel. the scheduler. IPIs are of course not delivered and processed
In contrast, Feather-Trace yields a large number of (non- instantaneously in a real system and thus affect scheduling
human-readable) event timestamps that require matching, latency if they arise. When scheduling cyclictest on hard-
filtering, post-processing, and a statistical evaluation. The ware platforms with processor-local timers (such as local
resulting overhead profile is primarily intended for integra- APIC timers in modern x86 systems), however, such IPIs
tion into schedulability analysis and is less suitable to direct are not required because the interrupt signaling the expiry of
interpretation. However, while cyclictest is arguably more cyclictests one-shot timer is handled locally. If we simply
convenient, LITMUSRT s Feather-Trace approach provides execute cyclictest under PSN-EDF, P-FP, or SCHED FIFO
a more complete picture since it yields the data required to with appropriate processor affinity masks to determine the
assess the impact of kernel overheads on tasks other than the worst-case latency, it will never trace the impact of such
highest-priority task, as we explain next. IPIs, even though an actual real-time application that is trig-
The main feature of Feather-Trace is that it integrates gered by interrupts from devices other than timers (e.g., such
many tracepoints in the kernel, which can be used to collect as a sensor) would actually be subject to IPI delays. In con-
fine-grained overheads. By measuring and considering the trast, in the methodology used to evaluate LITMUSRT (see
various sources of delay individually, a detailed analysis of [10, Ch. 4]), Feather-Trace is used to measure IPI latencies,
the worst-case cumulative delay can be carried out. which are then correctly accounted for in the schedulability
For example, for a task other than the highest-priority analysis to reflect the worst-case task-activation delay.
task, the cumulative delay incurred depends on the worst- In summary, it is impossible to derive how real-time tasks
case scheduling latency and the delays due to preemptions other than the highest-priority task are affected by overheads
by higher-priority tasks, which in turn depends on context- from cyclictest-based experiments, because overhead-aware
switching overheads, scheduling overheads in the presence of schedulability analysis is fundamentally required to make
potentially many ready tasks, and so on. With Feather-Trace temporal guarantees for all tasks. Such an analysis is made
in LITMUSRT , it is possible to measure all these aspects possible by Feather-Traces ability to extract specific over-
individually, and then account for them during schedulability heads. While obtaining measurements in a fine-grained man-
analysis (see [10, Ch. 3] for a comprehensive introduction ner is more involved than simply running cyclictest, Feather-
to overhead accounting), such that the observed worst-case Traces fine-grained measurement approach provides a flexi-
bility that is not achievable with coarse-grained approaches [5] A. Bastoni, B. Brandenburg, and J. Anderson. An empirical compari-
such as cyclictest. This, of course, does not diminish son of global, partitioned, and clustered multiprocessor EDF sched-
ulers. In Proc. of the 31st Real-Time Systems Symposium, pages 1424,
cyclictests value as a quick assessment and debugging aid, 2010.
but it should not be mistaken to provide a general measure of [6] A. Bastoni, B. Brandenburg, and J. Anderson. Is semi-partitioned
scheduling practical? In Proc. of the 23rd Euromicro Conference on
a systems real-time capability; it can only show the lack Real-Time Systems, pages 125135, 2011.
of such capability under certain circumstancesfor instance, [7] A. Block. Adaptive multiprocessor real-time systems. PhD thesis,
by exposing scheduling latencies in excess of 5ms in the University of North Carolina at Chapel Hill, 2008.
[8] A. Block, H. Leontyev, B. Brandenburg, and J. Anderson. A flexible
presence of I/O-bound background tasks. real-time locking protocol for multiprocessors. In Proc. of the 13th
IEEE Conference on Embedded and Real-Time Computing Systems
and Applications, pages 4757, 2007.
5 Conclusion and Future Work [9] D. Bovet and M. Cesati. Understanding The Linux Kernel. OReilly
& Associates Inc, third edition, 2005.
We presented an empirical evaluation of scheduling latency [10] B. Brandenburg. Scheduling and locking in multiprocessor real-time
under LITMUSRT using cyclictest. We ported cyclictest to operating systems. PhD thesis, The University of North Carolina at
LITMUSRT s native API and collected samples of schedul- Chapel Hill, 2011.
[11] B. Brandenburg and J. Anderson. Feather-trace: A light-weight event
ing latency under several of its event-driven scheduler plu- tracing toolkit. In Proc. of the Workshop on Operating Systems Plat-
gins, in three system configurations (an idle system, a sys- forms for Embedded Real-Time applications, pages 6170, 2007.
[12] B. Brandenburg and J. Anderson. A comparison of the M-PCP, D-
tem with CPU-bound background tasks, and a system with
PCP, and FMLP on LITMUSRT . In Proc. of the 12th Intl. Conference
I/O-bound background tasks). For the purpose of compari- on Principles of Distributed Systems, pages 105124, 2008.
son, we repeated the same measurements under Linux 3.0, [13] B. Brandenburg, A. Block, J. Calandrino, U. Devi, H. Leontyev, and
Linux 3.8.13, and Linux 3.8.13 with the PREEMPT RT J. Anderson. LITMUSRT : a status report. 9th Real-Time Linux
Workshop, 2007.
patch using the original, unmodified cyclictest version. [14] J. Calandrino, H. Leontyev, A. Block, U. Devi, and J. Anderson.
The results obtained from an idle system and in the pres- LITMUSRT : A testbed for empirically comparing real-time multi-
ence of CPU-bound background tasks showed that while processor schedulers. In Proc. of the 27th IEEE Real-Time Systems
Symposium, pages 111123, 2006.
LITMUSRT introduces some additional overheads, the dif- [15] G. Chanteperdrix and R. Cochran. The ARM fast context switch
ference is minor in absolute terms and manifests only extension for Linux. Real Time Linux Workshop, 2009.
in the average and median scheduling latencies. Impor- [16] R. Cochran, C. Marinescu, and C. Riesch. Synchronizing the Linux
system time to a PTP hardware clock. In Proc. of the 2011 Intl. IEEE
tantly, LITMUSRT was not observed to affect the maximum Symposium on Precision Clock Synchronization for Measurement
scheduling latencies negatively, which is due to the fact that Control and Communication, pages 8792, 2011.
[17] R. Coker. bonnie++ program to test hard drive performance. Linux
other factors in mainline Linux have a much larger impact on manual page.
worst-case delays. We conclude from these observations that [18] U. Devi. Soft real-time scheduling on multiprocessors. PhD thesis,
LITMUSRT does not impose inherently impractical over- Chapel Hill, NC, USA, 2006.
[19] G. Elliott and J. Anderson. Globally scheduled real-time multiproces-
heads. Further, we believe that the observed minor increase sor systems with GPUs. Real-Time Systems, 48(1):3474, 2012.
in average and median scheduling latency is not fundamental, [20] C. Emde. Long-term monitoring of apparent latency in PREEMPT RT
but caused by a lack of low-level optimizations that could be Linux real-time systems. 12th Real-Time Linux Workshop, 2010.
[21] L. Fu and R. Schwebel. Real-time linux wiki. RT PREEMPT
rectified with additional engineering effort. HOWTO. https://rt.wiki.kernel.org/index.php/
However, our data also documents that LITMUSRT inher- RT_PREEMPT_HOWTO.
[22] L. Henriques. Threaded IRQs on Linux PREEMPT-RT. In Proc. of
its mainline Linuxs weaknesses in the presence of I/O-bound the 5th Intl. Workshop on Operating Systems Platforms for Embedded
background tasks. Again, LITMUSRT did not increase the Real-Time Applications, pages 2332, 2009.
observed maximum scheduling latency, but the latency pro- [23] C. Kenna, J. Herman, B. Brandenburg, A. Mills, and J. Anderson. Soft
real-time on multiprocessors: are analysis-based schedulers really
file of the underlying Linux 3.0 kernel renders it unfit for se- worth it? In Proc. of the 32nd Real-Time Systems Symposium, pages
rious (hard) real-time applications. Further, our experiments 93103, 2011.
[24] J. Kiszka. Towards Linux as a real-time hypervisor. In Proc. of the
confirmed that this is still the case with the more recent 11th Real-Time Linux Workshop, pages 205214, 2009.
mainline Linux version 3.8.13. It would thus be highly de- [25] K. Koolwal. Investigating latency effects of the linux real-time pre-
sirable to combine LITMUSRT s algorithmic improvements emption patches (PREEMPT RT) on AMDs GEODE LX platform.
In Proc. of the 11th Real-Time Linux Workshop, pages 131146, 2009.
with the increased responsiveness under load achieved by [26] A. Lackorzynski, J. Danisevskis, J. Nordholz, and M. Peter. Real-
the PREEMPT RT patch, which remains as future work. time performance of L4Linux. In Proc. of the 13th Real-Time Linux
Workshop, pages 117124, 2011.
[27] P. McKenney. A realtime preemption overview. 2005. LWN.
References http://lwn.net/Articles/146861/.
[28] A. Mok. Fundamental design problems of distributed systems for the
[1] The LITMUSRT project. http://www.litmus-rt.org. hard-real-time environment. PhD thesis, 1983.
[2] Real-time linux wiki. cyclictest - RTwiki. https://rt.wiki. [29] M. Traut. Real-time CORBA performance on Linux-RT PREEMPT.
kernel.org/index.php/Cyclictest. 9th Real-Time Linux Workshop, 2007.
[3] J. Anderson and A. Srinivasan. Mixed Pfair/ERfair scheduling of [30] C. Williams and D. Sommerseth. hackbench scheduler bench-
synchronous periodic tasks. In Proc. of the 13th Euromicro Conference mark/stress test. Linux manual page.
on Real-Time Systems, pages 7685. IEEE, 2001. [31] B. Zuo, K. Chen, A. Liang, H. Guan, J. Zhang, R. Ma, and H. Yang.
[4] S. Baruah, N. Cohen, C. Plaxton, and D. Varvel. Proportionate Performance tuning towards a KVM-based low latency virtualization
progress: A notion of fairness in resource allocation. Algorithmica, system. In Proc. of the 2nd Internation Conference on Information
15(6):600625, 1996. Engineering and Computer Science, pages 14. IEEE, 2010.