Multi-Line Prefetch Covert Channel With Huge Pages
Multi-Line Prefetch Covert Channel With Huge Pages
Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA; xli@iastate.edu
* Correspondence: tyagi@iastate.edu
Abstract
Modern x86 processors incorporate performance-enhancing features such as prefetching
mechanisms, cache coherence protocols, and support for large memory pages (e.g., 2 MB
huge pages). While these architectural innovations aim to reduce memory access latency,
boost throughput, and maintain cache consistency across cores, they can also expose subtle
microarchitectural side channels that adversaries may exploit. This study investigates
how the combination of prefetching techniques and huge pages can significantly enhance
the throughput and accuracy of covert channels in controlled computing environments.
Building on prior work that examined the impact of the MESI cache coherence protocol
using single-cache-line access without huge pages, our approach expands the attack surface
by simultaneously accessing multiple cache lines across all 512 L1 lines under a 2 MB huge
page configuration. As a result, our 9-bit covert channel achieves a peak throughput of
4940 KB/s—substantially exceeding previously reported benchmarks. We further validate
our channel on AMD SEV-SNP virtual machines, achieving up to an 88% decoding accu-
racy using write-access encoding with 2 MB huge pages, demonstrating feasibility even
under TEE-enforced virtualization environments. These findings highlight the need for
careful consideration and evaluation of the security implications of common performance
optimizations with respect to their side-channel potential.
1. Introduction
Academic Editor: Christoforos
Covert channels in computer systems exploit shared resources to transmit information
Ntantogian
covertly between processes. Traditional cache-based covert channels often suffer from
Received: 27 May 2025
low throughput and high error rates due to the unpredictable nature of cache access
Revised: 7 July 2025
Accepted: 14 July 2025
times and system interference. Covert channels between cross-core sender and receiver
Published: 18 July 2025 processes are noisier and harder to establish because the two processes must rely on shared
Citation: Li, X.; Tyagi, A. Multi-Line
microarchitectural elements, such as the last-level cache (LLC), to communicate. This setup
Prefetch Covert Channel with Huge introduces several challenges, including timing jitter from the cache coherence protocols,
Pages. Cryptography 2025, 9, 51. interference from concurrent system activity on other cores, and difficulties in tightly
https://doi.org/10.3390/ synchronizing the sender and the receiver. These factors make it harder to reliably measure
cryptography9030051
timing differences, increasing the likelihood of decoding errors and reducing the overall
Copyright: © 2025 by the authors. throughput. Earlier investigations by [1–3] laid the groundwork for understanding covert
Licensee MDPI, Basel, Switzerland. channels by analyzing the MESI (Modified, Exclusive, Shared, Invalid) cache coherence
This article is an open access article
protocol’s effects on last-level caches (LLCs). Their approaches primarily relied on accessing
distributed under the terms and
a single cache line per bit transmission with a normal page size without incorporating
conditions of the Creative Commons
Attribution (CC BY) license
huge pages.
(https://creativecommons.org/ Building upon this foundational research, our method introduces the use of huge
licenses/by/4.0/). pages and facilitates access to multiple cache lines concurrently. This novel approach is
designed to enhance both the accuracy and throughput of covert channels, capitalizing on
the combined benefits of prefetching and huge pages.
For a cross-core covert channel, the sender and receiver processes are on different cores.
Only the shared cache between the sender and the receiver is the LLC. Cache coherence
events provide a mechanism to signal or encode the data from a receiver to a sender. A
sender within a software enclave such as an Intel SGX enclave may have access to private
data of value to the receiver. All of the information channels are typically monitored in
a secure domain such as a software enclave [4–7] or a secure world domain of an ARM
TrustZone [8,9]. These covert channels avoid such dynamic information channel monitoring
to exfiltrate secret data. How such secret data is acquired in the sender domain is not a
focus of this paper.
Prefetching plays a critical role in optimizing memory access but serves to activate
specific cache coherence events in covert channels. Prefetchers, through instructions such
as PREFETCHW, enable data to be proactively loaded into the L1 cache. The PREFETCHW
instruction is a software prefetch hint available in x86 architectures that brings a cache
line into the processor’s cache hierarchy in anticipation of a write [10]. This proactive
behavior reduces the memory access latency, improves cache utilization, and interacts
with the MESI (Modified, Exclusive, Shared, Invalid) cache coherence protocol to maintain
data consistency across cores. For instance, PREFETCHW can transition cache lines to
the modified state, preparing them for faster subsequent write operations while maintain-
ing coherence. These operations enable side-channel and covert-channel vulnerabilities
through observable changes in cache states. Covert channels, by leveraging the interplay
between prefetching and cache coherence protocols, can exploit these microarchitectural
optimizations to improve their effectiveness [1].
Huge pages, on the other hand, address memory management challenges by signif-
icantly reducing the number of Translation Lookaside Buffer (TLB) entries required for
address translation. With larger page sizes such as 2 MB or 1 GB, a TLB entry covers a
broader range of memory addresses, minimizing TLB misses and reducing the address
translation overhead. This optimization is particularly effective for memory-intensive
applications with spatial locality, as it lowers the latency and improves system efficiency by
decreasing the frequency of page table walks and translations. Spatial locality also is likely
to reduce the page fault frequency, further improving the performance. Additionally, huge
pages are commonly employed in cryptographic systems and secure data transmission to
improve the performance and predictability when handling secret or sensitive data, making
them a natural fit for covert communication channels that rely on timing stability.
The combination of prefetching and huge pages amplifies these individual benefits,
leading to enhanced covert channel efficiency. Huge pages facilitate more effective prefetch-
ers, enabling data fetching across larger contiguous memory regions with fewer interrup-
tions driven by page faults. This integration ensures faster address translations, higher
cache hit rates, and a reduced latency, resulting in a significantly improved throughput and
accuracy in covert channels. Our methodology constructs a cross-core covert channel by ex-
ploiting prefetch-induced cache coherence transitions, specifically using the PREFETCHW
instruction to trigger measurable state changes in L1 caches. It further introduces multi-line
encoding, wherein messages are encoded by accessing varying numbers of cache lines
in a huge-page-backed memory region. This enables more robust and high-bandwidth
transmission. Our proposed approach capitalizes on the strengths of prefetching and huge
pages to enhance the covert channel performance, demonstrating notable improvements
in the throughput and accuracy while addressing associated challenges in a controlled
computing environment.
Cryptography 2025, 9, 51 3 of 21
2. Related Work
Microarchitectural side channels have long been used for covert communication. Prior
works such as Prime+Probe, Flush+Reload, and Spectre-family attacks demonstrate how
cache occupancy and speculation artifacts can be exploited [11,12].
Our work is closely related to the “Adversarial Prefetch” attack by Zhang et al. [1],
which showed that PREFETCHW can leak the coherence state of a cache line, enabling high-
throughput covert channels over single-line access patterns. We build on this idea, general-
izing to multi-line prefetching and integrating huge page memory mappings.
Recent studies further extend covert channels’ capacity and stealth:
• PrefetchX (Zhang et al., 2023) [13] discovers a cross-core channel via the XPT prefetcher
shared among cores, achieving 1.7 MB/s on Intel.
• BandwidthBreach (Barth et al., 2023) [14] establishes covert channels through LFB/SQ
contention in the memory pipeline, achieving over 10 Mbps.
• Write+Sync Covert Channels (Kumar et al., 2023) [15] demonstrate durable-write-
based channels exploiting fsync() barriers on disk, effective even under strong
cache defenses.
• SgxPectre Attacks [16] exploit speculative execution vulnerabilities in SGX to leak
enclave secrets, demonstrating how transient execution can be used to bypass SGX’s
isolation guarantees. These highlight the importance of microarchitectural analysis in
trusted computing.
Compared to these, our channel uniquely combines huge page access, multi-line
prefetching, and TEE compatibility via AMD SEV-SNP. Our throughput and accuracy
trends (up to 88%) compare favorably while maintaining low detectability and leveraging
coherence transitions observable even under cloud virtualization.
3. Background
3.1. The Software Prefetcher
A software prefetcher is a mechanism that allows a program to explicitly request
the fetching of data from memory into the cache before it is accessed [17]. The purpose
is to hide the memory latency by ensuring that data is already available in the cache
when needed by the CPU. Software prefetching is typically initiated by inserting special
prefetch instructions (e.g., PREFETCH in x86 or PLD in ARM) into the program code.
These instructions act as hints to the processor that specific memory locations will likely
be accessed soon, prompting the prefetcher to load the data into the appropriate cache
level [18].
The mechanism of software prefetching involves several steps. First, programmers or
compilers strategically place prefetch instructions at points in the code where the memory
access patterns are predictable, such as in loops that iterate over large datasets. For instance,
in a loop processing an array, a prefetch instruction can be placed a few iterations ahead
to ensure data is available when needed. Once executed, these instructions trigger the
processor to fetch the specified memory location from main memory into the cache. On
x86 architectures, instructions like PREFETCHW are used to prepare cache lines for future
writes, while PREFETCHT0 brings data into the L1 cache. Similarly, ARM architectures
utilize PLD for data prefetching and PLI for prefetching instructions. The fetched data
is then stored in a specified cache level, such as L1, L2, or L3, depending on the type of
prefetch instruction used. This operation is non-blocking, meaning it does not stall the
CPU while waiting for the data to be fetched; the processor continues executing other
instructions, allowing the prefetcher to asynchronously load the data into the cache [19].
Cryptography 2025, 9, 51 4 of 21
Software prefetching can handle both read and write operations. For example, the
__builtin_prefetch instruction in x86 architectures is versatile and explicitly prepares
cache lines for future writes by transitioning them to the modified state within the MESI
protocol [20]. Prefetching does not alter the data itself; it merely ensures that the data is
readily available in the cache for subsequent operations, thereby enhancing the efficiency
without compromising data integrity.
The advantages of software prefetching are significant. It reduces the memory latency
by preloading data into the cache, thus minimizing delays when the data is accessed [21].
By ensuring frequently accessed data is present at the appropriate cache level, it improves
cache utilization and reduces cache misses. This also minimizes pipeline stalls caused
by memory access delays, leading to smoother instruction execution. Software prefetch-
ing is particularly effective in workloads with predictable access patterns, such as matrix
operations prevalent in AI/ML applications, image processing, and large-scale numer-
ical simulations [17]. By leveraging software prefetching effectively, programmers and
compilers can achieve substantial performance gains in memory-intensive applications.
In our baseline tests in Section 4.2, enabling software prefetching reduced the average
memory access latency and provided a performance improvement of approximately 13%.
those that rely on frequent or dynamic small memory allocations. Moreover, managing
huge pages can be complex and may require administrative privileges to configure. In
some cases, huge pages are “pinned”, meaning they cannot be swapped out, which can
reduce the flexibility of memory management. Similarly, if the application does not have
enough spatial locality to support huge pages, this could lead to significant thrashing,
degrading the program’s performance.
Typical applications that leverage huge pages include databases like Oracle and Post-
greSQL, which benefit from reduced TLB misses during operations on large datasets [27].
High-performance computing workloads and virtualization systems also use huge pages
to optimize memory access patterns and minimize latency. Similarly, large-scale machine
learning and AI applications rely on huge pages to handle their substantial memory re-
quirements efficiently [21].
For example, on Linux systems, huge pages can be enabled and configured using the
hugepages subsystem or libraries like libhugetlbfs [23]. The standard page size of 4 KB
can be replaced with 2 MB huge pages (default for x86) or even 1 GB pages, depending
on the hardware support and system configuration. By enabling huge pages, developers
and system administrators can unlock substantial performance improvements for memory-
bound applications [23].
Our baseline evaluation in Section 4.2 shows that using huge pages reduced the
average memory access latency, resulting in a 22% improvement. When combined with
software prefetching, the memory latency decreased by 24%, indicating a synergistic effect
from both techniques.
This hierarchical architecture works in tandem with the MESI protocol to optimize
both the performance and consistency, ensuring efficient data sharing and synchronization
across multi-core systems.
allocation, data encoding, and error correction protocols. These agreements are critical for
maintaining the consistency and accuracy of the covert channel.
When huge pages are enabled, the multi-line prefetch covert channel gains significant
advantages, particularly in scenarios involving n cache lines. Huge pages reduce TLB
misses by mapping larger memory regions with fewer entries, enabling the prefetcher to
operate more efficiently. This optimization allows the sender to access multiple cache lines
within the same page, reducing latency and improving throughput. The larger contiguous
memory provided by huge pages enhances the precision of timing measurements, leading
to better accuracy and reduced error rates. Furthermore, the combination of huge pages
and multi-line prefetching ensures that more data can be encoded and decoded in fewer
iterations, thereby increasing the bandwidth and stealth of the attack.
We outline below the key assumptions of our threat model that enable this covert channel:
• The Shared Last-Level Cache (LLC): The sender and the receiver share the same LLC
(e.g., reside on the same processor or within the same virtual machine), enabling them
to observe coherence traffic.
• Privilege Level and Isolation: In our primary setup, both the sender and the receiver
are unprivileged user-space processes running on separate CPU cores. However,
our threat model also supports scenarios where the sender operates within a secure
execution environment—such as an AMD SEV-encrypted virtual machine—as long
as the sender and the receiver share the same last-level cache (LLC). The shared LLC
enables shared cache coherence events, which in turn enable the covert channel. A
broader discussion of TEE platforms, including SGX and TrustZone, is provided in
Section 6.
• Access to Huge Pages and Prefetch Instructions: Both parties are assumed to have
access to huge pages (e.g., via mmap() with MAP_HUGETLB) and the ability to invoke
prefetch instructions such as __builtin_prefetch(). These capabilities are available
in modern Linux systems with the appropriate configuration.
• Prefetching Only: Applying software prefetching without huge pages results in a 12%
latency reduction. The prefetch instruction (__builtin_prefetch()) helps bring the
cache lines closer to the processor ahead of access, thereby reducing stalls.
Cryptography 2025, 9, 51 8 of 21
• Huge Pages Only: Enabling huge pages alone reduces the average access latency by
approximately 14% compared to that at the baseline. This is primarily due to the
reduced TLB pressure and improved memory translation efficiency provided by 2 MB
page mappings.
• Combined Optimization: The combination of huge pages and prefetching yields
the lowest average latency (3.07 cycles). This configuration effectively leverages
both a reduced TLB pressure from huge pages and improved cache readiness from
prefetching, making it the most efficient strategy for minimizing the access latency in
our setup.
Store the decoded message: The receiver stores the decoded message for further
processing or logging.
Signal readiness: The receiver sets receiver_done_flag to notify the sender to start
the next iteration.
The protocol is shown in Figure 1 and Algorithm 2. The sequence of interactions
(Labeled Edges) is 1. The receiver sets receiver_done_flag = 1 after the initial prefetch
measurement. 2. The sender detects that receiver_done_flag = 1 and resets it to 0. 3. The
sender accesses memory lines (encoded with secret data) during config.interval. 4. The
receiver prefetches memory lines and measures the timing (affected by the sender’s cache
state). 5. The receiver sets receiver_done_flag = 1 to signal readiness for the next message.
We also explored loop-based counters [28,29], which estimate the timing by counting
instruction iterations; however, they are highly susceptible to noise from interrupts, context
switches, and scheduling variability in multi-tenant or virtualized environments. These
interrupts are asynchronous and unpredictable. This uncertainty in the location and timing
of interrupts results in significant variance in the loop counter timing. These significant
timing fluctuations over multiple runs make such methods unreliable for the precise latency
discrimination required in covert channel decoding.
Semaphore-Based Alternatives and Trade-Offs:
In such restricted environments, semaphores or barriers provide viable alternatives
for synchronization. These primitives block the receiver until signaled by the sender,
thus avoiding the need for polling and enabling more efficient CPU usage. However,
these mechanisms typically rely on atomic operations or memory fences, which introduce
additional microarchitectural side effects such as cache line invalidation and memory
ordering constraints. Such effects may interfere with the prefetch timing behavior and
degrade the performance and accuracy of timing-based covert channels. As a result, while
semaphores offer an enclave-compatible solution, their influence on the cache state must
be carefully considered when designing prefetch-based transmission mechanisms.
5. Results
Throughput and Accuracy
To evaluate the efficiency of our multi-line encoding covert channel, we measured both
the throughput and accuracy across different page sizes and encoding strategies. As shown
in Table 2, the read-only multi-line encoding achieved a throughput of approximately
4623 KB/s with 4 KB pages and up to 4940 KB/s with 2 MB huge pages, with an accuracy of
up to 81.23%. The write-access encoding, which leveraged PREFETCHW to induce transitions
from the I to M state rather than S to M, demonstrated a slightly higher precision at 83.34%,
although with a slightly lower throughput—4345 KB/s on 4 KB pages and 4828 KB/s with
2 MB huge pages.
Compared to the original single-line encoding approach from prior work [1], which
achieves a throughput of only 822 KB/s and transmits just a single bit per iteration, our
multi-line encoding achieves up to a 4940 KB/s throughput and transmits 9 bits per
iteration—representing a 6× increase in the bandwidth capacity. The original design cannot
convey meaningful data efficiently due to its limited capacity. In contrast, our approach
can encode and transmit 9 bits per iteration by accessing multiple cache lines, allowing for
the efficient transmission of complex messages. Moreover, if greater decoding accuracy
is desired, a bucket-based method can be employed: for example, transmitting message
“1” by accessing 10 lines in one iteration, message “2” by accessing 20 lines, and so on.
This technique trades throughput for enhanced resilience to noise and improved decoding
reliability, offering flexibility between performance and accuracy.
Cryptography 2025, 9, 51 13 of 21
While the reference paper [1] achieved a reported accuracy of 96.2% using the single-
line encoding scheme, our local reproduction under varying experimental conditions
revealed a broader accuracy range of 60–80%. This divergence in the results suggests
potential sensitivity to environmental factors not fully replicated in our setup. To uphold
transparency and avoid overstating the outcomes, we have opted to omit the accuracy met-
rics for [1]’s results from the table, as they may not reliably reflect the scheme’s performance
in generalized scenarios.
We further evaluated the influence of the bucket size on the decoding accuracy under
the read-only 2 MB huge page setting. Our experiments show that as the bucket size
increases, the accuracy improves up to a point and then plateaus. Specifically, with a bucket
size of 1 or 5, the accuracy remains at 66.67%; increasing the bucket size to 10 improves
the accuracy to 71.43%; and at a bucket size of 25, we achieve the peak accuracy of 81.23%.
Further increases in the bucket size beyond 25 yield no significant improvements, with
the accuracy remaining stable at 81.23%. A bucket size of b introduces redundancy into
the encoding. The timing of a coherence event is above or below a statistical threshold
at the receiver to indicate an encoded 1 or 0. Many system-level aspects determine this
statistical model—how many other processes are active and what else may engender an
aliased coherence event, timer accuracy, or jitter, among many others. Once a certain level
of redundancy resulting in a robust statistical threshold is reached, an additional bucket
size b does not overcome any additional noise, resulting in a saturated accuracy. This
suggests that a moderate bucket size provides a good balance between throughput and
accuracy. Figure 2 illustrates the relationship between bucket size and accuracy.
Figure 2. Bucket Size vs. Accuracy for 2MB Huge Page, Read-only Setting.
maximizing the bandwidth is critical and some errors are acceptable (or can be corrected),
the 9+9-bit scheme may be beneficial.
To provide a more holistic evaluation of each encoding strategy, we introduce a
composite metric that combines both the throughput and decoding accuracy to compute the
effective bandwidth in KB/s. This metric estimates the number of correct bits transmitted
per second, capturing the real-world utility of the covert channel under noisy conditions.
For each scheme, we compute
• Effective Bits/Round = Bits per iteration × Accuracy
• Effective Bandwidth (KB/s) = Raw Bandwidth × Accuracy
10-bit scheme:
9+9-bit scheme:
This reveals that while the 9+9-bit scheme achieves a higher raw throughput, its effec-
tive bandwidth (factoring in accuracy) also remains superior to that in the 10-bit scheme.
However, the 10-bit scheme retains an advantage in scenarios requiring reliable single-
round decoding (e.g., short-lived channels with no retransmission). The choice ultimately
depends on whether the application prioritizes raw speed or guaranteed correctness.
The adoption of huge pages further enhanced the throughput and stability. Huge
pages reduce TLB misses and maintain consistent memory access timing, benefiting both
accuracy and stealth. Moreover, using varied numbers of cache line accesses per itera-
tion increases the unpredictability of the access patterns, improving the stealth against
side-channel detection mechanisms. Unlike traditional binary encoding, our method mini-
mizes observable LLC misses and system-level anomalies, making it more resilient against
detection through performance monitoring tools.
Overall, the multi-line encoding approach not only provides a higher throughput and
accuracy but also expands the covert channel’s capacity for efficient, robust, and stealthy
data exfiltration.
6.4. Summary
In conclusion, AMD SEV provides the best balance of hardware features and virtual-
ization transparency to evaluate the feasibility and impact of huge-page-aware prefetch-
based covert channels in a real-world TEE deployment. Its support for native huge pages
and x86 cache coherence behavior—including observable effects from instructions like
PREFETCHW—makes it an ideal platform for exploring covert communication across cores
within an encrypted VM.
Cryptography 2025, 9, 51 16 of 21
We implement and evaluate our covert channel design on AMD SEV in the following
section, demonstrating both the feasibility and performance characteristics in a realistic
threat model setting.
As shown in the table, the latency measurements on the AMD SEV VM (e.g.,
920.9–1063.9 ns or ~2256.2–2606.5 cycles) were much lower than those on our local Intel
testbed (~50,000 cycles). This discrepancy can be attributed to time counter virtualization
in cloud VMs. While less precise in absolute terms, this timing still preserved the relative
trends: increasing the number of lines accessed increased the receiver latency, and huge
page usage consistently reduced the timing overhead.
Cryptography 2025, 9, 51 17 of 21
Figure 3. Bucket size vs. accuracy for 2 MB huge page, write-access setting on AMD SEV.
These results show that AMD SEV-SNP platforms can practically support our covert
channel, especially under configurations where the sender writes to shared memory and the re-
ceiver executes prefetch operations. This affirms the importance of cache state transitions and
coherence protocol behaviors in covert channel construction across TEE platforms. Our find-
ings suggest that future attacks could focus on triggering write-induced coherence transitions
or explore alternative microarchitectural features that amplify side-channel observability.
8. Discussion
Our evaluation demonstrates that the proposed multi-line prefetch-based covert chan-
nel significantly outperforms previous single-line encoding schemes in its throughput.
However, several avenues remain for further enhancements of the channels’ reliability,
robustness, and stealthiness.
Accuracy Optimization:
While our current implementation achieves an up to 83.34% decoding accuracy with
write-access encoding and 81.23% with read-only encoding, the accuracy could be im-
proved further through several techniques. First, tuning the synchronization intervals
between the sender and the receiver could mitigate the timing drift and system noise that
degrade the decoding precision. Second, our current use of a bucket-based message en-
coding strategy—where each message corresponds to a specific number of accessed cache
lines—already improves the robustness by reducing the decoding ambiguity. Increasing the
bucket size further (i.e., using larger groups of cache line accesses per message) can improve
the accuracy, especially under noisy conditions, at the expense of a reduced throughput.
Research on AMD SEV: We also evaluated the decoding accuracy on AMD SEV
VMs with 2 MB huge pages under write-access encoding. The results demonstrate a
Cryptography 2025, 9, 51 18 of 21
consistent increase in accuracy with larger bucket sizes, reaching up to 88.03%. This
confirms the effectiveness of the bucket-based strategy in virtualized TEEs. Write access
causes observable coherence-induced latency changes, allowing for higher reliability in
detection even under cloud-based timing variability.
Future research on AMD SEV platforms could explore dynamic channel adaptation
strategies to maintain the decoding accuracy under varying system loads and noise condi-
tions. For instance, runtime adjustment of the bucket sizes or synchronization intervals
could improve the robustness in multi-tenant cloud environments. Additionally, as SEV-
SNP introduces stricter integrity protections and memory access controls, it would be
valuable to evaluate whether covert channels based on cache coherence events remain
practical under these constraints. Investigating the interaction between SEV’s memory
encryption and microarchitectural behaviors—such as cache sharing and prefetch activity—
could offer deeper insights into the resilience or detectability of such attacks in evolving
confidential computing infrastructures.
Machine-Learning-Based Decoding: Integrating a lightweight machine learning model
for classification of the timing traces could enhance the decoding accuracy further, espe-
cially in noisy or unpredictable environments. By training the model on the observed
timing patterns associated with different line access counts or cache states, the receiver can
distinguish between valid message values and false positives caused by system activity or
cache noise better [33–36].
Expanding Coherence Exploits:
Our current design focuses on leveraging the MESI cache coherence protocol, primarily
through read and write operations that trigger transitions from the shared (S) state to the
modified (M) state, as well as from the invalid (I) state to the modified (M) state. Future
work could investigate a broader range of MESI state transitions, including the exclusive
(E) state, which may display distinct timing characteristics or variations in the coherence
traffic patterns. These additional behaviors could potentially enhance the bandwidth of
the covert channel, the improve stealth by reducing observable system events, and offer
greater flexibility in encoding strategies.
Cross-Platform Considerations: While our implementation and evaluation fo-
cus on Linux due to its flexible support for huge pages, prefetch instructions (e.g.,
__builtin_prefetch()), and high-resolution timing via rdtscp or clock_gettime(), it is
important to briefly consider other major operating systems. On Windows, large pages
can be enabled using the VirtualAlloc() API with specific flags, and although user-space
access to precise timing sources is more restricted, covert channels leveraging shared caches
have still been demonstrated in prior work. On macOS, the situation is more constrained:
strict sandboxing, limited access to low-level timing mechanisms, and Apple Silicon’s dis-
tinct memory and cache architecture pose challenges for direct translation of our method.
Nonetheless, the fundamental principles of cross-core coherence and timing-based side
channels remain relevant, and future work may explore adaptations of this channel to
Windows or macOS with the appropriate privilege levels and architectural adjustments.
9. Conclusions
In this work, we present a high-throughput, cache-based covert channel leveraging
multi-line encoding strategies and the MESI cache coherence protocol. By encoding mes-
sages across multiple cache lines per iteration and utilizing both read-only and write-access
patterns, our approach significantly improves upon the prior single-line encoding tech-
niques. Notably, our implementation achieves an up to 4940 KB/s throughput with 2 MB
huge pages and attains decoding accuracies of 81.23% (read-only) and 83.34% (write-based),
Cryptography 2025, 9, 51 19 of 21
outperforming prior single-line Prefetch+Prefetch attacks that are limited to 822 KB/s and
binary messages.
We extended our implementation of the multi-line covert channel to AMD SEV VMs
to assess the feasibility in a TEE with hardware support for memory encryption and huge
pages. Our results show that write-based encoding yields clear timing variations, enabling
accurate decoding even in virtualized environments. Using huge pages provides further
latency reductions and improved timing consistency, enhancing both the throughput
and stealth.
Despite the timing imprecision in virtualized environments, overall trends such as
a rising latency with a larger bucket size remain observable, confirming our scheme’s
practicality. Our AMD implementation showed up to an 88.03% accuracy with a bucket
size of 40-line writes and demonstrated that write encoding outperforms read encoding
due to MOESI-induced state changes.
We demonstrate that huge pages enhance the channel stability and performance, and
our encoding method supports richer message transmissions—up to 9 bits per iteration—
while retaining low detectability. Furthermore, we explore the trade-offs between through-
put and accuracy using a bucket-based encoding method, and we identify tuning opportu-
nities such as synchronization timing and bucket size adjustment.
Future directions include applying machine learning models to improving the decod-
ing robustness, experimenting with other cache state transitions (e.g., E to M, I to E), and
evaluating more sophisticated cache activities such as atomic operations or flushes. These
extensions could increase the stealth, bandwidth, and adaptability of covert communication
in shared-memory systems further.
Author Contributions: Conceptualization: X.L.; methodology: X.L.; formal analysis: X.L.; writing—
original draft: X.L.; writing—review and editing: A.T.; project administration: A.T.; funding acquisi-
tion: A.T. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement: The original contributions presented in this study are included in the
article. Further inquiries can be directed to the corresponding author(s).
Conflicts of Interest: The funders had no role in the design of this study; in the collection, analyses,
or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.
References
1. Guo, Y.; Zigerelli, A.; Zhang, Y.; Yang, J. Adversarial prefetch: New cross-core cache side channel attacks. In Proceedings of the
2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; pp. 1458–1473.
2. Trippel, C.; Lustig, D.; Martonosi, M. MeltdownPrime and SpectrePrime: Automatically-synthesized attacks exploiting
invalidation-based coherence protocols. arXiv 2018, arXiv:1802.03802.
3. Fogh, A. Row Hammer, Java Script and MESI. 2016. Available online: https://cyber.wtf/2016/06/27/row-hammer-the-short-
summary/ (accessed on 13 July 2025).
4. Götzfried, J.; Eckert, M.; Schinzel, S.; Müller, T. Cache Attacks on Intel SGX. In Proceedings of the 10th European Workshop on
Systems Security, New York, NY, USA, 23–26 April 2017; EuroSec’17. [CrossRef]
5. Han, Y.; Kim, J. A Novel Covert Channel Attack Using Memory Encryption Engine Cache. In Proceedings of the 56th Annual
Design Automation Conference 2019, Las Vegas, NV, USA, 2–6 June 2019; DAC ’19. [CrossRef]
6. Lantz, D. Detection of Side-Channel Attacks Targeting Intel SGX. Master’s Thesis, Linköping University, Linköping, Sweden, 2021.
7. Miketic, I.; Dhananjay, K.; Salman, E. Covert Channel Communication as an Emerging Security Threat in 2.5D/3D Integrated
Systems. Sensors 2023, 23, 2081. [CrossRef] [PubMed]
Cryptography 2025, 9, 51 20 of 21
8. Cho, H.; Zhang, P.; Kim, D.; Park, J.; Lee, C.H.; Zhao, Z.; Doupé, A.; Ahn, G.J. Prime+Count: Novel Cross-world Covert Channels
on ARM TrustZone. In Proceedings of the 34th Annual Computer Security Applications Conference, San Juan, Puerto Rico, USA,
3–7 December 2018; ACSAC ’18, pp. 441–452. [CrossRef]
9. Li, X.; Tyagi, A. Cross-World Covert Channel on ARM Trustzone through PMU. Sensors 2022, 22, 7354. [CrossRef] [PubMed]
10. x86. x86 and amd64 Instruction Reference. 2025. Available online: https://www.felixcloutier.com/x86/ (accessed on 1 May 2020).
11. Younis, Y.A.; Kifayat, K.; Shi, Q.; Askwith, B. A new prime and probe cache side-channel attack for cloud computing. In
Proceedings of the 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and
Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, Liverpool, UK, 26–28
October 2015; pp. 1718–1724.
12. Kocher, P.; Horn, J.; Fogh, A.; Genkin, D.; Gruss, D.; Haas, W.; Hamburg, M.; Lipp, M.; Mangard, S.; Prescher, T.; et al. Spectre
attacks: Exploiting speculative execution. Commun. ACM 2020, 63, 93–101. [CrossRef]
13. Zhang, X.; Wu, Y. PrefetchX: Cross-Core Channels via Shared Hardware Prefetchers. arXiv 2023, arXiv:2306.11195.
14. Barth, P.; Weiss, F.; Gotsman, A. BandwidthBreach: Exploiting Memory Pipeline Congestion for High-Speed Covert Channels.
arXiv 2023, arXiv:2306.01996.
15. Kumar, A.; Heninger, N. Write+Sync: Durable Covert Channels via Filesystem Flushes. arXiv 2023, arXiv:2312.11501.
16. Schwarz, M.; Lipp, M.; Gruss, D.; Mangard, S. SgxPectre Attacks: Stealing Intel Secrets from SGX Enclaves via Speculative
Execution. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019.
17. Vanderwiel, S.P.; Lilja, D.J. Data prefetch mechanisms. ACM Comput. Surv. (CSUR) 2000, 32, 174–199. [CrossRef]
18. Mittal, S. A survey of recent prefetching techniques for processor caches. ACM Comput. Surv. (CSUR) 2016, 49, 1–35. [CrossRef]
19. Kühn, R.; Mühlig, J.; Teubner, J. How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching.
In Proceedings of the 20th International Workshop on Data Management on New Hardware, Santiago, Chile, 10 June 2024.
[CrossRef]
20. Oren, N. A Survey of prefetching techniques. Relatório Técnico Julho De 2000 . Available online: https://www.academia.edu/
download/30738526/oren00survey.pdf (accessed on 15 July 2025).
21. TechOverflow. Advantages and Disadvantages of Hugepages. 2017. Available online: https://techoverflow.net/2017/02/18
/advantages-and-disadvantages-of-hugepages/ (accessed on 7 January 2025).
22. Ashwathnarayana, S. Understanding Huge Pages. 2023. Available online:
https://www.netdata.cloud/blog/understanding-huge-pages/ (accessed on 7 January 2025).
23. Red Hat. Chapter 9. What Huge Pages Do and How They Are Consumed by Applications; n.d. Available online:
https://docs.redhat.com/en/documentation/openshift_container_platform/4.2/html/scalability_and_performance/what-
huge-pages-do-and-how-they-are-consumed (accessed on 7 January 2025).
24. Luo, T.; Wang, X.; Hu, J.; Luo, Y.; Wang, Z. Improving TLB performance by increasing hugepage ratio. In Proceedings of
the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, China, 4–7 May 2015;
pp. 1139–1142.
25. Easyperf. Performance Benefits of Using Huge Pages for Code. 2022. Available online: https://easyperf.net/blog/2022/09/01
/Utilizing-Huge-Pages-For-Code (accessed on 7 January 2025).
26. Panwar, A.; Prasad, A.; Gopinath, K. Making huge pages actually useful. In Proceedings of the Twenty-Third International
Conference on Architectural Support for Programming Languages and Operating Systems, Williamsburg, VA, USA, 24–28 March
2018; pp. 679–692.
27. Fan, R. A Comprehensive Guide to Using Huge Pages in Oracle Databases. 2023. Available online: https://rootfan.com/huge-
pages-in-oracle/ (accessed on 7 January 2025).
28. Dutta, S.B.; Naghibijouybari, H.; Abu-Ghazaleh, N.; Marquez, A.; Barker, K. Leaky buddies: Cross-component covert channels on
integrated CPU-GPU systems. In Proceedings of the 48th Annual International Symposium on Computer Architecture, Valencia,
Spain, 14–18 June 2021; ISCA ’21, pp. 972–984. [CrossRef]
29. Cook, J.; Drean, J.; Behrens, J.; Yan, M. There’s always a bigger fish: A clarifying analysis of a machine-learning-assisted
side-channel attack. In Proceedings of the 49th Annual International Symposium on Computer Architecture, New York, NY,
USA, 18–22 June 2022; ISCA ’22, pp. 204–217. [CrossRef]
30. Van Bulck, J.; Minkin, M.; Weisse, O.; Genkin, D.; Kasikci, B.; Piessens, F.; Silberstein, M.; Wenisch, T.F.; Yarom, Y.; Strackx, R.
Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution. In Proceedings of the 27th
USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018.
31. ARM Limited. ARM Architecture Reference Manual, ARMv8, for ARMv8-A Architecture Profile. 2021. Available online:
https://developer.arm.com/documentation/ddi0487/latest (accessed on 5 April 2025 ).
32. AMD. Huge Pages and Performance Optimization. 2021. Available online: https://docs.amd.com/r/en-US/ug1586-onload-
user/Huge-Pages (accessed on 1 May 2025 ).
33. Li, X.; Tyagi, A. Block-active ADMM to Minimize NMF with Bregman Divergences. Sensors 2023, 23, 7229. [CrossRef] [PubMed]
Cryptography 2025, 9, 51 21 of 21
34. Gao, T.; Sun, S.; Liu, H.; Gao, H. Global Convergence in Neural ODEs: Impact of Activation Functions. In Proceedings of the
Thirteenth International Conference on Learning Representations (ICLR), Singapore, 24–28 April 2025.
35. Gao, T.; Huo, X.; Liu, H.; Gao, H. Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models. Adv.
Neural Inf. Process. Syst. (Neurips) 2023, 36, 54918–54951.
36. Gao, T.; Liu, H.; Liu, J.; Rajan, H.; Gao, H. A Global Convergence Theory for Deep ReLU Implicit Networks via Over-Parameterization.
In Proceedings of the International Conference on Learning Representations (ICLR), Virtual , 25–29 April 2022.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.