462 results sorted by ID
OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM
Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
Improved ML-DSA Hardware Implementation With First Order Masking Countermeasure
Kamal Raj, Prasanna Ravi, Tee Kiah Chia, Anupam Chattopadhyay
Implementation
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
Zewen Ye, Junhao Huang, Tianshun Huang, Yudan Bai, Jinze Li, Hao Zhang, Guangyan Li, Donglong Chen, Ray C.C. Cheung, Kejie Huang
Implementation
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
Efficient Boolean-to-Arithmetic Mask Conversion in Hardware
Aein Rezaei Shahmirzadi, Michael Hutter
Implementation
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
The SMAesH dataset
Gaëtan Cassiers, Charles Momin
Implementation
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
A Combined Design of 4-PLL-TRNG and 64-bit CDC-7-XPUF on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
32-bit and 64-bit CDC-7-XPUF Implementations on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Design and Implementation of a Fast, Platform-Adaptive, AIS-20/31 Compliant PLL-Based True Random Number Generator on a Zynq 7020 SoC FPGA
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms
Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede
Implementation
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
A Better Kyber Butterfly for FPGAs
Jonas Bertels, Quinten Norga, Ingrid Verbauwhede
Implementation
Kyber was selected by NIST as a Post-Quantum
Cryptography Key Encapsulation Mechanism standard. This
means that the industry now needs to transition and adopt
these new standards. One of the most demanding operations in
Kyber is the modular arithmetic, making it a suitable target for
optimization. This work offers a novel modular reduction design
with the lowest area on Xilinx FPGA platforms. This novel design,
through K-reduction and LUT-based reduction, utilizes 49 LUTs
and 1 DSP...
Fast Low Level Disk Encryption Using FPGAs
Debrup Chakraborty, Sebati Ghosh, Cuauhtemoc Mancillas Lopez, Palash Sarkar
Secret-key cryptography
A fixed length tweakable enciphering scheme (TES) is the appropriate cryptographic functionality for low level disk encryption. Research on TES over the last two decades have led to a number of proposals many of which have already been implemented using FPGAs. This paper considers the FPGA implementations of two more recent and promising TESs, namely AEZ and FAST. The relevant architectures are described and simulation results on the Xilinx Virtex 5 and Virtex 7 FPGAs are presented. For...
FELIX (XGCD for FALCON): FPGA-based Scalable and Lightweight Accelerator for Large Integer Extended GCD
Sam Coulon, Tianyou Bao, Jiafeng Xie
Implementation
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof
Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang
Implementation
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
Xinyi Ji, Jiankuo Dong, Junhao Huang, Zhijian Yuan, Wangchen Dai, Fu Xiao, Jingqiang Lin
Implementation
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
Rudraksh: A compact and lightweight post-quantum key-encapsulation mechanism
Suparna Kundu, Archisman Ghosh, Angshuman Karmakar, Shreyas Sen, Ingrid Verbauwhede
Public-key cryptography
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA
Jing Tian, Bo Wu, Lang Feng, Haochen Zhang, Zhongfeng Wang
Implementation
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Reading It like an Open Book: Single-trace Blind Side-channel Attacks on Garbled Circuit Frameworks
Sirui Shen, Chenglu Jin
Attacks and cryptanalysis
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers.
We investigate popular open-source GC frameworks and discover that the AES...
Side-Channel and Fault Resistant ASCON Implementation: A Detailed Hardware Evaluation (Extended Version)
Aneesh Kandi, Anubhab Baksi, Peizhou Gan, Sylvain Guilley, Tomáš Gerlich, Jakub Breier, Anupam Chattopadhyay, Ritu Ranjan Shrivastwa, Zdeněk Martinásek, Shivam Bhasin
Implementation
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
Time Sharing - A Novel Approach to Low-Latency Masking
Dilip Kumar S. V., Siemen Dhooghe, Josep Balasch, Benedikt Gierlichs, Ingrid Verbauwhede
Implementation
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
Glitch-Stopping Circuits: Hardware Secure Masking without Registers
Zhenda Zhang, Svetla Nikova, Ventzislav Nikov
Implementation
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator
Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara
Implementation
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
Vision Mark-32: ZK-Friendly Hash Function Over Binary Tower Fields
Tomer Ashur, Mohammad Mahzoun, Jim Posen, Danilo Šijačić
Implementation
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
BackMon: IC Backside Tamper Detection using On-Chip Impedance Monitoring
Tahoura Mosavirik, Shahin Tajik
Implementation
The expansion of flip-chip technologies and a lack of backside protection make the integrated circuit (IC) vulnerable to certain classes of physical attacks mounted from the IC's backside. Laser-assisted probing, electromagnetic, and body-biasing injection attacks are examples of such attacks. Unfortunately, there are few countermeasures proposed in the literature, and none are available commercially. Those that do exist are not only expensive but also incompatible with current IC...
Number-Theoretic Transform Architecture for Fully Homomorphic Encryption from Hypercube Topology
Jingwei Hu, Yuhong Fang, Wangchen Dai
Implementation
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes.
The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
Generalized Feistel Ciphers for Efficient Prime Field Masking - Full Version
Lorenzo Grassi, Loïc Masure, Pierrick Méaux, Thorben Moos, François-Xavier Standaert
Secret-key cryptography
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
1/0 Shades of UC: Photonic Side-Channel Analysis of Universal Circuits
Dev M. Mehta, Mohammad Hashemi, Domenic Forte, Shahin Tajik, Fatemeh Ganji
Attacks and cryptanalysis
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may...
SDitH in Hardware
Sanjay Deshpande, James Howe, Jakub Szefer, Dongze Yue
Implementation
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction.
This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
Blink: Breaking Lattice-Based Schemes Implemented in Parallel with Chosen-Ciphertext Attack
Jian Wang, Weiqiong Cao, Hua Chen, Haoyuan Li
Attacks and cryptanalysis
As the message recovery-based attack poses a serious threat to lattice-based schemes, we conducted a study on the side-channel secu- rity of parallel implementations of lattice-based key encapsulation mech- anisms. Initially, we developed a power model to describe the power leakage during message encoding. Utilizing this power model, we pro- pose a multi-ciphertext message recovery attack, which can retrieve the required messages for a chosen ciphertext attack through a suitable mes- sage...
In-depth Correlation Power Analysis Attacks on a Hardware Implementation of CRYSTALS-Dilithium
Huaxin Wang, Yiwen Gao, Yuejun Liu, Qian Zhang, Yongbin Zhou
Attacks and cryptanalysis
During the standardisation process of post-quantum cryptography, NIST encourages research on side-channel analysis for candidate schemes. As the recommended lattice signature scheme, CRYSTALS-Dilithium, when implemented on hardware, has only been subjected to the side-channel attack presented by Steffen et al. in IACR ePrint 2022. This attack is not complete and requires excessive traces. Therefore, we investigate the leakage of an FPGA (Kintex7) implementation of CRYSTALS-Dilithium using...
Falcon Takes Off - A Hardware Implementation of the Falcon Signature Scheme
Michael Schmid, Dorian Amiet, Jan Wendler, Paul Zbinden, Tao Wei
Implementation
Falcon is one out of three post-quantum signature schemes which have been selected for standardization by NIST in July 2022. To the best of our knowledge, Falcon is the only selected algorithm that does not yet have a publicly reported hardware description that performs signing or key generation. The reason might be that the Falcon signature and key generation algorithms do not fit well in hardware due to the use of floating-point numbers and recursive functions. This publication describes...
Accelerating Polynomial Multiplication for RLWE using Pipelined FFT
Neil Thanawala, Hamid Nejatollahi, Nikil Dutt
Implementation
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of
post quantum cryptography (PQC). Polynomial multiplication is the core of
Ring Learning with Error (RLWE) lattice based cryptography (LBC) which
is one of the most promising PQC candidates. In this work, we present the
design of fast and energy-efficient pipelined Number...
Aloha-HE: A Low-Area Hardware Accelerator for Client-Side Operations in Homomorphic Encryption
Florian Krieger, Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...
Who Watches the Watchers: Attacking Glitch Detection Circuits
Amund Askeland, Svetla Nikova, Ventzislav Nikov
Attacks and cryptanalysis
Over the last decades, fault injection attacks have been demonstrated to be an effective method for breaking the security of electronic devices. Some types of fault injection attacks, like clock and voltage glitching, require very few resources by the attacker and are practical and simple to execute. A cost-effective countermeasure against these attacks is the use of a detector circuit which detects timing violations - the underlying effect that glitch attacks rely on. In this paper, we take...
Designing Efficient and Flexible NTT Accelerators
Ahmet MALAL
Implementation
The Number Theoretic Transform (NTT) is a powerful mathematical tool with a wide range of applications in various fields, including signal processing, cryptography, and error correction codes. In recent years, there has been a growing interest in efficiently implementing the NTT on hardware platforms for lattice-based cryptography within the context of NIST's Post-Quantum Cryptography (PQC) competition. The implementation of NTT in cryptography stands as a pivotal advancement,...
A Black Box Attack Using Side Channel Analysis and Hardware Trojans
Raja Adhithan Radhakrishnan
Attacks and cryptanalysis
The emergence of hardware trojans as significant threats in
various aspects of hardware design, including Firmware, open-source IP,
and PCB design, has raised serious concerns. Simultaneously, AI technologies have been employed to simplify the complexity of Side Channel Analysis (SCA) attacks. Due to the increasing risk posed by these
threats, it becomes essential to test hardware by considering all possible
attack vectors. This paper aims to propose a black box attack using...
StaTI: Protecting against Fault Attacks Using Stable Threshold Implementations
Siemen Dhooghe, Artemii Ovchinnikov, Dilara Toprakhisar
Secret-key cryptography
Fault attacks impose a serious threat against the practical implementations of cryptographic algorithms. Statistical Ineffective Fault Attacks (SIFA), exploiting the dependency between the secret data and the fault propagation overcame many of the known countermeasures. Later, several countermeasures have been proposed to tackle this attack using error detection methods. However, the efficiency of the countermeasures, in part governed by the number of error checks, still remains a...
Threshold Implementations with Non-Uniform Inputs
Siemen Dhooghe, Artemii Ovchinnikov
Implementation
Modern block ciphers designed for hardware and masked with Threshold Implementations (TIs) provide provable security against first-order attacks. However, the application of TIs leaves designers to deal with a trade-off between its security and its cost, for example, the process to generate its required random bits. This generation cost comes with an increased overhead in terms of area and latency. Decreasing the number of random bits for the masking allows to reduce the aforementioned...
Efficient Hardware RNS Decomposition for Post-Quantum Signature Scheme FALCON
Samuel Coulon, Pengzhou He, Tianyou Bao, Jiafeng Xie
Implementation
The recently announced National Institute of Standards and Technology (NIST) Post-quantum cryptography (PQC) third-round standardization process has released its candidates to be standardized and Falcon is one of them. On the other hand, however, very few hardware implementation works for Falcon have been released due to its very complicated computation procedure and intensive complexity. With this background, in this paper, we propose an efficient hardware structure to implement residue...
Whipping the MAYO Signature Scheme using Hardware Platforms
Florian Hirner, Michael Streibl, Florian Krieger, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
NIST issued a new call in 2023 to diversify the portfolio of quantum-resistant digital signature schemes since the current portfolio relies on lattice problems. The MAYO scheme, which builds on the Unbalanced Oil and Vinegar (UOV) problem, is a promising candidate for this new call. MAYO introduces emulsifier maps and a novel 'whipping' technique to significantly reduce the key sizes compared to previous UOV schemes.
This paper provides a comprehensive analysis of the implementation...
A Note on ``Authenticated Key Agreement Protocol for Secure Communication Establishment in Vehicle-to-Grid Environment With FPGA Implementation''
Zhengjun Cao, Lihua Liu
Attacks and cryptanalysis
We show that the key agreement scheme [IEEE Trans. Veh. Technol. 71(4): 3470-3479, 2022] fails to keep user anonymity, not as claimed.
PicoEMP: A Low-Cost EMFI Platform Compared to BBI and Voltage Fault Injection using TDC and External VCC Measurements
Colin O'Flynn
Implementation
Electromagnetic Fault Injection (EMFI) has been demonstrated to be useful for both academic and industrial research. Due to the dangerous voltages involved, most work is done with commercial tools. This paper introduces a safety-focused low-cost and open-source design that can be built for less than \$50 using only off-the-shelf parts.
The paper also introduces an iCE40 based Time-to-Digital Converter (TDC), which is used to visualize the glitch inserted by the EMFI tool. This...
HaMAYO: A Fault-Tolerant Reconfigurable Hardware Implementation of the MAYO Signature Scheme
Oussama Sayari, Soundes Marzougui, Thomas Aulbach, Juliane Krämer, Jean-Pierre Seifert
Implementation
MAYO is a topical modification of the established multivariate signature scheme UOV. Signer and Verifier locally enlarge the public key map, such that the dimension of the oil space and therefore, the parameter sizes in general, can be reduced. This significantly reduces the public key size while maintaining the appealing properties of UOV, like short signatures and fast verification. Therefore, MAYO is considered as an attractive candidate in the NIST call for additional digital signatures...
Randomness Generation for Secure Hardware Masking - Unrolled Trivium to the Rescue
Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, Amir Moradi, François-Xavier Standaert
Implementation
Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are...
All You Need Is Fault: Zero-Value Attacks on AES and a New $\lambda$-Detection M&M
Haruka Hirata, Daiki Miyahara, Victor Arribas, Yang Li, Noriyuki Miura, Svetla Nikova, Kazuo Sakiyama
Attacks and cryptanalysis
Deploying cryptography on embedded systems requires security against physical attacks. At CHES 2019, M&M was proposed as a combined countermeasure applying masking against SCAs and information-theoretic MAC tags against FAs.
In this paper, we show that one of the protected AES implementations in the M&M paper is vulnerable to a zero-value SIFA2-like attack. A practical attack is demonstrated on an ASIC board.
We propose two versions of the attack: the first follows the SIFA approach to...
Mask Compression: High-Order Masking on Memory-Constrained Devices
Markku-Juhani O. Saarinen, Mélissa Rossi
Implementation
Masking is a well-studied method for achieving provable security against side-channel attacks. In masking, each sensitive variable is split into $d$ randomized shares, and computations are performed with those shares. In addition to the computational overhead of masked arithmetic, masking also has a storage cost, increasing the requirements for working memory and secret key storage proportionally with $d$.
In this work, we introduce mask compression. This conceptually simple technique is...
A Side-Channel Attack on a Masked Hardware Implementation of CRYSTALS-Kyber
Yanning Ji, Elena Dubrova
Attacks and cryptanalysis
NIST has recently selected CRYSTALS-Kyber as a new public key encryption and key establishment algorithm to be standardized. This makes it important to evaluate the resistance of CRYSTALS-Kyber implementations to side-channel attacks. Software implementations of CRYSTALS-Kyber have already been thoroughly analysed. The discovered vulnerabilities helped improve the subsequently released versions and promoted stronger countermeasures against side-channel attacks. In this paper, we present the...
BALoo: First and Efficient Countermeasure dedicated to Persistent Fault Attacks
Pierre-Antoine Tissot, Lilian Bossuet, Vincent Grosso
Implementation
Persistent fault analysis is a novel and efficient cryptanalysis method. The persistent fault attacks take advantage of a persistent fault injected in a non-volatile memory, then present on the device until the reboot of the device. Contrary to classical physical fault injection, where differential analysis can be performed, persistent fault analysis requires new analyses and dedicated countermeasures. Persistent fault analysis requires a persistent fault injected in the S-box such that the...
Stealthy Logic Misuse for Power Analysis Attacks in Multi-Tenant FPGAs (Extended Version)
Vincent Meyers, Dennis R. E. Gnad, Nguyen Minh Dang, Falk Schellenberg, Amir Moradi, Mehdi B. Tahoori
Implementation
FPGAs have been used in the cloud since several years, as accelerators for various workloads such as machine learning, database processes and security tasks. As for other cloud services, a highly desired feature is virtualization in which multiple tenants can share a single FPGA to increase utilization and by that efficiency. By solely using standard FPGA logic in the untrusted tenant, on-chip logic sensors allow remote power analysis side-channel and covert channel attacks on the victim...
A Hardware-Software Co-Design for the Discrete Gaussian Sampling of FALCON Digital Signature
Emre Karabulut, Aydin Aysu
Implementation
Sampling random values from a discrete Gaussian distribution with high precision is a major and computationally intensive operation of upcoming or existing cryptographic standards. FALCON is one such algorithm that the National Institute of Standards and Technology chose to standardize as a next-generation, quantum-secure digital signature algorithm. The discrete Gaussian sampling of FALCON has both flexibility and efficiency needs—it constitutes 72% of total signature generation in...
Automated Generation of Masked Nonlinear Components: From Lookup Tables to Private Circuits
Lixuan Wu, Yanhong Fan, Bart Preneel, Weijia Wang, Meiqin Wang
Implementation
Masking is considered to be an essential defense mechanism against side-channel attacks, but it is challenging to be adopted for hardware cryptographic implementations, especially for high security orders. Recently, Knichel et al. proposed an automated tool called AGEMA that enables the generation of masked implementations in hardware for arbitrary security orders using composable gadgets. This accelerates the construction and practical application of masking schemes. This article proposes a...
Securing IoT Devices with Fast and Energy Efficient Implementation of PRIDE and PRESENT Ciphers
Vijay Dahiphale, Hrishikesh Raut, Gaurav Bansod, Devendra Dahiphale
Implementation
The rise of low-power, cost-efficient internet-connected devices has led to a need for lightweight cryptography. The lightweight block cipher PRIDE, designed by Martin R. Albrecht, is one of the most efficient ciphers designed for IoT-constrained environments. It is useful for connected devices, requires fewer resources to implement, and has high performance. PRIDE is a software-oriented lightweight cipher optimized for microcontrollers. This paper focuses on the FPGA implementation of the...
Hardware Acceleration of FHEW
Jonas Bertels, Michiel Van Beirendonck, Furkan Turan, Ingrid Verbauwhede
Implementation
The magic of Fully Homomorphic Encryption (FHE) is that it allows operations on encrypted data without decryption. Unfortunately, the slow computation time limits their adoption. The slow computation time results from the vast memory requirements (64Kbits per ciphertext), a bootstrapping key of 1.3 GB, and sizeable computational overhead (10240 NTTs, each NTT requiring 5120 32-bit multiplications). We accelerate the FHEW bootstrapping in hardware on a high-end U280 FPGA.
To reduce the...
HLG: A framework for computing graphs in Residue Number System and its application in Fully Homomorphic Encryption
Shuang Wu, Chunhuan Zhao, Ye Yuan, Shuzhou Sun, Jie Li, Yamin Liu
Implementation
Implementation of Fully Homomorphic Encryption (FHE) is challenging. Especially when considering hardware acceleration, the major performance bottleneck is data transfer. Here we propose an algebraic framework called Heterogenous Lattice Graph (HLG) to build and process computing graphs in Residue Number System (RNS), which is the basis of high performance implementation of mainstream FHE algorithms.
There are three main design goals for HLG framework:
• Design a dedicated IR (HLG...
CPU to FPGA Power Covert Channel in FPGA-SoCs
Mathieu Gross, Robert Kunzelmann, Georg Sigl
Attacks and cryptanalysis
FPGA-SoCs are a popular platform for accelerating a wide
range of applications due to their performance and flexibility. From a
security point of view, these systems have been shown to be vulnerable
to various attacks, especially side-channel attacks where an attacker can
obtain the secret key of a cryptographic algorithm via laboratory mea-
surement equipment or even remotely with sensors implemented inside
the FPGA logic itself. Fortunately, a variety of countermeasures on...
AI Attacks AI: Recovering Neural Network architecture from NVDLA using AI-assisted Side Channel Attack
Naina Gupta, Arpan Jati, Anupam Chattopadhyay
Attacks and cryptanalysis
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
Generating Secure Hardware using ChatGPT Resistant to CWEs
Madhav Nair, Rajat Sadhukhan, Debdeep Mukhopadhyay
Applications
The development of Artificial Intelligence (AI) based systems to automatically generate hardware systems has gained an impulse that aims to accelerate the hardware design cycle with no human intervention. Recently, the striking AI-based system ChatGPT from OpenAI has achieved a momentous headline and has gone viral within a short span of time since its launch. This chatbot has the capability to interactively communicate with the designers through a prompt to generate software and hardware...
Gate-Level Masking of Streamlined NTRU Prime Decapsulation in Hardware
Georg Land, Adrian Marotzke, Jan Richter-Brockmann, Tim Güneysu
Implementation
Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism
(KEM) that is, together with X25519, currently the default algorithm in OpenSSH 9. Being based on lattice assumptions, it is assumed to be secure also against attackers with access to large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in the recent years, challenges remain with respect to protection mechanisms against attackers that have additional...
Bake It Till You Make It: Heat-induced Power Leakage from Masked Neural Networks
Dev M. Mehta, Mohammad Hashemi, David S. Koblah, Domenic Forte, Fatemeh Ganji
Applications
Masking has become one of the most effective approaches for securing hardware designs against side-channel attacks. Regardless of the effort put into correctly implementing masking schemes on a field-programmable gate array (FPGA), leakage can be unexpectedly observed. This is due to the fact that the assumption underlying all masked designs, i.e., the leakages of different shares are independent of each other, may no longer hold in practice. In this regard, extreme temperatures have been...
Oil and Vinegar: Modern Parameters and Implementations
Ward Beullens, Ming-Shing Chen, Shih-Hao Hung, Matthias J. Kannwischer, Bo-Yuan Peng, Cheng-Jhih Shih, Bo-Yin Yang
Implementation
Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, either made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We...
Side-Channel Resistant Implementation Using Arbiter PUF
Raja Adhithan RadhaKrishnan
Implementation
The goals of cryptography are achieved using mathematically strong crypto-algorithms, which are adopted for securing data and
communication. Even though the algorithms are mathematically
secure, the implementation of these algorithms may be vulnerable to
side-channel attacks such as timing and power analysis attacks. One
of the effective countermeasures against such attacks is Threshold Implementation(TI). However, TI realization in crypto-device introduces
hardware complexity, so it...
RDS: FPGA Routing Delay Sensors for Effective Remote Power Analysis Attacks
David Spielmann, Ognjen Glamocanin, Mirjana Stojilovic
Implementation
State-of-the-art sensors for measuring FPGA voltage fluctuations are time-to-digital converters (TDCs). They allow detecting voltage fluctuations in the order of a few nanoseconds. The key building component of a TDC is a delay line, typically implemented as a chain of fast carry propagation multiplexers. In FPGAs, the fast carry chains are constrained to dedicated logic and routing, and need to be routed strictly vertically. In this work, we present an alternative approach to designing...
A Closer Look at the Chaotic Ring Oscillators based TRNG Design
Shuqin Su, Bohan Yang, Vladimir Rožić, Mingyuan Yang, Min Zhu, Shaojun Wei, Leibo Liu
Implementation
TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golić (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify...
A Holistic Approach Towards Side-Channel Secure Fixed-Weight Polynomial Sampling
Markus Krausz, Georg Land, Jan Richter-Brockmann, Tim Güneysu
Implementation
The sampling of polynomials with fixed weight is a procedure required by round-4 Key Encapsulation Mechanisms (KEMs) for Post-Quantum Cryptography (PQC) standardization (BIKE, HQC, McEliece) as well as NTRU, Streamlined NTRU Prime, and NTRU LPRrime. Recent attacks have shown in this context that side-channel leakage of sampling methods can be exploited for key recoveries. While countermeasures regarding such timing attacks have already been presented, still, there is
no comprehensive work...
Area-time Efficient Implementation of NIST Lightweight Hash Functions Targeting IoT Applications
Safiullah Khan, Wai-Kong Lee, Angshuman Karmakar, Jose Maria Bermudo Mera, Abdul Majeed, Seong Oun Hwang
Implementation
To mitigate cybersecurity breaches, secure communication is crucial for the Internet of Things (IoT) environment. Data integrity is one of the most significant characteristics of security, which can be achieved by employing cryptographic hash functions. In view of the demand from IoT applications, the National Institute of Standards and Technology (NIST) initiated a standardization process for lightweight hash functions. This work presents field-programmable gate array (FPGA) implementations...
CycloneNTT: An NTT/FFT Architecture Using Quasi-Streaming of Large Datasets on DDR- and HBM-based FPGA Platforms
Kaveh Aasaraai, Emanuele Cesena, Rahul Maganti, Nicolas Stalder, Javier Varela, Kevin Bowers
Implementation
Number-Theoretic-Transform (NTT) is a variation of Fast-Fourier-Transform (FFT) on finite fields. NTT is being increasingly used in blockchain and zero-knowledge proof applications. Although FFT and NTT are widely studied for FPGA implementation, we believe CycloneNTT is the first to solve this problem for large data sets ($\ge2^{24}$, 64-bit numbers) that would not fit in the on-chip RAM. CycloneNTT uses a state-of-the-art butterfly network and maps the dataflow to hybrid FIFOs composed of...
FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption
Michiel Van Beirendonck, Jan-Pieter D'Anvers, Furkan Turan, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation.
We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
A Masked Pure-Hardware Implementation of Kyber Cryptographic Algorithm
Tendayi Kamucheka, Alexander Nelson, David Andrews, Miaoqing Huang
Security against side-channel assisted attacks remains a focus and concern in the ongoing standardization process of quantum-computer-resistant cryptography algorithms. Hiding and masking techniques are currently under investigation to protect the Post-Quantum Cryptography (PQC) algorithms in the NIST PQC standardization process against sophisticated side-channel attacks. Between hiding and masking, masking is emerging as a popular option due to its simplicity and minimized cost of...
A Side-Channel Attack on a Hardware Implementation of CRYSTALS-Kyber
Yanning Ji, Ruize Wang, Kalle Ngo, Elena Dubrova, Linus Backlund
Attacks and cryptanalysis
CRYSTALS-Kyber has been recently selected by the NIST as a new public-key encryption and key-establishment algorithm to be standardized. This makes it important to assess how well CRYSTALS-Kyber implementations withstand side-channel attacks. Software implementations of CRYSTALS-Kyber have been already analyzed and the discovered vulnerabilities were patched in the subsequently released versions. In this paper, we present a profiling side-channel attack on a hardware implementation of...
Towards Automating Cryptographic Hardware Implementations: a Case Study of HQC
Carlos Aguilar-Melchor, Jean-Christophe Deneuville, Arnaud Dion, James Howe, Romain Malmain, Vincent Migliore, Mamuri Nawan, Kashif Nawaz
Implementation
While hardware implementations allow the production of highly efficient and performance oriented designs, exploiting features such as parallelization, their longer time to code and implement often bottlenecks rapid prototyping. On the other hand, high-level synthesis (HLS) tools allow for faster experimentation of software code to a hardware platform while demonstrating a reasonable extrapolation of the expected hardware behavior. In this work, we attempt to show a rapid, fast prototyping of...
Side-Channel Attack Countermeasures Based On Clock Randomization Have a Fundamental Flaw
Martin Brisfors, Michail Moraitis, Elena Dubrova
Implementation
Clock randomization is one of the oldest countermeasures against side-channel attacks. Various implementations have been presented in the past, along with positive security evaluations. However, in this paper we show that it is possible to break countermeasures based on a randomized clock by sampling side-channel measurements at a frequency much higher than the encryption clock, synchronizing the traces with pre-processing, and targeting the beginning of the encryption.
We demonstrate a...
Breaking and Protecting the Crystal: Side-Channel Analysis of Dilithium in Hardware
Hauke Steffen, Georg Land, Lucie Kogelheide, Tim Güneysu
Attacks and cryptanalysis
The lattice-based CRYSTALS-Dilithium signature scheme has been selected for standardization by the NIST. As part of the selection process, a large number of implementations for platforms like x86, ARM Cortex-M4, or – on the hardware side – Xilinx Artix-7 have been presented and discussed by experts. While software implementations have been subject to side-channel analysis with several attacks being published, an analysis of Dilithium hardware implementations and their peculiarities has not...
FPGA Acceleration of Multi-Scalar Multiplication: CycloneMSM
Kaveh Aasaraai, Don Beaver, Emanuele Cesena, Rahul Maganti, Nicolas Stalder, Javier Varela
Implementation
Multi-Scalar Multiplication (MSM) on elliptic curves is one of the primitives and bottlenecks at the core of many zero-knowledge proof systems. Speeding up MSM typically results in faster proof generation, which in turn makes ZK-based applications practical.
We focus on accelerating large MSM on FPGA, and we present speed records for $\texttt{BLS12-377}$ on FPGA: 5.66s for $N=2^{26}$, sub-second for $N=2^{22}$.
We developed a fully-pipelined curve adder in extended Twisted Edwards...
Correlation Electromagnetic Analysis on an FPGA Implementation of CRYSTALS-Kyber
Rafael Carrera Rodriguez, Florent Bruguier, Emanuele Valea, Pascal Benoit
Attacks and cryptanalysis
Post-quantum cryptography represents a category of cryptosystems resistant to quantum algorithms. Recently, NIST launched a process to standardize one or more of such algorithms in the key encapsulation mechanism and signature categories. Such schemes are under the scrutiny of their mathematical security, but they are not side-channel secure at the algorithm level. That is why their side-channel vulnerabilities must be assessed by the research community. In this paper, we present a...
Exploring RNS for Isogeny-based Cryptography
David Jacquemin, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method...
Second-Order Low-Randomness $d+1$ Hardware Sharing of the AES
Siemen Dhooghe, Aein Rezaei Shahmirzadi, Amir Moradi
Implementation
In this paper, we introduce a second-order masking of the AES using the minimal number of shares and a total of 1268 bits of randomness including the sharing of the plaintext and key. The masking of the S-box is based on the tower field decomposition of the inversion over bytes where the changing of the guards technique is used in order to re-mask the middle branch of the decomposition. The sharing of the S-box is carefully crafted such that it achieves first-order probing security without...
Fast and Efficient Hardware Implementation of HQC
Sanjay Deshpande, Chuanqi Xu, Mamuri Nawan, Kashif Nawaz, Jakub Szefer
Implementation
This work presents a hardware design for constant-time implementation of the HQC (Hamming Quasi-Cyclic) code-based key encapsulation mechanism. HQC has been selected for the fourth round of NIST's Post-Quantum Cryptography standardization process and this work presents the first, hand-optimized design of HQC key generation, encapsulation, and decapsulation written in Verilog targeting implementation on FPGAs. The three modules further share a common SHAKE256 hash module to reduce area...
KaLi: A Crystal for Post-Quantum Security using Kyber and Dilithium
Aikata Aikata, Ahmet Can Mert, Malik Imran, Samuel Pagliarini, Sujoy Sinha Roy
Implementation
Quantum computers pose a threat to the security of communications over the internet. This imminent risk has led to the standardization of cryptographic schemes for protection in a post-quantum scenario. We present a design methodology for future implementations of such algorithms. This is manifested using the NIST selected digital signature scheme CRYSTALS-Dilithium and key encapsulation scheme CRYSTALS-Kyber. A unified architecture, \crystal, is proposed that can perform key generation,...
PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication
Charles. F. Xavier
Foundations
Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck.
In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest.
We implemented...
XOR Compositions of Physically Related Functions
Harishma Boyapally, Sikhar Patranabis, Debdeep Mukhopadhyay
Foundations
Physically related functions~(PReFs) are hardware primitives proposed to establish key-exchange between resource-constrained devices with no pre-established secrets. In this paper, we introduce XOR composition of PReFs to eliminate the requirement of revealing the complete functionality of the hardware primitive during the setup phase, which is a prerequisite to setup PReFs. We evaluate the quality of XOR\_PReF design by implementing them on Artix-7 FPGAs.
Logic Locking - Connecting Theory and Practice
Elisaweta Masserova, Deepali Garg, Ken Mai, Lawrence Pileggi, Vipul Goyal, Bryan Parno
Due to the complexity and the cost of producing integrated circuits, most hardware circuit designers outsource the manufacturing of their circuits to a third-party foundry. However, a dishonest foundry may abuse its access to the circuit's design in a variety of ways that undermine the designer's investment or potentially introduce vulnerabilities.
To combat these issues, the hardware community has developed the notion of logic locking, which allows the designer to send the foundry a...
A Key-Recovery Side-Channel Attack on Classic McEliece
Qian Guo, Andreas Johansson, Thomas Johansson
Public-key cryptography
In this paper, we propose the first key-recovery side-channel attack on Classic McEliece, a KEM finalist in the NIST Post-quantum Cryptography Standardization Project. Our novel idea is to design an attack algorithm where we submit special ciphertexts to the decryption oracle that correspond to cases of single errors. Decoding of such cipher-texts involves only a single entry in a large secret permutation, which is part of the secret key. Through an identified leakage in the additive...
Lightweight Hardware Accelerator for Post-Quantum Digital Signature CRYSTALS-Dilithium
Naina Gupta, Arpan Jati, Anupam Chattopadhyay, Gautam Jha
Implementation
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature.
By invoking multiple...
Medha: Microcoded Hardware Accelerator for computing on Encrypted Data
Ahmet Can Mert, Aikata, Sunmin Kwon, Youngsam Shin, Donghoon Yoo, Yongwoo Lee, Sujoy Sinha Roy
Implementation
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data.
First, we propose a divide-and-conquer technique that...
Subverting Cryptographic Hardware used in Blockchain Consensus
Pratyush Ranjan Tiwari, Matthew Green
Applications
In this work, we study and formalize security notions for algorithm substitution attacks (ASAs) on em cryptographic puzzles. Puzzles are difficult problems that require an investment of computation, memory, or some other related resource. They are heavily used as a building block for the consensus networks used by cryptocurrencies. These include primitives such as proof-of-work, proof-of-space, and verifiable delay functions (VDFs). Due to economies of scale, these networks increasingly rely...
Complete and Improved FPGA Implementation of Classic McEliece
Po-Jen Chen, Tung Chou, Sanjay Deshpande, Norman Lahr, Ruben Niederhagen, Jakub Szefer, Wen Wang
Implementation
We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST's Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming...
A High-performance ECC Processor over Curve448 based on a Novel Variant of the Karatsuba Formula for Asymmetric Digit Multiplier
Asep Muhamad Awaludin, Jonguk Park, Rini Wisnu Wardhani, Howon Kim
Implementation
In this paper, we present a high-performance architecture for elliptic curve cryptography (ECC) over Curve448, which to the best of our knowledge, is the fastest implementation of ECC point multiplication over Curve448 to date. Firstly, we introduce a novel variant of the Karatsuba formula for asymmetric digit multiplier, suitable for typical DSP primitive with asymmetric input. It reduces the number of required DSPs compared to previous work and preserves the performance via full...
FPGA Design Deobfuscation by Iterative LUT Modification at Bitstream Level
Michail Moraitis, Elena Dubrova
Implementation
Hardware obfuscation through redundancy addition is a well-known countermeasure against reverse engineering. For FPGA designs, such a technique can be implemented with a small overhead, however, its effectiveness is heavily dependent on the stealthiness of the redundant elements. Hardware opaque predicates can provide adequately stealthy constant values that can be used for obfuscation. However, in this report, we show that such obfuscation schemes can be defeated by ensuring the full...
Unprotected and Masked Hardware Implementations of Spook v2
Charles Momin, Gaëtan Cassiers, François-Xavier Standaert
Implementation
We describe FPGA implementations of the Spook candidate to the NIST lightweight cryptography competition in two flavors. First, unprotected implementations that exhibit the excellent throughput and energy consumption for the area target specified by the NIST benchmarking initiative. Second, protected implementations leveraging the leveled implementation concept that the Spook design enables and confirming the significant performance gains that it enables.
High-Performance Hardware Implementation of Lattice-Based Digital Signatures
Luke Beckwith, Duc Tri Nguyen, Kris Gaj
Implementation
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
RevEAL: Single-Trace Side-Channel Leakage of the SEAL Homomorphic Encryption Library
Furkan Aydin, Emre Karabulut, Seetal Potluri, Erdem Alkim, Aydin Aysu
This paper demonstrates the first side-channel attack on homomorphic encryption (HE), which allows computing on encrypted data. We reveal a power-based side-channel leakage of Microsoft SEAL prior to v3.6 that implements the Brakerski/Fan-Vercauteren (BFV) protocol. Our proposed attack targets the Gaussian sampling in the SEAL’s encryption phase and can extract the entire message with a single power measurement.
Our attack works by (1) identifying each coefficient index being sampled, (2)...
Attacks on the Firekite cipher
Thomas Johansson, Willi Meier, Vu Nguyen
Secret-key cryptography
Firekite is a synchronous stream cipher using a pseudo-random number generator (PRNG) whose security relies on the hardness of the \textit{Learning Parity with Noise} (LPN) problem. It is one of a few LPN-based symmetric encryption schemes and it can be very efficiently implemented on a low-end SoC FPGA. The designers, Bogos, Korolija, Locher, and Vaudenay, demonstrated appealing properties of Firekite such as requiring only one source of cryptographically strong bits, small key size, high...
High-Speed and Unified ECC Processor for Generic Weierstrass Curves over GF(p) on FPGA
Asep Muhamad Awaludin, Harashta Tatimma Larasati, Howon Kim
Implementation
In this paper, we present a high-speed, unified elliptic curve cryptography (ECC) processor for arbitrary Weierstrass curves over GF(p), which to the best of our knowledge, outperforms other similar works in terms of execution time. Our approach employs the combination of the schoolbook long and Karatsuba multiplication algorithm for the elliptic curve point multiplication (ECPM) to achieve better parallelization while retaining low complexity. In the hardware implementation, the substantial...
Where Star Wars Meets Star Trek: SABER and Dilithium on the Same Polynomial Multiplier
Andrea Basso, Furkan Aydin, Daniel Dinu, Joseph Friel, Avinash Varna, Manoj Sastry, Santosh Ghosh
Implementation
Secure communication often require both encryption and digital signatures to guarantee the confidentiality of the message and the authenticity of the parties. However, post-quantum cryptographic protocols are often studied independently. In this work, we identify a powerful synergy between two finalist protocols in the NIST standardization process. In particular, we propose a technique that enables SABER and Dilithium to share the exact same polynomial multiplier. Since polynomial...
An End-to-End Bitstream Tamper Attack Against Flip-Chip FPGAs
Fahim Rahman, Farimah Farahmandi, Mark Tehranipoor
Applications
FPGA bitstream encryption and authentication can be defeated by various techniques and it is critical to understand how these vulnerabilities enable extraction and tampering of commercial FPGA bitstreams. We exploit the physical vulnerability of bitstream encryption keys to readout using failure analysis equipment and conduct an end-to-end bitstream tamper attack. Our work underscores the feasibility of supply chain bitstream tampering and the necessity of guarding against such attacks in...
Light-OCB: Parallel Lightweight Authenticated Cipher with Full Security
Avik Chakraborti, Nilanjan Datta, Ashwin Jha, Cuauhtemoc Manicillas Lopez, Mridul Nandi
Secret-key cryptography
This paper proposes a lightweight authenticated encryption (AE) scheme, called Light-OCB, which can be viewed as a lighter variant of the CAESAR winner OCB as well as a faster variant of the high profi
le NIST LWC competition submission LOCUS-AEAD. Light-OCB is structurally similar to LOCUS-AEAD and uses a nonce-based derived key that provides optimal security, and short-tweak tweakable blockcipher (tBC) for efficient domain separation. Light-OCB improves over LOCUS-AEAD by reducing the...
Ark of the ECC: An open-source ECDSA power analysis attack on a FPGA based Curve P-256 implementation
Jean-Pierre Thibault, Colin O’Flynn, Alex Dewar
Public-key cryptography
Power analysis attacks on ECC have been presented since almost the very beginning of DPA itself, even before the standardization of AES. Given that power analysis attacks against AES are well known and have a large body of practical artifacts to demonstrate attacks on both software and hardware implementations, it is surprising that these artifacts are generally lacking for ECC. In this work we begin to remedy this by providing a complete open-source ECDSA attack artifact, based on a...
High-Speed Hardware Architectures and FPGA Benchmarking of CRYSTALS-Kyber, NTRU, and Saber
Viet Ba Dang, Kamyar Mohajerani, Kris Gaj
Implementation
Performance in hardware has typically played a significant role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major...
A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange
Aikata Aikata, Ahmet Can Mert, David Jacquemin, Amitabh Das, Donald Matthews, Santosh Ghosh, Sujoy Sinha Roy
Implementation
We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme 'CRYSTALS-Dilithium' and the key encapsulation mechanism (KEM) 'Saber', both finalists in the NIST’s post-quantum...
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
Kyber was selected by NIST as a Post-Quantum Cryptography Key Encapsulation Mechanism standard. This means that the industry now needs to transition and adopt these new standards. One of the most demanding operations in Kyber is the modular arithmetic, making it a suitable target for optimization. This work offers a novel modular reduction design with the lowest area on Xilinx FPGA platforms. This novel design, through K-reduction and LUT-based reduction, utilizes 49 LUTs and 1 DSP...
A fixed length tweakable enciphering scheme (TES) is the appropriate cryptographic functionality for low level disk encryption. Research on TES over the last two decades have led to a number of proposals many of which have already been implemented using FPGAs. This paper considers the FPGA implementations of two more recent and promising TESs, namely AEZ and FAST. The relevant architectures are described and simulation results on the Xilinx Virtex 5 and Virtex 7 FPGAs are presented. For...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers. We investigate popular open-source GC frameworks and discover that the AES...
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
The expansion of flip-chip technologies and a lack of backside protection make the integrated circuit (IC) vulnerable to certain classes of physical attacks mounted from the IC's backside. Laser-assisted probing, electromagnetic, and body-biasing injection attacks are examples of such attacks. Unfortunately, there are few countermeasures proposed in the literature, and none are available commercially. Those that do exist are not only expensive but also incompatible with current IC...
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes. The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may...
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
As the message recovery-based attack poses a serious threat to lattice-based schemes, we conducted a study on the side-channel secu- rity of parallel implementations of lattice-based key encapsulation mech- anisms. Initially, we developed a power model to describe the power leakage during message encoding. Utilizing this power model, we pro- pose a multi-ciphertext message recovery attack, which can retrieve the required messages for a chosen ciphertext attack through a suitable mes- sage...
During the standardisation process of post-quantum cryptography, NIST encourages research on side-channel analysis for candidate schemes. As the recommended lattice signature scheme, CRYSTALS-Dilithium, when implemented on hardware, has only been subjected to the side-channel attack presented by Steffen et al. in IACR ePrint 2022. This attack is not complete and requires excessive traces. Therefore, we investigate the leakage of an FPGA (Kintex7) implementation of CRYSTALS-Dilithium using...
Falcon is one out of three post-quantum signature schemes which have been selected for standardization by NIST in July 2022. To the best of our knowledge, Falcon is the only selected algorithm that does not yet have a publicly reported hardware description that performs signing or key generation. The reason might be that the Falcon signature and key generation algorithms do not fit well in hardware due to the use of floating-point numbers and recursive functions. This publication describes...
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of post quantum cryptography (PQC). Polynomial multiplication is the core of Ring Learning with Error (RLWE) lattice based cryptography (LBC) which is one of the most promising PQC candidates. In this work, we present the design of fast and energy-efficient pipelined Number...
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...
Over the last decades, fault injection attacks have been demonstrated to be an effective method for breaking the security of electronic devices. Some types of fault injection attacks, like clock and voltage glitching, require very few resources by the attacker and are practical and simple to execute. A cost-effective countermeasure against these attacks is the use of a detector circuit which detects timing violations - the underlying effect that glitch attacks rely on. In this paper, we take...
The Number Theoretic Transform (NTT) is a powerful mathematical tool with a wide range of applications in various fields, including signal processing, cryptography, and error correction codes. In recent years, there has been a growing interest in efficiently implementing the NTT on hardware platforms for lattice-based cryptography within the context of NIST's Post-Quantum Cryptography (PQC) competition. The implementation of NTT in cryptography stands as a pivotal advancement,...
The emergence of hardware trojans as significant threats in various aspects of hardware design, including Firmware, open-source IP, and PCB design, has raised serious concerns. Simultaneously, AI technologies have been employed to simplify the complexity of Side Channel Analysis (SCA) attacks. Due to the increasing risk posed by these threats, it becomes essential to test hardware by considering all possible attack vectors. This paper aims to propose a black box attack using...
Fault attacks impose a serious threat against the practical implementations of cryptographic algorithms. Statistical Ineffective Fault Attacks (SIFA), exploiting the dependency between the secret data and the fault propagation overcame many of the known countermeasures. Later, several countermeasures have been proposed to tackle this attack using error detection methods. However, the efficiency of the countermeasures, in part governed by the number of error checks, still remains a...
Modern block ciphers designed for hardware and masked with Threshold Implementations (TIs) provide provable security against first-order attacks. However, the application of TIs leaves designers to deal with a trade-off between its security and its cost, for example, the process to generate its required random bits. This generation cost comes with an increased overhead in terms of area and latency. Decreasing the number of random bits for the masking allows to reduce the aforementioned...
The recently announced National Institute of Standards and Technology (NIST) Post-quantum cryptography (PQC) third-round standardization process has released its candidates to be standardized and Falcon is one of them. On the other hand, however, very few hardware implementation works for Falcon have been released due to its very complicated computation procedure and intensive complexity. With this background, in this paper, we propose an efficient hardware structure to implement residue...
NIST issued a new call in 2023 to diversify the portfolio of quantum-resistant digital signature schemes since the current portfolio relies on lattice problems. The MAYO scheme, which builds on the Unbalanced Oil and Vinegar (UOV) problem, is a promising candidate for this new call. MAYO introduces emulsifier maps and a novel 'whipping' technique to significantly reduce the key sizes compared to previous UOV schemes. This paper provides a comprehensive analysis of the implementation...
We show that the key agreement scheme [IEEE Trans. Veh. Technol. 71(4): 3470-3479, 2022] fails to keep user anonymity, not as claimed.
Electromagnetic Fault Injection (EMFI) has been demonstrated to be useful for both academic and industrial research. Due to the dangerous voltages involved, most work is done with commercial tools. This paper introduces a safety-focused low-cost and open-source design that can be built for less than \$50 using only off-the-shelf parts. The paper also introduces an iCE40 based Time-to-Digital Converter (TDC), which is used to visualize the glitch inserted by the EMFI tool. This...
MAYO is a topical modification of the established multivariate signature scheme UOV. Signer and Verifier locally enlarge the public key map, such that the dimension of the oil space and therefore, the parameter sizes in general, can be reduced. This significantly reduces the public key size while maintaining the appealing properties of UOV, like short signatures and fast verification. Therefore, MAYO is considered as an attractive candidate in the NIST call for additional digital signatures...
Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are...
Deploying cryptography on embedded systems requires security against physical attacks. At CHES 2019, M&M was proposed as a combined countermeasure applying masking against SCAs and information-theoretic MAC tags against FAs. In this paper, we show that one of the protected AES implementations in the M&M paper is vulnerable to a zero-value SIFA2-like attack. A practical attack is demonstrated on an ASIC board. We propose two versions of the attack: the first follows the SIFA approach to...
Masking is a well-studied method for achieving provable security against side-channel attacks. In masking, each sensitive variable is split into $d$ randomized shares, and computations are performed with those shares. In addition to the computational overhead of masked arithmetic, masking also has a storage cost, increasing the requirements for working memory and secret key storage proportionally with $d$. In this work, we introduce mask compression. This conceptually simple technique is...
NIST has recently selected CRYSTALS-Kyber as a new public key encryption and key establishment algorithm to be standardized. This makes it important to evaluate the resistance of CRYSTALS-Kyber implementations to side-channel attacks. Software implementations of CRYSTALS-Kyber have already been thoroughly analysed. The discovered vulnerabilities helped improve the subsequently released versions and promoted stronger countermeasures against side-channel attacks. In this paper, we present the...
Persistent fault analysis is a novel and efficient cryptanalysis method. The persistent fault attacks take advantage of a persistent fault injected in a non-volatile memory, then present on the device until the reboot of the device. Contrary to classical physical fault injection, where differential analysis can be performed, persistent fault analysis requires new analyses and dedicated countermeasures. Persistent fault analysis requires a persistent fault injected in the S-box such that the...
FPGAs have been used in the cloud since several years, as accelerators for various workloads such as machine learning, database processes and security tasks. As for other cloud services, a highly desired feature is virtualization in which multiple tenants can share a single FPGA to increase utilization and by that efficiency. By solely using standard FPGA logic in the untrusted tenant, on-chip logic sensors allow remote power analysis side-channel and covert channel attacks on the victim...
Sampling random values from a discrete Gaussian distribution with high precision is a major and computationally intensive operation of upcoming or existing cryptographic standards. FALCON is one such algorithm that the National Institute of Standards and Technology chose to standardize as a next-generation, quantum-secure digital signature algorithm. The discrete Gaussian sampling of FALCON has both flexibility and efficiency needs—it constitutes 72% of total signature generation in...
Masking is considered to be an essential defense mechanism against side-channel attacks, but it is challenging to be adopted for hardware cryptographic implementations, especially for high security orders. Recently, Knichel et al. proposed an automated tool called AGEMA that enables the generation of masked implementations in hardware for arbitrary security orders using composable gadgets. This accelerates the construction and practical application of masking schemes. This article proposes a...
The rise of low-power, cost-efficient internet-connected devices has led to a need for lightweight cryptography. The lightweight block cipher PRIDE, designed by Martin R. Albrecht, is one of the most efficient ciphers designed for IoT-constrained environments. It is useful for connected devices, requires fewer resources to implement, and has high performance. PRIDE is a software-oriented lightweight cipher optimized for microcontrollers. This paper focuses on the FPGA implementation of the...
The magic of Fully Homomorphic Encryption (FHE) is that it allows operations on encrypted data without decryption. Unfortunately, the slow computation time limits their adoption. The slow computation time results from the vast memory requirements (64Kbits per ciphertext), a bootstrapping key of 1.3 GB, and sizeable computational overhead (10240 NTTs, each NTT requiring 5120 32-bit multiplications). We accelerate the FHEW bootstrapping in hardware on a high-end U280 FPGA. To reduce the...
Implementation of Fully Homomorphic Encryption (FHE) is challenging. Especially when considering hardware acceleration, the major performance bottleneck is data transfer. Here we propose an algebraic framework called Heterogenous Lattice Graph (HLG) to build and process computing graphs in Residue Number System (RNS), which is the basis of high performance implementation of mainstream FHE algorithms. There are three main design goals for HLG framework: • Design a dedicated IR (HLG...
FPGA-SoCs are a popular platform for accelerating a wide range of applications due to their performance and flexibility. From a security point of view, these systems have been shown to be vulnerable to various attacks, especially side-channel attacks where an attacker can obtain the secret key of a cryptographic algorithm via laboratory mea- surement equipment or even remotely with sensors implemented inside the FPGA logic itself. Fortunately, a variety of countermeasures on...
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
The development of Artificial Intelligence (AI) based systems to automatically generate hardware systems has gained an impulse that aims to accelerate the hardware design cycle with no human intervention. Recently, the striking AI-based system ChatGPT from OpenAI has achieved a momentous headline and has gone viral within a short span of time since its launch. This chatbot has the capability to interactively communicate with the designers through a prompt to generate software and hardware...
Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, currently the default algorithm in OpenSSH 9. Being based on lattice assumptions, it is assumed to be secure also against attackers with access to large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in the recent years, challenges remain with respect to protection mechanisms against attackers that have additional...
Masking has become one of the most effective approaches for securing hardware designs against side-channel attacks. Regardless of the effort put into correctly implementing masking schemes on a field-programmable gate array (FPGA), leakage can be unexpectedly observed. This is due to the fact that the assumption underlying all masked designs, i.e., the leakages of different shares are independent of each other, may no longer hold in practice. In this regard, extreme temperatures have been...
Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, either made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We...
The goals of cryptography are achieved using mathematically strong crypto-algorithms, which are adopted for securing data and communication. Even though the algorithms are mathematically secure, the implementation of these algorithms may be vulnerable to side-channel attacks such as timing and power analysis attacks. One of the effective countermeasures against such attacks is Threshold Implementation(TI). However, TI realization in crypto-device introduces hardware complexity, so it...
State-of-the-art sensors for measuring FPGA voltage fluctuations are time-to-digital converters (TDCs). They allow detecting voltage fluctuations in the order of a few nanoseconds. The key building component of a TDC is a delay line, typically implemented as a chain of fast carry propagation multiplexers. In FPGAs, the fast carry chains are constrained to dedicated logic and routing, and need to be routed strictly vertically. In this work, we present an alternative approach to designing...
TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golić (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify...
The sampling of polynomials with fixed weight is a procedure required by round-4 Key Encapsulation Mechanisms (KEMs) for Post-Quantum Cryptography (PQC) standardization (BIKE, HQC, McEliece) as well as NTRU, Streamlined NTRU Prime, and NTRU LPRrime. Recent attacks have shown in this context that side-channel leakage of sampling methods can be exploited for key recoveries. While countermeasures regarding such timing attacks have already been presented, still, there is no comprehensive work...
To mitigate cybersecurity breaches, secure communication is crucial for the Internet of Things (IoT) environment. Data integrity is one of the most significant characteristics of security, which can be achieved by employing cryptographic hash functions. In view of the demand from IoT applications, the National Institute of Standards and Technology (NIST) initiated a standardization process for lightweight hash functions. This work presents field-programmable gate array (FPGA) implementations...
Number-Theoretic-Transform (NTT) is a variation of Fast-Fourier-Transform (FFT) on finite fields. NTT is being increasingly used in blockchain and zero-knowledge proof applications. Although FFT and NTT are widely studied for FPGA implementation, we believe CycloneNTT is the first to solve this problem for large data sets ($\ge2^{24}$, 64-bit numbers) that would not fit in the on-chip RAM. CycloneNTT uses a state-of-the-art butterfly network and maps the dataflow to hybrid FIFOs composed of...
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation. We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
Security against side-channel assisted attacks remains a focus and concern in the ongoing standardization process of quantum-computer-resistant cryptography algorithms. Hiding and masking techniques are currently under investigation to protect the Post-Quantum Cryptography (PQC) algorithms in the NIST PQC standardization process against sophisticated side-channel attacks. Between hiding and masking, masking is emerging as a popular option due to its simplicity and minimized cost of...
CRYSTALS-Kyber has been recently selected by the NIST as a new public-key encryption and key-establishment algorithm to be standardized. This makes it important to assess how well CRYSTALS-Kyber implementations withstand side-channel attacks. Software implementations of CRYSTALS-Kyber have been already analyzed and the discovered vulnerabilities were patched in the subsequently released versions. In this paper, we present a profiling side-channel attack on a hardware implementation of...
While hardware implementations allow the production of highly efficient and performance oriented designs, exploiting features such as parallelization, their longer time to code and implement often bottlenecks rapid prototyping. On the other hand, high-level synthesis (HLS) tools allow for faster experimentation of software code to a hardware platform while demonstrating a reasonable extrapolation of the expected hardware behavior. In this work, we attempt to show a rapid, fast prototyping of...
Clock randomization is one of the oldest countermeasures against side-channel attacks. Various implementations have been presented in the past, along with positive security evaluations. However, in this paper we show that it is possible to break countermeasures based on a randomized clock by sampling side-channel measurements at a frequency much higher than the encryption clock, synchronizing the traces with pre-processing, and targeting the beginning of the encryption. We demonstrate a...
The lattice-based CRYSTALS-Dilithium signature scheme has been selected for standardization by the NIST. As part of the selection process, a large number of implementations for platforms like x86, ARM Cortex-M4, or – on the hardware side – Xilinx Artix-7 have been presented and discussed by experts. While software implementations have been subject to side-channel analysis with several attacks being published, an analysis of Dilithium hardware implementations and their peculiarities has not...
Multi-Scalar Multiplication (MSM) on elliptic curves is one of the primitives and bottlenecks at the core of many zero-knowledge proof systems. Speeding up MSM typically results in faster proof generation, which in turn makes ZK-based applications practical. We focus on accelerating large MSM on FPGA, and we present speed records for $\texttt{BLS12-377}$ on FPGA: 5.66s for $N=2^{26}$, sub-second for $N=2^{22}$. We developed a fully-pipelined curve adder in extended Twisted Edwards...
Post-quantum cryptography represents a category of cryptosystems resistant to quantum algorithms. Recently, NIST launched a process to standardize one or more of such algorithms in the key encapsulation mechanism and signature categories. Such schemes are under the scrutiny of their mathematical security, but they are not side-channel secure at the algorithm level. That is why their side-channel vulnerabilities must be assessed by the research community. In this paper, we present a...
Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method...
In this paper, we introduce a second-order masking of the AES using the minimal number of shares and a total of 1268 bits of randomness including the sharing of the plaintext and key. The masking of the S-box is based on the tower field decomposition of the inversion over bytes where the changing of the guards technique is used in order to re-mask the middle branch of the decomposition. The sharing of the S-box is carefully crafted such that it achieves first-order probing security without...
This work presents a hardware design for constant-time implementation of the HQC (Hamming Quasi-Cyclic) code-based key encapsulation mechanism. HQC has been selected for the fourth round of NIST's Post-Quantum Cryptography standardization process and this work presents the first, hand-optimized design of HQC key generation, encapsulation, and decapsulation written in Verilog targeting implementation on FPGAs. The three modules further share a common SHAKE256 hash module to reduce area...
Quantum computers pose a threat to the security of communications over the internet. This imminent risk has led to the standardization of cryptographic schemes for protection in a post-quantum scenario. We present a design methodology for future implementations of such algorithms. This is manifested using the NIST selected digital signature scheme CRYSTALS-Dilithium and key encapsulation scheme CRYSTALS-Kyber. A unified architecture, \crystal, is proposed that can perform key generation,...
Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck. In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest. We implemented...
Physically related functions~(PReFs) are hardware primitives proposed to establish key-exchange between resource-constrained devices with no pre-established secrets. In this paper, we introduce XOR composition of PReFs to eliminate the requirement of revealing the complete functionality of the hardware primitive during the setup phase, which is a prerequisite to setup PReFs. We evaluate the quality of XOR\_PReF design by implementing them on Artix-7 FPGAs.
Due to the complexity and the cost of producing integrated circuits, most hardware circuit designers outsource the manufacturing of their circuits to a third-party foundry. However, a dishonest foundry may abuse its access to the circuit's design in a variety of ways that undermine the designer's investment or potentially introduce vulnerabilities. To combat these issues, the hardware community has developed the notion of logic locking, which allows the designer to send the foundry a...
In this paper, we propose the first key-recovery side-channel attack on Classic McEliece, a KEM finalist in the NIST Post-quantum Cryptography Standardization Project. Our novel idea is to design an attack algorithm where we submit special ciphertexts to the decryption oracle that correspond to cases of single errors. Decoding of such cipher-texts involves only a single entry in a large secret permutation, which is part of the secret key. Through an identified leakage in the additive...
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature. By invoking multiple...
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data. First, we propose a divide-and-conquer technique that...
In this work, we study and formalize security notions for algorithm substitution attacks (ASAs) on em cryptographic puzzles. Puzzles are difficult problems that require an investment of computation, memory, or some other related resource. They are heavily used as a building block for the consensus networks used by cryptocurrencies. These include primitives such as proof-of-work, proof-of-space, and verifiable delay functions (VDFs). Due to economies of scale, these networks increasingly rely...
We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST's Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming...
In this paper, we present a high-performance architecture for elliptic curve cryptography (ECC) over Curve448, which to the best of our knowledge, is the fastest implementation of ECC point multiplication over Curve448 to date. Firstly, we introduce a novel variant of the Karatsuba formula for asymmetric digit multiplier, suitable for typical DSP primitive with asymmetric input. It reduces the number of required DSPs compared to previous work and preserves the performance via full...
Hardware obfuscation through redundancy addition is a well-known countermeasure against reverse engineering. For FPGA designs, such a technique can be implemented with a small overhead, however, its effectiveness is heavily dependent on the stealthiness of the redundant elements. Hardware opaque predicates can provide adequately stealthy constant values that can be used for obfuscation. However, in this report, we show that such obfuscation schemes can be defeated by ensuring the full...
We describe FPGA implementations of the Spook candidate to the NIST lightweight cryptography competition in two flavors. First, unprotected implementations that exhibit the excellent throughput and energy consumption for the area target specified by the NIST benchmarking initiative. Second, protected implementations leveraging the leveled implementation concept that the Spook design enables and confirming the significant performance gains that it enables.
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
This paper demonstrates the first side-channel attack on homomorphic encryption (HE), which allows computing on encrypted data. We reveal a power-based side-channel leakage of Microsoft SEAL prior to v3.6 that implements the Brakerski/Fan-Vercauteren (BFV) protocol. Our proposed attack targets the Gaussian sampling in the SEAL’s encryption phase and can extract the entire message with a single power measurement. Our attack works by (1) identifying each coefficient index being sampled, (2)...
Firekite is a synchronous stream cipher using a pseudo-random number generator (PRNG) whose security relies on the hardness of the \textit{Learning Parity with Noise} (LPN) problem. It is one of a few LPN-based symmetric encryption schemes and it can be very efficiently implemented on a low-end SoC FPGA. The designers, Bogos, Korolija, Locher, and Vaudenay, demonstrated appealing properties of Firekite such as requiring only one source of cryptographically strong bits, small key size, high...
In this paper, we present a high-speed, unified elliptic curve cryptography (ECC) processor for arbitrary Weierstrass curves over GF(p), which to the best of our knowledge, outperforms other similar works in terms of execution time. Our approach employs the combination of the schoolbook long and Karatsuba multiplication algorithm for the elliptic curve point multiplication (ECPM) to achieve better parallelization while retaining low complexity. In the hardware implementation, the substantial...
Secure communication often require both encryption and digital signatures to guarantee the confidentiality of the message and the authenticity of the parties. However, post-quantum cryptographic protocols are often studied independently. In this work, we identify a powerful synergy between two finalist protocols in the NIST standardization process. In particular, we propose a technique that enables SABER and Dilithium to share the exact same polynomial multiplier. Since polynomial...
FPGA bitstream encryption and authentication can be defeated by various techniques and it is critical to understand how these vulnerabilities enable extraction and tampering of commercial FPGA bitstreams. We exploit the physical vulnerability of bitstream encryption keys to readout using failure analysis equipment and conduct an end-to-end bitstream tamper attack. Our work underscores the feasibility of supply chain bitstream tampering and the necessity of guarding against such attacks in...
This paper proposes a lightweight authenticated encryption (AE) scheme, called Light-OCB, which can be viewed as a lighter variant of the CAESAR winner OCB as well as a faster variant of the high profi le NIST LWC competition submission LOCUS-AEAD. Light-OCB is structurally similar to LOCUS-AEAD and uses a nonce-based derived key that provides optimal security, and short-tweak tweakable blockcipher (tBC) for efficient domain separation. Light-OCB improves over LOCUS-AEAD by reducing the...
Power analysis attacks on ECC have been presented since almost the very beginning of DPA itself, even before the standardization of AES. Given that power analysis attacks against AES are well known and have a large body of practical artifacts to demonstrate attacks on both software and hardware implementations, it is surprising that these artifacts are generally lacking for ECC. In this work we begin to remedy this by providing a complete open-source ECDSA attack artifact, based on a...
Performance in hardware has typically played a significant role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major...
We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme 'CRYSTALS-Dilithium' and the key encapsulation mechanism (KEM) 'Saber', both finalists in the NIST’s post-quantum...