Hardware Security
Debdeep Mukhopadhyay
Secured Embedded Architecture Laboratory (SEAL)
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, INDIA – 721302
E-mail: debdeep.mukhopadhyay@gmail.com
An Overview
Hardware Root of Trust
Just enough security for each end points
Source: Patrick Koeberl – Security Architect at Intel Labs, Intel
Corporation, IDF14
3
Trustworthy Handling of large
Number of Devices
Source: Patrick Koeberl – Security Architect at Intel Labs, Intel
Corporation, IDF14
4
Trust in Cyber Physical Systems
• 50 Billion Devices
to be connected by
2020!
• Devices need to
trust the owner and
also each other.
• Devices connected
through
heterogeneous
network, and are
resource
constrained.
5
Whom can you Trust?
What do we know about the device?
◦ Is it running the correct software?
◦ Is it genuine?
We need to guarantee:
◦ Integrity
◦ Privacy
◦ Quality
IoT endpoints operate under resource
constraints:
◦ CPU
◦ Memory
◦ Energy
◦ Communications
Trust is a major enabler
Traditional Security features do not for IoT
scale down!
◦ The Trusted Computing Base (TCB) must be as
small as possible!
Are there more optimal solutions for the
hardware root of trust?
Threats from Side Channel Attacks
Terminal Data input
IC chip
Data output
00111…
Power supply
Measure power Guess secret information
consumption stored on IC chip memory
Power
consumption
Secret
information 0 0 1 1 1
Side Channel Attacks
Power Attacks
Strong cryptographic algorithms are just the
beginning!
Terminal Data input
IC chip
Data output
00111…
Power supply
Measure power Guess secret information
consumption stored on IC chip memory
Power
consumption
Secret
information 0 0 1 1 1
Experiment Set-up @ IIT KGP
2
Controller PC
Digital
Oscilloscope Current
Amplifier
Vcc Controller Pin 1
Current
Probe
DES/3-DES/AES
CORE LOGIC
(Implemented on
FPGA)
FPGA Board
Power Attacks
SPA – Simple Power Analysis attacks
◦ Fact exploited - Power consumption at an
instant of time is a function of the operation
being carried out by the device
DPA – Differential Power Analysis
◦ Fact exploited - Power consumption of the
same operation at different instants of time
depends on the data being processed.
Simple Power Analysis (SPA)
Directly interprets the power consumption
of the device
Looks for the operations taking place and
also the key!
Trace: A set of power consumptions
across a cryptographic process
1 millisecond operation sampled at 5MHz
yield a trace with 5000 points
A Power Trace
Power Trace of a round of AES.
Observe the variation of power values.
The variations occur because of the operation
dependence of power: leads to SPA.
The variations also occur because of data
dependence of power: leads to DPA.
Correlation of Power with bits
Assume power
leakage follows
Hamming Weight.
Divide the HW(s)
into two bins:
◦ 0 bin: when LSB is 0
◦ 1 bin: when LSB is 1
Difference-of-Mean
(DoM)=20/8-12/8=1
When the partitioning is done wrt.
an uncorrelated bit?
Parititioning
done by bits
simulated
using rand
function in C.
Observe the
DoM is close
to 0, as
expected!
A Toy DPA
Consider the operation z=yxmod 256
Assume attacker knows first 4 bits of the
secret, x.
Probability of guessing the next bit of x is
½ (with no side channel information).
Now we assume, the attacker varies y and
obtains several power traces.
◦ We simulate them through Hamming Weights.
A Toy DPA
For a given y, the attacker now guesses the next
bit and computes the probable s after the 3rd
iteration (note that square and multiply starts
from bit 6 to bit 0).
Based on the LSB of y, the corresponding trace
is put into the 0 bin or 1 bin.
For every guess (there are 2 guesses) the DoM
is computed and plotted.
The correct guess is expected to provide large
DoM.
A Toy DPA
Correct Key is 0x8F
Next bit is thus 1.
DoM computed after 210 traces.
Significant difference: 0.9 vs 0.2!
Differential Power Analysis Attacks on
DES
Distinguishable
Spikes
DPA Bias (mV)
0.1
-0.1
2500
2000
1500
Time (s)
1000
500
0 Key = 39
Key = 20
Key = 41
Key = 51
Key Guess
3D Differential Plot
SBOX – 3 BIT – 3 TRACE COUNT = 4,000
Differential Power Analysis Attacks on
3-DES
0.2
Distinguishable
Spikes
DPA Bias (mV)
0.1
-0.1
-0.2
2500
2000
1500
1000
500 Key = 51
Key = 19
0 Key = 20
Time (s) Key = 39
Key Guess
3D Differential Plot
SBOX – 4 BIT – 2 TRACE COUNT = 10,000
Differential Power Analysis Attacks on
AES
Distinguishable
Spikes
DPA Bias (mV)
0.1
-0.1
2500
2000
1500
Key = 77
1000
Key = 12
500 Key = 13
Time (s) 0 Key = 113
Key Guess
3D Differential Plot
SBOX – 11 BIT – 8 TRACE COUNT = 15,000
Differential Power Analysis: A Summary
Correlation Power Analysis: An Improved DPA technique
Countering DPA
Two broad approachesare taken
◦ Make the power consumption of the device
independent of the data processed
Detached power supplies
Logic styles with a data independent power
consumption
Noise generators
Insertion of random delays
◦ Methods are costly and not in tune with
normal CAD methodologies
Countering DPA
(Second Approach)
◦ Second Approach is to randomize the
intermediate results
◦ Based on the principle that the power
consumption of the device processing
randomized data is uncorrelated to the actual
intermediate results
◦ Masking:Can be applied at the algorithm level
or at the gate level
Gate Level Masking
No wire stores a value that is correlated
to an intermediate result of the algorithm.
Process of converting an unmasked digital
circuit to a masked version can be
automated
Masked AND Gate
am a ma
bm b mb
qm q mq
q f ( a, b)
qm fˆ (am , ma , bm , mb , mq )
Masked AND Gate
q m (a b) m q
(a m m a ).(b m m b ) m q
(((a m .b m b m .m a ) (m b a m )) m a .m b ) m q
Masked AND Gate
There are 45=1024 possible input transmissions that can
occur.
It turns out that the expected value of the energy
required for the processing of q=0 and q=1 are identical.
Thus protected against DPA, under the assumption that
the CMOS gates switch only once in one clock cycles.
But we know there are glitches, and so the output of
gates swing a number of times before reaching a steady
state. Hence... the argument continues.
Masked Multiplier
Same Principle may be applied for multiplier circuits.
FAULT ATTACKS:
“WHEN A SECRET IS REVEALED, IT IS
THE FAULT OF THE MAN WHO CONFIDED
IT.”
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Taken from "The Sorcerer's Apprentice Guide to Fault Attacks”, FDTC 2006
Fault Attacks : RSA Cipher
A generate keys:
◦ Select p,q large prime numbers (at least hundred digits) and
denote N=pq
◦ Select a small odd integer e relatively prime (only
common factor is 1) to ( N ) ( p 1)( q 1)
◦ Find integer d so that de 1 mod ( N )
◦ (e,N) – public key; (d,N) – private key
B wants to send A a message M
◦ B encrypts M using A's public key S M e mod N
M is restricted to 0 M N-1
◦ A decrypts using private key d -
S d mod N M de mod N M
Fault Attacks
Another attack – by injecting faults
◦ Vary the supply voltage – generate a spike
◦ Vary the clock frequency – generate a glitch
◦ Overheat the device
◦ Expose to intense light – camera flash or
precise laser beam
◦ Faults injected into a byte or a few bits
Fault Attacks on RSA
Only decryption is subject to attacks
Assume:
◦ 1. Attacker can flip a single bit in key d
2. S and corresponding message M known to
attacker
◦ Decryption device generates M̂ satisfying
Mˆ S 2 di
i
i mod N
M S 2 di
◦ If di 0 then Mˆ M S 2 mod N
i
◦ If di 1 then Mˆ M 1 S 2 mod N
i
Fault Attacks on AES
PLAIN TEXT PLAIN TEXT
ENCRYPTION ENCRYPTION
ALGORITHM ALGORITHM RANDOM
FAULTS
FAULT FREE FAULTY
CIPHER TEXT CIPHER TEXT
ANALYSIS
Fault Attacks on RSA
◦ Assume that the attacker flips randomly a bit in
d.
Example: (e,N)=(7,77), d=43 d5d 4 d3d 2 d1d 0 1010112
◦ Ciphertext=37 producing M=9 if no fault is
injected and Mˆ 67 if a fault is injected
◦ Search for i such that 9 (67 372 ) mod 77 i=3(d3 1)
i
since
(67 378 ) mod 77 (67 53) mod 77 9
Fault Models
M0: One Diagonal affected.
M1: Two Diagonals affected.
M2: Three Diagonals affected.
M3: Four Diagonals affected.
Propagation of Fault Induced
f f’ f’ 2f’
f’
f’
3f’
9th Round
8th Round 8th round 8th Round ByteSub
Byte Sub ShiftRow MixColumnF1
F2
F3
10th Round 10th Round 9th Round F4 9th Round
Shift Row Byte Sub MixColumn ShiftRow
A1 A2 A3 A4 A1 A2 A3 A4 2F1 F4 F3 3F2 F1
A6 A7 A8 A5 A5 A6 A7 A8 F1 F4 3F3 2F2 F2
A11 A12 A9 A10 A9 A10 A11 A12 F1 3F4 2F3 F2 F3
A16 A13 A14 A15 A13 A14 A15 A16 3F1 2F4 F3 F2 F4
Features of the proposed attack
The attack is the strongest of its class in
the literature.
Needs just one byte random fault
induction.
◦ Previous best attack reduced key space to 240.
◦ Our attack reduced to 232 (revealed in 400
sec on an Intel Xeon Server with 8 cores).
Debdeep Mukhopadhyay, An Improved Fault Based Attack of the
Advanced Encryption Standard. AFRICACRYPT 2009: LNCS 5580, pp
421-434
The work was developed as a project from NTT Laboratory, Japan.
DFA with a single Fault!
Current research shows that the AES key size
can be reduced from 232 to 28 for a single byte
fault.
The small complexity of the attack makes it
feasible on real life FPGA implementations of AES
using less sophisticated techniques, like clock
glitching.
Michael Tunstall and DebdeepMukhopadhyay, Differential Fault Analysis of the Advanced Encryption
Standard using a Single Fault, Cryptology ePrint Archive: Report 2009/575
Fault Injection Setup
Tools Used:
◦ AES Core Implemented on Xilinx Spartan 3E.
◦ Agilent Wavefrom (80 MHz)Generator
◦ Xilinx Chipscope Pro Embedded Logic Analyzer.
Experimental Results
ATTACK REGION
Dhiman Saha, Debdeep Mukhopadhyay, Dipanwita Roy Chowdhury: A Diagonal Fault Attack on
the Advanced Encryption Standard. IACR Cryptology ePrint Archive 2009: 581 (2009)
Bypassing the Counter-measures
Parity-1, Parity-16 (linear codes) and
Robust Codes (nonlinear codes) all miss
some faults!
The missed faults also can lie in the DFA-
exploitable space. •An attacker can operate at a region
where the faults are missed.
•Probability of faults getting missed
is much higher if the fault space is
not uniform.
•Can we have fault tolerance
techniques which have provable
100% security. w.r.t. all the DFA
exploitable faults?
Note that the DFA exploitable space
is quite small…
A Quick Look into Cache
Attacks
Cache Memory Leaks Information
Microprocessor
Cache
Memory
Main Memory
If there is a Cache Hit If there is a Cache Miss
◦ Access time is less ◦ Access time is more
◦ Power Consumption is ◦ Power Consumption is
less more
Cache Attacks : The Principle
P0 P1
K0 XOR K1 XOR
Sbox Sbox
Power and Time for the second access
depends on the previous sbox access.
◦ If cache hit :
P0 K 0 P1 K1
K 0 K1 P0 P1
Since we know P0 and P1, we can determine K 0 K1
…but we need to differentiate between a cache hit and
miss.
Classes of Cache Attacks
Three ways to identify cache behavior
Cache Trace Attacks
Cache Access Attacks
Cache Timing Attacks
Cache Trace Attacks
The Power Profile of the system gives away
cache behavior.
Without Cache Miss
With Cache Miss
Cache Access Attacks : Osvik’s Attack
Uses a spy program to determine cache behavior
Cache Memory
Part of the cache memory
occupied by tables
Cache Miss
Microprocessor
Cache Hit
AES
Spy
Memory
Cache Timing Attack
Based on measuring the total time for encryption.
Execution Time N h*Th N m*Tm K
Used to attack a remote server. Trigger Encryption &
Measure time taken
Network
Server running Remote Attacker
Encryption Software
Bernstein’s Cache Timing Experiment
P0 = 00 - FF Random
AES
Cipher Text
For P0 0 to 255
For 215 iterations
Vary the remaining plain texts randomly Characteristic of
Plaintext Byte P0
Invoke AES Encryption (Deviation from
Obtain Time for Encryption mean time)
Bernstein’s Cache Timing Attack
Put key to all ZEROs and
perform experiment
Repeat experiment with
unknown key
Correlate the two results
60
Why Cache Attacks Work on AES
The AES Structure
Substitute 16 bytes from a 256 byte
◦ Add initial Key lookup table
◦ Nine Rounds of
Byte Substitution
Shift Row
Mix Column
Add Round Key
◦ Final Round
Byte Substitution
Shift Row
Add Round Key
Chester Rebeiro, Mainack Mandal, Debdeep Mukhopadhyay, "Pinpointing Cache Timing Attacks on AES",
VLSID 2010,
Software Implementations of AES
(OpenSSL)
Merges all round operations into 4 lookups.
Uses 4 tables T0,T1,T2,T3 , each of size 1024 byte.
The Software Structure is
Round 1
a0 =T0[b024] ^ T1[b116] ^ T2[b28] ^ T4[b30] ^ rk[4]
a1 =T0[b124] ^ T1[b216] ^ T2[b38] ^ T4[b00] ^ rk[5]
a2 =T0[b224] ^ T1[b316] ^ T2[b08] ^ T4[b10] ^ rk[6]
a3 =T0[b324] ^ T1[b016] ^ T2[b18] ^ T4[b20] ^ rk[7]
Round 2
b0 =T0[a024] ^ T1[a116] ^ T2[a28] ^ T4[a30] ^ rk[8]
b1 =T0[a124] ^ T1[a216] ^ T2[a38] ^ T4[a00] ^ rk[9]
b2 =T0[a224] ^ T1[a316] ^ T2[a08] ^ T4[a10] ^ rk[10]
b3 =T0[a324] ^ T1[a016] ^ T2[a18] ^ T4[a20] ^ rk[11]
62
AES Execution Time big and small Tables
Cache Attack
Big Tables works because of
varying
execution time.
Small Tables Cache Attacks
Believed to be
not possible
63
Attack of Clefia: A Block cipher with
small tables
Clefia is a block cipher designed by Sony
Corporations
It has small tables of 256 bytes.
It was designed specifically to be protected against
cache timing attacks.
◦ Our project from NTT Lab, was to analyze cache
timing attacks on a Clefia code written by Sony
itself..
Clefia Attack Results
In around 226 Clefia
encryptions the cipher can be
shown to break in the face of
cache timing attacks
3 GHz Intel Core 2 Duo
32 kB L1 Cache
1 GB RAM
Linux (Ubuntu 8.04)
gcc -4.2.4 with O3
optimization.
Attack Time:
◦ First Phase (with known key):
1300 sec
◦ Second Phase (with unknown
key): 312.5 sec
Chester Rebeiro, Debdeep
Mukhopadhyay, Junko Takahashi and
The work was developed as a project from Toshinori Fukunaga, "Cache Timing
NTT Laboratory, Japan. Attacks on Clefia", Indocrypt 2009.
The attack is mentioned in website of SONY.
Correlation Results on CLEFIA
66
Can Side Channels be Eliminated by
design?
Application of DRECON
Is that the end?
-- Side Channels with Process Scaling
Mathieu Renauld, François-Xavier Standaert, Nicolas Veyrat-Charvillon, Dina
Kamel, Denis Flandre: A Formal Study of Power Variability Issues and
Side-Channel Attacks for Nanoscale Devices. EUROCRYPT 2011: 109-
128
Process Scaling and Side Channels
Process variation is a central issue in deep-
submicron technology.
Research shows that the leakage models change:
they become non-linear!
In the absence of variations, one can develop a
power model, but due to variations one need to
infer correct leakage for each chip.
Also, countermeasures assume independent
leakage
◦ But below 65nm, there would be cross-talk!
Machine Learning as a Classifier
k1 ~
. .
. .
. .
k|K | ~ k|K |
k?
Given a trace, find the key k used for the
encryption.
Develop Suitable Metrics to Certify
Chips
Lt =Φt(Sk) + Nt
SNR: Var(E[Lt|Sk])/Var(Lt-E[Lt|Sk])
Certification of true side channel security
of chips require sophisticated tools!
Thank You for Your Attention!
78