Hunting Malicious TLS Certificates
With Deep Neural Networks
Real Time SSL & TLS Abuse Detector
David Camacho – Lead Data Architect
Alejandro Correa Bahnsen – VP. Research
Common Phishing Scams
2
3
What is a Web Certificate?
PhishingTLS Increment
Images from: https://www.thesslstore.com/blog/https-phishing-green-padlock/
How Do People Recognize a Safe Web Site?
Forrester survey asked users: “Some websites receive the following
browser user interface security indicator in the browser. What do
you think the security indicator is intended to tell users?”
Secure | https://ultrabank.com
The website is safe: 82%
The website is encrypted: 75%
The website is trustworthy: 66%
The website is private: 32%
Malware Abuse of TLS
Malware free
Malware Abuse of TLS
encrypted
We want to identify malicious
certificates in real time!
Can we detect a “bad” certificate on the fly?
Malware Cert
Hunter
Certificate + URL
Safe site
Can we detect a “bad” certificate on the fly?
Malware Cert
Hunter
Certificate + URL
Malicious
site
Hunting Malicious TLS Certificates with
Deep Neural Networks
Cert-HunterData
Data Collected:
• 1,000,000 of legitimate use certificates
• 5,000 of phishing use certificates
• 3,000 of malware use certificates
90%+ of TLS attacks use non-validated certificates
55% of legitimate businesses use non-validated TLS certificates but 100% of them use
real information
90% of malicious certificates contains commons names like:
• Example.com
• Localhost
• Domain.com
• localdomain
TLS Certificate Examples
Legitimate Certificates from Alexa Top Million
CN = *.stackexchange.com, O = Stack Exchange, Inc., L = New York, S = NY, C = US
Phishing Certificates from Phishtank
CN = localhost, L = Springfield
Malware Certificates from Abuse.ch &Censys.io & Rapid7
O=Dis, L=Springfield, S=Denial, C=US
14
Feature engineering
We created 40 features divided into 4 categories:
Boolean: Boolean matrix indicating which fields the certificate
has
SOC: Company’s SOC experience features.
Features inherited from previous work (last state of the
Prev_work: art)
Text: Statistical features extracted from subject and issuer
strings
LSTM
(Long-Short Term Memory)
RNN
• Excited
What comes next? Most probable
The dog is… • Hungry Hungry
• Green
• ….
Short term context • Affordable
RNN
• Excited
When it sees its owner What comes next? • Hungry Most probable
Hungry
the dog is.. • Green
• ….
Long term context
• Affordable
Short term context
LSTM
• Excited
When it sees its owner What comes next? • Hungry Most probable
Excited
the dog is.. • Green
• ….
Long term context
• Affordable
Short term context
Deep Learning Architecture
Subject Principal Issuer Principal Extracted Features
One hot One hot
encoding encoding
Embedding Embedding
LSTM LSTM Dense/ReLu
Dropout Dropout Dropout
Concatenate
Dense/ReLu
Dropout
Dense/Logit
score
Training process
lr=0.005
lr=0.005
Malicious Cert Classification Results (Phishing)
5-Fold CV Accuracy Recall Precision
Average 86.41% 83.20% 88.86%
Deviation 1.22% 3.29% 1.04%
25
Malicious Cert Classification Results (Malware)
5-Fold CV Accuracy Recall Precision
Average 94.65% 95.09% 94.28%
Deviation 0.09% 0.29% 0.11%
26
Takeaways
• It is possible to differentiate malicious certificates from
legitimate ones due to how attackers create their certificates.
• Attackers won’t expose themselves allowing exhaustive
validations
• Phishers are the most sophisticated attackers because the
want to look real.
Thanks
@luisdcamachog
luis.camacho@cyxtera.com
https://www.linkedin.com/in/luisdcamachog/
https://github.com/LuisDavidCamacho