Learning from Biometric Fingerprints to prevent Cyber Security Threats

LOREM
I P S U M
LEARNING FROM
BIOMETRICS
To prevent #CyberSecurity 🕵 threats
Valerio Maggio
@leriomaggio Data Scientist & Pythonistas @ FBK
vmaggio@fbk.eu

DOLOR
S I T
A M E T
SORRY, WHO?
• Post Doc Researcher
• Background in CS
• Interested in Machine & Deep Learning
• Core in Biomedicine & Environment
here
We’re looking for students for
Internship & (PhD) Thesis
• Applied Machine Learning (a.k.a. Data Science)
https://mpbalab.fbk.eu

DONEC
F I N I B U
S A C
• Geek & Nerd
• Fellow Pythonista since 2006
this is a better me !-)
SORRY, WHO?
100K points if
you get this pun !-)
github.com/leriomaggio

NULLA
C O N G U
E S A P I E
N
WHAT THE CLOUDS SAY

VITAE
A U G U E
C O N S E
C T E T U R
WHAT THE CLOUDS KEEP
SAYING…

AT
CONVALLIS
M I
A U C T O R .
WHAT THE CLOUDS STILL
SAY…

FUSCE
F E U G I A T
WHAT THE CLOUDS
FINALLY SAY!
Learning from Data
for future predictions

SED
SUSCIPIT
I N
E L I T
M O L L I S
SUPERVISED SETTING
• Input Data are accompanied with
labels the ML model can learn from
• i.o.w. labels are reference for the
model to estimate the expected
outcomes

DIGITS CLASSIFICATION
Labels are
Categories

HOUSE PRICES ESTIMATION
Labels are
Real numbers

FRINGILLA
M A E C E
N A S
G R A V I D
A S
UNSUPERVISED
SETTING
• No label is provided
• Learning directly from data
• e.g. Clustering

FUSCE
F E U G I A T
WHAT THE CLOUDS
FINALLY SAY!

Let’s play with
all of this!

IPSUM
E G E T
A U C T O R APPLIED ML IN 5 STEPS
• Collect the Data
1. Look at the Data & Clean the Data
2. Prepare the data
3. Train your model(s)
4. Predict using your best model using unseen data
(namely: data NOT used in training)
5. Deploy your system in production

TWO COMMON FRAUDS
Account Hijacking
Card Faking

TWO COMMON FRAUDS
Account Hijacking
User Identification

KEYSTROKE DYNAMICS
Keystroke dynamics consists in analysing the way a user types by monitoring
keyboard inputs thousand of times per second, and processing this data through an
algorithm, which then defines a pattern for future comparison
Identifying an individual based on their way of typing on a physical or virtual keyboard

KEYSTROKE DYNAMICS
Time between two key pressures
Time between one pressure and one release
Time between one release and one pressure
Time between two key release
Intuition:
Users have unique ways to
type on keyboards
(i.e. typing patterns)

KEYSTROKE DYNAMIC
Down-Down Time
Time between one pressure and one release-
Dwell Time
Flight Time
Up-Up Time

DATA COLLECTION
Down-Down Time
Time between one pressure and one release-
Dwell Time
Flight Time
Up-Up Time
• Dataset Statistics:
• 50 different users
• 450+ patterns each

DONEC
M E N U S
U R N A
STEP 1: LOOK AT
THE DATA AND
CLEAN THEM

UP-UP TIME - USERNAME FIELD - WEB VS APP

UP-UP TIME - PASSWORD FIELD - WEB VS APP

DWELL TIME - USERNAME FIELD - WEB VS APP

DWELL TIME - PASSWORD FIELD - WEB VS APP

DATA
CLEANING
Complexity-Invariant
Distance Measure

FEATURE SCALING (NORMALISATION)
Original
Feature Data
MinMax Scaling
Standard Scaling

PULVINAR
V I T A E
E L I T .
STEP 2:PREPARE
THE DATA
TRAIN-TEST CUT

WHAT
WE
REALLY
DO K-Fold Cross Validation

VIVAMUS
F I N I B
U S
R I S U S
STEP 3-4:TRAIN
AND TEST ML
MODEL

Deep AutoEncoder
Encoder Decoder
…
Classification Deep Network
One AutoEncoder + FC Network
Outlier Detector (per user)
DEEPKS

Deep AutoEncoder
Encoder Decoder
DEEPKS
1. AUTOENCODER
Trained on genuine keystroke patterns
Unsupervised Machine (Deep) Learning

Deep AutoEncoder
Encoder Decoder
DEEPKS
2. DISCRIMINATOR
Trained on genuine &
adversarial patterns

EVALUATION METRICS
Confusion Matrix
over ~5200 samples

SAMPLE
SIZE TEST
Q: How many patterns would I
need to be confident about the
accuracy of the model ?

Feature Importance
rf.fit(X,y_DL)

NON
DIAM
B L A N D
I T
F E R M E
N T U M .
STEP 5:DEPLOY
YOUR SOLUTION

Models
Database
Model
Service
Feature
Database
Data
Collector
Feature Detection
Orchestration
Model
Training
Service
Feature
Extraction
Alarms
Dashboard
Models Models
Features
+ Labels
Features
Features
Raw Data
Alarm
Prediction
Request
Labels
1
2
3
9
SOC
Alarms
Database
4
5 6
7
Score
Confirmation/
Rejection
Features
8
10
11
12

API Engine
Feature
extractor
DL Model
{json}
Raw data, features,
predictions

EUROSCIPY 2018
Fondazione Bruno Kessler | Associazione Python Italia
University of Trento
Northern Italy | Trentino Region Tentative dates:
Aug. 28 - Sept. 01 2018
Be posted on euroscipy.org

trento.python.it
Next Meetup: Feb, 22 2018 - h19:00 ➡ @Clab

SHAMELESS
SELF
PROMOTION
https://github.com/leriomaggio/deep-learning-keras-tensorflow

THANK YOU!
🍻
Now it’s time for Cheers
🥓
@leriomaggio
vmaggio@fbk.eu

Learning from Biometric Fingerprints to prevent Cyber Security Threats

More Related Content

Similar to Learning from Biometric Fingerprints to prevent Cyber Security Threats (20)

More from Speck&Tech (20)

Recently uploaded (20)

Learning from Biometric Fingerprints to prevent Cyber Security Threats