SlideShare a Scribd company logo
MACHINE LEARNING IN PHP
The roots of education are bitter, but the fruit is sweet
PHPtek, Saint Louis, MO, USA, 2016
Agenda
• How to teach tricks to your PHP
• Application : searching for code in comments
• Complex learning
Speaker
• Damien Seguy
• Exakat CTO
• Static analysis of PHP code
Machine Learning
• Teaching the machine
• Supervised learning : learning then applying
• Application build its own model : training phase
• It applies its model to real cases : applying phase
Applications
• Play go, chess, tic-tac-toe and beat everyone else
• Fraud detection and risk analysis
• Automated translation or automated transcription
• OCR and face recognition
• Medical diagnostics
• Walk, welcome guest at hotels, play football
• Finding good PHP code
PHP Applications
• Recommendations systems
• Predicting user behavior
• SPAM
• conversion user to customer
• ETA
• Detect code in comments
Real use case
• Identify code in comments
• Classic problem
• Good problem for machine learning
• Complex, no simple solution
• A lot of data and expertise are available
Supervised Training
History
data
Training
ModelReal data Results
The Fann Extension
• ext/fann (https://pecl.php.net/package/fann)
• Fast Artificial Neural Network
• http://leenissen.dk/fann/wp/
• Neural networks in PHP
• Works on PHP 7, thanks to the hard work of Jakub
Zelenka
• https://github.com/bukka/php-fann
NEURAL NETWORKS
• Imitation of nature
• Input layer
• Output layer
• Intermediate layers
Neural network
• Imitation of nature
• Input layer
• Output layer
• Intermediate layers
Initialisation
<?php
$num_layers  = 1;
$num_input  = 5;
$num_neurons_hidden = 3;
$num_output  = 1;
$ann = fann_create_standard($num_layers, $num_input, 
$num_neurons_hidden, $num_output);
// Activation function
fann_set_activation_function_hidden($ann, 
FANN_SIGMOID_SYMMETRIC);
fann_set_activation_function_output($ann, 
FANN_SIGMOID_SYMMETRIC);
Preparing data
Raw data Extract Filter Human review Fann ready
Expert at work
// Test if the if is in a compressed format
// none need yet
// There is a parser specified in `Parser::$KEYWORD_PARSERS`
// $result should exist, regardless of $_message
// $a && $b and multidimensional
// numGlyphs + 1
// TODO : fix this; var_dump($var);
// if(ob_get_clean()){
//$annots .= ' /StructParent ';
// $cfg['Servers'][$i]['controlpass'] = 'pmapass';
Input vector
• 'length' : size of the comment
• 'countDollar' : number of $
• 'countEqual' : number of =
• 'countObjectOperator' number of -> operator ($o->p)
• 'countSemicolon' : number of semi-colon ;
Input data
46 5 1
825 0 0 0 1
0
37 2 0 0 0
0
55 2 2 0 1
1
61 2 1 3 1
1
...
 * This file is part of Exakat.
 *
 * Exakat is free software: you can redist
 * it under the terms of the GNU Affero Ge
 * the Free Software Foundation, either ve
 * (at your option) any later version.
 *
 * Exakat is distributed in the hope that 
 * but WITHOUT ANY WARRANTY; without even 
 * MERCHANTABILITY or FITNESS FOR A PARTIC
 * GNU Affero General Public License for m
 *
 * You should have received a copy of the 
 * along with Exakat.  If not, see <http:/
 *
 * The latest code can be found at <http:/
 *
*/
// $x[3] or $x[] and multidimensional
//if ($round == 3) { die('Round '.$round);
//$this->errors[] = $this->language->get('
Number of input
Number of incoming data
Number of outgoing data
1 5 1
37 2 0 0 0
0
// $x[3] or $x[] and multidimensional
ext/Fann
It's a comment
Training
$max_epochs  = 500000;
$desired_error  = 0.001;
// the actual training
if (fann_train_on_file($ann, 
'incoming.data', 
$max_epochs, 
$epochs_between_reports, 
$desired_error)) {
        fann_save($ann, 'model.out');
}
fann_destroy($ann);
?>
Machine learning in PHP
Machine learning in PHP
Machine learning in PHP
TRAINING
• 47 cases
• 5 characteristics
• 3 hidden neurons
• + 5 input + 1 output
• Duration : 5.711 s
Application
History
data
Training
ModelReal data Results
Application
<?php 
$ann = fann_create_from_file('model.out'); 
$comment = '//$gvars = $this->getGraphicVars();';
$input = makeVector($comment);
$results = fann_run($ann, $input); 
if ($results[0] > 0.8) { 
     print ""$comment" -> $results[0] n"; 
} 
?>
Results > 0.8
• Answer between 0 and 1
• Values ranges from -14 to 0,999
• The closer to 1, the safer. The closer to 0, the safer.
• Is this a percentage? Is this a carrots count ?
• It's a mix of counts…
-16
-12
-8
-4
0
60.000000
70.000000
80.000000
90.000000
100.000000
REAL CASES
• Tested on 14093 comments
• Duration 68.01ms
• Found 1960 issues (14%)
0.99999893
// $cfg['Servers'][$i]['controlhost'] = '';    
0.99999928
//$_SESSION['Import_message'] = $message->getDisplay();    
/* 0.99999928
if (defined('SESSIONUPLOAD')) {
    // write sessionupload back into the loaded PMA session
    $sessionupload = unserialize(SESSIONUPLOAD);
    foreach ($sessionupload as $key => $value) {
        $_SESSION[$key] = $value;
    }
    // remove session upload data that are not set anymore
    foreach ($_SESSION as $key => $value) {
        if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX))
            == UPLOAD_PREFIX
            && ! isset($sessionupload[$key])
        ) {
            unset($_SESSION[$key]);
        }
    }
0.98780382
//LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232    
0.99361396
// We have server(s) => apply default configuration
    
0.98383027
// Duration = as configured    
0.99999928
// original -> translation mapping    
0.97590065
// = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in    
True positive False positive
True negative False negative
Found by
FANN
Target
True
positive
False
positive
True
negative
False
negative
Found by
FANN
Target
// $cfg['Servers'][$i]['table_coords'] = 'pma__tabl
//(isset($attribs['height'])?$attribs['height']: 1)
// if ($key != null) did not work for index "0"    
// the PASSWORD() function    
0.99999923
0.73295981
0.99999851
0.2104115
RESULTS
• 1960 issues
• 50+% of false positive
• With an easy clean, 822 issues reported
• 14k comments, analyzed in 68 ms (367ms in PHP5)
• Total time of coding : 27 mins.
// = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in    
/* vim: set expandtab sw=4 ts=4 sts=4: */
Learn better, not harder
• Better training data
• Improve characteristics
• Configure the neural network
• Change algorithm
• Automate learning
• Update constantly
Real data
History
data
Training
Model Results
Retroaction
Better training data
• More data, more data, more data
• Varied situations, real case situations
• Include specific cases
• Experience is capital
• https://homes.cs.washington.edu/~pedrod/
papers/cacm12.pdf
Improve characteristics
• Add new characteristics
• Remove the one that are less interesting
• Find the right set of characteristics
Network Configuration
• Input vector
• Intermediate neurons
• Activation function
• Output vector
0
5000
10000
15000
20000
1 2 3 4 5 6 7 8 9 10
1 layer 2 layers 3 layers 4 layers
Time of training (ms)
Change algorithm
• First add more data before changing algorithm
• Try cascade2 algorithm from FANN
• 0.6 => 0 found
• 0.5 => 2 found
• Not found by the first algorithm
Finding the BEST
• Test with 2-4
layers

10 neurons
• Measure
results
0
2250
4500
6750
9000
1 2 3 4 5 6 7 8 9 10 11 12 13
1 layer 2 layers 3 layers 4 layers
DEEP LEARNING
• Chaining the neural networks
• Auto-encoders
• Unsupervised Learning
• Genetic algorithm, ant, random forest, naive Bayes
Other tools
• PHP ext/fann
• Langage R
• https://github.com/kachkaev/php-r
• Scikit-learn
• https://github.com/scikit-learn/scikit-learn
• Mahout
• https://mahout.apache.org/
Conclusion
• Machine learning is about data, not code
• There are tools to use it with PHP
• Fast to try, easy results or fast fail
• Use it for complex problems, that accepts error
@exakat
https://joind.in/
talk/a5b3a
THANKYOU!

More Related Content

PDF
Machine learning in php las vegas
PDF
Machine learning in php singapore
PDF
PPTX
Test in action week 3
PPTX
Test in action week 2
PPTX
Test in action week 4
PDF
Effective PHP. Part 3
PDF
Effective PHP. Part 6
Machine learning in php las vegas
Machine learning in php singapore
Test in action week 3
Test in action week 2
Test in action week 4
Effective PHP. Part 3
Effective PHP. Part 6

What's hot (20)

PPS
Php security3895
PDF
Building Testable PHP Applications
ZIP
Object Oriented PHP5
PDF
Effective PHP. Part 4
PDF
Effective PHP. Part 5
PDF
Effective PHP. Part 1
PDF
Effective PHP. Part 2
PPTX
Test in action – week 1
PDF
Object Oriented PHP - PART-1
PDF
Object Oriented PHP - PART-2
PDF
The art of readable code (ch1~ch4)
PPTX
PHP 7 Crash Course
PDF
PHP: 4 Design Patterns to Make Better Code
PPTX
Creating "Secure" PHP Applications, Part 1, Explicit Code & QA
ODP
What's new, what's hot in PHP 5.3
PDF
Php tips-and-tricks4128
PPT
Building Data Mapper PHP5
PDF
Functions in PHP
ODP
My app is secure... I think
PPT
Class 2 - Introduction to PHP
Php security3895
Building Testable PHP Applications
Object Oriented PHP5
Effective PHP. Part 4
Effective PHP. Part 5
Effective PHP. Part 1
Effective PHP. Part 2
Test in action – week 1
Object Oriented PHP - PART-1
Object Oriented PHP - PART-2
The art of readable code (ch1~ch4)
PHP 7 Crash Course
PHP: 4 Design Patterns to Make Better Code
Creating "Secure" PHP Applications, Part 1, Explicit Code & QA
What's new, what's hot in PHP 5.3
Php tips-and-tricks4128
Building Data Mapper PHP5
Functions in PHP
My app is secure... I think
Class 2 - Introduction to PHP
Ad

Viewers also liked (20)

PPTX
Mobile Monday Brussels Sept2009 Mobile Music By Julien Mourlon
PDF
2016-05-25 ASIP Santé Ateliers PHW16 "MSSanté : un déploiement vers la médeci...
PDF
Optimisez votre Référencement sur Internet pour améliorer la visibilité de v...
PDF
Dossier d'aappel offre travaux de_rehabilitation_des_ouvrages_de_drainage_ob...
PPTX
謎のコード名を解き明かせ!
PDF
La Fabrique de l'Industrie "Automatisation, emploi et travail" > le robot tue...
PPTX
Créer et gérer sa page Tripadvisor
PPTX
Cloud and compliance REX
PDF
2016-05-24 ASIP Santé Ateliers PHW16 "MOOC e-santé Place de la télémédecine -...
PDF
PrestaShop Masterclass - SEO & Ecommerce: come incrementare visite e conversi...
PDF
Social media i ecommerce. 8 sekund, które decyduje o twoim biznesie.
DOCX
Creative district²
PDF
2016-05-24 ASIP Santé Ateliers PHW16 "Comment communiquer un besoin d'interop...
PDF
QUEEN MOVING & STORAGE SERVICES NYC
PPTX
Comportements mobiles : vrais challenges & idées reçues
PDF
2016-05-24 ASIP Santé Ateliers PHW16 "La Certification Qualité Hôpital numéri...
PDF
2016-05-24 ASIP Santé Ateliers PHW16 "MOOC e-santé - Le SI d’un projet de tél...
PPT
Microformats
PDF
ROCHESTER MAGAZINE ADS - PREVIEW
PDF
Québec: votre porte sur l'Amérique du Nord
Mobile Monday Brussels Sept2009 Mobile Music By Julien Mourlon
2016-05-25 ASIP Santé Ateliers PHW16 "MSSanté : un déploiement vers la médeci...
Optimisez votre Référencement sur Internet pour améliorer la visibilité de v...
Dossier d'aappel offre travaux de_rehabilitation_des_ouvrages_de_drainage_ob...
謎のコード名を解き明かせ!
La Fabrique de l'Industrie "Automatisation, emploi et travail" > le robot tue...
Créer et gérer sa page Tripadvisor
Cloud and compliance REX
2016-05-24 ASIP Santé Ateliers PHW16 "MOOC e-santé Place de la télémédecine -...
PrestaShop Masterclass - SEO & Ecommerce: come incrementare visite e conversi...
Social media i ecommerce. 8 sekund, które decyduje o twoim biznesie.
Creative district²
2016-05-24 ASIP Santé Ateliers PHW16 "Comment communiquer un besoin d'interop...
QUEEN MOVING & STORAGE SERVICES NYC
Comportements mobiles : vrais challenges & idées reçues
2016-05-24 ASIP Santé Ateliers PHW16 "La Certification Qualité Hôpital numéri...
2016-05-24 ASIP Santé Ateliers PHW16 "MOOC e-santé - Le SI d’un projet de tél...
Microformats
ROCHESTER MAGAZINE ADS - PREVIEW
Québec: votre porte sur l'Amérique du Nord
Ad

Similar to Machine learning in PHP (20)

PDF
Machine learning in php
PDF
Machine learning in php php con poland
PDF
Building source code level profiler for C++.pdf
KEY
Workshop quality assurance for php projects tek12
PDF
Multiply your Testing Effectiveness with Parameterized Testing, v1
PPTX
DIY Java Profiling
PPTX
Smart Data Conference: DL4J and DataVec
PDF
Quality Assurance for PHP projects - ZendCon 2012
PDF
Yapc10 Cdt World Domination
PDF
Workshop quality assurance for php projects - phpdublin
PDF
Advanced Php - Macq Electronique 2010
PDF
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
KEY
Developer testing 101: Become a Testing Fanatic
PDF
Leveling Up With Unit Testing - php[tek] 2023
PDF
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
PDF
Découvrir dtrace en ligne de commande.
PDF
Top ten-list
KEY
Introduction to memcached
PPTX
Unit 4-6 sem 7 Web Technologies.pptx
PDF
6 tips for improving ruby performance
Machine learning in php
Machine learning in php php con poland
Building source code level profiler for C++.pdf
Workshop quality assurance for php projects tek12
Multiply your Testing Effectiveness with Parameterized Testing, v1
DIY Java Profiling
Smart Data Conference: DL4J and DataVec
Quality Assurance for PHP projects - ZendCon 2012
Yapc10 Cdt World Domination
Workshop quality assurance for php projects - phpdublin
Advanced Php - Macq Electronique 2010
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
Developer testing 101: Become a Testing Fanatic
Leveling Up With Unit Testing - php[tek] 2023
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Découvrir dtrace en ligne de commande.
Top ten-list
Introduction to memcached
Unit 4-6 sem 7 Web Technologies.pptx
6 tips for improving ruby performance

More from Damien Seguy (20)

PDF
Strong typing @ php leeds
PPTX
Strong typing : adoption, adaptation and organisation
PDF
Qui a laissé son mot de passe dans le code
PDF
Analyse statique et applications
PDF
Top 10 pieges php afup limoges
PDF
Top 10 php classic traps DPC 2020
PDF
Meilleur du typage fort (AFUP Day, 2020)
PDF
Top 10 php classic traps confoo
PDF
Tout pour se préparer à PHP 7.4
PDF
Top 10 php classic traps php serbia
PDF
Top 10 php classic traps
PDF
Top 10 chausse trappes
PDF
Code review workshop
PDF
Understanding static analysis php amsterdam 2018
PDF
Review unknown code with static analysis php ce 2018
PDF
Everything new with PHP 7.3
PDF
Php 7.3 et ses RFC (AFUP Toulouse)
PDF
Tout sur PHP 7.3 et ses RFC
PDF
Review unknown code with static analysis php ipc 2018
PDF
Code review for busy people
Strong typing @ php leeds
Strong typing : adoption, adaptation and organisation
Qui a laissé son mot de passe dans le code
Analyse statique et applications
Top 10 pieges php afup limoges
Top 10 php classic traps DPC 2020
Meilleur du typage fort (AFUP Day, 2020)
Top 10 php classic traps confoo
Tout pour se préparer à PHP 7.4
Top 10 php classic traps php serbia
Top 10 php classic traps
Top 10 chausse trappes
Code review workshop
Understanding static analysis php amsterdam 2018
Review unknown code with static analysis php ce 2018
Everything new with PHP 7.3
Php 7.3 et ses RFC (AFUP Toulouse)
Tout sur PHP 7.3 et ses RFC
Review unknown code with static analysis php ipc 2018
Code review for busy people

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Chapter 3 Spatial Domain Image Processing.pdf
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx

Machine learning in PHP