SlideShare a Scribd company logo
Mariusz Gil
Source Ministry
@mariuszgil
Machine Learning to a Rescue
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
CLIENT PROBLEM
1M BACKLINKS
CLASSIFY THEM
Mariusz Gil "Machine Learning"
OK
NOT OK
I DON’T CARE
OK
NOT OK
I DON’T CARE
OK
NOT OK
I DON’T CARE
Mariusz Gil "Machine Learning"
T(URL) → [1, 2, 3, …]
IF-OLOGY
UGLY CODE
FOR POC
1STAPPROACH
I DON’ KNOW
NAIVE
MACHINE LEARNING
2ND APPROACH
DATA ML
TASK
SENDTO RESULTS
CALCULATE
Mariusz Gil "Machine Learning"
RECIPE FOR A FAILURE
DOING WITHOUT KNOWING IS A…
DATA ORIENTED
MACHINE LEARNING
WORKFLOW
3RD APPROACH,FINAL
A COMPUTER PROGRAM
IS SAID TO LEARN FROM EXPERIENCE E
WITH RESPECTTO SOME CLASS OF TASKS T
AND PERFORMANCE MEASURE P
IF ITS PERFORMANCE ATTASKS IN T,
AS MEASURED BY P,
IMPROVES WITH EXPERIENCE E
DATA ML
TASK
PREPARED,INPUT FOR
RESULTS
WITH PERFORMANCE
EXPERIENCE FEEDBACK LOOP
LEARNING,VALIDATING
ML
TASK
CLASSIFICATION
REGRESSION
CLUSTERING
DIMENSIONALITY REDUCTION
ASSOCIATION RULES
ML
METHOD
SUPERVISED LEARNING
UNSUPERVISED LEARNING
REINFORCEMENT LEARNING
REAL PROBLEM
MARCO PIVETTA
CODE REVIEWS
+--------------------------------------------------+--------+
| pull_request | price |
+--------------------------------------------------+--------+
| https://github.com/octocat/Hello-World/pull/1347 | 100.00 |
+--------------------------------------------------+--------+
| https://github.com/octocat/Hello-World/pull/1347 | 150.00 |
+--------------------------------------------------+--------+
PREDICTING VALUES
REGRESSION
WHAT IS THE PRICE OF PULL REQUEST
WE NEED TO REVIEW?
+--------------+-------+-----------+--------+
| pull_request | files | all_lines | price |
+--------------+-------+-----------+--------+
| ... | 10 | 1000 | 100.00 |
+--------------+-------+-----------+--------+
| ... | 15 | 2000 | 150.00 |
+--------------+-------+-----------+--------+
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from sklearn.linear_model import LinearRegression
from sklearn.isotonic import IsotonicRegression
from sklearn.utils import check_random_state
n = 100
x = np.arange(n)
rs = check_random_state(0)
y = rs.randint(-50, 50, size=(n,)) + 50. * np.log(1 + np.arange(n))
ir = IsotonicRegression()
y_ = ir.fit_transform(x, y)
lr = LinearRegression()
lr.fit(x[:, np.newaxis], y)
Mariusz Gil "Machine Learning"
RECIPE FOR A FAILURE
DON’TYOU KNOWYOUR DATA?
UDF
WOKRKING ON ML,APPLY
*
+--------------+-------+-----------+--------+
| pull_request | files | all_lines | price |
+--------------+-------+-----------+--------+
| ... | 10 | 1000 | 100.00 |
+--------------+-------+-----------+--------+
| ... | 15 | 2000 | 150.00 |
+--------------+-------+-----------+--------+
+--------------+-------+-----------+--------+
| pull_request | files | all_lines | price |
+--------------+-------+-----------+--------+
| ... | 10 | 1000 | 100.00 |
+--------------+-------+-----------+--------+
| ... | 15 | 2000 | 150.00 |
+--------------+-------+-----------+--------+
| ... | 15 | 2000 | 50.00 |
+--------------+-------+-----------+--------+
+-----+-------+-----------+------------+--------+
| pr | files | all_lines | diff_lines | price |
+-----+-------+-----------+------------+--------+
| ... | 10 | 1000 | 500 | 100.00 |
+-----+-------+-----------+------------+--------+
| ... | 15 | 2000 | 700 | 150.00 |
+-----+-------+-----------+------------+--------+
| ... | 15 | 2000 | 150 | 50.00 |
+-----+-------+-----------+------------+--------+
T(PR) → [1, 2, 3, …]
FEATURES
CLASSES
INTERFACES
INHERITANCE LEVEL
METHODS CALLS
FUNCTIONS CALLS
AFFERENT COUPLING
EFFERENT COUPLING
LLOC METRIC
LCOM METRIC
…
K-MEANS
CLUSTERING
HOWTO CLUSTER OUR DATASET
AUTOMATED WAY IN THE CONTEXT OF…?
IRIS DATASET
1936,RONALD FISHER
Ronald Fisher 

1936 paper

The use of multiple measurements in
taxonomic problems
Mariusz Gil "Machine Learning"
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
n_samples = 1500
random_state = 170
X, y = make_blobs(n_samples=n_samples, random_state=random_state)
X_varied, y_varied = make_blobs(n_samples=n_samples,
cluster_std=[1.0, 2.5, 0.5],
random_state=random_state)
y_pred = KMeans(n_clusters=3, random_state=random_state).fit_predict(X_varied)
Mariusz Gil "Machine Learning"
+-----+-------+-----------+------------+--------+
| pr | files | all_lines | diff_lines | price |
+-----+-------+-----------+------------+--------+
| ... | 10 | 1000 | 500 | 100.00 |
+-----+-------+-----------+------------+--------+
| ... | 15 | 2000 | 700 | 150.00 |
+-----+-------+-----------+------------+--------+
| ... | 15 | 2000 | 150 | 50.00 |
+-----+-------+-----------+------------+--------+
+-----+--------+-------+-----------+------------+--------+
| pr | lang | files | all_lines | diff_lines | price |
+-----+--------+-------+-----------+------------+--------+
| ... | java | 10 | 1000 | 500 | 100.00 |
+-----+--------+-------+-----------+------------+--------+
| ... | java | 15 | 2000 | 700 | 150.00 |
+-----+--------+-------+-----------+------------+--------+
| ... | python | 15 | 2000 | 150 | 50.00 |
+-----+--------+-------+-----------+------------+--------+
RESULTS STABILITY
FAST ARTIFICIAL
NEURAL NETWORK
CLASSIFICATION
HOWTO CLASSIFY OUR DATASET
AUTOMATED WAY TO FIND JUNIOR/SENIOR
DEVELOPER?
Mariusz Gil "Machine Learning"
from sklearn.datasets import fetch_mldata
from sklearn.neural_network import MLPClassifier
mnist = fetch_mldata("MNIST original")
# rescale the data, use the traditional train/test split
X, y = mnist.data / 255., mnist.target
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]
# mlp = MLPClassifier(hidden_layer_sizes=(100, 100), max_iter=400, alpha=1e-4,
# solver='sgd', verbose=10, tol=1e-4, random_state=1)
mlp = MLPClassifier(hidden_layer_sizes=(50,), max_iter=10, alpha=1e-4,
solver='sgd', verbose=10, tol=1e-4, random_state=1,
learning_rate_init=.1)
mlp.fit(X_train, y_train)
TECHNOLOGY
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
Mariusz Gil "Machine Learning"
FOCUS ON IDEAS
NOT TOOLS
IN 99.99% APPS
YOU WILL NOT WRITE
ALGO FROM SCRATCH
http://scikit-learn.org/stable/_static/ml_map.png
ML IS NOT
A SINGLE RUN
OF ALGORITHM
IT’S A PROCESS
ML
PROCESS
DEFINE A PROBLEM
GATHER YOUR DATA
UNDERSTAND YOUR DATA
PREPARE DATA FOR ML
SELECT & RUN ALGO(S)
TUNE ALGO(S) PARAMETERS
SELECT FINAL MODEL
VALIDATE FINAL MODEL
ML
PROCESS
DEFINE A PROBLEM
ANALYZE YOUR DATA
UNDERSTAND YOUR DATA
PREPARE DATA FOR ML
SELECT & RUN ALGO(S)
TUNE ALGO(S) PARAMETERS
SELECT FINAL MODEL
VALIDATE FINAL MODEL
+-----+--------+-------+-----------+------------+--------+
| pr | lang | files | all_lines | diff_lines | price |
+-----+--------+-------+-----------+------------+--------+
| ... | java | 10 | 1000 | 500 | 100.00 |
+-----+--------+-------+-----------+------------+--------+
| ... | java | 15 | 2000 | 700 | 150.00 |
+-----+--------+-------+-----------+------------+--------+
| ... | python | 15 | 2000 | 150 | 50.00 |
+-----+--------+-------+-----------+------------+--------+
+-----+--------+-------+-----------+------------+--------+----------+
| pr | lang | files | all_lines | diff_lines | price | currency |
+-----+--------+-------+-----------+------------+--------+----------+
| ... | java | 10 | 1000 | 500 | 100.00 | USD |
+-----+--------+-------+-----------+------------+--------+----------+
| ... | java | 15 | 2000 | 700 | 150.00 | USD |
+-----+--------+-------+-----------+------------+--------+----------+
| ... | python | 15 | 2000 | 150 | 50.00 | CAD |
+-----+--------+-------+-----------+------------+--------+----------+
THANKS
mariuszgil
HAPPY LEARNING YOUR MACHINES!

More Related Content

PPTX
Activity sequence and series
PPTX
Prep velvet – Speed Maths
DOCX
Tugas matik ii
PPT
Chapter 2, Section 2: Distributive Property
PPTX
Lecture 07.
ODP
Surds & indices in business mathematics
PPTX
Midpoint Between Two Points
PPTX
Distributive property
Activity sequence and series
Prep velvet – Speed Maths
Tugas matik ii
Chapter 2, Section 2: Distributive Property
Lecture 07.
Surds & indices in business mathematics
Midpoint Between Two Points
Distributive property

Similar to Mariusz Gil "Machine Learning" (20)

PPTX
Analytics functions in mysql, oracle and hive
PDF
Performance
PPTX
Application of Machine Learning in Agriculture
PPT
Ch02 primitive-data-definite-loops
PDF
Classification examp
PDF
Simple rules for building robust machine learning models
PPTX
Machine Learning with Tensorflow
PDF
maXbox starter69 Machine Learning VII
PDF
Building Machine Learning Pipelines
PDF
Linear Regression (Machine Learning)
PPT
ch02-primitive-data-definite-loops.ppt
PPT
ch02-primitive-data-definite-loops.ppt
PDF
Data Science and Machine Learning Using Python and Scikit-learn
PDF
Building ML Pipelines
PDF
maXbox starter65 machinelearning3
PDF
Window functions in MySQL 8.0
PDF
Assignment 5.2.pdf
PPTX
Machine learning and_nlp
PDF
ILOUG 2019 - SQL features for Developers
PDF
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
Analytics functions in mysql, oracle and hive
Performance
Application of Machine Learning in Agriculture
Ch02 primitive-data-definite-loops
Classification examp
Simple rules for building robust machine learning models
Machine Learning with Tensorflow
maXbox starter69 Machine Learning VII
Building Machine Learning Pipelines
Linear Regression (Machine Learning)
ch02-primitive-data-definite-loops.ppt
ch02-primitive-data-definite-loops.ppt
Data Science and Machine Learning Using Python and Scikit-learn
Building ML Pipelines
maXbox starter65 machinelearning3
Window functions in MySQL 8.0
Assignment 5.2.pdf
Machine learning and_nlp
ILOUG 2019 - SQL features for Developers
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
Ad

More from Fwdays (20)

PDF
"Mastering UI Complexity: State Machines and Reactive Patterns at Grammarly",...
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
PPTX
"Computer Use Agents: From SFT to Classic RL", Maksym Shamrai
PPTX
"Як ми переписали Сільпо на Angular", Євген Русаков
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
PDF
"Validation and Observability of AI Agents", Oleksandr Denisyuk
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
PPTX
"Co-Authoring with a Machine: What I Learned from Writing a Book on Generativ...
PPTX
"Human-AI Collaboration Models for Better Decisions, Faster Workflows, and Cr...
PDF
"AI is already here. What will happen to your team (and your role) tomorrow?"...
PPTX
"Is it worth investing in AI in 2025?", Alexander Sharko
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
PDF
"Database isolation: how we deal with hundreds of direct connections to the d...
PDF
"Scaling in space and time with Temporal", Andriy Lupa .pdf
PPTX
"Provisioning via DOT-Chain: from catering to drone marketplaces", Volodymyr ...
PPTX
" Observability with Elasticsearch: Best Practices for High-Load Platform", A...
PPTX
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
PPTX
"Istio Ambient Mesh in production: our way from Sidecar to Sidecar-less",Hlib...
"Mastering UI Complexity: State Machines and Reactive Patterns at Grammarly",...
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
"Computer Use Agents: From SFT to Classic RL", Maksym Shamrai
"Як ми переписали Сільпо на Angular", Євген Русаков
"AI Transformation: Directions and Challenges", Pavlo Shaternik
"Validation and Observability of AI Agents", Oleksandr Denisyuk
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
"Co-Authoring with a Machine: What I Learned from Writing a Book on Generativ...
"Human-AI Collaboration Models for Better Decisions, Faster Workflows, and Cr...
"AI is already here. What will happen to your team (and your role) tomorrow?"...
"Is it worth investing in AI in 2025?", Alexander Sharko
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Database isolation: how we deal with hundreds of direct connections to the d...
"Scaling in space and time with Temporal", Andriy Lupa .pdf
"Provisioning via DOT-Chain: from catering to drone marketplaces", Volodymyr ...
" Observability with Elasticsearch: Best Practices for High-Load Platform", A...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"Istio Ambient Mesh in production: our way from Sidecar to Sidecar-less",Hlib...
Ad

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Mushroom cultivation and it's methods.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
project resource management chapter-09.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
August Patch Tuesday
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
TLE Review Electricity (Electricity).pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Hybrid model detection and classification of lung cancer
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Assigned Numbers - 2025 - Bluetooth® Document
Mushroom cultivation and it's methods.pdf
A Presentation on Touch Screen Technology
Building Integrated photovoltaic BIPV_UPV.pdf
Getting Started with Data Integration: FME Form 101
A comparative analysis of optical character recognition models for extracting...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Group 1 Presentation -Planning and Decision Making .pptx
project resource management chapter-09.pdf
Web App vs Mobile App What Should You Build First.pdf
Zenith AI: Advanced Artificial Intelligence
Hindi spoken digit analysis for native and non-native speakers
August Patch Tuesday
DP Operators-handbook-extract for the Mautical Institute
TLE Review Electricity (Electricity).pptx
NewMind AI Weekly Chronicles - August'25-Week II

Mariusz Gil "Machine Learning"