SlideShare a Scribd company logo
Introduction to
Machine Learning
Max Kleiner 10.2018
2
ML Agenda
2
http://www.softwareschule.ch/examples/machinelearning.jpg
 4 Cases with 5 Scripts
Data Reduction - EKON22_PCA_1.py
Regression - EKON22_REG_2.py
Clustering - EKON22_CLU_3.py
Classification - EKON22_CLA_4.py
Decision Tree, Random Forest - EKON32_DET_5.py
 Cluster & Classify with different inputs, algos ,config
Define label, features or topic ratings, hyper-parameters, tests
– assumed/implicit labels, predict versus target, random state
 Conclusions/ ML Process Summary/
3
3
https://www.springboard.com/blog/data-mining-python-tutorial/
PCA (Principal Component Analysis)
C:maXboxEKON22EKON22_scriptsEKON22_PCA_1.py
http://playground.tensorflow.org_maXbox2
Visualizing 2 or 3 dimensional data is not that challenging.
However, even Iris dataset used 4 dim. Use PCA to reduce 4 dim
data into 2 or 3 dim so that you can plot & understand data better.
Use StandardScaler for features onto unit scale (mean = 0, variance
= 1) which is a requirement for optimal performance.
4
4
https://www.springboard.com/blog/data-mining-python-tutorial/
Topic IRIS Classify Task
https://sebastianraschka.com/images/blog/2015/principal_component_analysis_files/iris.png
CASSANDRASystem
5
Regression and Correlation
@C:maXboxEKON22EKON22_scriptsEKON22_REG_2.py
2. C:maXboxmX46210ntwdblib.dllUnsharpDetector-masterUnsharpDetector-masterinference_gui.py
6
From Correlation to 4 Dim Cluster
6https://www.soovle.com/
https://answerthepublic.com/reports/
Finding the question is often more important than finding the answer - John Tukey
CASSANDRASystem
7
Clustering from module import class
0 1 2 3 4
5 6 37 81 9
@C:maXboxEKON22EKON22_scriptsEKON22_CLU_3.py
8
GEO
Cluster
Story
8
An agent or probe that collects threat data from the security sensor and correlation
middleware. A console and associated database for managing the solution and its alerts.
https://www.esecurityplanet.com/views/article.php/1501001/Security-Threat-Correlation-The-Next-Battlefield.htm
CASSANDRASystem
9
IRIS Classification Concept
0 1 2 3 4
5 6 4 3 4
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
y_pred = clf.predict(iris.data)
print('Train accuracy_score: ')
metrics.accuracy_score(iris.target,y_pred)
Demo in VSCode /maXbox4
C:maXboxsoftwareschuleMT-HS12-
05mentor_xmlcasra2017crawlerplot_iris_dataset_mx.py
@C:maXboxEKON22EKON22_scriptsEKON22_CLA_4.py
10
IRIS Confusion Matrix
10
11
11
IRIS Decision Tree
@C:maXboxEKON22EKON22_scriptsEKON23_DET_5.py
CASSANDRASystem
12
MongoDB My Cluster sacred.runs & completed
13
13
Task II
14
14
What's behind test ? (backend pattern, crossentropy)
60000/60000 [==============================] - 426s 7ms/step - loss: 0.4982 - acc: 0.8510 -
val_loss: 0.0788 - val_acc: 0.9749
Using TensorFlow backend.
INFO - MNIST-Convnet4 - Result: 0.9749
INFO - MNIST-Convnet4 - Completed after 0:07:27
Test loss: 0.0788029053777
Test accuracy: 0.9749
 59392/60000 [============================>.] - ETA: 5s - loss: 0.0571 - acc: 0.9829
 59520/60000 [============================>.] - ETA: 3s - loss: 0.0572 - acc: 0.9829
 59648/60000 [============================>.] - ETA: 2s - loss: 0.0572 - acc: 0.9829
 59776/60000 [============================>.] - ETA: 1s - loss: 0.0572 - acc: 0.9829
 59904/60000 [============================>.] - ETA: 0s - loss: 0.0573 - acc: 0.9829
 60000/60000 [==============================] - 513s 9ms/step - loss: 0.0573 - acc:
0.9829 - val_loss: 0.0312 - val_acc: 0.9891
 INFO - MNIST-Convnet4 - Result: 0.9891
 INFO - MNIST-Convnet4 - Completed after 0:33:28
 Test loss: 0.0311644290059
 Test accuracy: 0.9891
15
15
What's behind code ? (Classification Summary)
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
>>> iris = datasets.load_iris()
>>> X = iris.data[0:, 1:3]
>>> y = iris.target
>>> X_train,X_test, y_train, y_test =
train_test_split(X, y,test_size=0.3, random_state=20)
>>> from sklearn import svm
>>> classifier = svm.SVC(kernel='linear', C=1.0)
>>> classifier.fit(X_train, y_train)
>>> y_pred = classifier.predict(X_test)
>>> from sklearn import metrics
>>> print ("Test - Accuracy SVC:", metrics.accuracy_score(y_test, y_pred))
Test - Accuracy SVC: 0.9555555555555556
https://www.programcreek.com/python/example/103267/keras.datasets.mnist.load_data
16
16
What's behind code II ? (Check for duplicates)
print(y_test)
array([0, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 1, 2, 0, 0, 2, 0, 1, 2, 1, 1, 2,
2, 0, 1, 1, 1, 0, 2, 2, 1, 1, 0, 0, 0, 2, 1, 0, 1, 2, 1, 2, 0, 1, 1])
>>> unique, counts = np.unique(y_test, return_counts=True)
>>> dict(zip(unique, counts))
{0: 13, 1: 18, 2: 14}
>>> Xyt = np.column_stack((X_test, y_test))
>>> csort = Xyt[Xyt[:,2].argsort()]
>>> dfiris = pd.DataFrame(csort)
>>> dfiris[0:13].groupby([0,1]).size()
3.0 1.1 1
1.4 2
3.1 1.5 2
3.2 1.2 1 - 3.4 1.4 1 1.6 1 1.7 1 -3.5 1.4 1 - 3.7 1.5 1
1.4 1
3.8 1.6 1
>>> sum(dfiris[0:13].groupby([0,1]).size()>1) 2
17
17https://github.com/pinae/Sacred-MNIST/blob/master/train_convnet.py
What's behind Python: PIP3 Install
pip3 install sacred
Collecting sacred
Downloading
https://files.pythonhosted.org/packages/2d/86/7be3af
a4d4c1c0c76a5de03e5ff779797ab2654e377685255c11c13c0e
a5/sacred-0.7.3-py2.py3-none-any.whl (82kB)
Collecting pymongo
Downloading
https://files.pythonhosted.org/packages/46/39/b9bb7fed3e3a0ea621a1
512a938c105cd996320d7d9894d8239ca9093340/pymongo-3.6.1-cp36-cp36m-
win_amd64.whl (291kB)
100% |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 296kB 728kB/s
Installing collected packages: pymongo
Successfully installed pymongo-3.6.1
18
Machine Learning Process Chain
• Collab (Set a control thesis, understand the
problem, get resources Python etc.)
• Collect (Scrapy data, store, data mining,
filter data, inconsistent, incomplete)
• Consolidate or Clean data (normalization and
aggregation, PCA data reduction, Regression,
Filters, slice out irrel. Or ambigous data or
char unicode map prob.)
• Cluster (kmeans for category, collocates for
N-keywords) algorithm – unsupervised)
• Classify (SVM, Sequential, Bayes – supervised)
• Conclude and Control (Predict or report context
thesis and drive data to decision)
http://www.softwareschule.ch/examples/machinelearning.jpg
https://maxbox4.wordpress.com/code/
CASSANDRASystem
19
https://www.kaggle.com/
similarity of doc a to doc b =
''
)',(
),(
)',(
),(
),(
word
'
2
'
2
BA
jbv
jbv
jav
jav
basim
i
jj

 

20
THE TEST OVERVIEW
20
Double Trouble with ML →
Stackexchange,
Stackoverflow
No. of URLs removed 76,732,515
+ No. of robots.txt
requests
3,675,634
- No. of excludedURLs 3,050,768
= No. of HTTP requests 77,357,381
HTTP requests not
respond
1,763850
Status Description
QUEUED
The run was just
queued
and not run yet
RUNNING
Currently running (but see below)
COMPLETED
Completed successfully
FAILED
The run failed due to an exception
INTERRUPTED
The run was cancelled with a
KeyboardInterrupt
TIMED_OUT
The run was aborted using a TimeoutInterrupt
[custom]
A custom py:class:
~sacred.utils.SacredInterrupt
occurred
File "C:UsersmaxAppDataLocalProgramsPythonPython36libsite-
packagessklearnmetricsclusterunsupervised.py", line 254, in
calinski_harabaz_score
intra_disp += np.sum((cluster_k - mean_k) ** 2) MemoryError
File
"C:UsersmaxAppDataLocalProgramsPythonPyt
hon36libsite-
packagessklearnmetricsclusterunsupervised.py",
line 254, in calinski_harabaz_score
intra_disp += np.sum((cluster_k - mean_k) ** 2)
MemoryError
https://stats.stackexchange.com/
21
QUESTIONS ?
17:45 - 18:30
Machine Learning II
Art. Neural Network
Best Book in my opinion:
Mastering Machine
Learning with
Python in Six Steps
A Practical Implementation
Predictive Data Analytics
21

More Related Content

PDF
Performance
PPTX
Finding SQL execution outliers
PDF
Aspects of 10 Tuning
PDF
How To Crack RSA Netrek Binary Verification System
PDF
Do snow.rwn
PDF
Demystifying cost based optimization
PDF
Deep review of LMS process
TXT
Hanya contoh saja dari xampp
Performance
Finding SQL execution outliers
Aspects of 10 Tuning
How To Crack RSA Netrek Binary Verification System
Do snow.rwn
Demystifying cost based optimization
Deep review of LMS process
Hanya contoh saja dari xampp

What's hot (20)

PDF
benpresentation_django
PDF
Efficient Programs
PDF
Complex models in ecology: challenges and solutions
PDF
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
PPTX
Maximal slice problem
PDF
AST: threats and opportunities
PDF
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PDF
PostgreSQL: Advanced indexing
PDF
orca_fosdem_FINAL
PDF
Machine Learning - Introduction
PDF
第7回 大規模データを用いたデータフレーム操作実習(1)
PDF
Tech Talk: Best Practices for Data Modeling
PDF
Same plan different performance
PDF
Image classification with Deeplearning4j
PDF
Oracle trace data collection errors: the story about oceans, islands, and rivers
PPTX
R intro 20140716-advance
PDF
Report resnet-110 キャラクター分類テスト
PPTX
Ac cuda c_6
PDF
Advanced pg_stat_statements: Filtering, Regression Testing & more
PDF
PostgreSQL query planner's internals
benpresentation_django
Efficient Programs
Complex models in ecology: challenges and solutions
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
Maximal slice problem
AST: threats and opportunities
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PostgreSQL: Advanced indexing
orca_fosdem_FINAL
Machine Learning - Introduction
第7回 大規模データを用いたデータフレーム操作実習(1)
Tech Talk: Best Practices for Data Modeling
Same plan different performance
Image classification with Deeplearning4j
Oracle trace data collection errors: the story about oceans, islands, and rivers
R intro 20140716-advance
Report resnet-110 キャラクター分類テスト
Ac cuda c_6
Advanced pg_stat_statements: Filtering, Regression Testing & more
PostgreSQL query planner's internals
Ad

Similar to EKON22 Introduction to Machinelearning (20)

PPTX
Classification: MNIST, training a Binary classifier, performance measure, mul...
PDF
Hands_On_Machine_Learning_with_Scikit_Le.pdf
PDF
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
PDF
TensorFlow BASTA2018 Machinelearning
PDF
Ekon22 tensorflow machinelearning2
PDF
Machine Learning Guide maXbox Starter62
PDF
maxbox starter60 machine learning
PDF
Deep learning with Keras
PDF
maXbox starter67 machine learning V
PPTX
Machine-Learning-Overview a statistical approach
PDF
OpenPOWER Workshop in Silicon Valley
PPTX
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
PDF
Machine Learning Crash Course by Sebastian Raschka
PPTX
TechEvent Machine Learning
PDF
An introduction to Machine Learning
PDF
Neural Networks in the Wild: Handwriting Recognition
PDF
Icpp power ai-workshop 2018
PPTX
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
PPTX
07 learning
PDF
Introduction to deep learning using python
Classification: MNIST, training a Binary classifier, performance measure, mul...
Hands_On_Machine_Learning_with_Scikit_Le.pdf
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
TensorFlow BASTA2018 Machinelearning
Ekon22 tensorflow machinelearning2
Machine Learning Guide maXbox Starter62
maxbox starter60 machine learning
Deep learning with Keras
maXbox starter67 machine learning V
Machine-Learning-Overview a statistical approach
OpenPOWER Workshop in Silicon Valley
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
Machine Learning Crash Course by Sebastian Raschka
TechEvent Machine Learning
An introduction to Machine Learning
Neural Networks in the Wild: Handwriting Recognition
Icpp power ai-workshop 2018
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
07 learning
Introduction to deep learning using python
Ad

More from Max Kleiner (20)

PDF
EKON28_ModernRegex_12_Regular_Expressions.pdf
PDF
EKON28_Maps_API_12_google_openstreetmaps.pdf
PDF
EKON26_VCL4Python.pdf
PDF
EKON26_Open_API_Develop2Cloud.pdf
PDF
maXbox_Starter91_SyntheticData_Implement
PDF
Ekon 25 Python4Delphi_MX475
PDF
EKON 25 Python4Delphi_mX4
PDF
maXbox Starter87
PDF
maXbox Starter78 PortablePixmap
PDF
maXbox starter75 object detection
PDF
BASTA 2020 VS Code Data Visualisation
PDF
EKON 24 ML_community_edition
PDF
maxbox starter72 multilanguage coding
PDF
EKON 23 Code_review_checklist
PDF
EKON 12 Running OpenLDAP
PDF
EKON 12 Closures Coding
PDF
NoGUI maXbox Starter70
PDF
maXbox starter69 Machine Learning VII
PDF
maXbox starter68 machine learning VI
PDF
maXbox starter65 machinelearning3
EKON28_ModernRegex_12_Regular_Expressions.pdf
EKON28_Maps_API_12_google_openstreetmaps.pdf
EKON26_VCL4Python.pdf
EKON26_Open_API_Develop2Cloud.pdf
maXbox_Starter91_SyntheticData_Implement
Ekon 25 Python4Delphi_MX475
EKON 25 Python4Delphi_mX4
maXbox Starter87
maXbox Starter78 PortablePixmap
maXbox starter75 object detection
BASTA 2020 VS Code Data Visualisation
EKON 24 ML_community_edition
maxbox starter72 multilanguage coding
EKON 23 Code_review_checklist
EKON 12 Running OpenLDAP
EKON 12 Closures Coding
NoGUI maXbox Starter70
maXbox starter69 Machine Learning VII
maXbox starter68 machine learning VI
maXbox starter65 machinelearning3

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Knowledge Engineering Part 1
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Global journeys: estimating international migration
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Database Infoormation System (DBIS).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Launch Your Data Science Career in Kochi – 2025
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Moving the Public Sector (Government) to a Digital Adoption
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
.pdf is not working space design for the following data for the following dat...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Foundation of Data Science unit number two notes
Introduction to Knowledge Engineering Part 1
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Global journeys: estimating international migration
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

EKON22 Introduction to Machinelearning

  • 2. 2 ML Agenda 2 http://www.softwareschule.ch/examples/machinelearning.jpg  4 Cases with 5 Scripts Data Reduction - EKON22_PCA_1.py Regression - EKON22_REG_2.py Clustering - EKON22_CLU_3.py Classification - EKON22_CLA_4.py Decision Tree, Random Forest - EKON32_DET_5.py  Cluster & Classify with different inputs, algos ,config Define label, features or topic ratings, hyper-parameters, tests – assumed/implicit labels, predict versus target, random state  Conclusions/ ML Process Summary/
  • 3. 3 3 https://www.springboard.com/blog/data-mining-python-tutorial/ PCA (Principal Component Analysis) C:maXboxEKON22EKON22_scriptsEKON22_PCA_1.py http://playground.tensorflow.org_maXbox2 Visualizing 2 or 3 dimensional data is not that challenging. However, even Iris dataset used 4 dim. Use PCA to reduce 4 dim data into 2 or 3 dim so that you can plot & understand data better. Use StandardScaler for features onto unit scale (mean = 0, variance = 1) which is a requirement for optimal performance.
  • 4. 4 4 https://www.springboard.com/blog/data-mining-python-tutorial/ Topic IRIS Classify Task https://sebastianraschka.com/images/blog/2015/principal_component_analysis_files/iris.png
  • 5. CASSANDRASystem 5 Regression and Correlation @C:maXboxEKON22EKON22_scriptsEKON22_REG_2.py 2. C:maXboxmX46210ntwdblib.dllUnsharpDetector-masterUnsharpDetector-masterinference_gui.py
  • 6. 6 From Correlation to 4 Dim Cluster 6https://www.soovle.com/ https://answerthepublic.com/reports/ Finding the question is often more important than finding the answer - John Tukey
  • 7. CASSANDRASystem 7 Clustering from module import class 0 1 2 3 4 5 6 37 81 9 @C:maXboxEKON22EKON22_scriptsEKON22_CLU_3.py
  • 8. 8 GEO Cluster Story 8 An agent or probe that collects threat data from the security sensor and correlation middleware. A console and associated database for managing the solution and its alerts. https://www.esecurityplanet.com/views/article.php/1501001/Security-Threat-Correlation-The-Next-Battlefield.htm
  • 9. CASSANDRASystem 9 IRIS Classification Concept 0 1 2 3 4 5 6 4 3 4 from sklearn import datasets, tree iris = datasets.load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) y_pred = clf.predict(iris.data) print('Train accuracy_score: ') metrics.accuracy_score(iris.target,y_pred) Demo in VSCode /maXbox4 C:maXboxsoftwareschuleMT-HS12- 05mentor_xmlcasra2017crawlerplot_iris_dataset_mx.py @C:maXboxEKON22EKON22_scriptsEKON22_CLA_4.py
  • 12. CASSANDRASystem 12 MongoDB My Cluster sacred.runs & completed
  • 14. 14 14 What's behind test ? (backend pattern, crossentropy) 60000/60000 [==============================] - 426s 7ms/step - loss: 0.4982 - acc: 0.8510 - val_loss: 0.0788 - val_acc: 0.9749 Using TensorFlow backend. INFO - MNIST-Convnet4 - Result: 0.9749 INFO - MNIST-Convnet4 - Completed after 0:07:27 Test loss: 0.0788029053777 Test accuracy: 0.9749  59392/60000 [============================>.] - ETA: 5s - loss: 0.0571 - acc: 0.9829  59520/60000 [============================>.] - ETA: 3s - loss: 0.0572 - acc: 0.9829  59648/60000 [============================>.] - ETA: 2s - loss: 0.0572 - acc: 0.9829  59776/60000 [============================>.] - ETA: 1s - loss: 0.0572 - acc: 0.9829  59904/60000 [============================>.] - ETA: 0s - loss: 0.0573 - acc: 0.9829  60000/60000 [==============================] - 513s 9ms/step - loss: 0.0573 - acc: 0.9829 - val_loss: 0.0312 - val_acc: 0.9891  INFO - MNIST-Convnet4 - Result: 0.9891  INFO - MNIST-Convnet4 - Completed after 0:33:28  Test loss: 0.0311644290059  Test accuracy: 0.9891
  • 15. 15 15 What's behind code ? (Classification Summary) from sklearn import datasets, metrics from sklearn.model_selection import train_test_split >>> iris = datasets.load_iris() >>> X = iris.data[0:, 1:3] >>> y = iris.target >>> X_train,X_test, y_train, y_test = train_test_split(X, y,test_size=0.3, random_state=20) >>> from sklearn import svm >>> classifier = svm.SVC(kernel='linear', C=1.0) >>> classifier.fit(X_train, y_train) >>> y_pred = classifier.predict(X_test) >>> from sklearn import metrics >>> print ("Test - Accuracy SVC:", metrics.accuracy_score(y_test, y_pred)) Test - Accuracy SVC: 0.9555555555555556 https://www.programcreek.com/python/example/103267/keras.datasets.mnist.load_data
  • 16. 16 16 What's behind code II ? (Check for duplicates) print(y_test) array([0, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 1, 2, 0, 0, 2, 0, 1, 2, 1, 1, 2, 2, 0, 1, 1, 1, 0, 2, 2, 1, 1, 0, 0, 0, 2, 1, 0, 1, 2, 1, 2, 0, 1, 1]) >>> unique, counts = np.unique(y_test, return_counts=True) >>> dict(zip(unique, counts)) {0: 13, 1: 18, 2: 14} >>> Xyt = np.column_stack((X_test, y_test)) >>> csort = Xyt[Xyt[:,2].argsort()] >>> dfiris = pd.DataFrame(csort) >>> dfiris[0:13].groupby([0,1]).size() 3.0 1.1 1 1.4 2 3.1 1.5 2 3.2 1.2 1 - 3.4 1.4 1 1.6 1 1.7 1 -3.5 1.4 1 - 3.7 1.5 1 1.4 1 3.8 1.6 1 >>> sum(dfiris[0:13].groupby([0,1]).size()>1) 2
  • 17. 17 17https://github.com/pinae/Sacred-MNIST/blob/master/train_convnet.py What's behind Python: PIP3 Install pip3 install sacred Collecting sacred Downloading https://files.pythonhosted.org/packages/2d/86/7be3af a4d4c1c0c76a5de03e5ff779797ab2654e377685255c11c13c0e a5/sacred-0.7.3-py2.py3-none-any.whl (82kB) Collecting pymongo Downloading https://files.pythonhosted.org/packages/46/39/b9bb7fed3e3a0ea621a1 512a938c105cd996320d7d9894d8239ca9093340/pymongo-3.6.1-cp36-cp36m- win_amd64.whl (291kB) 100% |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 296kB 728kB/s Installing collected packages: pymongo Successfully installed pymongo-3.6.1
  • 18. 18 Machine Learning Process Chain • Collab (Set a control thesis, understand the problem, get resources Python etc.) • Collect (Scrapy data, store, data mining, filter data, inconsistent, incomplete) • Consolidate or Clean data (normalization and aggregation, PCA data reduction, Regression, Filters, slice out irrel. Or ambigous data or char unicode map prob.) • Cluster (kmeans for category, collocates for N-keywords) algorithm – unsupervised) • Classify (SVM, Sequential, Bayes – supervised) • Conclude and Control (Predict or report context thesis and drive data to decision) http://www.softwareschule.ch/examples/machinelearning.jpg https://maxbox4.wordpress.com/code/
  • 19. CASSANDRASystem 19 https://www.kaggle.com/ similarity of doc a to doc b = '' )',( ),( )',( ),( ),( word ' 2 ' 2 BA jbv jbv jav jav basim i jj    
  • 20. 20 THE TEST OVERVIEW 20 Double Trouble with ML → Stackexchange, Stackoverflow No. of URLs removed 76,732,515 + No. of robots.txt requests 3,675,634 - No. of excludedURLs 3,050,768 = No. of HTTP requests 77,357,381 HTTP requests not respond 1,763850 Status Description QUEUED The run was just queued and not run yet RUNNING Currently running (but see below) COMPLETED Completed successfully FAILED The run failed due to an exception INTERRUPTED The run was cancelled with a KeyboardInterrupt TIMED_OUT The run was aborted using a TimeoutInterrupt [custom] A custom py:class: ~sacred.utils.SacredInterrupt occurred File "C:UsersmaxAppDataLocalProgramsPythonPython36libsite- packagessklearnmetricsclusterunsupervised.py", line 254, in calinski_harabaz_score intra_disp += np.sum((cluster_k - mean_k) ** 2) MemoryError File "C:UsersmaxAppDataLocalProgramsPythonPyt hon36libsite- packagessklearnmetricsclusterunsupervised.py", line 254, in calinski_harabaz_score intra_disp += np.sum((cluster_k - mean_k) ** 2) MemoryError https://stats.stackexchange.com/
  • 21. 21 QUESTIONS ? 17:45 - 18:30 Machine Learning II Art. Neural Network Best Book in my opinion: Mastering Machine Learning with Python in Six Steps A Practical Implementation Predictive Data Analytics 21