SlideShare a Scribd company logo
Prof. Pier Luca Lanzi
Course Introduction
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Data Mining and Text Mining
•  Prof. Pier Luca Lanzi
Dipartimento di Elettronica, Informazione
e Bioingegneria
pierluca.lanzi@polimi.it
voice: 02 23993472
http://www.deib.polimi.it/people/lanzi
•  Office Hours
Wednesday, from 14:30 until 16:00
2
Prof. Pier Luca Lanzi
Course Structure
•  Introduction to basic Data Mining and Text Mining methods
(24 hours)
•  Advaced Techniques and Applications
(16 hours)
•  Final Project will involve an application to real-world data
3
Prof. Pier Luca Lanzi
Course Outline
•  What is Data Mining?
•  Data and knowledge representation
•  Data exploration and preparation
•  Data Mining tasks
§ Associations
§ Clustering
§ Classification
•  Advanced techniques and applications
§ Text Mining
§ Graph Mining
§ Data Streams
4
Prof. Pier Luca Lanzi
syllabus
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Course Material
•  “Data Mining and Analysis: Fundamental Concepts and Algorithms,”
Mohammed Zaki and Wagner Meira Jr. Cambridge University Press in 2014.
http://www.dataminingbook.info
•  “Mining of Massive Datasets Book,” by A. Rajaraman, J. Ullman.
http://www.mmds.org
•  Course slides available on BEEP and articles distributed during the course
•  Software
§ R  Rstudio (http://www.rstudio.com)
§ Python/IPython (numpy, scipy, scikit, etc.)
§ BigML (http://www.bigml.com)
§ Rapid Miner/Weka (http://rapid-i.com/)
7
Prof. Pier Luca Lanzi
Additional Material
•  “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,”
Second Edition, February 2009, Trevor Hastie, Robert Tibshirani, Jerome
Friedman (http://statweb.stanford.edu/~tibs/ElemStatLearn/)
•  “An Introduction to Data Science,” Jeffrey Stanton
https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf
•  Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations” 2nd Edition.
8
Prof. Pier Luca Lanzi
R Help Websites
•  Quick-R
http://www.statmethods.net
•  R Cookbook
http://www.cookbook-r.com
•  R Bloggers
http://www.r-bloggers.com
•  R on Stackoverflow
http://stackoverflow.com/tags/r/info
•  Google R Styleguide
https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
9
Prof. Pier Luca Lanzihttp://www.kdnuggets.com/2012/08/poll-analytics-data-mining-programming-languages.html
Prof. Pier Luca Lanzi
http://xkcd.com/353/
Prof. Pier Luca Lanzi
Evaluation
•  May 2015 First Midterm (15 points)
•  June 2015 Second Midterm (18 points)
•  July 2015 Full exam for those who failed midterms
12
Prof. Pier Luca Lanzi
Challenges and exercises might be proposed
during the course to substitute part of the written exam
There is also another way to pass the exam
http://www.kaggle.com
http://www.drivendata.org/
http://tunedit.org/data-competitions

More Related Content

PDF
DMTM 2015 - 02 Data Mining
PPTX
JCDL 2015 Tutorial Opening Slides
PPTX
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
PDF
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
PPTX
Keystone summer school 2015 paolo-missier-provenance
PPT
The Semantic Web: 2010 Update
PDF
Search, Exploration and Analytics of Evolving Data
PDF
Artificial Intelligence and the Coming Revolution of Family History - Handout
DMTM 2015 - 02 Data Mining
JCDL 2015 Tutorial Opening Slides
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Keystone summer school 2015 paolo-missier-provenance
The Semantic Web: 2010 Update
Search, Exploration and Analytics of Evolving Data
Artificial Intelligence and the Coming Revolution of Family History - Handout

What's hot (19)

PDF
Iiif to go iiif vatican (7 minutes)
PDF
The Coming Explosion of Records at FamilySearch - Presentation
PDF
The Coming Explosion of Records at FamilySearch Syllabus
PPTX
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
PPTX
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
PDF
Agora User Committee Meeting 2013
PPTX
Publishing and Using Linked Open Data - Day 1
KEY
WebART in 10 minutes
PDF
Escaping Datageddon
PDF
Open-Content Text Corpus for African languages
PDF
Risk management and auditing
PPTX
PPTX
Data Management 101 (2015)
PPTX
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
PPTX
NCURA Webinar on Open Data
PPTX
Urban Data Science at UW
PDF
A Peek Under the Hood at FamilySearch - Presentation
PPTX
Thomas_Kanke_SFA_2014_prez
Iiif to go iiif vatican (7 minutes)
The Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch Syllabus
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Agora User Committee Meeting 2013
Publishing and Using Linked Open Data - Day 1
WebART in 10 minutes
Escaping Datageddon
Open-Content Text Corpus for African languages
Risk management and auditing
Data Management 101 (2015)
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
NCURA Webinar on Open Data
Urban Data Science at UW
A Peek Under the Hood at FamilySearch - Presentation
Thomas_Kanke_SFA_2014_prez
Ad

Viewers also liked (20)

PDF
DMTM 2015 - 03 Data Representation
PDF
DMTM 2015 - 10 Introduction to Classification
PDF
DMTM 2015 - 05 Association Rules
PDF
DMTM 2015 - 15 Classification Ensembles
PDF
DMTM 2015 - 16 Data Preparation
PDF
DMTM 2015 - 08 Representative-Based Clustering
PDF
DMTM 2015 - 14 Evaluation of Classification Models
PDF
DMTM 2015 - 17 Text Mining Part 1
PDF
DMTM 2015 - 11 Decision Trees
PDF
DMTM 2015 - 12 Classification Rules
PDF
DMTM 2015 - 07 Hierarchical Clustering
PDF
DMTM 2015 - 09 Density Based Clustering
PDF
Course Introduction
PDF
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
PDF
DMTM 2015 - 19 Graph Mining
PDF
Focus Junior - 14 Maggio 2016
PDF
Machine Learning and Data Mining: 12 Classification Rules
PDF
Course Organization
PDF
DMTM 2015 - 06 Introduction to Clustering
PDF
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 03 Data Representation
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 05 Association Rules
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 09 Density Based Clustering
Course Introduction
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 19 Graph Mining
Focus Junior - 14 Maggio 2016
Machine Learning and Data Mining: 12 Classification Rules
Course Organization
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 18 Text Mining Part 2
Ad

Similar to DMTM 2015 - 01 Course Introduction (20)

PDF
DMTM Lecture 01 Introduction
PDF
STI2 Board Meeting 2011 - ESWC
PPTX
introintrointrointrointrointrointrointro
PPTX
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
PPTX
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
PDF
Accelerating New Materials Design with Supercomputing and Machine Learning
PPTX
Eurodidaweb july 2012 synthetic report
PPTX
Open Science and Open Data for Librarians
PPTX
Implementing Archivematica, research data network
PDF
Presentation - First International Library Staff Exchange Week, Zagreb
PDF
S1 intro
PDF
RDAP 15: Research Data Integration in the Purdue Libraries
PPTX
NSTA - Going Beyond Data Collection
PDF
Introduction_Data_Mining_BasicConcepts.pdf
PPTX
Open Science: Tools and platforms
PPTX
2016 09-28 social network analysis with node-xl_emke
PDF
Lcwebinar rise of-the_databrarian_73961
PPTX
Eurodidaweb march2013 syntheticreport
PPTX
Eurodidaweb march2014 syntheticreport
PDF
Jupyter: A Gateway for Scientific Collaboration and Education
DMTM Lecture 01 Introduction
STI2 Board Meeting 2011 - ESWC
introintrointrointrointrointrointrointro
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
Accelerating New Materials Design with Supercomputing and Machine Learning
Eurodidaweb july 2012 synthetic report
Open Science and Open Data for Librarians
Implementing Archivematica, research data network
Presentation - First International Library Staff Exchange Week, Zagreb
S1 intro
RDAP 15: Research Data Integration in the Purdue Libraries
NSTA - Going Beyond Data Collection
Introduction_Data_Mining_BasicConcepts.pdf
Open Science: Tools and platforms
2016 09-28 social network analysis with node-xl_emke
Lcwebinar rise of-the_databrarian_73961
Eurodidaweb march2013 syntheticreport
Eurodidaweb march2014 syntheticreport
Jupyter: A Gateway for Scientific Collaboration and Education

More from Pier Luca Lanzi (20)

PDF
11 Settembre 2021 - Giocare con i Videogiochi
PDF
Breve Viaggio al Centro dei Videogiochi
PDF
Global Game Jam 19 @ POLIMI - Morning Welcome
PPTX
Data Driven Game Design @ Campus Party 2018
PDF
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
PDF
GGJ18 al Politecnico di Milano - Presentazione di apertura
PDF
Presentation for UNITECH event - January 8, 2018
PDF
DMTM Lecture 20 Data preparation
PDF
DMTM Lecture 19 Data exploration
PDF
DMTM Lecture 18 Graph mining
PDF
DMTM Lecture 17 Text mining
PDF
DMTM Lecture 16 Association rules
PDF
DMTM Lecture 15 Clustering evaluation
PDF
DMTM Lecture 14 Density based clustering
PDF
DMTM Lecture 13 Representative based clustering
PDF
DMTM Lecture 12 Hierarchical clustering
PDF
DMTM Lecture 11 Clustering
PDF
DMTM Lecture 10 Classification ensembles
PDF
DMTM Lecture 09 Other classificationmethods
PDF
DMTM Lecture 08 Classification rules
11 Settembre 2021 - Giocare con i Videogiochi
Breve Viaggio al Centro dei Videogiochi
Global Game Jam 19 @ POLIMI - Morning Welcome
Data Driven Game Design @ Campus Party 2018
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione di apertura
Presentation for UNITECH event - January 8, 2018
DMTM Lecture 20 Data preparation
DMTM Lecture 19 Data exploration
DMTM Lecture 18 Graph mining
DMTM Lecture 17 Text mining
DMTM Lecture 16 Association rules
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 14 Density based clustering
DMTM Lecture 13 Representative based clustering
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 11 Clustering
DMTM Lecture 10 Classification ensembles
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 08 Classification rules

Recently uploaded (20)

PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Basic Mud Logging Guide for educational purpose
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Types and Its function , kingdom of life
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Pharma ospi slides which help in ospi learning
PDF
Insiders guide to clinical Medicine.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pre independence Education in Inndia.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial diseases, their pathogenesis and prophylaxis
Basic Mud Logging Guide for educational purpose
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Types and Its function , kingdom of life
2.FourierTransform-ShortQuestionswithAnswers.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial disease of the cardiovascular and lymphatic systems
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GDM (1) (1).pptx small presentation for students
Pharma ospi slides which help in ospi learning
Insiders guide to clinical Medicine.pdf
PPH.pptx obstetrics and gynecology in nursing
Abdominal Access Techniques with Prof. Dr. R K Mishra
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf

DMTM 2015 - 01 Course Introduction

  • 1. Prof. Pier Luca Lanzi Course Introduction Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi Data Mining and Text Mining •  Prof. Pier Luca Lanzi Dipartimento di Elettronica, Informazione e Bioingegneria pierluca.lanzi@polimi.it voice: 02 23993472 http://www.deib.polimi.it/people/lanzi •  Office Hours Wednesday, from 14:30 until 16:00 2
  • 3. Prof. Pier Luca Lanzi Course Structure •  Introduction to basic Data Mining and Text Mining methods (24 hours) •  Advaced Techniques and Applications (16 hours) •  Final Project will involve an application to real-world data 3
  • 4. Prof. Pier Luca Lanzi Course Outline •  What is Data Mining? •  Data and knowledge representation •  Data exploration and preparation •  Data Mining tasks § Associations § Clustering § Classification •  Advanced techniques and applications § Text Mining § Graph Mining § Data Streams 4
  • 5. Prof. Pier Luca Lanzi syllabus
  • 7. Prof. Pier Luca Lanzi Course Material •  “Data Mining and Analysis: Fundamental Concepts and Algorithms,” Mohammed Zaki and Wagner Meira Jr. Cambridge University Press in 2014. http://www.dataminingbook.info •  “Mining of Massive Datasets Book,” by A. Rajaraman, J. Ullman. http://www.mmds.org •  Course slides available on BEEP and articles distributed during the course •  Software § R Rstudio (http://www.rstudio.com) § Python/IPython (numpy, scipy, scikit, etc.) § BigML (http://www.bigml.com) § Rapid Miner/Weka (http://rapid-i.com/) 7
  • 8. Prof. Pier Luca Lanzi Additional Material •  “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” Second Edition, February 2009, Trevor Hastie, Robert Tibshirani, Jerome Friedman (http://statweb.stanford.edu/~tibs/ElemStatLearn/) •  “An Introduction to Data Science,” Jeffrey Stanton https://ischool.syr.edu/media/documents/2012/3/DataScienceBook1_1.pdf •  Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” 2nd Edition. 8
  • 9. Prof. Pier Luca Lanzi R Help Websites •  Quick-R http://www.statmethods.net •  R Cookbook http://www.cookbook-r.com •  R Bloggers http://www.r-bloggers.com •  R on Stackoverflow http://stackoverflow.com/tags/r/info •  Google R Styleguide https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml 9
  • 10. Prof. Pier Luca Lanzihttp://www.kdnuggets.com/2012/08/poll-analytics-data-mining-programming-languages.html
  • 11. Prof. Pier Luca Lanzi http://xkcd.com/353/
  • 12. Prof. Pier Luca Lanzi Evaluation •  May 2015 First Midterm (15 points) •  June 2015 Second Midterm (18 points) •  July 2015 Full exam for those who failed midterms 12
  • 13. Prof. Pier Luca Lanzi Challenges and exercises might be proposed during the course to substitute part of the written exam There is also another way to pass the exam http://www.kaggle.com http://www.drivendata.org/ http://tunedit.org/data-competitions