SlideShare a Scribd company logo
Advanced Data Analytics: Introduction

                Jeffrey Stanton
         School of Information Studies
             Syracuse University
Introduction to Advance Analytics Course
Kilo, Mega, Giga, Tera, Peta, Exa
                   Zetta = 1021 bytes
…An       organization                   Over 95% of the
employing       1,000                    digital universe is
knowledge workers                        "unstructured data" –
loses $5.7 million                       meaning its content
annually just in time                    can't       be      truly
wasted having to                         represented by its field
reformat information                     in a record, such as
as they move among                       name, address, or date
applications.     Not                    of last transaction. In
finding information                      organizations,
costs    that    same                    unstructured data
organization        an                   accounts for more than
additional $5.3m a                       80% of all
year.                                    information.

Source: IDC                              Source: IDC
Major sources of data

• Health-related services, e.g. benefits, medical analyses
• Business:
   – Walmart: 20 million transactions/day, 10 terabyte database
• Science:
   – NASA: 0.5+ terabytes per day per satellite
• Society and everyone: news, digital cameras, YouTube
• DOD and intelligence




                                                                  4
Analytics: Multiple Disciplines


             Database
            Technology               Statistics



 Machine                                          Visualization
 Learning                Analytics


    Pattern
  Recognition                                     Social
                         Computer                 Science
                          Science

                                                            5
Analytics: Multiple Skills
• Curiosity – Interest and intrinsic motivation to figure things
  out, ask why, and pursue solutions
• Skepticism – Seek simplicity and distrust it, go below the
  surface explanation of things, question all assumptions
• Writing – Communicate results, tell stories, convince others
  of the merits of your case
• Visual Reasoning – Develop and present visualizations that
  support your conclusions
• Statistics – Draw inferences from and summarize data to
  develop a case and a story
• Programming – Manipulate software tools to create a chain
  of provenance for data and analysis
                                                              6
Knowledge Development
                                    for Industry, Education,
                                     Government, Research


        Domain
        Experts                                                                   Infrastructure
                                                                                   Professionals
   Expertise in specific                  Information                              Rapid pace of
      subject areas                      Organization &
                                                                                  IT development
                                          Visualization

 Limited opportunity to                                                         Limited expertise in
 master technology skills      Information      Data             Solution
                                                                                   domain areas
                                 Analysis     Scientists        Integration


Proliferation of big data &
                                                                              Specialized knowledge of
     new technology
                                                                                 HW, FW, MW, SW
                                             Digital Curation

Need for knowledge and                                                            Communication
 information managers                                                               challenges


                            Transforming Data Into Decisions
Analytics: Key Steps
• Learn the application domain
• Locate or develop a data source or data set
• Clean and preprocess data: May take 60% of effort!
• Data reduction and transformation
   – Find useful pieces, squeeze out redundancies
• Choose analytical approaches
   – summarize, visualize, organize, describe, explore, find
     patterns, predict, test, infer
• Communicate the results and implications to data users
• Deploy discovered knowledge in a system
• Monitor and evaluate the effectiveness of the system
                                                               8

More Related Content

PPTX
Jeff's what isdatascience
PDF
The role of BI in content strategies
PPT
Knowledge Management Tools & Techniques
PPTX
Ai in Human Welfare and Knowledge Economy
PPTX
Knowledge management tools
PDF
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
PDF
Davis mark advanced search analytics in 20 minutes
PPTX
Big Data Forum - Phoenix
Jeff's what isdatascience
The role of BI in content strategies
Knowledge Management Tools & Techniques
Ai in Human Welfare and Knowledge Economy
Knowledge management tools
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Davis mark advanced search analytics in 20 minutes
Big Data Forum - Phoenix

What's hot (20)

PPTX
Analytics for actuaries cia
PPT
Preserving Knowledge: A multi-faceted Process
PPTX
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
PDF
Unlocking The Value Of Your Information
PPTX
Wikipedia (DBpedia): Crowdsourced Data Curation
PPTX
Data Curation at the New York Times
PPT
Knowledge manageability
PDF
HPE IDOL Technical Overview - july 2016
PPTX
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
PPTX
Matt McIlwain opening keynote
PPTX
The Role of Community-Driven Data Curation for Enterprises
PDF
Data-Centric Business Transformation Using Knowledge Graphs
PPTX
Big data and Artificial Intelligence
PDF
Keyrus US Information
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Left Brain, Right Brain: How to Unify Enterprise Analytics
PDF
Denodo’s Data Catalog: Bridging the Gap between Data and Business
PDF
Introduction to Modern Data Virtualization (US)
PPTX
3rd Socio-Cultural Data Summit
PPTX
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Analytics for actuaries cia
Preserving Knowledge: A multi-faceted Process
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Unlocking The Value Of Your Information
Wikipedia (DBpedia): Crowdsourced Data Curation
Data Curation at the New York Times
Knowledge manageability
HPE IDOL Technical Overview - july 2016
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
Matt McIlwain opening keynote
The Role of Community-Driven Data Curation for Enterprises
Data-Centric Business Transformation Using Knowledge Graphs
Big data and Artificial Intelligence
Keyrus US Information
Advanced Analytics and Machine Learning with Data Virtualization
Left Brain, Right Brain: How to Unify Enterprise Analytics
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Introduction to Modern Data Virtualization (US)
3rd Socio-Cultural Data Summit
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Ad

Viewers also liked (6)

PPT
Siop impact of social media
PPTX
What is Data Science
PPTX
Carma internet research module visual design issues
PPTX
Mining tweets for security information (rev 2)
PPTX
Reducing Response Burden
PPTX
Getting Started with R
Siop impact of social media
What is Data Science
Carma internet research module visual design issues
Mining tweets for security information (rev 2)
Reducing Response Burden
Getting Started with R
Ad

Similar to Introduction to Advance Analytics Course (20)

PDF
Dealing with Big Data: Planning for and Surviving the Petabyte Age
PPTX
Big Data Is Here - Now What?
PDF
Technology Strategies for Big Data Analytics,
PDF
Robert LeBlanc - Why Big Data? Why Now?
PDF
Using Big Data to create a data drive organization
PDF
IBM Stream au Hadoop User Group
PDF
Data Science - Poster - Kirk Borne - RDAP12
PPT
Big data meets big analytics
PDF
Ibm big data ibm marriage of hadoop and data warehousing
PPTX
Unlocking value in your (big) data
PDF
Smarter Computing Big Data
PPTX
Information Management and Analytics
PPTX
Big data and Analytics
PDF
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
PDF
Evaluating Big Data Predictive Analytics Platforms
PPT
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
PDF
Information Governance for Smarter Government Strategy and Solutions
PDF
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
PDF
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
PPTX
Asian Bankers Association, Manila Conference
Dealing with Big Data: Planning for and Surviving the Petabyte Age
Big Data Is Here - Now What?
Technology Strategies for Big Data Analytics,
Robert LeBlanc - Why Big Data? Why Now?
Using Big Data to create a data drive organization
IBM Stream au Hadoop User Group
Data Science - Poster - Kirk Borne - RDAP12
Big data meets big analytics
Ibm big data ibm marriage of hadoop and data warehousing
Unlocking value in your (big) data
Smarter Computing Big Data
Information Management and Analytics
Big data and Analytics
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Evaluating Big Data Predictive Analytics Platforms
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Information Governance for Smarter Government Strategy and Solutions
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
Asian Bankers Association, Manila Conference

More from Syracuse University (20)

PPTX
Discovery informaticsstanton
PPTX
Basic SEVIS Overview for U.S. University Faculty
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
PPTX
Chapter9 r studio2
PPTX
Basic Overview of Data Mining
PPTX
Strategic planning
PPTX
Carma internet research module scale development
PPTX
Carma internet research module getting started with question pro
PPTX
Basic Graphics with R
PPTX
R-Studio Vs. Rcmdr
PPTX
Moving Data to and From R
PPTX
Installing R and R-Studio
PPTX
PACIS Survey Workshop
PPTX
Carma internet research module: Future data collection
PPTX
Carma internet research module: Sampling for internet
PPTX
Carma internet research module: Encouraging responding
PPTX
Carma internet research module: Survey reduction
PPTX
Carma internet research module: Research design catalog
PPTX
Stanton eScience Presentation
PPTX
Carma internet research module detecting bad data
Discovery informaticsstanton
Basic SEVIS Overview for U.S. University Faculty
Why R? A Brief Introduction to the Open Source Statistics Platform
Chapter9 r studio2
Basic Overview of Data Mining
Strategic planning
Carma internet research module scale development
Carma internet research module getting started with question pro
Basic Graphics with R
R-Studio Vs. Rcmdr
Moving Data to and From R
Installing R and R-Studio
PACIS Survey Workshop
Carma internet research module: Future data collection
Carma internet research module: Sampling for internet
Carma internet research module: Encouraging responding
Carma internet research module: Survey reduction
Carma internet research module: Research design catalog
Stanton eScience Presentation
Carma internet research module detecting bad data

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Insiders guide to clinical Medicine.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Classroom Observation Tools for Teachers
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Business Ethics Teaching Materials for college
Complications of Minimal Access Surgery at WLH
Microbial diseases, their pathogenesis and prophylaxis
Insiders guide to clinical Medicine.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Pharma ospi slides which help in ospi learning
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Week 4 Term 3 Study Techniques revisited.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O7-L3 Supply Chain Operations - ICLT Program
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Classroom Observation Tools for Teachers
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Business Ethics Teaching Materials for college

Introduction to Advance Analytics Course

  • 1. Advanced Data Analytics: Introduction Jeffrey Stanton School of Information Studies Syracuse University
  • 3. Kilo, Mega, Giga, Tera, Peta, Exa Zetta = 1021 bytes …An organization Over 95% of the employing 1,000 digital universe is knowledge workers "unstructured data" – loses $5.7 million meaning its content annually just in time can't be truly wasted having to represented by its field reformat information in a record, such as as they move among name, address, or date applications. Not of last transaction. In finding information organizations, costs that same unstructured data organization an accounts for more than additional $5.3m a 80% of all year. information. Source: IDC Source: IDC
  • 4. Major sources of data • Health-related services, e.g. benefits, medical analyses • Business: – Walmart: 20 million transactions/day, 10 terabyte database • Science: – NASA: 0.5+ terabytes per day per satellite • Society and everyone: news, digital cameras, YouTube • DOD and intelligence 4
  • 5. Analytics: Multiple Disciplines Database Technology Statistics Machine Visualization Learning Analytics Pattern Recognition Social Computer Science Science 5
  • 6. Analytics: Multiple Skills • Curiosity – Interest and intrinsic motivation to figure things out, ask why, and pursue solutions • Skepticism – Seek simplicity and distrust it, go below the surface explanation of things, question all assumptions • Writing – Communicate results, tell stories, convince others of the merits of your case • Visual Reasoning – Develop and present visualizations that support your conclusions • Statistics – Draw inferences from and summarize data to develop a case and a story • Programming – Manipulate software tools to create a chain of provenance for data and analysis 6
  • 7. Knowledge Development for Industry, Education, Government, Research Domain Experts Infrastructure Professionals Expertise in specific Information Rapid pace of subject areas Organization & IT development Visualization Limited opportunity to Limited expertise in master technology skills Information Data Solution domain areas Analysis Scientists Integration Proliferation of big data & Specialized knowledge of new technology HW, FW, MW, SW Digital Curation Need for knowledge and Communication information managers challenges Transforming Data Into Decisions
  • 8. Analytics: Key Steps • Learn the application domain • Locate or develop a data source or data set • Clean and preprocess data: May take 60% of effort! • Data reduction and transformation – Find useful pieces, squeeze out redundancies • Choose analytical approaches – summarize, visualize, organize, describe, explore, find patterns, predict, test, infer • Communicate the results and implications to data users • Deploy discovered knowledge in a system • Monitor and evaluate the effectiveness of the system 8

Editor's Notes

  • #3: Facebook friend connections worldwide, a network diagram of the Enron email set, a comparison of similar gene sequences between humans, chimps, and macaques
  • #8: HW, FW, MW, SW: Hardware Firmware Middleware Software