SlideShare a Scribd company logo
Data Design
                                                           2114.409: Creative Research Practice




HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
Reflection
Status Check



Concerns

 Programming

 What can we build




                     HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
Course Outline
1. Foundations                 3. Prototyping
Introduction                   Crawling
Survey Methods / Data Mining   Text Mining
Visualization and Analysis     To be determined (TBD)
Social Mechanics               Project Update




2. Methods                     4. Refinement
Creativity and Brainstorming   TBD x3
Prototyping                    Project Presentations
Project Management             Reflection
Last Week: Building Blocks
    Clustering



   Classification
   & Regression


   Association
     Rules


     Outlier
    Detection
                   HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
This Week: Systems




HTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
Data Mining Overview
How do I see and
                        Visualization, Storytelling
communicate answers?


What questions should
                        Design, Data Exploration
I ask of the data?

How do I clean and
                        Analysis Techniques
process the data?

How do I gather
                        Crawling, Surveys, UX Design
meaningful data?
Why might we prefer analysis?

         LABOR                       ACCURACY
Too many pictures to look at.   Can test for statistical
                                significance, etc.
Don’t know which are
interesting.                    Some patterns don’t
                                visualize easily.




                                         HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
Clustering
Find natural
groupings in
the data



Organize data into classes:

‣ high intra-class similarity
‣ low inter-class similarity
Clustering
         Input Data                  Output Clusters



  Points                                           Hard
                                              OR



    OR




                                       Soft
Similarities                                  OR




         [ # of clusters ]              Hierarchical
Classification               Regression




Learn to map objects to   Learn map objects to
categories                continuous variables
Classification
Observations    X   Learn         f(x) = y
Labels          Y
                     Y = gender


 Male




Female
                                       X = height
The Whole Process
                     Data Set
                                Featurization



                   Featurized

                  Random Split (e.g. 90/10)



Training Data                                   Test Data
       Training



   Model
                          Evaluation




                      Results
Association Rules
Learn interesting
relations in the data




                        = proportion of events in which X occurs
Anomaly Detection

          Detect strange
          events in the data


            Simplest measure:
What Can
                                                  We Build?




HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective Intelligence
Clicks,)      Likes,)      Updates,)   Ar,cles,)
Scrolls,)     Links,)      Reviews,)   Images,)
 Time)       Checkins)    Comments)     Video)




                   Collec,ve)            How can we harness the
                  Intelligence)
                                         activities of the world’s digital
                                         citizens to build new and
                                         useful consumer services?


                  Community)
Politics




The Korean elections are coming. How
does the Internet tell us more than
traditional polling ever could?
Politics




What issues are important?
Who are the influencers?
How can we segment/characterize support groups?
How do we spread our opinions more widely?
Who will win the election?
How can we build this?

 “Can social
media predict
  election
outcomes?”
 HTTP://WWW.USATODAY.COM/TECH/
 NEWS/STORY/2012-03-05/SOCIAL-
   SUPER-TUESDAY-PREDICTION/
          53374536/1
Tweet       Insert Magic
 Author
  Date         Here?
 Body
Retweets
Hashtags                                    Prediction
                                             Candidate
                                              Location
                          Classification &
Author      Clustering
                            Regression         Score
 Profile                                      Confidence
 Tweets
Favorites
Following
Followers   Association      Outlier
Location      Rules         Detection
Workshop
Sentiment +
                         Candidate              System Overview

Tweet Inputs



                                                         Correction based
                                      Scoring
                                                         on past elections



               Refinements




Author Inputs




                                                       RMSE Evaluation
Sentiment Detail
Input Observation   Feature Extractor



                                                          Classifier                 Output Label




                                                                                              Confusion Matrix
                                                                                                 Evaluation


                                        N-Gram Features




                                                                 Training Process



   Tweet + Label
Entertainment                                                              Food                                           Movements



            HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/       HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/         HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/




        Collaboration                                                   Shopping                                                        Travel



                    HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/       HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/      HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/




                Investing                                                Medicine                                                         Trust


HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/
           HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/   HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/    HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
Homework: Data Mining
1. Form groups!

2. Choose a Collective Intelligence topic from
   Lecture 1, or propose similar.

3. Make a list of data sources that might
   provide insights to that topic.

4. Propose a set of meaningful questions about
   the data based on your intuition.

5. How would you have to clean/process your
   data to start answering those questions?

6. Consider clustering, association rules,
   anomaly detection, classification. For each
   technique, how might you apply it to the
   data and what would it show?

7. Document your work and be prepared to
   present.
                                                 HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
Feedback

More Related Content

PDF
Data Mining
PDF
Personal Desire / Design Fiction
PDF
Open data INPS
PPTX
Minecraft 120331102247-phpapp01
PPT
Demystifying windowscommunicationfoundation
PDF
Class, where are we?
PPT
Ticfinal
DOCX
Neurological sysyem
Data Mining
Personal Desire / Design Fiction
Open data INPS
Minecraft 120331102247-phpapp01
Demystifying windowscommunicationfoundation
Class, where are we?
Ticfinal
Neurological sysyem

Similar to Data Design (20)

PDF
Andy Kirk Malofiej 20 Presentation
PDF
Dev and Ops Collaboration and Awareness at Etsy and Flickr
PDF
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
PPT
Introduction To Data Mining
PPT
Introduction To Data Mining
PPTX
JoTechies -Azure Machine Learning
PDF
Future of test automation tools & infrastructure
PPTX
التقنيات المستخدمة لتطوير المكتبات
PDF
Graph Algorithms for Developers
PPTX
Measuring the Networked Nonprofit Book Launch
PPTX
Sai kiran goud sem.ppt
PDF
2011/06/21 Microsoft Developer Day 2011—Design Decade
PPTX
South Big Data Hub: Text Data Analysis Panel
PPTX
Data Science Demystified
PPTX
The Next-Generation SharePoint: Powered by Text Analytics
PPTX
The Next Generation SharePoint: Powered by Text Analytics
PDF
How Can Analytics Improve Business?
PPTX
Classification
PPTX
Data imputation for unstructured dataset
PPT
Introduction to Data Mining
Andy Kirk Malofiej 20 Presentation
Dev and Ops Collaboration and Awareness at Etsy and Flickr
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
Introduction To Data Mining
Introduction To Data Mining
JoTechies -Azure Machine Learning
Future of test automation tools & infrastructure
التقنيات المستخدمة لتطوير المكتبات
Graph Algorithms for Developers
Measuring the Networked Nonprofit Book Launch
Sai kiran goud sem.ppt
2011/06/21 Microsoft Developer Day 2011—Design Decade
South Big Data Hub: Text Data Analysis Panel
Data Science Demystified
The Next-Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
How Can Analytics Improve Business?
Classification
Data imputation for unstructured dataset
Introduction to Data Mining
Ad

More from Michael Shilman (7)

PDF
Project Management
PDF
Controlled Experiments - Shengdong Zhao
PDF
Iterative Prototyping
PDF
Myoyoung Kim: Visual Storytelling, Infographics!
PPTX
Seungwon Hwang: Entity Graph Mining and Matching
PDF
Ignite Seoul: Machine Learning
PDF
Collective Intelligence Lecture 1: Introduction
Project Management
Controlled Experiments - Shengdong Zhao
Iterative Prototyping
Myoyoung Kim: Visual Storytelling, Infographics!
Seungwon Hwang: Entity Graph Mining and Matching
Ignite Seoul: Machine Learning
Collective Intelligence Lecture 1: Introduction
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
Mobile App Security Testing_ A Comprehensive Guide.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology

Data Design

  • 1. Data Design 2114.409: Creative Research Practice HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  • 2. Reflection Status Check Concerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  • 3. Course Outline 1. Foundations 3. Prototyping Introduction Crawling Survey Methods / Data Mining Text Mining Visualization and Analysis To be determined (TBD) Social Mechanics Project Update 2. Methods 4. Refinement Creativity and Brainstorming TBD x3 Prototyping Project Presentations Project Management Reflection
  • 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  • 6. Data Mining Overview How do I see and Visualization, Storytelling communicate answers? What questions should Design, Data Exploration I ask of the data? How do I clean and Analysis Techniques process the data? How do I gather Crawling, Surveys, UX Design meaningful data?
  • 7. Why might we prefer analysis? LABOR ACCURACY Too many pictures to look at. Can test for statistical significance, etc. Don’t know which are interesting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  • 8. Clustering Find natural groupings in the data Organize data into classes: ‣ high intra-class similarity ‣ low inter-class similarity
  • 9. Clustering Input Data Output Clusters Points Hard OR OR Soft Similarities OR [ # of clusters ] Hierarchical
  • 10. Classification Regression Learn to map objects to Learn map objects to categories continuous variables
  • 11. Classification Observations X Learn f(x) = y Labels Y Y = gender Male Female X = height
  • 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10) Training Data Test Data Training Model Evaluation Results
  • 13. Association Rules Learn interesting relations in the data = proportion of events in which X occurs
  • 14. Anomaly Detection Detect strange events in the data Simplest measure:
  • 15. What Can We Build? HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  • 16. Collective Intelligence Clicks,) Likes,) Updates,) Ar,cles,) Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  • 17. Politics The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?
  • 18. Politics What issues are important? Who are the influencers? How can we segment/characterize support groups? How do we spread our opinions more widely? Who will win the election?
  • 19. How can we build this? “Can social media predict election outcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  • 20. Tweet Insert Magic Author Date Here? Body Retweets Hashtags Prediction Candidate Location Classification & Author Clustering Regression Score Profile Confidence Tweets Favorites Following Followers Association Outlier Location Rules Detection
  • 22. Sentiment + Candidate System Overview Tweet Inputs Correction based Scoring on past elections Refinements Author Inputs RMSE Evaluation
  • 23. Sentiment Detail Input Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  • 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine Trust HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  • 25. Homework: Data Mining 1. Form groups! 2. Choose a Collective Intelligence topic from Lecture 1, or propose similar. 3. Make a list of data sources that might provide insights to that topic. 4. Propose a set of meaningful questions about the data based on your intuition. 5. How would you have to clean/process your data to start answering those questions? 6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show? 7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/