SlideShare a Scribd company logo
Amu Prabhjot Singh 10BM60011
 Divya Hamirwasia 10BM60025
   an interactive data transformation tool
    developed by the Stanford Visualization
    Group.
   allows direct manipulation of visual data
   provides automatic suggestions for relevant
    transformations
   used in activities like reformatting data values
    and formats, integrating data from multiple
    sources, missing values etc
   use of Wrangler reduces the specification
    time significantly
   When the user selects any data, applicable transformations are
    suggested by the tool based on the current context of interaction
   Data wrangler uses a modeling technique to enumerate and rate the
    possible transformations
   This model combines user's inputs with diversity, frequency and
    specification difficulty of applicable transform types
   Wrangler provides short natural language descriptions of the
    transforms and also provides the visual previews of the transform
    results
   This helps analysts to assess the viable transforms quickly
   Wrangler's interactive history viewer records and shows the step of
    transforms applied on the data set so as to facilitate reuse.
   Wrangler scripts can be run in a web browser using JavaScript or
    Python code
   underlying declarative data transformation language
   language consists of 8 classes of transformations
    ◦ Map
         One to zero
         One to One
         One to Many
    ◦ Look ups and Joins
    ◦ Reshape
         Fold
         unfold
    ◦ Positional
         Fill
         Lag
    ◦    Sorting
    ◦    Aggregation
    ◦   Key Generation
    ◦   Schema Transforms
   This is the example data available with data
    wrangler.
   House crime data from the U.S. Bureau of
    Justice Statistics
   Csv format data
User interactions

                                        Inferring transform
 Current working                            parameters
    transform

                                       Generating candidate
                       DATA WRANGLER       transforms
 Data descriptions

                                        Ranking the results

Corpus of historical
  usage statistics
   GETTING STARTED
    ◦ Browser based tool: http://vis.stanford.edu/wrangler/
   DATA ENTRY
    ◦ copy and paste the data to be wrangled into the input window.
    ◦ Input format : csv files, tsv files and manual entry
   TRANSFORMS
     • Cut                              • Merge
     • Delete                           • Promote
     • Drop                             • Split
     • Edit                             • Translate
     • Extract                          • Transpose
     • Fill                             • Unfold
     • Fold
   OUTPUT
    Two types of outputs:
    ◦ Data Output.xlsx
       Csv, tsv, row oriented JSON, column oriented JSON, look up tables
    ◦ Script
       Python, java script
   helps to speed up the process of data
    manipulation
   helps managers to spend more time analyzing
    and learning from their data rather than
    spending much of the time just rearranging it
   allows interactive transformation of messy, real-
    world data and export data for use in
    Excel, R, Tableau, Protovis etc
   LIMITATION: data containing more than 40
    columns and 1000 rows cannot be wrangled

More Related Content

PPT
Get me my data !
PDF
Intro to open refine
PDF
Visualising Data on Interactive Maps
PDF
Notes from the Library Juice Academy courses on XPath, XSLT, and XQuery: Univ...
PDF
Library Linked Data in Latvia - #LIBER2014 poster
PPTX
Bose Corporation
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
PDF
Building Custom Big Data Integrations
Get me my data !
Intro to open refine
Visualising Data on Interactive Maps
Notes from the Library Juice Academy courses on XPath, XSLT, and XQuery: Univ...
Library Linked Data in Latvia - #LIBER2014 poster
Bose Corporation
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Building Custom Big Data Integrations

Similar to DataWrangler @VGSOM (20)

PDF
Scalable And Incremental Data Profiling With Spark
PPT
Potter’S Wheel
PPTX
Sharing a Startup’s Big Data Lessons
PPT
DA_MAP
PPT
Document Databases & RavenDB
PPTX
Fyp presentation 2 (SQL Converter)
PPTX
Big data meet_up_08042016
PDF
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
PPTX
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PPTX
20160317 - PAZUR - PowerBI & R
PPTX
Netflix Edge Engineering Open House Presentations - June 9, 2016
PDF
Apache Spark Streaming
PDF
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
PPT
Wrangler
PPTX
U-SQL - Azure Data Lake Analytics for Developers
PDF
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
PDF
Analyzing Semi-Structured Data At Volume In The Cloud
PDF
xGem Data Stream Processing
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
Scalable And Incremental Data Profiling With Spark
Potter’S Wheel
Sharing a Startup’s Big Data Lessons
DA_MAP
Document Databases & RavenDB
Fyp presentation 2 (SQL Converter)
Big data meet_up_08042016
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
20160317 - PAZUR - PowerBI & R
Netflix Edge Engineering Open House Presentations - June 9, 2016
Apache Spark Streaming
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Wrangler
U-SQL - Azure Data Lake Analytics for Developers
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
Analyzing Semi-Structured Data At Volume In The Cloud
xGem Data Stream Processing
[WSO2Con EU 2018] The Rise of Streaming SQL
Ad

Recently uploaded (20)

PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
RMMM.pdf make it easy to upload and study
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Classroom Observation Tools for Teachers
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Introduction to Building Materials
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
Trump Administration's workforce development strategy
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
1_English_Language_Set_2.pdf probationary
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Final Presentation General Medicine 03-08-2024.pptx
RMMM.pdf make it easy to upload and study
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Complications of Minimal Access Surgery at WLH
Paper A Mock Exam 9_ Attempt review.pdf.
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Classroom Observation Tools for Teachers
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Introduction to Building Materials
A powerpoint presentation on the Revised K-10 Science Shaping Paper
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Weekly quiz Compilation Jan -July 25.pdf
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Trump Administration's workforce development strategy
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
1_English_Language_Set_2.pdf probationary
Ad

DataWrangler @VGSOM

  • 1. Amu Prabhjot Singh 10BM60011 Divya Hamirwasia 10BM60025
  • 2. an interactive data transformation tool developed by the Stanford Visualization Group.  allows direct manipulation of visual data  provides automatic suggestions for relevant transformations  used in activities like reformatting data values and formats, integrating data from multiple sources, missing values etc  use of Wrangler reduces the specification time significantly
  • 3. When the user selects any data, applicable transformations are suggested by the tool based on the current context of interaction  Data wrangler uses a modeling technique to enumerate and rate the possible transformations  This model combines user's inputs with diversity, frequency and specification difficulty of applicable transform types  Wrangler provides short natural language descriptions of the transforms and also provides the visual previews of the transform results  This helps analysts to assess the viable transforms quickly  Wrangler's interactive history viewer records and shows the step of transforms applied on the data set so as to facilitate reuse.  Wrangler scripts can be run in a web browser using JavaScript or Python code
  • 4. underlying declarative data transformation language  language consists of 8 classes of transformations ◦ Map  One to zero  One to One  One to Many ◦ Look ups and Joins ◦ Reshape  Fold  unfold ◦ Positional  Fill  Lag ◦ Sorting ◦ Aggregation ◦ Key Generation ◦ Schema Transforms
  • 5. This is the example data available with data wrangler.  House crime data from the U.S. Bureau of Justice Statistics  Csv format data
  • 6. User interactions Inferring transform Current working parameters transform Generating candidate DATA WRANGLER transforms Data descriptions Ranking the results Corpus of historical usage statistics
  • 7. GETTING STARTED ◦ Browser based tool: http://vis.stanford.edu/wrangler/  DATA ENTRY ◦ copy and paste the data to be wrangled into the input window. ◦ Input format : csv files, tsv files and manual entry  TRANSFORMS • Cut • Merge • Delete • Promote • Drop • Split • Edit • Translate • Extract • Transpose • Fill • Unfold • Fold  OUTPUT Two types of outputs: ◦ Data Output.xlsx  Csv, tsv, row oriented JSON, column oriented JSON, look up tables ◦ Script  Python, java script
  • 8. helps to speed up the process of data manipulation  helps managers to spend more time analyzing and learning from their data rather than spending much of the time just rearranging it  allows interactive transformation of messy, real- world data and export data for use in Excel, R, Tableau, Protovis etc  LIMITATION: data containing more than 40 columns and 1000 rows cannot be wrangled