SlideShare a Scribd company logo
Data Mining and Feeder Attribute Analysis Interns Charles Naut Lawrence Ng Columbia University’s Center for Computational Learning Systems
Transformer Attribute Time Series 7 years worth of data from Con Edison databases Attributes for transformers: temperature, phase voltage, and phase load B phase load for one week - 168 data points (one point per hour) 41.9 61 40.9 60 … … … … 18.9 1 19.2 128 19.3 127 18.6 2 Load Hour
Piecewise Aggregate Approximation (PAA ) Symbolic Aggregate Approximation (SAX) PAA 1 Time series is divided into equally sized frames Value of a frame is the average of data falling in that frame Reduces dimensionality of time series Lower bounding of Euclidean Distance SAX 2 Discretizes time series data  PAA values are given symbols based on calculated breakpoints  Retains reduced dimensionality of PAA Allows for lower bounding of Euclidean Distance 1- E. Keogh, J. Lin & A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.  2- B. Yi, & C. Faloutsos. Fast time sequence indexing for arbitrary lp norms. In Proc. of the 26th Int'l Conf. on Very Large Databases. pp 385-394, 2000.
SAX Conversion Process Steps: Attain raw time series Normalize time series Convert to PAA format Convert to a SAX string Result: cbbbddaa
SAX Goals  Detect abnormalities among time series data from transformers by comparing differences in SAX strings of baseline data to SAX strings of live data Predict when a transformer will fail by using dynamic (time series) data to indicate how stressed it is B phase loads for two feeders  Top: caaadddb - Normal Bottom: cbbbddaa - Failure
Tarzan and Hot SAX Methods for finding time series discords Tarzan 3   Detects novel time series patterns Novelty based on expected pattern frequency Hot SAX 4   Finds patterns most unlike others Aids in clustering and discovery of motifs 3- S.Lonardi, J. Lin, E. Keogh & B. Chiu (2007). Efficient Discovery of Unusual Patterns in Time Series. Special Issue of New Generation Computing Journal. To Appear. 4- E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005. 
Feeder Attribute Analysis Determine the similarity of feeders and create a dendrogram displaying the information Paired feeders allow to better study “treatments”, such as Hipots, statistically Build on SQL queries previously written
Dendrogram of Distances Between Feeders Based on Feeder Attributes
Pairing of Feeders List of feeders and companions based on increasing coverage area.
Accomplishments ABF dynamic attribute on feeder outages SAX strings based on time series from transformer RMS data Matlab implementation of Hot SAX algorithm Introduction of a new method for machine learning on feeders and their components. Dendrogram with information of the distance between feeders based on feeder attributes SQL queries creating feeder pairs
Growth  Gained experience with: Matlab, SQL, Python, R, Unix, and Microsoft Office Acquired new knowledge of SAX, machine learning, data mining, pattern recognition, databases, suffix trees, and time series Learned about the Con Edison Distribution System and its relevance to our work Developed good work habits and communication skills
Special Thanks Albert Boulanger Ansaf Salleb-Aouissi Phillip Gross Roger Anderson Leon Bukhman Eugene Klitenik

More Related Content

PPTX
Slide 1
PPTX
Linked Sensor Data cube
PPTX
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
PPTX
Finalprojectpresentation
PDF
TGS GPS- Russian well database
 
PPTX
Cvp With Jpg Pics
PPTX
Temporal Pattern Mining
Slide 1
Linked Sensor Data cube
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
Finalprojectpresentation
TGS GPS- Russian well database
 
Cvp With Jpg Pics
Temporal Pattern Mining

What's hot (20)

PDF
What we do to improve scalability in our RDF processing system
PPTX
The lifecycle of reproducible science data and what provenance has got to do ...
PPTX
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
PDF
Comaskey_William_Poster_SULI_FALL_2014
PPTX
Spatiotemporal Representation Data in R
PPTX
Project Matsu: Elastic Clouds for Disaster Relief
PPTX
Preserving the currency of analytics outcomes over time through selective re-...
PPTX
Bioclouds CAMDA (Robert Grossman) 09-v9p
PPTX
OCC Overview OMG Clouds Meeting 07-13-09 v3
PDF
SHARE Notification Service, October 2014
PDF
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
PPTX
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
PPT
2003-11-02 Combined Aerosol Trajectory Tool, CATT
PDF
Moa: Real Time Analytics for Data Streams
PPTX
ReComp: challenges in selective recomputation of (expensive) data analytics t...
PPTX
829 tdwg-2015-nicolson-kew-strings-to-things
PDF
Sentiment Knowledge Discovery in Twitter Streaming Data
PPTX
paleofire R
PPTX
XL-Miner: Timeseries
What we do to improve scalability in our RDF processing system
The lifecycle of reproducible science data and what provenance has got to do ...
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Comaskey_William_Poster_SULI_FALL_2014
Spatiotemporal Representation Data in R
Project Matsu: Elastic Clouds for Disaster Relief
Preserving the currency of analytics outcomes over time through selective re-...
Bioclouds CAMDA (Robert Grossman) 09-v9p
OCC Overview OMG Clouds Meeting 07-13-09 v3
SHARE Notification Service, October 2014
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
2003-11-02 Combined Aerosol Trajectory Tool, CATT
Moa: Real Time Analytics for Data Streams
ReComp: challenges in selective recomputation of (expensive) data analytics t...
829 tdwg-2015-nicolson-kew-strings-to-things
Sentiment Knowledge Discovery in Twitter Streaming Data
paleofire R
XL-Miner: Timeseries
Ad

Similar to CCLS Internship Presentation (20)

PPT
IGARSS2011-I-Ling.ppt
PPTX
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
PPTX
Imgc2011 bioinformatics tutorial
PPT
Mayank
PPT
Real Time Geodemographics
DOCX
Distributed processing of probabilistic top k queries in wireless sensor netw...
PPTX
The Other HPC: High Productivity Computing in Polystore Environments
PPTX
RAMSES: Robust Analytic Models for Science at Extreme Scales
PPTX
Scientific
PDF
4-SequenceTimeSeries02.pdf
PDF
IEEE Datamining 2016 Title and Abstract
PDF
The Interplay of Workflow Execution and Resource Provisioning
DOCX
Spatial approximate string search
DOCX
Spatial approximate string search
DOCX
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT Spatial approximate string search
PDF
A Survey of Sequential Rule Mining Techniques
DOCX
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
DOCX
Harvesting aware energy management for time-critical wireless sensor networks
PDF
Maria Patterson - Building a community fountain around your data stream
DOCX
JAVA 2013 IEEE DATAMINING PROJECT Distributed processing of probabilistic top...
IGARSS2011-I-Ling.ppt
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
Imgc2011 bioinformatics tutorial
Mayank
Real Time Geodemographics
Distributed processing of probabilistic top k queries in wireless sensor netw...
The Other HPC: High Productivity Computing in Polystore Environments
RAMSES: Robust Analytic Models for Science at Extreme Scales
Scientific
4-SequenceTimeSeries02.pdf
IEEE Datamining 2016 Title and Abstract
The Interplay of Workflow Execution and Resource Provisioning
Spatial approximate string search
Spatial approximate string search
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT Spatial approximate string search
A Survey of Sequential Rule Mining Techniques
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
Harvesting aware energy management for time-critical wireless sensor networks
Maria Patterson - Building a community fountain around your data stream
JAVA 2013 IEEE DATAMINING PROJECT Distributed processing of probabilistic top...
Ad

More from Charles Naut (7)

PPTX
Mobile Technology in the Developing World
PPTX
Inside the Human Speechome
PPTX
From MemChu to Y2E2: How Stanford's Buildings Affect science
PPTX
Reverse Turing Test
PPTX
XNA: Creating Creators
PPT
Macedonian Election Monitoring System
PPT
Morgan Stanley Fixed Income Internship Presentation
Mobile Technology in the Developing World
Inside the Human Speechome
From MemChu to Y2E2: How Stanford's Buildings Affect science
Reverse Turing Test
XNA: Creating Creators
Macedonian Election Monitoring System
Morgan Stanley Fixed Income Internship Presentation

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Getting Started with Data Integration: FME Form 101
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf

CCLS Internship Presentation

  • 1. Data Mining and Feeder Attribute Analysis Interns Charles Naut Lawrence Ng Columbia University’s Center for Computational Learning Systems
  • 2. Transformer Attribute Time Series 7 years worth of data from Con Edison databases Attributes for transformers: temperature, phase voltage, and phase load B phase load for one week - 168 data points (one point per hour) 41.9 61 40.9 60 … … … … 18.9 1 19.2 128 19.3 127 18.6 2 Load Hour
  • 3. Piecewise Aggregate Approximation (PAA ) Symbolic Aggregate Approximation (SAX) PAA 1 Time series is divided into equally sized frames Value of a frame is the average of data falling in that frame Reduces dimensionality of time series Lower bounding of Euclidean Distance SAX 2 Discretizes time series data PAA values are given symbols based on calculated breakpoints Retains reduced dimensionality of PAA Allows for lower bounding of Euclidean Distance 1- E. Keogh, J. Lin & A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.  2- B. Yi, & C. Faloutsos. Fast time sequence indexing for arbitrary lp norms. In Proc. of the 26th Int'l Conf. on Very Large Databases. pp 385-394, 2000.
  • 4. SAX Conversion Process Steps: Attain raw time series Normalize time series Convert to PAA format Convert to a SAX string Result: cbbbddaa
  • 5. SAX Goals Detect abnormalities among time series data from transformers by comparing differences in SAX strings of baseline data to SAX strings of live data Predict when a transformer will fail by using dynamic (time series) data to indicate how stressed it is B phase loads for two feeders Top: caaadddb - Normal Bottom: cbbbddaa - Failure
  • 6. Tarzan and Hot SAX Methods for finding time series discords Tarzan 3 Detects novel time series patterns Novelty based on expected pattern frequency Hot SAX 4 Finds patterns most unlike others Aids in clustering and discovery of motifs 3- S.Lonardi, J. Lin, E. Keogh & B. Chiu (2007). Efficient Discovery of Unusual Patterns in Time Series. Special Issue of New Generation Computing Journal. To Appear. 4- E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005. 
  • 7. Feeder Attribute Analysis Determine the similarity of feeders and create a dendrogram displaying the information Paired feeders allow to better study “treatments”, such as Hipots, statistically Build on SQL queries previously written
  • 8. Dendrogram of Distances Between Feeders Based on Feeder Attributes
  • 9. Pairing of Feeders List of feeders and companions based on increasing coverage area.
  • 10. Accomplishments ABF dynamic attribute on feeder outages SAX strings based on time series from transformer RMS data Matlab implementation of Hot SAX algorithm Introduction of a new method for machine learning on feeders and their components. Dendrogram with information of the distance between feeders based on feeder attributes SQL queries creating feeder pairs
  • 11. Growth Gained experience with: Matlab, SQL, Python, R, Unix, and Microsoft Office Acquired new knowledge of SAX, machine learning, data mining, pattern recognition, databases, suffix trees, and time series Learned about the Con Edison Distribution System and its relevance to our work Developed good work habits and communication skills
  • 12. Special Thanks Albert Boulanger Ansaf Salleb-Aouissi Phillip Gross Roger Anderson Leon Bukhman Eugene Klitenik