SlideShare a Scribd company logo
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED
HASH TABLES
Abstract
As remote sensing equipment and networked observational devices continue
to proliferate, their corresponding data volumes have surpassed the storage
and processing capabilities of commodity computing hardware. This trend has
led to the development of distributed storage frameworks that incrementally
scale out by assimilating resources as necessary. While challenging in its own
right, storing and managing voluminous datasets is only the precursor to a
broader field of research: extracting insights, relationships, and models from
the underlying datasets. The focus of this study is twofold: exploratory and
predictive analytics over voluminous, multidimensional datasets in a
distributed environment. Both of these types of analysis represent a higher-
level abstraction over standard query semantics; rather than indexing every
discrete value for subsequent retrieval, our framework autonomously learns
the relationships and interactions between dimensions in the dataset and
makes the information readily available to users. This functionality includes
statistical synopses, correlation analysis, hypothesis testing, probabilistic
structures, and predictive models that not only enable the discovery of
nuanced relationships between dimensions, but also allow future events and
trends to be predicted. The algorithms presented in this work were evaluated
empirically on a real-world geospatial time-series dataset in a production
environment, and are broadly applicable across other storage frameworks
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
CONCLUSION:
Support for analytic queries over voluminous datasets entails accounting for:
(1) the speed differential between memory accesses and disk I/O,
(2) how metadata is organized and managed,
(3) the performance impact of the data structures,
(4) dispersion of query loads, and
(5) the avoidance of I/O hotspots.
These factors enable us to provide a rich set of exploratory analysis
functionality as well as predictive models that produce insights beyond just the
trends present in the dataset. One key aspect of our approach is minimizing
disk accesses. This is achieved by carefully maintaining metadata graphs that
retain expressiveness for query evaluations but preserve compactness to
ensure memory residency while avoiding page faults and thrashing. The graphs
remain compact even in situations where individual nodes store hundreds of
millions of files. Further, statistical synopses ensure the knowledge base is
continually updated as live streams occur. We achieve this via the use and
adaptation of online algorithms, compact data structures, and lightweight
models. This also allows us to perform query evaluations at multiple
geographic scales. We avoid query hotspots by propagating the queries to
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
nodes likely to satisfy them, performing in-memory evaluations and avoiding
disk accesses. This reduces the likelihood of queries building up and
overflowing request queues at individual nodes. By targeting only a specific
subset of the nodes, we minimize cases where queries are evaluated that
produce no results. Our use of Geohashes also allows us to localize queries
efficiently. Hotspot avoidance ensures faster overall turnaround times for
individual queries. Combined with efficient pipelining, this allows multiple
queries to be evaluated concurrently at a high rate, which is validated by our
empirical results.
REFERENCES
[1] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large
clusters,” Communications of the ACM, 2008.
[2] M. Malensek, S. Pallickara, and S. Pallickara, “Exploiting geospatial and
chronological characteristics in data streams to enable efficient storage and
retrievals,” Future Gener. Comput. Syst., 2012.
[3] W. Budgaga, M. Malensek, S. Pallickara, N. Harvey, F. J. Breidt, and S.
Pallickara, “Predictive analytics using statistical, learning, and ensemble
methods to support real-time exploration of discrete event simulations,”
Future Gener. Comput. Syst., vol. 56, 2016.
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
[4] M. Malensek, S. Pallickara, and S. Pallickara, “Fast, ad hoc query evaluations
over multidimensional geospatial datasets,” Cloud Computing, IEEE
Transactions on (To appear), 2015.
[5] A. Lakshman and P. Malik, “Cassandra: a decentralized structured storage
system,” ACM SIGOPS Op. Sys. Rev., vol. 44, 2010.
[6] D. Hastorun, M. Jampani, G. Kakulapati, A. Pilchin, S. Sivasubramanian, P.
Vosshall, and W. Vogels, “Dynamo: amazon’s highly available key-value store,”
in SOSP. Citeseer, 2007.
[7] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: A
scalable peer-to-peer lookup service for internet applications,” ACM SIGCOMM
Computer Communication Review, vol. 31, no. 4, pp. 149–160, 2001.
[8] G. Niemeyer. (2008) Geohash. [Online]. Available:
http://en.wikipedia.org/wiki/Geohash
[9] C. Tolooee, M. Malensek, and S. L. Pallickara, “A framework for managing
continuous query evaluations over voluminous, multidimensional datasets,” in
Proceedings of the 2014 ACM Cloud and Autonomic Computing Conference,
ser. CAC ’14. ACM, 2014.
[10] National Oceanic and Atmospheric Administration. (2015) The north
american mesoscale forecast system. [Online]. Available:
http://www.emc.ncep.noaa.gov/index.php?branch=NAM

More Related Content

PDF
Drsp dimension reduction for similarity matching and pruning of time series ...
PDF
Certain Investigation on Dynamic Clustering in Dynamic Datamining
PDF
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
PDF
A hybrid approach for analysis of dynamic changes in spatial data
PDF
A PROCESS OF LINK MINING
DOCX
An Efficient Cluster-Tree Based Data Collection Scheme for Large Mobile Wirel...
PDF
Applying association rules and co location techniques on geospatial web services
DOCX
Final_Paper_Revision
Drsp dimension reduction for similarity matching and pruning of time series ...
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
A hybrid approach for analysis of dynamic changes in spatial data
A PROCESS OF LINK MINING
An Efficient Cluster-Tree Based Data Collection Scheme for Large Mobile Wirel...
Applying association rules and co location techniques on geospatial web services
Final_Paper_Revision

Viewers also liked (20)

PDF
Ps 2016- projects list
PDF
Ahmed_SAMIR_cv[1-1]
PPTX
Applic group michel tassen
PPTX
Embedded system
PPTX
Big data minute privacy
PPTX
Rapidminer: Important Elements
PDF
Edelman Trust Barometer: U.S. Energy Industry
PPT
La vertu d'egoïsme Ayn rand par nicodème et degiovanni
PPT
ENJ-400 Mejora Contínua al Programa Formación de Derecho Civil
 
PDF
Cloud centric multi-level authentication as a service for secure public safet...
PDF
A parallel patient treatment time prediction algorithm and its applications i...
PDF
Java 2016- projects list
PPTX
RAPIDMINER: Rapidminerproducts
PPT
ENJ-300 Atribuciones Penales Del Juzgado De Paz
 
PDF
Automatically mining facets for queries from their search results
PDF
L'observatoire des millennials
PPTX
Rapid miner
PDF
Protection of big data privacy
Ps 2016- projects list
Ahmed_SAMIR_cv[1-1]
Applic group michel tassen
Embedded system
Big data minute privacy
Rapidminer: Important Elements
Edelman Trust Barometer: U.S. Energy Industry
La vertu d'egoïsme Ayn rand par nicodème et degiovanni
ENJ-400 Mejora Contínua al Programa Formación de Derecho Civil
 
Cloud centric multi-level authentication as a service for secure public safet...
A parallel patient treatment time prediction algorithm and its applications i...
Java 2016- projects list
RAPIDMINER: Rapidminerproducts
ENJ-300 Atribuciones Penales Del Juzgado De Paz
 
Automatically mining facets for queries from their search results
L'observatoire des millennials
Rapid miner
Protection of big data privacy
Ad

Similar to ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES (20)

PDF
IRJET- Big Data Processes and Analysis using Hadoop Framework
PDF
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
PDF
TCS_DATA_ANALYSIS_REPORT_ADITYA
PDF
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
PPT
Hive @ Hadoop day seattle_2010
DOCX
Hashedcubes simple, low memory, real time visual
DOCX
Hashedcubes simple, low memory, real time visual
PPT
Big data analytics, survey r.nabati
PPTX
PDF
Scalable IoT platform
PPTX
Agile data warehousing
PDF
An Efficient Approach for Clustering High Dimensional Data
PDF
Big Data and IOT
PPTX
Term Paper Presentation
PDF
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
PPT
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
PDF
Unstructured Datasets Analysis: Thesaurus Model
PDF
A sql implementation on the map reduce framework
PPTX
Introducing DataWave
PDF
Data mining model for the data retrieval from central server configuration
IRJET- Big Data Processes and Analysis using Hadoop Framework
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
TCS_DATA_ANALYSIS_REPORT_ADITYA
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
Hive @ Hadoop day seattle_2010
Hashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visual
Big data analytics, survey r.nabati
Scalable IoT platform
Agile data warehousing
An Efficient Approach for Clustering High Dimensional Data
Big Data and IOT
Term Paper Presentation
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Unstructured Datasets Analysis: Thesaurus Model
A sql implementation on the map reduce framework
Introducing DataWave
Data mining model for the data retrieval from central server configuration
Ad

More from Nexgen Technology (20)

DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
DOCX
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
DOCX
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
DOCX
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
DOCX
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
DOCX
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
DOCX
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
DOCX
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
DOCX
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
DOCX
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...

Recently uploaded (20)

PPTX
master seminar digital applications in india
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Lesson notes of climatology university.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Complications of Minimal Access Surgery at WLH
master seminar digital applications in india
O5-L3 Freight Transport Ops (International) V1.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Computing-Curriculum for Schools in Ghana
Lesson notes of climatology university.
Final Presentation General Medicine 03-08-2024.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
FourierSeries-QuestionsWithAnswers(Part-A).pdf
VCE English Exam - Section C Student Revision Booklet
Final Presentation General Medicine 03-08-2024.pptx
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Complications of Minimal Access Surgery at WLH

ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES

  • 1. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES Abstract As remote sensing equipment and networked observational devices continue to proliferate, their corresponding data volumes have surpassed the storage and processing capabilities of commodity computing hardware. This trend has led to the development of distributed storage frameworks that incrementally scale out by assimilating resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of research: extracting insights, relationships, and models from the underlying datasets. The focus of this study is twofold: exploratory and predictive analytics over voluminous, multidimensional datasets in a distributed environment. Both of these types of analysis represent a higher- level abstraction over standard query semantics; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. The algorithms presented in this work were evaluated empirically on a real-world geospatial time-series dataset in a production environment, and are broadly applicable across other storage frameworks
  • 2. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com CONCLUSION: Support for analytic queries over voluminous datasets entails accounting for: (1) the speed differential between memory accesses and disk I/O, (2) how metadata is organized and managed, (3) the performance impact of the data structures, (4) dispersion of query loads, and (5) the avoidance of I/O hotspots. These factors enable us to provide a rich set of exploratory analysis functionality as well as predictive models that produce insights beyond just the trends present in the dataset. One key aspect of our approach is minimizing disk accesses. This is achieved by carefully maintaining metadata graphs that retain expressiveness for query evaluations but preserve compactness to ensure memory residency while avoiding page faults and thrashing. The graphs remain compact even in situations where individual nodes store hundreds of millions of files. Further, statistical synopses ensure the knowledge base is continually updated as live streams occur. We achieve this via the use and adaptation of online algorithms, compact data structures, and lightweight models. This also allows us to perform query evaluations at multiple geographic scales. We avoid query hotspots by propagating the queries to
  • 3. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com nodes likely to satisfy them, performing in-memory evaluations and avoiding disk accesses. This reduces the likelihood of queries building up and overflowing request queues at individual nodes. By targeting only a specific subset of the nodes, we minimize cases where queries are evaluated that produce no results. Our use of Geohashes also allows us to localize queries efficiently. Hotspot avoidance ensures faster overall turnaround times for individual queries. Combined with efficient pipelining, this allows multiple queries to be evaluated concurrently at a high rate, which is validated by our empirical results. REFERENCES [1] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, 2008. [2] M. Malensek, S. Pallickara, and S. Pallickara, “Exploiting geospatial and chronological characteristics in data streams to enable efficient storage and retrievals,” Future Gener. Comput. Syst., 2012. [3] W. Budgaga, M. Malensek, S. Pallickara, N. Harvey, F. J. Breidt, and S. Pallickara, “Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations,” Future Gener. Comput. Syst., vol. 56, 2016.
  • 4. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com [4] M. Malensek, S. Pallickara, and S. Pallickara, “Fast, ad hoc query evaluations over multidimensional geospatial datasets,” Cloud Computing, IEEE Transactions on (To appear), 2015. [5] A. Lakshman and P. Malik, “Cassandra: a decentralized structured storage system,” ACM SIGOPS Op. Sys. Rev., vol. 44, 2010. [6] D. Hastorun, M. Jampani, G. Kakulapati, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: amazon’s highly available key-value store,” in SOSP. Citeseer, 2007. [7] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” ACM SIGCOMM Computer Communication Review, vol. 31, no. 4, pp. 149–160, 2001. [8] G. Niemeyer. (2008) Geohash. [Online]. Available: http://en.wikipedia.org/wiki/Geohash [9] C. Tolooee, M. Malensek, and S. L. Pallickara, “A framework for managing continuous query evaluations over voluminous, multidimensional datasets,” in Proceedings of the 2014 ACM Cloud and Autonomic Computing Conference, ser. CAC ’14. ACM, 2014. [10] National Oceanic and Atmospheric Administration. (2015) The north american mesoscale forecast system. [Online]. Available: http://www.emc.ncep.noaa.gov/index.php?branch=NAM