SlideShare a Scribd company logo
 A HOLISTIC APPROACH TO DISTRIBUTE
DIMENSIONALITY REDUCTION OF BIG DATA
With the exponential growth of data volume, big data have placed an unprecedented
burden on current computing infrastructure.
Dimensionality reduction of big data attracts a great deal of attention in recent years
as an efficient method to extract the core data which is smaller to store and faster to
process.
This paper aims at addressing the three fundamental problems closely related to
distributed dimensionality reduction of big data, i.e. big data fusion, dimensionality
reduction algorithm and construction of distributed computing platform.
A chunk tensor method is presented to fuse the unstructured, semi-structured and
structured data as a unified model in which all characteristics of the heterogeneous
data are appropriately arranged along the tensor orders.
A Lanczos based High Order Singular Value Decomposition algorithm is
proposed to reduce dimensionality of the unified model.
Theoretical analyses of the algorithm are provided in terms of storage
scheme, convergence property and computation cost.
To execute the dimensionality reduction task, this paper employs the
Transparent Computing paradigm to construct a distributed computing
platform as well as utilizes the linear predictive model to partition the data
blocks.
Experimental results demonstrate that the proposed holistic approach is
efficient for distributed dimensionality reduction of big data.
 With the exponential growth of data volume, big
data have placed an unprecedented burden on
current computing infrastructure.
 Dimensionality reduction of big data attracts a
great deal of attention in recent years as an
efficient method to extract the core data which is
smaller to store and faster to process.
 This paper aims at addressing the three fundamental problems
closely related to distributed dimensionality reduction of big
data, i.e. big data fusion, dimensionality reduction algorithm and
construction of distributed computing platform.
 A chunk tensor method is presented to fuse the unstructured,
semi-structured and structured data as a unified model in which
all characteristics of the heterogeneous data are appropriately
arranged along the tensor orders.
 Decomposition
 Storage Scheme for Symmetric Matrix during
Lanczos Iteration
 Convergence and Accuracy of the L-HOSVD
Algorithm
 Computation Cost and Memory Usage
This paper aims at providing a holistic approach to distributed dimensionality reduction of
big data. Firstly a chunk tensor model is proposed to fuse the heterogeneous data from
multiple sources as a unified tensor model.
Concepts and operations of the chunk tensor model are established in this paper.
Secondly, a Lanczos-based High Order Singular Value Decomposition (L-HOSVD) algorithm
is proposed to obtain the core data which are small but contain valuable information.
Storage and convergence property of the L-HOSVD algorithm are studied.
Thirdly, the transparent computing paradigm is employed to construct a distributed
computing platform, as well as the linear predictive model is used to partition and distribute
data blocks to autonomic devices.
 [1] L. J. van der Maaten, E. O. Postma, and H. J. van den Herik, “Dimensionality Reduction: A ComparativeReview,”
Journal of Machine Learning Research, vol. 10, no. 1-41, pp. 66–71, 2009.
 [2]U. Doraszelski and K. L. Judd, “Avoiding the Curse of Dimensionality in Dynamic Stochastic
Games,”Quantitative Economics, vol. 3, no. 1, pp. 53–93, 2012.
 [3] H. Abdi and L. J. Williams, “Principal Component Analysis,” Wiley Interdisciplinary Reviews:
ComputationalStatistics, vol. 2, no. 4, pp. 433–459, 2010.
 [4] E. Henry, J. Hofrichter et al., “Singular Value Decomposition: Application to Analysis of ExperimentalData,”
Essential Numerical Computer Methods, vol. 210, pp. 81–138, 2010.
 [5] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and
Applications. Academic Press, 2010.

More Related Content

PDF
50120130406022
PPTX
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
PPTX
Big data visualization state of the art
DOCX
Online data deduplication for in memory big-data analytic systems
PDF
Volume 2-issue-6-1930-1932
PDF
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
PPT
WILF2011 - slides
PDF
Big Data with Rough Set Using Map- Reduce
50120130406022
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
Big data visualization state of the art
Online data deduplication for in memory big-data analytic systems
Volume 2-issue-6-1930-1932
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
WILF2011 - slides
Big Data with Rough Set Using Map- Reduce

What's hot (19)

PDF
Different Classification Technique for Data mining in Insurance Industry usin...
PPTX
Application statistics in software engineering
PDF
HITS: A History-Based Intelligent Transportation System
PPTX
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
PDF
CLIM Program: Remote Sensing Workshop, Some Ideas on Theory of Data Systems -...
PDF
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
PDF
Ba2419551957
PPT
Data pre processing
PDF
Single view vs. multiple views scatterplots
PDF
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
PPTX
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
PDF
THIC MedIX Summer 2015 Poster
PDF
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
DOCX
On distributed fuzzy decision trees for big data
PDF
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
PPTX
Real life application of statistics in engineering
PDF
Iaetsd a survey on one class clustering
PPT
Datapreprocess
PDF
GCUBE INDEXING
Different Classification Technique for Data mining in Insurance Industry usin...
Application statistics in software engineering
HITS: A History-Based Intelligent Transportation System
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
CLIM Program: Remote Sensing Workshop, Some Ideas on Theory of Data Systems -...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Ba2419551957
Data pre processing
Single view vs. multiple views scatterplots
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
THIC MedIX Summer 2015 Poster
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
On distributed fuzzy decision trees for big data
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
Real life application of statistics in engineering
Iaetsd a survey on one class clustering
Datapreprocess
GCUBE INDEXING
Ad

Similar to A holistic approach to distribute dimensionality reduction of big dat,big data projects in pondicherry, bulk ieee projectsbig data projects (20)

PDF
2013-imMens-EuroVis
PDF
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
PDF
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
PDF
Similar Data Points Identification with LLM: A Human-in-the-Loop Strategy Usi...
PDF
Big Data Storage System Based on a Distributed Hash Tables System
PDF
Big Data Storage System Based on a Distributed Hash Tables System
PDF
Information Upload and retrieval using SP Theory of Intelligence
PDF
Towards reducing the
PDF
A Quantified Approach for large Dataset Compression in Association Mining
PDF
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
PDF
Big data storage
PDF
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
PDF
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
PDF
An Efficient Approach for Clustering High Dimensional Data
PDF
A h k clustering algorithm for high dimensional data using ensemble learning
PDF
A study and survey on various progressive duplicate detection mechanisms
PPTX
Clustering for Stream and Parallelism (DATA ANALYTICS)
PDF
Implementation of p pic algorithm in map reduce to handle big data
PDF
A frame work for clustering time evolving data
PDF
Drsp dimension reduction for similarity matching and pruning of time series ...
2013-imMens-EuroVis
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
Similar Data Points Identification with LLM: A Human-in-the-Loop Strategy Usi...
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
Information Upload and retrieval using SP Theory of Intelligence
Towards reducing the
A Quantified Approach for large Dataset Compression in Association Mining
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
Big data storage
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
An Efficient Approach for Clustering High Dimensional Data
A h k clustering algorithm for high dimensional data using ensemble learning
A study and survey on various progressive duplicate detection mechanisms
Clustering for Stream and Parallelism (DATA ANALYTICS)
Implementation of p pic algorithm in map reduce to handle big data
A frame work for clustering time evolving data
Drsp dimension reduction for similarity matching and pruning of time series ...
Ad

More from Nexgen Technology (20)

DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
DOCX
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
DOCX
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
DOCX
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
DOCX
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
DOCX
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
DOCX
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
DOCX
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
DOCX
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
DOCX
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
DOCX
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...

Recently uploaded (20)

PPTX
master seminar digital applications in india
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Pharma ospi slides which help in ospi learning
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Computing-Curriculum for Schools in Ghana
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Institutional Correction lecture only . . .
master seminar digital applications in india
Renaissance Architecture: A Journey from Faith to Humanism
Module 4: Burden of Disease Tutorial Slides S2 2025
Pharma ospi slides which help in ospi learning
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
Sports Quiz easy sports quiz sports quiz
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Anesthesia in Laparoscopic Surgery in India
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
102 student loan defaulters named and shamed – Is someone you know on the list?
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pre independence Education in Inndia.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Computing-Curriculum for Schools in Ghana
VCE English Exam - Section C Student Revision Booklet
Institutional Correction lecture only . . .

A holistic approach to distribute dimensionality reduction of big dat,big data projects in pondicherry, bulk ieee projectsbig data projects

  • 1.  A HOLISTIC APPROACH TO DISTRIBUTE DIMENSIONALITY REDUCTION OF BIG DATA
  • 2. With the exponential growth of data volume, big data have placed an unprecedented burden on current computing infrastructure. Dimensionality reduction of big data attracts a great deal of attention in recent years as an efficient method to extract the core data which is smaller to store and faster to process. This paper aims at addressing the three fundamental problems closely related to distributed dimensionality reduction of big data, i.e. big data fusion, dimensionality reduction algorithm and construction of distributed computing platform. A chunk tensor method is presented to fuse the unstructured, semi-structured and structured data as a unified model in which all characteristics of the heterogeneous data are appropriately arranged along the tensor orders.
  • 3. A Lanczos based High Order Singular Value Decomposition algorithm is proposed to reduce dimensionality of the unified model. Theoretical analyses of the algorithm are provided in terms of storage scheme, convergence property and computation cost. To execute the dimensionality reduction task, this paper employs the Transparent Computing paradigm to construct a distributed computing platform as well as utilizes the linear predictive model to partition the data blocks. Experimental results demonstrate that the proposed holistic approach is efficient for distributed dimensionality reduction of big data.
  • 4.  With the exponential growth of data volume, big data have placed an unprecedented burden on current computing infrastructure.  Dimensionality reduction of big data attracts a great deal of attention in recent years as an efficient method to extract the core data which is smaller to store and faster to process.
  • 5.  This paper aims at addressing the three fundamental problems closely related to distributed dimensionality reduction of big data, i.e. big data fusion, dimensionality reduction algorithm and construction of distributed computing platform.  A chunk tensor method is presented to fuse the unstructured, semi-structured and structured data as a unified model in which all characteristics of the heterogeneous data are appropriately arranged along the tensor orders.
  • 6.  Decomposition  Storage Scheme for Symmetric Matrix during Lanczos Iteration  Convergence and Accuracy of the L-HOSVD Algorithm  Computation Cost and Memory Usage
  • 7. This paper aims at providing a holistic approach to distributed dimensionality reduction of big data. Firstly a chunk tensor model is proposed to fuse the heterogeneous data from multiple sources as a unified tensor model. Concepts and operations of the chunk tensor model are established in this paper. Secondly, a Lanczos-based High Order Singular Value Decomposition (L-HOSVD) algorithm is proposed to obtain the core data which are small but contain valuable information. Storage and convergence property of the L-HOSVD algorithm are studied. Thirdly, the transparent computing paradigm is employed to construct a distributed computing platform, as well as the linear predictive model is used to partition and distribute data blocks to autonomic devices.
  • 8.  [1] L. J. van der Maaten, E. O. Postma, and H. J. van den Herik, “Dimensionality Reduction: A ComparativeReview,” Journal of Machine Learning Research, vol. 10, no. 1-41, pp. 66–71, 2009.  [2]U. Doraszelski and K. L. Judd, “Avoiding the Curse of Dimensionality in Dynamic Stochastic Games,”Quantitative Economics, vol. 3, no. 1, pp. 53–93, 2012.  [3] H. Abdi and L. J. Williams, “Principal Component Analysis,” Wiley Interdisciplinary Reviews: ComputationalStatistics, vol. 2, no. 4, pp. 433–459, 2010.  [4] E. Henry, J. Hofrichter et al., “Singular Value Decomposition: Application to Analysis of ExperimentalData,” Essential Numerical Computer Methods, vol. 210, pp. 81–138, 2010.  [5] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, 2010.