SlideShare a Scribd company logo
Big Data Cloud MeetupBig Data & Cloud Computing - Help, Educate & Demystify.June 3rd 2011
Kitenga, Mark Davis CTOJune 3rd 2011 MeetupUnlocking Big Data through Analytics and Search
Big DataEnormous transactional dataEnormous unstructured informationToo big for databasesNew tools are needed
Unstructured data explosionMultimedia ContentTextImageryAudioVideoSensor StreamsBiometric data3DTextEmailDocumentsWeb pagesTweetsPosts<5%Structured Enterprise DataDatawarehouseCDRsFinancial recordsAccess logs4
Big Data Trillions of user interactions/transactions == Big Data>100M<10M<1MOpen sourceMySQLPHPData warehousingParallel SQLBig hardwareNoSQLHadoop/MapReduceHbase/HIVEEmerging  technologies  Traditional (DBMS-based) solutions 5
The Structured/Unstructured ChasmSQLRDBMSTransactional DataBI ToolsSearchDocumentsText ClassificationTaxonomiesOntologies
Unstructured Analytics: Surfacing Metadata
Information ExtractionMachine-LearningFinite State TransducerFinite State TransducerFinite State TransducerParts-of-Speech TaggingLemmatizationTokenization
Search + AnalyticsResource IntegrationFacet BrowsingFacet ChartingAutosuggestSpellcheckQuery LanguageIndexingMetadata Extraction
Defense IntelligenceAnalyst support staff needs to convert raw data into actionable intelligence10Named Entity ExtractionImage taggingVideo analyticsLinkage AnalysisNetwork VisualizationSearchImprove Force EffectivenessHadoop/MapReduce, GPUs, HDFS, Hbase, SOLRSituation ReportsGeo-tagged ImageryUS Army NavyDHSNSA
CASE STUDY: US ARMY11The Solution>200 data feeds<0.5s queriesFast analysis cyclesMachine LearningAnalyticsBiometricsLinkage AnalysisFace recognitionVideo taggingCollaborative systemsAnalysis Bottlenecks200 data feedsUnacceptable response timeAnalysts avoid complete searchesBasic entity extractionSlow analysis cyclesDistribution by PowerPointEnabling techonolgies: GPU clouds, Hadoop/MapReduce, Katta, Lucene, NoSQL, HbaseEnabling Technologies: Oracle and custom thick clients
Pharma BioinformaticsIncrease speed of drug discovery12Biological Named Entity ExtractionAuthor Name Extraction and NormalizationLinkage AnalysisTimelinesFacetted SearchZettaVoxFaster DiscoveryHadoop/MapReduce, HDFS, Hbase, GPUs, SOLRPatentsGenetic Sequence DataJournal Articles
PharmaTreemap13
14
Big Data Cloud June 3rd Meetup - Presentation by Mark Davis
Demo
Big Data Cloud June 3rd Meetup - Presentation by Mark Davis
SummaryBig Data spans unstructured and structured dataEffective tools for managing both involve understanding the differences and similarities of bothBridging the chasm between them means merging search and analytics together
Questions?
Contact Info20mark@kitenga.comhttp://www.kitenga.comKitenga, Inc.2953 Bunker Hill Lane, Suite 400Santa Clara, CA 950541-(408)-462-KITE1-(253)-541-6799 (FAX)

More Related Content

PDF
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
PDF
Big data landscape v 3.0 - Matt Turck (FirstMark)
PPTX
PhD Research Topics in Data Mining Tutorials
PPTX
Big Data, Big Deal? (A Big Data 101 presentation)
PPTX
Big data? No. Big Decisions are What You Want
PPTX
Big data
PPT
Introduction to Big Data & Hadoop
PPT
Cloudant
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Big data landscape v 3.0 - Matt Turck (FirstMark)
PhD Research Topics in Data Mining Tutorials
Big Data, Big Deal? (A Big Data 101 presentation)
Big data? No. Big Decisions are What You Want
Big data
Introduction to Big Data & Hadoop
Cloudant

What's hot (20)

DOCX
data mining with big data
PDF
Alfresco Corporate Presentation
PPTX
Presentation at Google Day on Big Data
PDF
Big Data Landscape 2018
PPTX
A chart of the big data ecosystem
PDF
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
PDF
Big Data analytics
PPTX
What is Big Data ?
PDF
Graph-based intelligence analysis
PPT
Big Data
PDF
Introduction to BigData
PPTX
Big data peresintaion
PPTX
Big data-ppt-
PDF
Fraudes Financières: Méthodes de Prévention et Détection
PPTX
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
PPT
Data mining with big data
PPTX
Big data analytics presented at meetup big data for decision makers
PPTX
Introduction to Big Data & Big Data 1.0 System
PDF
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
data mining with big data
Alfresco Corporate Presentation
Presentation at Google Day on Big Data
Big Data Landscape 2018
A chart of the big data ecosystem
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data analytics
What is Big Data ?
Graph-based intelligence analysis
Big Data
Introduction to BigData
Big data peresintaion
Big data-ppt-
Fraudes Financières: Méthodes de Prévention et Détection
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
Data mining with big data
Big data analytics presented at meetup big data for decision makers
Introduction to Big Data & Big Data 1.0 System
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Ad

Similar to Big Data Cloud June 3rd Meetup - Presentation by Mark Davis (20)

PPT
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
PPT
Big Data
PDF
ICTA Meetup 11 - Big Data
PDF
Cloud & Big Data: Lessons Learnt
PDF
DAMA - Innovations in DG Architecture and Analytics (online)
PPTX
Demystifying data engineering
PPT
Big data analytics, survey r.nabati
PPTX
Big Data Infrastructure and Hadoop components.pptx
PDF
Devoxx 2022
PPT
Data Intensive Computing Map-Reduce Programming.ppt
PDF
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
PPTX
The rise of “Big Data” on cloud computing
PPTX
lec1_Unit 1_rev.pptx_big data aanalytics
PPTX
Big Data & Data Science
PPSX
Big Data
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PPTX
DataJan27.pptxDataFoundationsPresentation
PDF
Big Data Ecosystem
PPTX
The Evolution of Data Engineering Emerging Trends and Scalable Architecture D...
PPTX
big data processing.pptx
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data
ICTA Meetup 11 - Big Data
Cloud & Big Data: Lessons Learnt
DAMA - Innovations in DG Architecture and Analytics (online)
Demystifying data engineering
Big data analytics, survey r.nabati
Big Data Infrastructure and Hadoop components.pptx
Devoxx 2022
Data Intensive Computing Map-Reduce Programming.ppt
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
The rise of “Big Data” on cloud computing
lec1_Unit 1_rev.pptx_big data aanalytics
Big Data & Data Science
Big Data
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
DataJan27.pptxDataFoundationsPresentation
Big Data Ecosystem
The Evolution of Data Engineering Emerging Trends and Scalable Architecture D...
big data processing.pptx
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Tartificialntelligence_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Machine Learning_overview_presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
TLE Review Electricity (Electricity).pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cloud_computing_Infrastucture_as_cloud_p
Tartificialntelligence_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Mushroom cultivation and it's methods.pdf
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Machine Learning_overview_presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TLE Review Electricity (Electricity).pptx

Big Data Cloud June 3rd Meetup - Presentation by Mark Davis

  • 1. Big Data Cloud MeetupBig Data & Cloud Computing - Help, Educate & Demystify.June 3rd 2011
  • 2. Kitenga, Mark Davis CTOJune 3rd 2011 MeetupUnlocking Big Data through Analytics and Search
  • 3. Big DataEnormous transactional dataEnormous unstructured informationToo big for databasesNew tools are needed
  • 4. Unstructured data explosionMultimedia ContentTextImageryAudioVideoSensor StreamsBiometric data3DTextEmailDocumentsWeb pagesTweetsPosts<5%Structured Enterprise DataDatawarehouseCDRsFinancial recordsAccess logs4
  • 5. Big Data Trillions of user interactions/transactions == Big Data>100M<10M<1MOpen sourceMySQLPHPData warehousingParallel SQLBig hardwareNoSQLHadoop/MapReduceHbase/HIVEEmerging technologies Traditional (DBMS-based) solutions 5
  • 6. The Structured/Unstructured ChasmSQLRDBMSTransactional DataBI ToolsSearchDocumentsText ClassificationTaxonomiesOntologies
  • 8. Information ExtractionMachine-LearningFinite State TransducerFinite State TransducerFinite State TransducerParts-of-Speech TaggingLemmatizationTokenization
  • 9. Search + AnalyticsResource IntegrationFacet BrowsingFacet ChartingAutosuggestSpellcheckQuery LanguageIndexingMetadata Extraction
  • 10. Defense IntelligenceAnalyst support staff needs to convert raw data into actionable intelligence10Named Entity ExtractionImage taggingVideo analyticsLinkage AnalysisNetwork VisualizationSearchImprove Force EffectivenessHadoop/MapReduce, GPUs, HDFS, Hbase, SOLRSituation ReportsGeo-tagged ImageryUS Army NavyDHSNSA
  • 11. CASE STUDY: US ARMY11The Solution>200 data feeds<0.5s queriesFast analysis cyclesMachine LearningAnalyticsBiometricsLinkage AnalysisFace recognitionVideo taggingCollaborative systemsAnalysis Bottlenecks200 data feedsUnacceptable response timeAnalysts avoid complete searchesBasic entity extractionSlow analysis cyclesDistribution by PowerPointEnabling techonolgies: GPU clouds, Hadoop/MapReduce, Katta, Lucene, NoSQL, HbaseEnabling Technologies: Oracle and custom thick clients
  • 12. Pharma BioinformaticsIncrease speed of drug discovery12Biological Named Entity ExtractionAuthor Name Extraction and NormalizationLinkage AnalysisTimelinesFacetted SearchZettaVoxFaster DiscoveryHadoop/MapReduce, HDFS, Hbase, GPUs, SOLRPatentsGenetic Sequence DataJournal Articles
  • 14. 14
  • 16. Demo
  • 18. SummaryBig Data spans unstructured and structured dataEffective tools for managing both involve understanding the differences and similarities of bothBridging the chasm between them means merging search and analytics together
  • 20. Contact Info20mark@kitenga.comhttp://www.kitenga.comKitenga, Inc.2953 Bunker Hill Lane, Suite 400Santa Clara, CA 950541-(408)-462-KITE1-(253)-541-6799 (FAX)