SlideShare a Scribd company logo
Structuring big data
 Mark Wilson
 January 2012




#CloudCamp              UNCLASSIFIED   © Copyright 2012 Fujitsu Services Limited
The problem with big data: and a solution
The problem:
        “New reference architectures will include both big data and enterprise
         data warehouses”
                                                              [IDC, 19 January 2012]
        Two worlds: structured and unstructured data (plus external data
         sources, documents stored in structured databases, etc.)
        Siloes create issues with management, integration, etc.
The solution:
        Linked data – a single reference point for all data in the enterprise




#CloudCamp                                 1                                 UNCLASSIFIED
Some history



               Fixed structure
                   Difficult to change schema
               Simple reporting capabilities
                   Complex to create new reports




#CloudCamp                     2                    UNCLASSIFIED
Some history


                   Completed
                    transactions
                    transferred to separate
                    database for analysis
                       “Data warehouse”
                   Better reporting, data
                    mining, etc.
                       Still highly structured
                   Data is historical
                       May be aggregated




#CloudCamp     3                            UNCLASSIFIED
The smart guys



Real-time update of completed
 transactions
        Transactions moved to data warehouse
         upon completion
        Smaller transactional database
Allows for alerts to be generated when
 specific conditions met and action
 taken




#CloudCamp                             4        UNCLASSIFIED
A third “data silo”



                      Masses of unstructured/semi-
                       structured data being processed in
                       NoSQL databases
                      May, or may not be transferred
                       to/from structured databases
                          Time-consuming and inefficient
                      Three types of data, each with their
                       own limitations and own
                       management considerations




#CloudCamp                   5                              UNCLASSIFIED
Data everywhere!




#CloudCamp         6   UNCLASSIFIED
Linked Data
Tie records together – even from separate data sets
We can express as triples with a specific grammar:




Build up a graph to show machine-readable data in human
 form




#CloudCamp                     7                       UNCLASSIFIED
Then add lots more data…




Source: http://lod-cloud.net/
        Each node is itself another graph (zoom in)
#CloudCamp                               8             UNCLASSIFIED
Aren’t we missing a trick?
Use linked data as a the
 optimal reference source
        Broker of all data sources
Single view on structured and
 unstructured data
        Bring in external sources too
Mapping, interconnecting,
 indexing and feeding
        In real time
Query linked data to derive
 new value from old
        Infer relationships
        Gain new insights


#CloudCamp                               9   UNCLASSIFIED
Structuring Big Data
About the author
Mark Wilson, Strategy Manager, Fujitsu
Mark is an analyst working within Fujitsu’s UK and
Ireland Office of the CTO, providing thought
leadership both internally and to customers,
shaping business and technology strategy. He has
17 years' experience of working in the IT industry,
12 of which have been with Fujitsu. Mark has a
background in leading large IT infrastructure
projects with customers in the UK, mainland
Europe and Australia. He has a degree in
Computer Studies from the University of
Glamorgan. Mark is also active in social media and
won the Individual IT Professional (Male) award in
the 2010 Computer Weekly IT Blog Awards. Mark
may be found on Twitter @markwilsonit.

If you would like to comment on the topics in this
presentation, Mark would welcome your feedback,
by email to mark.a.wilson@uk.fujitsu.com.

More Related Content

PDF
An introduction to Machine Learning
PPTX
Deep neural networks
PDF
Machine Learning
PPTX
Hard & soft computing
ODP
Machine Learning with Decision trees
PPT
Rule Based System
PDF
Machine learning
PPTX
Overfitting & Underfitting
An introduction to Machine Learning
Deep neural networks
Machine Learning
Hard & soft computing
Machine Learning with Decision trees
Rule Based System
Machine learning
Overfitting & Underfitting

What's hot (20)

PDF
Decision tree
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
multi dimensional data model
PPT
1.4 data warehouse
PPTX
Matching techniques
PPT
Ontology engineering
PDF
Data warehouse architecture
PPTX
Machine learning ppt.
PPTX
Physical organization of parallel platforms
PPT
Artificial Neural Network seminar presentation using ppt.
PPTX
Language models
PPTX
AI: Logic in AI
PPT
17. Recovery System in DBMS
PPT
program partitioning and scheduling IN Advanced Computer Architecture
PPT
PPTX
OLAP & DATA WAREHOUSE
PPTX
Stochastic Gradient Decent (SGD).pptx
PDF
Machine Learning: Introduction to Neural Networks
PPT
Principles of soft computing-Associative memory networks
Decision tree
Artificial Neural Networks Lect3: Neural Network Learning rules
Data Mining: Mining ,associations, and correlations
multi dimensional data model
1.4 data warehouse
Matching techniques
Ontology engineering
Data warehouse architecture
Machine learning ppt.
Physical organization of parallel platforms
Artificial Neural Network seminar presentation using ppt.
Language models
AI: Logic in AI
17. Recovery System in DBMS
program partitioning and scheduling IN Advanced Computer Architecture
OLAP & DATA WAREHOUSE
Stochastic Gradient Decent (SGD).pptx
Machine Learning: Introduction to Neural Networks
Principles of soft computing-Associative memory networks
Ad

Viewers also liked (6)

PDF
Making a Cleaner Cloud with Open Source
PDF
Making The Most Of Your Fears
PDF
Adaptive Brands
PDF
Good presentations matter
PDF
The History of Pets vs. Cattle ... And Using It Properly
PDF
(Graham Brown mobileYouth) The London Riots - wtf?
Making a Cleaner Cloud with Open Source
Making The Most Of Your Fears
Adaptive Brands
Good presentations matter
The History of Pets vs. Cattle ... And Using It Properly
(Graham Brown mobileYouth) The London Riots - wtf?
Ad

Similar to Structuring Big Data (20)

PDF
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
PDF
Why Data Mesh Needs Data Virtualization (ASEAN)
PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
DOC
snowflake-course-training-in-hyderabad.DOC
PPTX
PDF
Enabling a Data Mesh Architecture with Data Virtualization
DOCX
Report 2.0.docx
PDF
Accelerate Migration to the Cloud using Data Virtualization (APAC)
PDF
Best Practices in the Cloud for Data Management (US)
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
PDF
A novel solution of distributed memory no sql database for cloud computing
PPT
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
PDF
1_Prelim-Module-IM101 ADVANCE DATABASE SYSTEM
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
PPTX
Modern data warehouse presentation
PDF
The Top 5 Factors to Consider When Choosing a Big Data Solution
PPTX
No sql database
PDF
Top 5 Considerations for a Big Data Solution
PDF
Snowflake Cloning.pdf
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Why Data Mesh Needs Data Virtualization (ASEAN)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
snowflake-course-training-in-hyderabad.DOC
Enabling a Data Mesh Architecture with Data Virtualization
Report 2.0.docx
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Best Practices in the Cloud for Data Management (US)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
A novel solution of distributed memory no sql database for cloud computing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
1_Prelim-Module-IM101 ADVANCE DATABASE SYSTEM
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Modern data warehouse presentation
The Top 5 Factors to Consider When Choosing a Big Data Solution
No sql database
Top 5 Considerations for a Big Data Solution
Snowflake Cloning.pdf

More from Fujitsu UK (14)

PDF
Fujitsu Graduate and Industrial Placement Career Opportunities 2013
PPTX
Futurology: art, science, nonsense?
PPTX
High Performance Computing: Luxury, Vanity or Essential?
PPTX
What do we know about the future, today? 12 changes and their implications fo...
PPTX
What in the world?
PDF
Separation Services from Fujitsu
PDF
Integration Services from Fujitsu
PPTX
Technology, Inside the Black Box
PPTX
Journey Into The Cloud
PPTX
Cloud Computing Infrastructure: Practical Insights
PPTX
The Changing Landscape
PPTX
A Journey into the Cloud
PPTX
An Innovation Perspective
PPTX
Time is an illusion, cloud time doubly so!
Fujitsu Graduate and Industrial Placement Career Opportunities 2013
Futurology: art, science, nonsense?
High Performance Computing: Luxury, Vanity or Essential?
What do we know about the future, today? 12 changes and their implications fo...
What in the world?
Separation Services from Fujitsu
Integration Services from Fujitsu
Technology, Inside the Black Box
Journey Into The Cloud
Cloud Computing Infrastructure: Practical Insights
The Changing Landscape
A Journey into the Cloud
An Innovation Perspective
Time is an illusion, cloud time doubly so!

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
A Presentation on Artificial Intelligence
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
cuic standard and advanced reporting.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Modernizing your data center with Dell and AMD
PDF
Empathic Computing: Creating Shared Understanding
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
A Presentation on Artificial Intelligence
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
cuic standard and advanced reporting.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Modernizing your data center with Dell and AMD
Empathic Computing: Creating Shared Understanding

Structuring Big Data

  • 1. Structuring big data Mark Wilson January 2012 #CloudCamp UNCLASSIFIED © Copyright 2012 Fujitsu Services Limited
  • 2. The problem with big data: and a solution The problem:  “New reference architectures will include both big data and enterprise data warehouses” [IDC, 19 January 2012]  Two worlds: structured and unstructured data (plus external data sources, documents stored in structured databases, etc.)  Siloes create issues with management, integration, etc. The solution:  Linked data – a single reference point for all data in the enterprise #CloudCamp 1 UNCLASSIFIED
  • 3. Some history Fixed structure  Difficult to change schema Simple reporting capabilities  Complex to create new reports #CloudCamp 2 UNCLASSIFIED
  • 4. Some history Completed transactions transferred to separate database for analysis  “Data warehouse” Better reporting, data mining, etc.  Still highly structured Data is historical  May be aggregated #CloudCamp 3 UNCLASSIFIED
  • 5. The smart guys Real-time update of completed transactions  Transactions moved to data warehouse upon completion  Smaller transactional database Allows for alerts to be generated when specific conditions met and action taken #CloudCamp 4 UNCLASSIFIED
  • 6. A third “data silo” Masses of unstructured/semi- structured data being processed in NoSQL databases May, or may not be transferred to/from structured databases  Time-consuming and inefficient Three types of data, each with their own limitations and own management considerations #CloudCamp 5 UNCLASSIFIED
  • 8. Linked Data Tie records together – even from separate data sets We can express as triples with a specific grammar: Build up a graph to show machine-readable data in human form #CloudCamp 7 UNCLASSIFIED
  • 9. Then add lots more data… Source: http://lod-cloud.net/  Each node is itself another graph (zoom in) #CloudCamp 8 UNCLASSIFIED
  • 10. Aren’t we missing a trick? Use linked data as a the optimal reference source  Broker of all data sources Single view on structured and unstructured data  Bring in external sources too Mapping, interconnecting, indexing and feeding  In real time Query linked data to derive new value from old  Infer relationships  Gain new insights #CloudCamp 9 UNCLASSIFIED
  • 12. About the author Mark Wilson, Strategy Manager, Fujitsu Mark is an analyst working within Fujitsu’s UK and Ireland Office of the CTO, providing thought leadership both internally and to customers, shaping business and technology strategy. He has 17 years' experience of working in the IT industry, 12 of which have been with Fujitsu. Mark has a background in leading large IT infrastructure projects with customers in the UK, mainland Europe and Australia. He has a degree in Computer Studies from the University of Glamorgan. Mark is also active in social media and won the Individual IT Professional (Male) award in the 2010 Computer Weekly IT Blog Awards. Mark may be found on Twitter @markwilsonit. If you would like to comment on the topics in this presentation, Mark would welcome your feedback, by email to mark.a.wilson@uk.fujitsu.com.

Editor's Notes

  • #2: Everyone’s talking about big data but the bulk of the conversation seems to focus on a new level of business intelligence and an ever-increasing volume of data organised into OLTP, OLAP and NoSQLsiloes.  In this talk, Mark Wilson puts forward a view that the real value is not from the big data itself but how we can employ linked data concepts to integrate structured, unstructured and semistructured data sets – and then use this unified data source to derive new value.