SlideShare a Scribd company logo
BIGDa
Ta
Introduction
Dr. Akram Alkouz
Princess Sumaya University for Technology
1 PSUT Big Data Class,  introduction
1 PSUT Big Data Class,  introduction
Data Information  Understanding
•Big Data is the amount of data that is beyond
the storage and processing capabilities of a
single machine
•Big Data: huge volume of data, comes from
variety of sources, in variety of formats, with
high velocity.
•Big Data is similar to ‘small data’, but bigger
•Having data bigger requires different
approaches: Techniques, tools and architecture
1 PSUT Big Data Class,  introduction
• Data quantity
Volume
• Data Speed
Velocity
• DataTypes
Variety
• Accuracy
• Big Data –Veracity = Incorrect inferences?Veracity
• logic or fact?
• Volume -Validity =Worthlesness?Validity
• Usefulness
• Big Data = Data +Value?Value
• Big Data – visibility = Black Hole?Visibility
• High trend
• Real data problems
Market Size
Source:WikibonTaming Big Data
By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself
MENA – Big Data
• Gaining attraction
• Huge market opportunities for IT services (82.9% of
revenues) and analytics firms (17.1 % )
• Current market size is for GCC is 135.7 million. By
2020 it will be 635.5 million
• The opportunity for MENA service providers lies in
offering services around Big Data implementation
and analytics for global multinationals
Why Big Data became possible
•Key enablers of appearance and growth of
Big Data are:
–Increase of storage capacities
–Increase of processing power
–Availability of data
•Every day we create 2.5 quintillion bytes of
data; 90% of the data in the world today has
been created in the last two years alone
Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare
Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
Healthcare
• 80% of medical data is unstructured and is clinically
relevant
• Data resides in multiple places like individual EMRs,
lab and imaging systems, physician notes, medical
correspondence, claims etc
• Leveraging Big Data
• Build sustainable healthcare systems
• Collaborate to improve care and outcomes
• Increase access to healthcare
1 PSUT Big Data Class,  introduction
NoSQL : non-relational or at least non-SQL database
solutions such as HBase (also a part of the Hadoop
ecosystem), Cassandra, MongoDB, Riak, CouchDB, and
many others.
Hadoop: It is an ecosystem of software packages,
including MapReduce, HDFS, and a whole host of other
software packages
1 PSUT Big Data Class,  introduction
1 PSUT Big Data Class,  introduction
+ Hadoop, MapReduce, Hive, Pig, Cascading,
HBase, Hypertable, Cassandra, Flume, Sqoop,
Mongo, Voldemort, Storm, Kafka, Drill, Dremmel,
Impala, Zookeeper, Ambari, Oozi, Yarn, Redis,
Rajak, Pregel, Gremlin, Giraph, Solr, Lucene, R,
Mahout, Weka,
• Google 24 PB data processed daily
• Twitter 340 mln daily tweets + 1.6 bln
search queries + 7 TB added daily
• Facebook + 750 mln users + 12 TB daily daily
content + 2.7 bln “likes” and comments daily
1 PSUT Big Data Class,  introduction
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can only scan the data once
• Unstructured Data (Documents)
• RFID
• Web logs
• User interaction logs
• User transaction history
• Social Network, Semantic Web (RDF), …
• Climate sensors
• Internal
• Transactions
• Emails
• Log data
• External
• Social Networks
• Web
• Media
What to do with this data?
Analyze it
Why Big Data Analytics?
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown correlations
• Competitive advantage
• Better business decisions: strategic and operational
• Effective marketing, customer satisfaction, increased
revenue
• Vital Information discovery
• Trends detection and prediction
• Personalized user services
• Identify the most important customers
• Identify the best time to perform
maintenance based on the usage patterns
• Analyze brands reputation in Social Media
How can such huge amount of data processed?
Distributed systems
Application
Server
Application
Server
Application
Server
Storage
Server
Storage
Server
Storage
Server
Storage
Server
Storage Area
Network
Architecture
Problems
• Dependency on Network and big demand of
network bandwidth
• Scale up and down is not that smooth
• Partial failure is problematic
• Transferring data consumes processing power
• Data synchronization is a headache
Problems
• Dependency on Network and big demand of
network bandwidth
• Scale up and down is not that smooth
• Partial failure is problematic
• Transferring data consumes processing power
• Data synchronization is a headache
Big Data revolution comes to the stage
Big data revolution
• Google: GFS, MapReduce, BigTable,
• Yahoo: Hadoop
• Amazon: DynamoDB
• Facebook: Cassandra, HBase
• Twitter: FlockDB, Storm
• LinkedIn: Vondelmort, Kafka
• Machine Learning
• Data Mining
• Statistics
• Software Engineering
• Hadoop/MapReduce/HBase/Hive/Pig
• Java, Python, C/C+, SQL
“By 2018, the United States alone could face a shortage of 140,000
to 190,000 people with deep analytical skills as well as 1.5 million
managers and analysts with the know-how to use the analysis of big
data to make effective decisions.”
1 PSUT Big Data Class,  introduction
Thank you 

More Related Content

PPSX
Big Data
PDF
Introduction to BigData
PDF
Big Data and Health Care
PDF
xGem BigData
PDF
Societal Impact of Applied Data Science on the Big Data Stack
PDF
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
PDF
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
PDF
Big Data
Big Data
Introduction to BigData
Big Data and Health Care
xGem BigData
Societal Impact of Applied Data Science on the Big Data Stack
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Big Data

What's hot (20)

PPTX
Rethink Analytics with an Enterprise Data Hub
PDF
Business intelligence architectures.pdf
PPTX
Starting the Hadoop Journey at a Global Leader in Cancer Research
PPTX
Big data
PDF
Big dataservicesatfidel
PPTX
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
PPTX
Using Big Data for Improved Healthcare Operations and Analytics
PPTX
Necessity of Data Lakes in the Financial Services Sector
PDF
Infographic: Big Data Exploration
PDF
Infographic: Big Data Exploration
PPTX
Introduction to BIG DATA
PPT
bigdatappt-130621045034-phpapp01
PPTX
Big data(1st presentation)
PDF
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
PDF
Where HADOOP fits in and challenges
PPTX
Data mining with big data
PDF
A Dynamic Data Catalog for Autonomy and Self-Service
PPTX
A Modern Data Strategy for Precision Medicine
PPTX
Five steps to getting maximum value from Real World Data
Rethink Analytics with an Enterprise Data Hub
Business intelligence architectures.pdf
Starting the Hadoop Journey at a Global Leader in Cancer Research
Big data
Big dataservicesatfidel
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Using Big Data for Improved Healthcare Operations and Analytics
Necessity of Data Lakes in the Financial Services Sector
Infographic: Big Data Exploration
Infographic: Big Data Exploration
Introduction to BIG DATA
bigdatappt-130621045034-phpapp01
Big data(1st presentation)
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
Where HADOOP fits in and challenges
Data mining with big data
A Dynamic Data Catalog for Autonomy and Self-Service
A Modern Data Strategy for Precision Medicine
Five steps to getting maximum value from Real World Data
Ad

Similar to 1 PSUT Big Data Class, introduction (20)

PDF
Big data and analytics
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
PPT
big data
PPTX
Big_Data_ppt[1] (1).pptx
PPTX
WHAT IS BIG DATA,THREE CHARACTERISTICS OF BIG DATA
PPTX
BIG DATA,WHAT IS BIG DATA?THREE CHARACTERISTICS OF BIG DATA
PPTX
A Big Data Concept
PPTX
Big Data
PPTX
Big data analytics
PPTX
Special issues on big data
PDF
Bigdatappt 140225061440-phpapp01
PPTX
Big data ppt
PPTX
Big data
PPTX
Presentation on Big Data
PPTX
ppt final.pptx
PDF
PPTX
Big Data ppt
DOCX
Content1. Introduction2. What is Big Data3. Characte.docx
PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data and analytics
Big-Data-Seminar-6-Aug-2014-Koenig
big data
Big_Data_ppt[1] (1).pptx
WHAT IS BIG DATA,THREE CHARACTERISTICS OF BIG DATA
BIG DATA,WHAT IS BIG DATA?THREE CHARACTERISTICS OF BIG DATA
A Big Data Concept
Big Data
Big data analytics
Special issues on big data
Bigdatappt 140225061440-phpapp01
Big data ppt
Big data
Presentation on Big Data
ppt final.pptx
Big Data ppt
Content1. Introduction2. What is Big Data3. Characte.docx
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Ad

Recently uploaded (20)

PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Leprosy and NLEP programme community medicine
PPTX
Introduction to Inferential Statistics.pptx
PDF
How to run a consulting project- client discovery
PDF
Transcultural that can help you someday.
PDF
Global Data and Analytics Market Outlook Report
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
modul_python (1).pptx for professional and student
PPTX
Database Infoormation System (DBIS).pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Leprosy and NLEP programme community medicine
Introduction to Inferential Statistics.pptx
How to run a consulting project- client discovery
Transcultural that can help you someday.
Global Data and Analytics Market Outlook Report
CYBER SECURITY the Next Warefare Tactics
Qualitative Qantitative and Mixed Methods.pptx
A Complete Guide to Streamlining Business Processes
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
modul_python (1).pptx for professional and student
Database Infoormation System (DBIS).pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
IBA_Chapter_11_Slides_Final_Accessible.pptx

1 PSUT Big Data Class, introduction

  • 1. BIGDa Ta Introduction Dr. Akram Alkouz Princess Sumaya University for Technology
  • 4. Data Information  Understanding
  • 5. •Big Data is the amount of data that is beyond the storage and processing capabilities of a single machine •Big Data: huge volume of data, comes from variety of sources, in variety of formats, with high velocity.
  • 6. •Big Data is similar to ‘small data’, but bigger •Having data bigger requires different approaches: Techniques, tools and architecture
  • 8. • Data quantity Volume • Data Speed Velocity • DataTypes Variety • Accuracy • Big Data –Veracity = Incorrect inferences?Veracity • logic or fact? • Volume -Validity =Worthlesness?Validity • Usefulness • Big Data = Data +Value?Value • Big Data – visibility = Black Hole?Visibility
  • 9. • High trend • Real data problems
  • 10. Market Size Source:WikibonTaming Big Data By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself
  • 11. MENA – Big Data • Gaining attraction • Huge market opportunities for IT services (82.9% of revenues) and analytics firms (17.1 % ) • Current market size is for GCC is 135.7 million. By 2020 it will be 635.5 million • The opportunity for MENA service providers lies in offering services around Big Data implementation and analytics for global multinationals
  • 12. Why Big Data became possible •Key enablers of appearance and growth of Big Data are: –Increase of storage capacities –Increase of processing power –Availability of data •Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two years alone
  • 13. Applications for Big Data Analytics Homeland Security FinanceSmarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Fraud and Risk Log Analysis Search Quality Retail: Churn, NBO
  • 14. Healthcare • 80% of medical data is unstructured and is clinically relevant • Data resides in multiple places like individual EMRs, lab and imaging systems, physician notes, medical correspondence, claims etc • Leveraging Big Data • Build sustainable healthcare systems • Collaborate to improve care and outcomes • Increase access to healthcare
  • 16. NoSQL : non-relational or at least non-SQL database solutions such as HBase (also a part of the Hadoop ecosystem), Cassandra, MongoDB, Riak, CouchDB, and many others. Hadoop: It is an ecosystem of software packages, including MapReduce, HDFS, and a whole host of other software packages
  • 19. + Hadoop, MapReduce, Hive, Pig, Cascading, HBase, Hypertable, Cassandra, Flume, Sqoop, Mongo, Voldemort, Storm, Kafka, Drill, Dremmel, Impala, Zookeeper, Ambari, Oozi, Yarn, Redis, Rajak, Pregel, Gremlin, Giraph, Solr, Lucene, R, Mahout, Weka,
  • 20. • Google 24 PB data processed daily • Twitter 340 mln daily tweets + 1.6 bln search queries + 7 TB added daily • Facebook + 750 mln users + 12 TB daily daily content + 2.7 bln “likes” and comments daily
  • 22. • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can only scan the data once • Unstructured Data (Documents)
  • 23. • RFID • Web logs • User interaction logs • User transaction history • Social Network, Semantic Web (RDF), … • Climate sensors
  • 24. • Internal • Transactions • Emails • Log data • External • Social Networks • Web • Media What to do with this data? Analyze it
  • 25. Why Big Data Analytics? • Examining large amount of data • Appropriate information • Identification of hidden patterns, unknown correlations • Competitive advantage • Better business decisions: strategic and operational • Effective marketing, customer satisfaction, increased revenue • Vital Information discovery • Trends detection and prediction • Personalized user services
  • 26. • Identify the most important customers • Identify the best time to perform maintenance based on the usage patterns • Analyze brands reputation in Social Media How can such huge amount of data processed? Distributed systems
  • 28. Problems • Dependency on Network and big demand of network bandwidth • Scale up and down is not that smooth • Partial failure is problematic • Transferring data consumes processing power • Data synchronization is a headache
  • 29. Problems • Dependency on Network and big demand of network bandwidth • Scale up and down is not that smooth • Partial failure is problematic • Transferring data consumes processing power • Data synchronization is a headache Big Data revolution comes to the stage
  • 30. Big data revolution • Google: GFS, MapReduce, BigTable, • Yahoo: Hadoop • Amazon: DynamoDB • Facebook: Cassandra, HBase • Twitter: FlockDB, Storm • LinkedIn: Vondelmort, Kafka
  • 31. • Machine Learning • Data Mining • Statistics • Software Engineering • Hadoop/MapReduce/HBase/Hive/Pig • Java, Python, C/C+, SQL “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”

Editor's Notes

  • #2: ICP : acco. to IBM
  • #3: ICP : acco. to IBM
  • #8: Acco.to IBM
  • #9: Acco.to IBM
  • #14: Explain well. Quote practical examples
  • #32: .