SlideShare a Scribd company logo
Big Data
Storages
Agenda
[Big]Data Source: when it becomes Big?
What cluster is? Horizontal and vertical scaling
[Big]Data Storage challenges
Disadvantages
NoSQL = Not only SQL
Most popular and trendy
Big Data Storage Concepts
Only stores facts (events), doesn’t analyze it
Immutable
Time series data (based on timestamps and, maybe, origin)
Store everything, delete nothing
Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files,
Locations
Cluster. Horizontal and vertical scaling
What cluster is?
Load balancer
Communication: master/slave
architecture
Fault tolerance and replication
factor
Size (keep and search huge
amount of data)
Speed (data acquisition, data
search)
Availability (fault tolerance,
partition tolerance)
Big Data Storage Challenges
Disadvantages of Big Data Storages
No transactions (ACID)
Less mature
Big variety of concepts, lack of standardization
No BI or analytics in queries
Administration
Distributed File storage
Amazon
 Tatyana Matvienko,Senior Java Developer, Big data storages
Storages: Key-Value
Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB
Storages: Document oriented
Examples: Apache CouchDB, Couchbase, MongoDB
Storages: Graphs
Examples: Allegro, Neo4J, OrientDB, Titan
Storages: Column based
Examples: Cassandra, HBase, Accumulo, Vertica
Why Cassandra?
Apache Cassandra: basics
Masterless architecture with read/write anywhere design
All nodes are the same
No single point of failure
Zone support
Linear scalability
CQL - cassandra query language
Availability and Partition Tolerance but Eventual Consistency
 Tatyana Matvienko,Senior Java Developer, Big data storages
Partitioning and Replication
Data modeling
 Tatyana Matvienko,Senior Java Developer, Big data storages
Demo

More Related Content

PDF
TileDB Cloud Webinar (09/30/2021)
PDF
Population genomics is a data management problem
PDF
The New Data Economics
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PPTX
MetadataTheory: Introduction to Metadata (5th of 10)
PPTX
Overview of Oracle Database 18c Express Edition (XE)
PPTX
Data Mining Techniques
PPTX
TileDB Cloud Webinar (09/30/2021)
Population genomics is a data management problem
The New Data Economics
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
MetadataTheory: Introduction to Metadata (5th of 10)
Overview of Oracle Database 18c Express Edition (XE)
Data Mining Techniques

What's hot (18)

PDF
Datamining with big data
PPTX
ORCID and RDM
PPT
Graph Database and Neo4j
PDF
Big Data Pitfalls
ODP
Building next generation data warehouses
PPTX
Introduction
PPTX
Custom Data Search with Stormpath
PPTX
Semantic Web related top conference review
PDF
Stardog Linked Data Catalog
PPTX
Introduction to Big Data
PDF
A Gentle Introduction to Big Data
PPTX
The University of Edinburgh Research Data Management Service Suite
PPTX
Data Mining: Key definitions
PPTX
Lunch & Learn Intro to Big Data
ODP
Graphing Your Data
PPTX
How Linked Data Can Speed Information Discovery
PPTX
Big Data Projects Research Ideas
PPTX
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Datamining with big data
ORCID and RDM
Graph Database and Neo4j
Big Data Pitfalls
Building next generation data warehouses
Introduction
Custom Data Search with Stormpath
Semantic Web related top conference review
Stardog Linked Data Catalog
Introduction to Big Data
A Gentle Introduction to Big Data
The University of Edinburgh Research Data Management Service Suite
Data Mining: Key definitions
Lunch & Learn Intro to Big Data
Graphing Your Data
How Linked Data Can Speed Information Discovery
Big Data Projects Research Ideas
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Ad

Viewers also liked (20)

PDF
“5th World: Texas Industry Cluster Initiative and 21st-Century Science, Techn...
PDF
Business Project Report on Nishat Textile Mills Pakistan
PPTX
Doctrina
PPTX
Andrey Chebotarev, Head of UX Design в AltexSoft
DOCX
González valentin annex_competic2
DOCX
resume 2015
PDF
My Invoice Finance
PDF
Impact of Risk Free and Risky Asset on Portfolio Return
PPTX
Sneak Peak into the Gaming Community
PPTX
Desenvolvendo aplicações Cross-Platform com Xamarin
DOCX
Metodos computarizados para resolver ecuaciones diferenciales
PPTX
Encryption ppt
PDF
Buyer'sGuideIssuu
PDF
Analisis de Estados Financieros
PDF
White Paper Organizational Design
PPTX
Praktek
PDF
WindEnergyFinal-2-2
DOCX
Applications of Doppler in Biomedical
PDF
Statistical Analysis of Interrelationship between Money Supply Exchange Rates...
DOC
CVTemplate_en_GB europass zugravu
“5th World: Texas Industry Cluster Initiative and 21st-Century Science, Techn...
Business Project Report on Nishat Textile Mills Pakistan
Doctrina
Andrey Chebotarev, Head of UX Design в AltexSoft
González valentin annex_competic2
resume 2015
My Invoice Finance
Impact of Risk Free and Risky Asset on Portfolio Return
Sneak Peak into the Gaming Community
Desenvolvendo aplicações Cross-Platform com Xamarin
Metodos computarizados para resolver ecuaciones diferenciales
Encryption ppt
Buyer'sGuideIssuu
Analisis de Estados Financieros
White Paper Organizational Design
Praktek
WindEnergyFinal-2-2
Applications of Doppler in Biomedical
Statistical Analysis of Interrelationship between Money Supply Exchange Rates...
CVTemplate_en_GB europass zugravu
Ad

Similar to Tatyana Matvienko,Senior Java Developer, Big data storages (20)

PDF
BigData Behind-the-Scenes~20150827
PPTX
Big Data in Action : Operations, Analytics and more
PDF
PPTX
Introduction to Big Data
PPT
Apache Cassandra training. Overview and Basics
PDF
Introduction to Big Data Technologies & Applications
PPSX
Big data with Hadoop - Introduction
PDF
Cassandra background-and-architecture
PDF
DBA to Data Scientist
ODP
BigData Hadoop
PDF
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
PDF
Big data and hadoop
PPT
Big Data
PDF
Big Data Ecosystem
PDF
Beyond Relational
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PPTX
Big data explanation with real time use case
PPT
Final deck
PPTX
Bigdata
PPTX
TDC2016SP - Trilha NoSQL
BigData Behind-the-Scenes~20150827
Big Data in Action : Operations, Analytics and more
Introduction to Big Data
Apache Cassandra training. Overview and Basics
Introduction to Big Data Technologies & Applications
Big data with Hadoop - Introduction
Cassandra background-and-architecture
DBA to Data Scientist
BigData Hadoop
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Big data and hadoop
Big Data
Big Data Ecosystem
Beyond Relational
Lecture 5 - Big Data and Hadoop Intro.ppt
Big data explanation with real time use case
Final deck
Bigdata
TDC2016SP - Trilha NoSQL

More from Alina Vilk (9)

PPTX
"Intro to-xamarin.forms", Кирилл Стативкин, Microsoft Student Partner
PPTX
Alexander Pavlenko, Senior Java Developer, "Cassandra into"
PPTX
Big data overwiew, Татьяна Матвиенко/Александр Павленко, Senior Java/BigData ...
PPTX
Alexander Pavlenko, Java Software Engineer, DataArt.
PDF
Студия ГрафПром
PDF
“ComputerVision(Ruby && OpenCV)”, Людмила Дежкина ( Senior Ruby, DataArt)
PDF
«Как научить Ruby / как научиться Ruby», Виктор Шепелев (Team Lead at BrandSp...
PPTX
Aleksandr Mishanin, UX/UI дизайнер, Fortifier, " О трудностях в работе дизайн...
PPTX
Александр Мищанин,(UX/UI дизайнер, Fortifier), "О дизайн-процессе и трудностя...
"Intro to-xamarin.forms", Кирилл Стативкин, Microsoft Student Partner
Alexander Pavlenko, Senior Java Developer, "Cassandra into"
Big data overwiew, Татьяна Матвиенко/Александр Павленко, Senior Java/BigData ...
Alexander Pavlenko, Java Software Engineer, DataArt.
Студия ГрафПром
“ComputerVision(Ruby && OpenCV)”, Людмила Дежкина ( Senior Ruby, DataArt)
«Как научить Ruby / как научиться Ruby», Виктор Шепелев (Team Lead at BrandSp...
Aleksandr Mishanin, UX/UI дизайнер, Fortifier, " О трудностях в работе дизайн...
Александр Мищанин,(UX/UI дизайнер, Fortifier), "О дизайн-процессе и трудностя...

Recently uploaded (20)

PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Lesson notes of climatology university.
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Cell Types and Its function , kingdom of life
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
Trump Administration's workforce development strategy
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
What if we spent less time fighting change, and more time building what’s rig...
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Supply Chain Operations Speaking Notes -ICLT Program
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Lesson notes of climatology university.
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Final Presentation General Medicine 03-08-2024.pptx
Complications of Minimal Access Surgery at WLH
Cell Types and Its function , kingdom of life
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Trump Administration's workforce development strategy
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
ChatGPT for Dummies - Pam Baker Ccesa007.pdf

Tatyana Matvienko,Senior Java Developer, Big data storages

Editor's Notes

  • #4: Materialized view, functions, procedures and triggers в RDBMS и что от этого ушли (пример про Oracle и финансовый отчет) Отказ от UPDATE в пользу INSERT за счет обновленного таймстемпа В силу предыдущего пункта данные принято называть time series Т.к. аналитика происходит за пределами БД (batch jobs), то желательно ничего не удалять, т.к. если в наших джобах будут какие-то ошибки или проблемы - мы всегда можем их прогнать снова и получить новые результаты Рассказать про основные источники time series данных
  • #5: Определение Коммуникационные протоколы -> master/slave architecture Single point of failure Распределение данных по кластеру, отказоустойчивость и репликация
  • #6: Напоминание про CAP теорему ++ Меня потом спрашивали после лекции, Нужно еще раз пояснить, что это не догма, а скорее важный принцип о котором не следует забывать Трактовать тот же Consistency можно по разному
  • #7: Проговорить традиционное понятие транзакции, расшифровать ACID Пройтись по пунктам: атомарность, консистентность, изолированность, доступность (пример: перевод денег на счет) Big Data storages появились относительно недавно, по сравнению с RDBMS Большое кол-во концепций и реализаций для разных задач Нормальные формы БД в RDBMS, здесь их нет, для аналитики вам нужны другие компоненты (а значит и их изучение, финансы на запуск и администрирование) Администрирование кластера само по себе более сложная вещь
  • #8: S3 - web service, HDFS - software S3 provides eventual consistency (read-after-write) S3 communication: REST and SOAP S3 replication: you don’t control it, but you can enable cross-region replication HDFS - master-slave architecture (Namenodes, datanodes) HDFS: files splitted into parts - blocks HDFS: automatic recovery Adding nodes to cluster is ok, but deleting is a challenge
  • #9: Здесь рассказать, почему sql запросы невозможно выполнять на NoSQL DBs (расшифровать понятие, пройтись по UPDATE, DELETE, COMMIT, ROLLBACK для примера)
  • #10: Здесь сказать про кеш на примере Redis: Open source In memory (Redis holds its database entirely in memory, using the disk only for persistence) Scalable All the Redis operations are atomic Rich set of data types
  • #11: Пример: MongoDB JSON-based documents (set of key-value pairs) Have dynamic schema Supports indexing and aggregation queries
  • #16: Нет смысла хранить все данные на каждом из узлов Как распределить их по кластеру, Hash Ring Вопрос сохранности данных: репликация
  • #17: Репликация асинхронна Протокол общения между нодам - Gossip Каждая нода может обрабатывать запросы. Нода, на которую пришел запрос, является координатором этого запроса Hinted handoff - если нода отпала, то какое-то время информация, которую ей нужно было передать, хранится и ждет, пока нода снова появится
  • #18: Partition key Clustering column Ordering