SlideShare a Scribd company logo
Tonight’s Meetup
Michael Sexton
"Testing Big Data in AWS"
23rd September 2021
Travel Plan
What is Big Data in AWS
What is Testing Big Data in AWS Cloud
Types of Testing
Deployment Testing and Deployments
Tools & Automation
Challenges
Conclusion
Q & A
Testing Big Data in AWS - Sept 2021
What is Big Data?
- IoT, Netflix,
Cybersecurity,
Social Media,
EuroVision,
Traffic on
GoogleMaps,
Machine
Learning, Video,
Communications,
etc.
https://bigdatapath.wordpress.com/
What Work is There?
Senior QA Engineer - Data Analytics Software
- Dublin - Selenium/Python - 65k - 75k
Senior SDET - Big Data/Analytics -
Dublin/Remote - Python, System Testing,
Performance Testing - 80k - 90k
Lead SDET - Big Data/Analytics -
Dublin/Remote - Python, System Testing,
Performance Testing - 90k - 100k
Senior QA Automation Engineer - Data
Management Software - Galway/Remote -
Python/Selenium 60k - 70k Reperio
Human Capital
Who Are the Cloud Providers?
- Various cloud providers: AWS, Azure, Google
Cloud, Alibaba, IBM, Dell, Tencent
What is Big Data in AWS?
Services Provided By AWS
What is Testing Big Data in AWS?
- Testing application that carries data works
well (no anomalies)
- Functional, Performance testing
- Testing of migration from on-prem to the
cloud
- Verifying resultant big data analysis is
correct
Functional Testing
- Testing with varied valid and invalid input
- Boundary cases, Calculations
- Scripts - Latin, Sanskrit, Arabic, encoded
- Testing against existing on-prem results
- Failure cases
Performance and SEcurity Testing
- Data ingestion (many different sources of
data e.g. v1 and v3)
- Data processing and throughput - soak
- Sub-component performance
- Security of pipeline and stored data
- Robustness (no data arrives - what then?)
Pythagorean Cup
Deployment Testing and DEployments
- Deployment Testing (upgrades, timings)
- Code merged to Master branch, deploy
scripts & documentation written & tested
- Are go-to person for devops during
deployment and upgrades
- Staging Environment, Production Environment
- Integration testing with other teams.
Tools & Automation
- AWS EC2 machine with linux/python scripts
- Pytest/Robotframework for automation &
regression testing
- AWS eco system (Athena, CloudWatch, X-Ray,
QuickSight)
- EXCEL (max=1048576 rows) & VLOOKUP
- Pyspark
Challenges
- Large varied dataset
- Automation and scripting skills
- Costs & AWS Knowledge
- Knowing the big picture
- Deployments to staging and production
- Communicating with other teams
Conclusion
- What is Big Data … in AWS?
- What is Big Data Testing in AWS
- Types of Testing (Functional & Performance)
- Deployment Testing & Deployments
- Tools & Automation
- Challenges
Any Questions?
Thanks
Ministry of Testing
Poppulo
Twitter: @MinistryCork

More Related Content

PDF
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
PDF
How a Small Team Scales Instagram
PDF
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
PDF
Tackling a 1 billion member social network
PDF
Using Spark ML on Spark Errors - What do the clusters tell us?
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
PDF
Microservices, Containers, and Machine Learning
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
How a Small Team Scales Instagram
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Tackling a 1 billion member social network
Using Spark ML on Spark Errors - What do the clusters tell us?
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Microservices, Containers, and Machine Learning

What's hot (20)

PDF
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
KEY
The Secrets of Building Realtime Big Data Systems
PDF
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
PDF
Apache storm vs. Spark Streaming
PDF
Machine Learning With H2O vs SparkML
PDF
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
PDF
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
PDF
Scalding - Big Data Programming with Scala
PPTX
Functional Comparison and Performance Evaluation of Streaming Frameworks
PPTX
A Developer’s View into Spark's Memory Model with Wenchen Fan
PDF
Planet-scale Data Ingestion Pipeline: Bigdam
PDF
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
PDF
Beyond Parallelize and Collect by Holden Karau
PDF
Hdfs high availability
PPTX
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
PDF
[253] apache ni fi
PDF
AWS Summit Milan - AWS RDS for your data (and your sleep)
PDF
Apache Spark Performance: Past, Future and Present
PDF
Journeys from Kafka to Parquet
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
The Secrets of Building Realtime Big Data Systems
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Machine Learning with H2O, Spark, and Python at Strata 2015
Apache storm vs. Spark Streaming
Machine Learning With H2O vs SparkML
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Scalding - Big Data Programming with Scala
Functional Comparison and Performance Evaluation of Streaming Frameworks
A Developer’s View into Spark's Memory Model with Wenchen Fan
Planet-scale Data Ingestion Pipeline: Bigdam
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Beyond Parallelize and Collect by Holden Karau
Hdfs high availability
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
[253] apache ni fi
AWS Summit Milan - AWS RDS for your data (and your sleep)
Apache Spark Performance: Past, Future and Present
Journeys from Kafka to Parquet
Ad

Similar to Testing Big Data in AWS - Sept 2021 (20)

PPTX
BigData Testing by Shreya Pal
PDF
Cloud as a Data Platform
PPTX
Testing In Production (TiP) Advances with Big Data and the Cloud
PPTX
Testing In Production (TiP) Advances with Big Data & the Cloud
PPTX
Big Data – A New Testing Challenge
PDF
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
PDF
Infographic Things You Should Know About Big Data Testing
PPTX
Big data solutions on cloud – the way forward
PPTX
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
PPTX
Lessons learned from designing a QA Automation for analytics databases (big d...
PPTX
flight data analysis using big data
PDF
Big Data Analytics Lecture notes pdf notes
PPTX
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
PPTX
Big Data Analytics
PPT
Information Security Analytics
PDF
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
PDF
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
PPTX
Solving Big Data problems on AWS by Rajnish Malik
PDF
Big data testing (1)
PDF
Big Data for Data Scientists - WeCloudData
BigData Testing by Shreya Pal
Cloud as a Data Platform
Testing In Production (TiP) Advances with Big Data and the Cloud
Testing In Production (TiP) Advances with Big Data & the Cloud
Big Data – A New Testing Challenge
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
Infographic Things You Should Know About Big Data Testing
Big data solutions on cloud – the way forward
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Lessons learned from designing a QA Automation for analytics databases (big d...
flight data analysis using big data
Big Data Analytics Lecture notes pdf notes
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Big Data Analytics
Information Security Analytics
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Solving Big Data problems on AWS by Rajnish Malik
Big data testing (1)
Big Data for Data Scientists - WeCloudData
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to machine learning and Linear Models
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to machine learning and Linear Models
Major-Components-ofNKJNNKNKNKNKronment.pptx
Fluorescence-microscope_Botany_detailed content
Business Acumen Training GuidePresentation.pptx
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
IB Computer Science - Internal Assessment.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Database Infoormation System (DBIS).pptx
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Supervised vs unsupervised machine learning algorithms
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

Testing Big Data in AWS - Sept 2021

  • 1. Tonight’s Meetup Michael Sexton "Testing Big Data in AWS" 23rd September 2021
  • 2. Travel Plan What is Big Data in AWS What is Testing Big Data in AWS Cloud Types of Testing Deployment Testing and Deployments Tools & Automation Challenges Conclusion Q & A
  • 4. What is Big Data? - IoT, Netflix, Cybersecurity, Social Media, EuroVision, Traffic on GoogleMaps, Machine Learning, Video, Communications, etc. https://bigdatapath.wordpress.com/
  • 5. What Work is There? Senior QA Engineer - Data Analytics Software - Dublin - Selenium/Python - 65k - 75k Senior SDET - Big Data/Analytics - Dublin/Remote - Python, System Testing, Performance Testing - 80k - 90k Lead SDET - Big Data/Analytics - Dublin/Remote - Python, System Testing, Performance Testing - 90k - 100k Senior QA Automation Engineer - Data Management Software - Galway/Remote - Python/Selenium 60k - 70k Reperio Human Capital
  • 6. Who Are the Cloud Providers? - Various cloud providers: AWS, Azure, Google Cloud, Alibaba, IBM, Dell, Tencent
  • 7. What is Big Data in AWS?
  • 9. What is Testing Big Data in AWS? - Testing application that carries data works well (no anomalies) - Functional, Performance testing - Testing of migration from on-prem to the cloud - Verifying resultant big data analysis is correct
  • 10. Functional Testing - Testing with varied valid and invalid input - Boundary cases, Calculations - Scripts - Latin, Sanskrit, Arabic, encoded - Testing against existing on-prem results - Failure cases
  • 11. Performance and SEcurity Testing - Data ingestion (many different sources of data e.g. v1 and v3) - Data processing and throughput - soak - Sub-component performance - Security of pipeline and stored data - Robustness (no data arrives - what then?)
  • 13. Deployment Testing and DEployments - Deployment Testing (upgrades, timings) - Code merged to Master branch, deploy scripts & documentation written & tested - Are go-to person for devops during deployment and upgrades - Staging Environment, Production Environment - Integration testing with other teams.
  • 14. Tools & Automation - AWS EC2 machine with linux/python scripts - Pytest/Robotframework for automation & regression testing - AWS eco system (Athena, CloudWatch, X-Ray, QuickSight) - EXCEL (max=1048576 rows) & VLOOKUP - Pyspark
  • 15. Challenges - Large varied dataset - Automation and scripting skills - Costs & AWS Knowledge - Knowing the big picture - Deployments to staging and production - Communicating with other teams
  • 16. Conclusion - What is Big Data … in AWS? - What is Big Data Testing in AWS - Types of Testing (Functional & Performance) - Deployment Testing & Deployments - Tools & Automation - Challenges

Editor's Notes

  • #19: The Ministry of Testing aims to change and lead within the software testing world.   We are doing this through a strong focus on learning, collaboration and resources.   You are part of the story too, we hope you can join us along the way. We run several conferences around the world under the name of TestBash. We also over an e-learning platform called the Dojo. The Dojo offer numerous online courses for you to improve your testing skills.