SlideShare a Scribd company logo
TESTING
BIG DATA
SOLUTIONS
FAST AND
FURIOUSLY
ABOUT ME
Dmitriy Sobko
Lead QA
Zoral
dmitriy.sobko@gmail.com
AGENDA
• Big Data
• BI / ETL
• DWH
• Cloud
• Testing concepts
• Framework example
First, we had data. Now
we have big data.
The more data there is,
the more you know about
things and the sharper
your decisions become
WHAT IS BIG DATA
BUSINESS INTELLIGENCE (BI)
• Know your data to make better
decisions
• Set of practices, architectures
and technologies for
gathering, processing and
analyzing the data
BI. CLOSER VIEW
• Daily transactions and correspondences are
recorded
• Records are collected in databases
• Data are processed and transformed into
usable information
• Information is analyzed to generate insight
ETL
• Extracts data from the multiple
and disparate source systems
such as records databases
• Transforms this data into usable
information for decision makers
• Loads the data into data
warehouses, from which end-
users can readily extract usable
data for query and analysis
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
INPUT CSV
STAGING TABLE
TARGET TABLE
REPORT
Amount of Spotify’s Delivered Events over time
https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
MOVING TO
CLOUD
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
https://www.alooma.com/blog/best-practices-for-migrating-data-from-on-prem-to-cloud
Worldwide Cloud IT Infrastructure Market Forecast
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
TEST TYPES
Accuracy Testing
Completeness Testing
Data Validation Testing
Metadata Testing
Performance Testing
DWHACCURACY TESTING
It checks whether the data is accurately transformed
and loaded from the source to the data warehouse
DWHCOMPLETENESS TESTING
This verifies whether all the data from the source are
loaded into the data warehouse
DATA VALIDATION TESTING
This assesses whether the values of the data post-
transformation are the same as their expected values
with respect to the source values
METADATA TESTING
This checks whether data retains its integrity up to the
metadata level — that is, its length, indexes,
constraints, and type
PERFORMANCE TESTING
• How long it takes to process streaming data and batch
data
• How long reports/datamarts/data feeds are calculated
• SLA
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
TEST APPROACHES
• Test on real data
• Test code with mocks/stubs
TEST ON REAL DATA
DWHTEST ON MOCKS/STUBS
MIXTURE OF
BOTH
APPROACHES
UNIT TESTS
"WordCount" should "work" in {
JobTest[com.spotify.scio.examples.WordCount.type]
.args("--input=in.txt", "--output=out.txt")
.input(TextIO("in.txt"), inData)
.output(TextIO("out.txt")) {
coll => coll should
containInAnyOrder(expected) ()
}
.run()
}
Check that method correctly process input data file
INTEGRATION TESTS
val stream = testStreamOf[GameActionInfo]
.advanceWatermarkTo(bTime) // add some elements ahead of
the watermark
.addElements( event(blue1, 3, Duration.standardSeconds(3)),
event(blue2, 2, Duration.standardMinutes(1)),
event(red1, 3, Duration.standardSeconds(22))
) // The watermark advances slightly, but not past the end of
the window
.advanceWatermarkTo(bTime.plus(Duration.standardMinutes(3))
)
Check that method correctly read data from streaming pipeline
ACCEPTANCE TESTS
• Make each test self-sufficient and
independent
• Rely on data contract, not
implementation
• Assert data as fully as possible
TESTS SHOULD BE
•Stable
•Resistant to constant
code changes
•Fast
•Extensible
•Easily supported
TECHNOLOGY
STACK
KOTLIN
Kotlin is a general purpose, open
source, statically typed “pragmatic”
programming language for the JVM
that combines object-oriented and
functional programming features.
It is focused on interoperability, safety,
clarity, and tooling support.
SPRING
Spring Boot makes it easy to create
stand-alone, production-grade Spring
based applications that you can “just
run”.
The same for testing frameworks -
you can get started with minimum
fuss and with very little pre-
configuration.
CUCUMBER
Cucumber is a software tool to run
automated tests written in a behavior-
driven development (BDD) style.
Central to the Cucumber BDD
approach is its plain language parser
called Gherkin. It allows expected
software behaviors to be specified in
a logical language that customers can
understand.
GRADLE
Gradle is an open-source build
automation tool focused on flexibility
and performance.
Gradle build scripts are written using
a Groovy or Kotlin DSL.
COURGETTE TEST RUNNER
Courgette Test Runner is an
extension of Cucumber-JVM with
added capabilities to run Cucumber
tests in parallel on a feature level or
on a scenario level.
CODE
HOW AUTOTEST LOOKS LIKE
Feature: River project test feature
Scenario: Check Alpha feed
Given I check Alpha name field is correct
And I check Alpha views field is correct
And I check Alpha xViews field is correct
And I check Alpha yViews field is correct
And I check Alpha otherViews field is correct
And I check Alpha reportDate field is correct
Scenario: Check Beta feed
Given I check Beta passName field is correct
And I check Beta views field is correct
And I check Beta channelName field is correct
And I check Beta reportDate field is correct
HOW CODE LOOKS LIKE
@Given("^I check Alpha views field is correct$")
fun assertAlphaViewsField() {
service.checkAlphaViewsField()
}
fun checkAlphaViewsField() =
execCheckCountQuery(ALPHA_VIEWS_FIELD)
HOW RUNNER LOOKS LIKE
@RunWith(Courgette::class)
@CourgetteOptions(threads = 4,
runLevel = CourgetteRunLevel.FEATURE,
rerunFailedScenarios = false,
cucumberOptions = CucumberOptions(features =
arrayOf("resources/features"),
glue = arrayOf("com.dsobko.test"),
tags = arrayOf("@Ready", "~@Bug"),
plugin = arrayOf("pretty",
"html:build/cucumber-report")))
object CucumberFeaturesRunner
TEST REPORT
ALTERNATIVE SOLUTIONS
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
LINKS
https://labs.spotify.com/2016/03/10/spotifys-event-
delivery-the-road-to-the-cloud-part-iii/
https://kotlinlang.org/
https://spring.io/projects/spring-boot
https://cucumber.io/
THANKS

More Related Content

PPTX
Testing Big Data solutions fast and furiously
PDF
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
PPTX
DeNA West & BigQuery
PPTX
Introduction to GraphQL
PPTX
GraphQL - hot or not? How to simplify API based services?
PDF
Change Data Capture Pipelines with Debezium and Kafka Streams (Gunnar Morling...
PDF
Best Practices for Building Open Source Data Layers
PDF
Resume sailaja
Testing Big Data solutions fast and furiously
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
DeNA West & BigQuery
Introduction to GraphQL
GraphQL - hot or not? How to simplify API based services?
Change Data Capture Pipelines with Debezium and Kafka Streams (Gunnar Morling...
Best Practices for Building Open Source Data Layers
Resume sailaja

Similar to QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously (20)

PDF
PASS 2024 - Best Practices for Development on Azure Databricks
PPTX
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
PDF
Test Automation for Data Warehouses
PDF
How to Automate your Enterprise Application / ERP Testing
PDF
Measuring Your Code
PDF
Measuring Your Code 2.0
PDF
Taming the shrew Power BI
DOCX
ShwetaKumar_ETLBITesting_3.7yr_faridabad
PPTX
Cerberus_Presentation1
PPTX
Cerberus : Framework for Manual and Automated Testing (Web Application)
PDF
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
PDF
The Mechanics of Testing Large Data Pipelines
PDF
StarWest 2019 - End to end testing: Stupid or Legit?
DOCX
Pradeep_resume_ETL Testing
PPTX
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
PPTX
Measure() or die()
PPTX
Measure() or die()
PDF
Modernizing Testing as Apps Re-Architect
PPT
The Magic Of Application Lifecycle Management In Vs Public
PPTX
IPC Data Analysis and Extraction
PASS 2024 - Best Practices for Development on Azure Databricks
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Test Automation for Data Warehouses
How to Automate your Enterprise Application / ERP Testing
Measuring Your Code
Measuring Your Code 2.0
Taming the shrew Power BI
ShwetaKumar_ETLBITesting_3.7yr_faridabad
Cerberus_Presentation1
Cerberus : Framework for Manual and Automated Testing (Web Application)
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
The Mechanics of Testing Large Data Pipelines
StarWest 2019 - End to end testing: Stupid or Legit?
Pradeep_resume_ETL Testing
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Measure() or die()
Measure() or die()
Modernizing Testing as Apps Re-Architect
The Magic Of Application Lifecycle Management In Vs Public
IPC Data Analysis and Extraction
Ad

More from QAFest (20)

PDF
QA Fest 2019. Сергій Короленко. Топ веб вразливостей за 40 хвилин
PPTX
QA Fest 2019. Анна Чернышова. Self-healing test automation 2.0. The Future
PPTX
QA Fest 2019. Doug Sillars. It's just too Slow: Testing Mobile application pe...
PDF
QA Fest 2019. Катерина Спринсян. Параллельное покрытие автотестами и другие и...
PDF
QA Fest 2019. Никита Галкин. Как зарабатывать больше
PDF
QA Fest 2019. Сергей Пирогов. Why everything is spoiled
PDF
QA Fest 2019. Сергей Новик. Между мотивацией и выгоранием
PPTX
QA Fest 2019. Владимир Никонов. Код Шредингера или зачем и как мы тестируем н...
PPTX
QA Fest 2019. Владимир Трандафилов. GUI automation of WEB application with SV...
PDF
QA Fest 2019. Иван Крутов. Bulletproof Selenium Cluster
PPTX
QA Fest 2019. Николай Мижигурский. Миссия /*не*/выполнима: гуманитарий собесе...
PDF
QA Fest 2019. Володимир Стиран. Чим раніше – тим вигідніше, але ніколи не піз...
PPTX
QA Fest 2019. Дмитрий Прокопук. Mocks and network tricks in UI automation
PDF
QA Fest 2019. Екатерина Дядечко. Тестирование медицинского софта — вызовы и в...
PDF
QA Fest 2019. Катерина Черникова. Tune your P’s: the pop-art of keeping testa...
PDF
QA Fest 2019. Алиса Бойко. Какнезапутаться в коммуникативных сетях IT
PPTX
QA Fest 2019. Святослав Логин. Как найти уязвимости в мобильном приложении
PPTX
QA Fest 2019. Катерина Шепелєва та Інна Оснач. Що українцям потрібно знати пр...
PDF
QA Fest 2019. Антон Серпутько. Нагрузочное тестирование распределенных асинхр...
PPTX
QA Fest 2019. Петр Тарасенко. QA Hackathon - The Cookbook 22
QA Fest 2019. Сергій Короленко. Топ веб вразливостей за 40 хвилин
QA Fest 2019. Анна Чернышова. Self-healing test automation 2.0. The Future
QA Fest 2019. Doug Sillars. It's just too Slow: Testing Mobile application pe...
QA Fest 2019. Катерина Спринсян. Параллельное покрытие автотестами и другие и...
QA Fest 2019. Никита Галкин. Как зарабатывать больше
QA Fest 2019. Сергей Пирогов. Why everything is spoiled
QA Fest 2019. Сергей Новик. Между мотивацией и выгоранием
QA Fest 2019. Владимир Никонов. Код Шредингера или зачем и как мы тестируем н...
QA Fest 2019. Владимир Трандафилов. GUI automation of WEB application with SV...
QA Fest 2019. Иван Крутов. Bulletproof Selenium Cluster
QA Fest 2019. Николай Мижигурский. Миссия /*не*/выполнима: гуманитарий собесе...
QA Fest 2019. Володимир Стиран. Чим раніше – тим вигідніше, але ніколи не піз...
QA Fest 2019. Дмитрий Прокопук. Mocks and network tricks in UI automation
QA Fest 2019. Екатерина Дядечко. Тестирование медицинского софта — вызовы и в...
QA Fest 2019. Катерина Черникова. Tune your P’s: the pop-art of keeping testa...
QA Fest 2019. Алиса Бойко. Какнезапутаться в коммуникативных сетях IT
QA Fest 2019. Святослав Логин. Как найти уязвимости в мобильном приложении
QA Fest 2019. Катерина Шепелєва та Інна Оснач. Що українцям потрібно знати пр...
QA Fest 2019. Антон Серпутько. Нагрузочное тестирование распределенных асинхр...
QA Fest 2019. Петр Тарасенко. QA Hackathon - The Cookbook 22
Ad

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PPTX
master seminar digital applications in india
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Types and Its function , kingdom of life
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Lesson notes of climatology university.
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Pre independence Education in Inndia.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PDF
01-Introduction-to-Information-Management.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
RMMM.pdf make it easy to upload and study
master seminar digital applications in india
PPH.pptx obstetrics and gynecology in nursing
Cell Types and Its function , kingdom of life
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Lesson notes of climatology university.
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Pre independence Education in Inndia.pdf
Insiders guide to clinical Medicine.pdf
Sports Quiz easy sports quiz sports quiz
01-Introduction-to-Information-Management.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
VCE English Exam - Section C Student Revision Booklet
2.FourierTransform-ShortQuestionswithAnswers.pdf
Renaissance Architecture: A Journey from Faith to Humanism
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx

QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously