SlideShare a Scribd company logo
Amazon Athena overview
S o f t w a r e E n g i n e e r a t E P A M
R a m a n M a s k a l e n k a
3 0 Н О Я Б Р Я
© 2019 EPAM Systems, Inc.
Table of context
A M A Z O N A T H E N A O V E R V I E W
S U P P O R T E D D A T A T Y P E S
T E C H N O L O G I E S U N D E R T H E H O O D
S I M P L E U S E C A S E
I N T E G R A T I O N W I T H O T H E R S E R V I C E S
T H I N G S T O C O N S I D E R W H I L E U S I N G
A T H E N A
2
© 2019 EPAM Systems, Inc.
Amazon Athena Overview
• Serverless
• No need of setting up an infrastructure
• Zero Spin up time
• Transparent upgrades
• Interactive
• High execution speed of queries
• Descriptive error messages
SERVERLESS INTERACTIVE HIGHLY AVAILABLE SQL QUERY SERVICE
© 2019 EPAM Systems, Inc.
Amazon Athena Overview
• Highly available
• Athena uses warm compute pools across multiple Availability Zones
• Your data is stored in S3 which is also designed for availability
• Core effective
• Automatically parallelize queries
• Results are streamed to console
• Tuned for performance
SERVERLESS INTERACTIVE HIGHLY AVAILABLE SQL QUERY SERVICE
© 2019 EPAM Systems, Inc.
Amazon Athena Overview
• Uses ANSI SQL
• Supports complex joins, nested queries and window functions
• Supports Complex data types (arrays, structs)
• Supports partitioning by almost any key, except datetime timestamp
• Cost effective
• Pay per query
• $5 per TB scanned
SERVERLESS INTERACTIVE HIGHLY AVAILABLE SQL QUERY SERVICE
© 2019 EPAM Systems, Inc.
Supported data types
• Text files (CSV, raw)
• Apache Web Logs, TSV
• JSON (simple, nested)
• Compressed files
• Apache parquet & Apache ORC
© 2019 EPAM Systems, Inc.
Technologies under the hood
Originally created by Facebook for their data
analysis to run interactive queries on large
amount of data.
• In-memory distributed query engine, ANSI-
SQL compatible with extensions
• Used by Athena for SQL queries
7
Data warehouse software project built on top of
Apache Hadoop for providing data query and
analysis. Allows to run SQL queries over
distributed data.
• Used by Athena for Data definition language
(DDL) functionality
• Supports complex datatypes and multiple
formats
• Supports partitioning
© 2019 EPAM Systems, Inc.
Simple use case
8
© 2019 EPAM Systems, Inc.
Simple use case
9
© 2019 EPAM Systems, Inc.
Integration with other services
10
© 2019 EPAM Systems, Inc.
Things to consider while using Athena
• No data transformation is made in S3
• You can write complex regexes for table creation
• You don’t pay for data transformation
• You can store your data in compressed format to lower the costs
• Rich access control (IAM, ACL, S3 bucket policies)
• Can be integrated with a lot of Business intelligence (BI) tools
PROS
© 2019 EPAM Systems, Inc.
Things to consider while using Athena
• Canceled queries will cost money for the data scanned
• Queries are rounded up to the nearest MB, with a 10 MB minimum.
• Query execution cost will consist of S3 data read + Athena scanned data rates
• Not all Hive DDL’s are supported by Athena
• Hive or Presto transactions are not supported by Athena
• User-defined functions and stored procedures are not supported
CONS
© 2019 EPAM Systems, Inc.
© 2019 EPAM Systems, Inc.

More Related Content

PDF
Amazon Web Services: Lessons for Architecting Data in the Cloud
PDF
Denver AWS Users' Group meeting - September 2017
PPTX
AWS vs. AmazonV2
PPTX
Azure DocumentDB en Dev@Nights
PDF
Getting Started with Amazon EMR
PPTX
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
PPTX
AWS Canberra User Group - August 2019 Intro
PDF
AWS tutorial-Part5 to 10(Combined):Overview of various AWS services and offer...
Amazon Web Services: Lessons for Architecting Data in the Cloud
Denver AWS Users' Group meeting - September 2017
AWS vs. AmazonV2
Azure DocumentDB en Dev@Nights
Getting Started with Amazon EMR
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
AWS Canberra User Group - August 2019 Intro
AWS tutorial-Part5 to 10(Combined):Overview of various AWS services and offer...

Similar to Amazon Athena overview (20)

PDF
Introduction to Amazon Athena
PPTX
What is Amazon Athena
PPTX
Aws Atlanta meetup Amazon Athena
PPTX
Los Angeles AWS Users Group - Athena Deep Dive
PPTX
Amazon Athena Hands-On Workshop
PPTX
Building Data Lakes & Analytics on AWS
PPTX
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
PPTX
Athena & AWS Glue for AWS Data analytics.pptx
PDF
Amazon Athena (March 2017)
PDF
An overview of Amazon Athena
PPTX
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
PDF
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
PDF
2023 Databases AWS reInvent Launches.pdf
PDF
¿Quién es Amazon Web Services?
PDF
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
PDF
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
PDF
Amazon Athena (April 2017)
PDF
Picking the right AWS backend for your Java application (May 2017)
PDF
Athena java dev guide
PDF
Architecting Data in the AWS Ecosystem
Introduction to Amazon Athena
What is Amazon Athena
Aws Atlanta meetup Amazon Athena
Los Angeles AWS Users Group - Athena Deep Dive
Amazon Athena Hands-On Workshop
Building Data Lakes & Analytics on AWS
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Athena & AWS Glue for AWS Data analytics.pptx
Amazon Athena (March 2017)
An overview of Amazon Athena
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2023 Databases AWS reInvent Launches.pdf
¿Quién es Amazon Web Services?
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
Amazon Athena (April 2017)
Picking the right AWS backend for your Java application (May 2017)
Athena java dev guide
Architecting Data in the AWS Ecosystem
Ad

More from Vitebsk DSC (20)

PDF
Community-Z
PDF
How to exceed Customer's expectations by delivery complicated ML+RPA project
PDF
Аджайл майндсет. Что разрушает вашу команду?
PDF
Микросервисы со Spring Boot & Spring Cloud
PDF
Тестирование больших данных
PDF
Amazon SQS или не все костыли одинаково бесполезны
PDF
Typical BA Mistakes ​in documentation
PDF
Boring is Fun!
PDF
На пути к совершенному инжинирингу
PDF
Чего же ты хочешь, человек?
PDF
Растем вместе с eKIDS
PDF
Технологии беспилотных автомобилей
PDF
Оптимизация потребления памяти в Java - делаем уборку правильно
PDF
Управляем эволюцией на лету
PDF
Жизнь после promises
PDF
Выбираем стратегию создания бранчей
PDF
Reactive programming для успеха вашего стартапа
PDF
Экстремальная оптимизация производительности на примере MongoDB Java Driver
PDF
Проблемы с производительностью приложений на AngularJS и способы их решения
PDF
Микросервисы на практике
Community-Z
How to exceed Customer's expectations by delivery complicated ML+RPA project
Аджайл майндсет. Что разрушает вашу команду?
Микросервисы со Spring Boot & Spring Cloud
Тестирование больших данных
Amazon SQS или не все костыли одинаково бесполезны
Typical BA Mistakes ​in documentation
Boring is Fun!
На пути к совершенному инжинирингу
Чего же ты хочешь, человек?
Растем вместе с eKIDS
Технологии беспилотных автомобилей
Оптимизация потребления памяти в Java - делаем уборку правильно
Управляем эволюцией на лету
Жизнь после promises
Выбираем стратегию создания бранчей
Reactive programming для успеха вашего стартапа
Экстремальная оптимизация производительности на примере MongoDB Java Driver
Проблемы с производительностью приложений на AngularJS и способы их решения
Микросервисы на практике
Ad

Recently uploaded (20)

PPTX
1. Introduction to Computer Programming.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
A Presentation on Artificial Intelligence
PDF
Mushroom cultivation and it's methods.pdf
PDF
August Patch Tuesday
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Approach and Philosophy of On baking technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Assigned Numbers - 2025 - Bluetooth® Document
1. Introduction to Computer Programming.pptx
Hybrid model detection and classification of lung cancer
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Zenith AI: Advanced Artificial Intelligence
NewMind AI Weekly Chronicles - August'25-Week II
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
1 - Historical Antecedents, Social Consideration.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Enhancing emotion recognition model for a student engagement use case through...
A Presentation on Artificial Intelligence
Mushroom cultivation and it's methods.pdf
August Patch Tuesday
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Approach and Philosophy of On baking technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Getting Started with Data Integration: FME Form 101
Digital-Transformation-Roadmap-for-Companies.pptx
Hindi spoken digit analysis for native and non-native speakers
Assigned Numbers - 2025 - Bluetooth® Document

Amazon Athena overview

  • 1. Amazon Athena overview S o f t w a r e E n g i n e e r a t E P A M R a m a n M a s k a l e n k a 3 0 Н О Я Б Р Я
  • 2. © 2019 EPAM Systems, Inc. Table of context A M A Z O N A T H E N A O V E R V I E W S U P P O R T E D D A T A T Y P E S T E C H N O L O G I E S U N D E R T H E H O O D S I M P L E U S E C A S E I N T E G R A T I O N W I T H O T H E R S E R V I C E S T H I N G S T O C O N S I D E R W H I L E U S I N G A T H E N A 2
  • 3. © 2019 EPAM Systems, Inc. Amazon Athena Overview • Serverless • No need of setting up an infrastructure • Zero Spin up time • Transparent upgrades • Interactive • High execution speed of queries • Descriptive error messages SERVERLESS INTERACTIVE HIGHLY AVAILABLE SQL QUERY SERVICE
  • 4. © 2019 EPAM Systems, Inc. Amazon Athena Overview • Highly available • Athena uses warm compute pools across multiple Availability Zones • Your data is stored in S3 which is also designed for availability • Core effective • Automatically parallelize queries • Results are streamed to console • Tuned for performance SERVERLESS INTERACTIVE HIGHLY AVAILABLE SQL QUERY SERVICE
  • 5. © 2019 EPAM Systems, Inc. Amazon Athena Overview • Uses ANSI SQL • Supports complex joins, nested queries and window functions • Supports Complex data types (arrays, structs) • Supports partitioning by almost any key, except datetime timestamp • Cost effective • Pay per query • $5 per TB scanned SERVERLESS INTERACTIVE HIGHLY AVAILABLE SQL QUERY SERVICE
  • 6. © 2019 EPAM Systems, Inc. Supported data types • Text files (CSV, raw) • Apache Web Logs, TSV • JSON (simple, nested) • Compressed files • Apache parquet & Apache ORC
  • 7. © 2019 EPAM Systems, Inc. Technologies under the hood Originally created by Facebook for their data analysis to run interactive queries on large amount of data. • In-memory distributed query engine, ANSI- SQL compatible with extensions • Used by Athena for SQL queries 7 Data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Allows to run SQL queries over distributed data. • Used by Athena for Data definition language (DDL) functionality • Supports complex datatypes and multiple formats • Supports partitioning
  • 8. © 2019 EPAM Systems, Inc. Simple use case 8
  • 9. © 2019 EPAM Systems, Inc. Simple use case 9
  • 10. © 2019 EPAM Systems, Inc. Integration with other services 10
  • 11. © 2019 EPAM Systems, Inc. Things to consider while using Athena • No data transformation is made in S3 • You can write complex regexes for table creation • You don’t pay for data transformation • You can store your data in compressed format to lower the costs • Rich access control (IAM, ACL, S3 bucket policies) • Can be integrated with a lot of Business intelligence (BI) tools PROS
  • 12. © 2019 EPAM Systems, Inc. Things to consider while using Athena • Canceled queries will cost money for the data scanned • Queries are rounded up to the nearest MB, with a 10 MB minimum. • Query execution cost will consist of S3 data read + Athena scanned data rates • Not all Hive DDL’s are supported by Athena • Hive or Presto transactions are not supported by Athena • User-defined functions and stored procedures are not supported CONS
  • 13. © 2019 EPAM Systems, Inc.
  • 14. © 2019 EPAM Systems, Inc.