SlideShare a Scribd company logo
1CONFIDENTIAL
Real complex infrastructure
solution for Hadoop Big data and
AWS with Cloudera CDH 5.x
May, 2017
2CONFIDENTIAL
CLIENT
• Epam is responsible for system engineering of
enterprise data lake. based on the Hadoop
technology stack.
• Epam is responsible for ETL implementation from
other internal system as well as external data
providers
• Epam is responsible for development and
maintainance one of the key areas – Customer Value
Management flows.
• Epam developed flexible store level dashboards to
provide real-time insight into sales process based
on Tableaus
CLIENT turned to Epam in order to resolve the
performance and stability issues linked to the
growing volumes of data.
Also CLIENT has business requirements to
implement advanced analytics and advanced
techniques of sales.
BUSINESS PROBLEM IMPLEMENTATION
3CONFIDENTIAL
Before EPAM
4CONFIDENTIAL
Proposed state
5CONFIDENTIAL
EMR vs CLOUDERA CDH 5.x
6CONFIDENTIAL
CLOUDERA CDH 5.x
7CONFIDENTIAL
HUE with CDH-5
[desktop]
#database_logging=True
django_debug_mode=True
collect_usage=False
use_new_editor=True
use_new_side_panels=True
app_blacklist=spark,zookeeper,search,indexer,sqoop,pig,jobsub,rdbms
[[auth]]
backend=desktop.auth.backend.PamBackend,desktop.auth.backend.AllowFirstUserDjangoBack
end
pam_service=sudo sshd login
#idle_session_timeout=120
[[session]]
expire_at_browser_close=True
[hbase]
#hbase_conf_dir=/etc/hbase/conf
hbase_conf_dir={{HBASE_CONF_DIR}}
hbase_cluster=(Peach|ip-172-31-46-118.eu-west-1.compute.internal:9090)
[impala]
[[ssl]]
enabled=true
validate=false
[beeswax]
hive_server_host=ip-172-31-46-119.eu-west-1.compute.internal
[[ssl]]
enabled=false
validate=false
8CONFIDENTIAL
AWS CLOUD FORMATION
9CONFIDENTIAL
LET’S Encrypt implementation
# renew_before_expiry = 30 days
version = 0.13.0
archive_dir =
/etc/letsencrypt/archive/cdm.aptest.CLIENT.com
cert =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/cert.pem
privkey =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/privkey.pe
m
chain =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/chain.pe
m
fullchain =
/etc/letsencrypt/live/cdm.aptest.CLIENT.com/fullchain.
pem
# Options used in the renewal process
[renewalparams]
authenticator = standalone
installer = None
account = 69eadfb4d56ff298317fea965987659a
standalone_supported_challenges = http-01
10CONFIDENTIAL
Zabbix implementation
11CONFIDENTIAL
Chef server implementation
name 'users'
maintainer 'Chef Software, Inc.'
maintainer_email 'cookbooks@chef.io'
license 'Apache 2.0'
description 'Creates users from a databag search'
long_description IO.read(File.join(File.dirname(__FILE__), 'README.md'))
version '1.8.3'
recipe 'users::default', 'Empty recipe for including LWRPs'
recipe 'users::sysadmins', 'Create and manage sysadmin group'
%w( ubuntu debian redhat centos fedora freebsd mac_os_x scientific oracle
amazon ).each do |os|
supports os
end
source_url 'https://github.com/chef-cookbooks/users' if respond_to?(:source_url)
issues_url 'https://github.com/chef-cookbooks/users/issues' if
respond_to?(:issues_url)
12CONFIDENTIAL
Jenkins implementation
13CONFIDENTIAL
Tableau Implementation
14CONFIDENTIAL
Tableau Implementation
15CONFIDENTIAL
Kerberos Implementation
16CONFIDENTIAL
CLIENT
Server deployment diagram
Key features:
Internet
Client pc’s
Bastion

Hive2
Data
Node
Node
man.
NAT
Hue
Resource
Manager
Cloudera
Zabbix
Jenkins
Chef-
Server
Hue LB
Tableau LB
Tableau DC1
Tableau DC2
Tableau Prod
Tableau Worker1
Tableau Worker2
Tableau Backup
Name
Node
Oozie
Peach-Cluster-n1
…..
Peach-Cluster-n4
Test
nodes
Jupyter
HIveLB
R-Studio
Peach Hue LB
Peach HIve2 LB
Peach CHD5 LB
- Production and staging environments
- HA mode(Hadoop and Tableau)
- DMZ configuration
- Centralized configuration and
monitoring
- Dedicated analytics server DMZ
configuration( Jupyter and R studio)
- CM env
17CONFIDENTIAL
• Data integration projects of implementing
new platform into enterprise fabric based
on Enterprise Data Hub
• Developed Security Model for Big Data
solution (Kerberos)
• Implemented production and staging
environment
• Rapid ETL Development by using Python,
Pig, Hive, MapReduce over Hadoop
• >30 ETL jobs
• Bringing unstructured and semi-structured
data
• Integration with Enterprise Infrastructure
• Using Tableau for rich data visualization
CLIENT: BIG DATA EXPERIENCE
KEY TECHNICAL HIGHLIGHTS
18CONFIDENTIAL
EPAM BIG DATA COMPETENCY CENTER
Big Data Architecture Design, Implementation, and SupportBig Data CC value for clients
Deep expertise with cutting edge technologiesTop Facts
• Data Strategy, Data Governance Consulting
• Data Hub/Lake architecture
• Advanced Solutions Development, Predictive & Prescriptive
Analytics
• Infrastructure Implementation and Integration with Enterprise
Security
• Bi Data Solutions & Platform Support
• 300+ Engineers, Architects and Consultants
• 50+ Successfully delivered Big Data and HPC solutions
• 10+ years of BI product development history (for SAP, Oracle,
Pentaho)
• We understand business and how to make BI & BigData
technology work
• Our design is straightforward while following industry best
practices
• EPAM process is interactive, iterative and highly effective
• Quick development and timely implementation
• Proven delivery approach
19CONFIDENTIAL
EPAM OFFERINGTOP MESSAGES AND KEY FACTS
TECHNOLOGIES
EPAM Data Science includes Data Scientists and Senior Solution Architects
with MS/PhD Degrees in Applied Math, Physics, Computer Science, &
Predictive Analytics. Along with a strong mathematical background, the
group has vertical expertise in multiple industries and extensive practical
software development skills related to Big Data and conventional DW
technologies
• 100+ Data Scientists
• 20+ Data Modelers
• 10+ Data Strategists
• 50+ successfully delivered projects
TOP STORIES
DATA SCIENCE: DISCOVER AND PREDICT
Services
• Large scale information
solutions design
• Predictive model building and
validation
• Customer segmentation
analysis
• Data profiling and preparation
• Dimension reduction
techniques
Mathematical foundation
• Probability, statistics, and
stochastic processes
• Supervised and unsupervised
machine learning techniques
• Numerical methods and
implementations
• Optimization theory
Predictive models: linear and logistic regression, decision trees, clustering, naïve bayes, support vector machines,
neural networks, kernel estimation, panel data analysis, survival/duration analysis, and time series analysis

More Related Content

PDF
Consolidate your SAP System landscape Teched && d-code 2014
PDF
Aerospike meetup july 2019 | Big Data Demystified
PDF
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...
PPS
Dell solutions for SAP, SAP HANA
PPTX
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
PPTX
Datacenter 2014: HP - Brian Andersen
PDF
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
PDF
SAP IQ 16 Product Annoucement
Consolidate your SAP System landscape Teched && d-code 2014
Aerospike meetup july 2019 | Big Data Demystified
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...
Dell solutions for SAP, SAP HANA
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Datacenter 2014: HP - Brian Andersen
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
SAP IQ 16 Product Annoucement

What's hot (20)

PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
PDF
Hadoop in the Enterprise: Legacy Rides the Elephant
PDF
Modernize Your Oracle Environment with an Agile Data Infrastructure
PPTX
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
PDF
ASE Tempdb Performance and Tuning
PDF
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
PPTX
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
PPTX
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
PPTX
Oracle Database Consolidation with FlexPod on Cisco UCS
PDF
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
PDF
Apache Spark Workshop at Hadoop Summit
PDF
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
PDF
Top 10 Tips for an Effective Postgres Deployment
 
PPTX
Replacing Oracle with EDB Postgres
 
PDF
Best Practices for a Complete Postgres Enterprise Architecture Setup
 
PPTX
Why IBM Power for SAP by John Hedge
PPTX
Change management in hybrid landscapes
PPTX
An Expert Guide to Migrating Legacy Databases to PostgreSQL
 
PPTX
Google cloud certification data engineer
PDF
DevOps Culture & Enablement with Postgres Plus Cloud Database
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Hadoop in the Enterprise: Legacy Rides the Elephant
Modernize Your Oracle Environment with an Agile Data Infrastructure
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
ASE Tempdb Performance and Tuning
FlexPod Datacenter for Oracle’s JD Edwards EnterpriseOne
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
Oracle Database Consolidation with FlexPod on Cisco UCS
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
Apache Spark Workshop at Hadoop Summit
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Top 10 Tips for an Effective Postgres Deployment
 
Replacing Oracle with EDB Postgres
 
Best Practices for a Complete Postgres Enterprise Architecture Setup
 
Why IBM Power for SAP by John Hedge
Change management in hybrid landscapes
An Expert Guide to Migrating Legacy Databases to PostgreSQL
 
Google cloud certification data engineer
DevOps Culture & Enablement with Postgres Plus Cloud Database
 
Ad

Similar to ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурное решение для Hadoop Big data и AWS с Cloudera CDH 5.x" (20)

PPTX
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
PPTX
Skilwise Big data
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PPTX
Skillwise Big Data part 2
PPTX
How to Increase Performance in IBM Cognos
PPTX
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
PPTX
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
PPTX
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
DOC
Robin_Hadoop
PDF
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
PPT
IBMHadoopofferingTechline-Systems2015
PDF
flexpod_hadoop_cloudera
PPTX
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
PPTX
Oracle Big Data Appliance and Big Data SQL for advanced analytics
PDF
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
DOC
Naman_Abinitio_7757021406
PPTX
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
DOC
PradeepDWH
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Skilwise Big data
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Skillwise Big Data part 2
How to Increase Performance in IBM Cognos
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Robin_Hadoop
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
IBMHadoopofferingTechline-Systems2015
flexpod_hadoop_cloudera
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Oracle Big Data Appliance and Big Data SQL for advanced analytics
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...
Naman_Abinitio_7757021406
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
PradeepDWH
Ad

More from epamspb (13)

PDF
Mobile Open Day: React Native: Crossplatform fast dive
PDF
Mobile Open Day: Things I wish I'd known about Core Data before getting married
PPTX
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
PPTX
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
PPTX
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
PPTX
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
PPTX
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
PPTX
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
PPTX
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
PPTX
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
PPTX
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
PPTX
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
PPTX
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...
Mobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: Things I wish I'd known about Core Data before getting married
#ITsubbotnik Spring 2017: Sergey Chibirev/Andrei Ortyashov "Умный дом своими ...
#ITsubbotnik Spring 2017: Stepan Rakitin "Создаем отказоустойчивые распределе...
#ITsubbotnik Spring 2017: Rustam Kadyrov "Как приструнить зоопарк из микросер...
#ITsubbotnik Spring 2017: Sergey Chernolyas "JPA for NoSQL"
#ITsubbotnik Spring 2017: Roman Iovlev "Java edge in test automation"
#ITsubbotnik Spring 2017: Dmitrii Nikitko "Deep learning for understanding of...
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Mikhail Khludnev "Search like %SQL%"
#ITsubbotnik Spring 2017: Andriy Filatov "Ансамбль солёных поваров: сравнивае...
#ITsubbotnik Spring 2017: Anton Shapin, Denis Klykov "Visualization, storage ...
#ITsubbotnik Spring 2017: Sergey Mishanin "Report Portal. Руководство для аде...

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Digital Strategies for Manufacturing Companies
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
history of c programming in notes for students .pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Understanding Forklifts - TECH EHS Solution
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
CHAPTER 2 - PM Management and IT Context
2025 Textile ERP Trends: SAP, Odoo & Oracle
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Digital Strategies for Manufacturing Companies
How Creative Agencies Leverage Project Management Software.pdf
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms II-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 41
history of c programming in notes for students .pptx
PTS Company Brochure 2025 (1).pdf.......
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Odoo POS Development Services by CandidRoot Solutions
Understanding Forklifts - TECH EHS Solution
How to Choose the Right IT Partner for Your Business in Malaysia
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
ManageIQ - Sprint 268 Review - Slide Deck
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...

ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурное решение для Hadoop Big data и AWS с Cloudera CDH 5.x"

  • 1. 1CONFIDENTIAL Real complex infrastructure solution for Hadoop Big data and AWS with Cloudera CDH 5.x May, 2017
  • 2. 2CONFIDENTIAL CLIENT • Epam is responsible for system engineering of enterprise data lake. based on the Hadoop technology stack. • Epam is responsible for ETL implementation from other internal system as well as external data providers • Epam is responsible for development and maintainance one of the key areas – Customer Value Management flows. • Epam developed flexible store level dashboards to provide real-time insight into sales process based on Tableaus CLIENT turned to Epam in order to resolve the performance and stability issues linked to the growing volumes of data. Also CLIENT has business requirements to implement advanced analytics and advanced techniques of sales. BUSINESS PROBLEM IMPLEMENTATION
  • 7. 7CONFIDENTIAL HUE with CDH-5 [desktop] #database_logging=True django_debug_mode=True collect_usage=False use_new_editor=True use_new_side_panels=True app_blacklist=spark,zookeeper,search,indexer,sqoop,pig,jobsub,rdbms [[auth]] backend=desktop.auth.backend.PamBackend,desktop.auth.backend.AllowFirstUserDjangoBack end pam_service=sudo sshd login #idle_session_timeout=120 [[session]] expire_at_browser_close=True [hbase] #hbase_conf_dir=/etc/hbase/conf hbase_conf_dir={{HBASE_CONF_DIR}} hbase_cluster=(Peach|ip-172-31-46-118.eu-west-1.compute.internal:9090) [impala] [[ssl]] enabled=true validate=false [beeswax] hive_server_host=ip-172-31-46-119.eu-west-1.compute.internal [[ssl]] enabled=false validate=false
  • 9. 9CONFIDENTIAL LET’S Encrypt implementation # renew_before_expiry = 30 days version = 0.13.0 archive_dir = /etc/letsencrypt/archive/cdm.aptest.CLIENT.com cert = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/cert.pem privkey = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/privkey.pe m chain = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/chain.pe m fullchain = /etc/letsencrypt/live/cdm.aptest.CLIENT.com/fullchain. pem # Options used in the renewal process [renewalparams] authenticator = standalone installer = None account = 69eadfb4d56ff298317fea965987659a standalone_supported_challenges = http-01
  • 11. 11CONFIDENTIAL Chef server implementation name 'users' maintainer 'Chef Software, Inc.' maintainer_email 'cookbooks@chef.io' license 'Apache 2.0' description 'Creates users from a databag search' long_description IO.read(File.join(File.dirname(__FILE__), 'README.md')) version '1.8.3' recipe 'users::default', 'Empty recipe for including LWRPs' recipe 'users::sysadmins', 'Create and manage sysadmin group' %w( ubuntu debian redhat centos fedora freebsd mac_os_x scientific oracle amazon ).each do |os| supports os end source_url 'https://github.com/chef-cookbooks/users' if respond_to?(:source_url) issues_url 'https://github.com/chef-cookbooks/users/issues' if respond_to?(:issues_url)
  • 16. 16CONFIDENTIAL CLIENT Server deployment diagram Key features: Internet Client pc’s Bastion Hive2 Data Node Node man. NAT Hue Resource Manager Cloudera Zabbix Jenkins Chef- Server Hue LB Tableau LB Tableau DC1 Tableau DC2 Tableau Prod Tableau Worker1 Tableau Worker2 Tableau Backup Name Node Oozie Peach-Cluster-n1 ….. Peach-Cluster-n4 Test nodes Jupyter HIveLB R-Studio Peach Hue LB Peach HIve2 LB Peach CHD5 LB - Production and staging environments - HA mode(Hadoop and Tableau) - DMZ configuration - Centralized configuration and monitoring - Dedicated analytics server DMZ configuration( Jupyter and R studio) - CM env
  • 17. 17CONFIDENTIAL • Data integration projects of implementing new platform into enterprise fabric based on Enterprise Data Hub • Developed Security Model for Big Data solution (Kerberos) • Implemented production and staging environment • Rapid ETL Development by using Python, Pig, Hive, MapReduce over Hadoop • >30 ETL jobs • Bringing unstructured and semi-structured data • Integration with Enterprise Infrastructure • Using Tableau for rich data visualization CLIENT: BIG DATA EXPERIENCE KEY TECHNICAL HIGHLIGHTS
  • 18. 18CONFIDENTIAL EPAM BIG DATA COMPETENCY CENTER Big Data Architecture Design, Implementation, and SupportBig Data CC value for clients Deep expertise with cutting edge technologiesTop Facts • Data Strategy, Data Governance Consulting • Data Hub/Lake architecture • Advanced Solutions Development, Predictive & Prescriptive Analytics • Infrastructure Implementation and Integration with Enterprise Security • Bi Data Solutions & Platform Support • 300+ Engineers, Architects and Consultants • 50+ Successfully delivered Big Data and HPC solutions • 10+ years of BI product development history (for SAP, Oracle, Pentaho) • We understand business and how to make BI & BigData technology work • Our design is straightforward while following industry best practices • EPAM process is interactive, iterative and highly effective • Quick development and timely implementation • Proven delivery approach
  • 19. 19CONFIDENTIAL EPAM OFFERINGTOP MESSAGES AND KEY FACTS TECHNOLOGIES EPAM Data Science includes Data Scientists and Senior Solution Architects with MS/PhD Degrees in Applied Math, Physics, Computer Science, & Predictive Analytics. Along with a strong mathematical background, the group has vertical expertise in multiple industries and extensive practical software development skills related to Big Data and conventional DW technologies • 100+ Data Scientists • 20+ Data Modelers • 10+ Data Strategists • 50+ successfully delivered projects TOP STORIES DATA SCIENCE: DISCOVER AND PREDICT Services • Large scale information solutions design • Predictive model building and validation • Customer segmentation analysis • Data profiling and preparation • Dimension reduction techniques Mathematical foundation • Probability, statistics, and stochastic processes • Supervised and unsupervised machine learning techniques • Numerical methods and implementations • Optimization theory Predictive models: linear and logistic regression, decision trees, clustering, naïve bayes, support vector machines, neural networks, kernel estimation, panel data analysis, survival/duration analysis, and time series analysis

Editor's Notes

  • #2: Готовое комплексное инфраструктурное решение для Hadoop Big Data на Amazon AWS на базе CDH 5.x Пользоваться выхлопом данного решения будут бизнесс аналитики на примере Для компании по розниченой продаже брэндовой одежды и парфюмерии На основании экономия ресурсов Безоасность Удобство использования и поддержки
  • #3: Школа ремонта С чем пришел клиент Что все плохо и падает и вылезли из задницы И клиент доволен
  • #18: А теперь почему это получилось потому что у нас есть епам компетенси центр
  • #19: Мы понимаем бизнес и то, как работает технология BI & BigData Наш дизайн прост и соответствует лучшим отраслевым практикам Процесс EPAM является интерактивным, итеративным и высокоэффективным Быстрая разработка и своевременное внедрение Подтвержденный метод доставки Стратегия данных, Консалтинг по управлению данными Архитектура центра данных / озера Разработка передовых решений, предварительная и предписывающая аналитика Внедрение инфраструктуры и интеграция с корпоративной безопасностью Bi Data Solutions и поддержка платформ
  • #20: EPAM Data Science включает ученых-данных и старших архитекторов решений со степенями MS / PhD по прикладной математике, физике, информатике и аналитической аналитике. Наряду с сильной математической базой группа обладает опытом работы в разных отраслях и имеет обширные практические навыки разработки программного обеспечения, связанные с большими данными и традиционными технологиями DW Разработка крупномасштабных информационных решений Прогнозирование построения моделей и их валидация Анализ сегментации клиентов Профилирование и подготовка данных Методы уменьшения размеров Кто хочет хочет поучаствовать ….. Коллеги спасибо вам за внимание и я буду рад если кому-то это понадобится и использует в своей работе и карьере