SlideShare a Scribd company logo
Foundations for Successful Data
Projects
Strata Data Conference, London 2019
Ted Malaska | @ted_malaska
Jonathan Seidman | @jseidman
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
About the presenters
▪ Capital One: Director of Enterprise Architecture
▪ Blizzard Ent: Director of Engineering of Global Insights
▪ Cloudera: Principal Solution Architect
▪ FINRA: Lead Architect
▪ Contributor: Apache Spark, Hadoop, Hive, Sqoop, Yarn, Flume, others
Ted Malaska
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
About the presenters
▪ Software Engineer at Cloudera
▪ Previously Technical Lead on the big data team at Orbitz
▪ Co-founder of the Chicago Hadoop User Group and Chicago Big Data
Jonathan Seidman
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Ted Malaska & Jonathan Seidman
Foundations
forArchitecting
Data Solutions
MANAGING SUCCESSFUL DATA PROJECTS
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Foundations of Successful Data Projects
Understand the problem Select software Manage risk Build effective teams Build maintainable architectures
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Agenda
▪ Understanding the key data project types
▪ Building effective teams
▪ Selecting data management solutions
▪ Managing risk in projects
▪ Ensuring data integrity
▪ Metadata management
Understanding the Key Data Project Types
Understanding the key data project types
● - Major Data Project Types
● - Primary Considerations & Risk Management
● - Team Makeup
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Major Data Project Types
▪ Data Pipelines and Data Staging
▪ Data Processing and Analysis
▪ Application Development
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Pipelines and Data Staging
▪ Sourcing Data
▪ Transmitting Data
▪ Staging Data
▪ Accessibility Options
▪ Discovery
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Processing and Analysis
▪ Curating Data
▪ Cultivating Ideas
▪ Data Product Generation
- Reports, Models, Insight, Charts, ...
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Application Development
▪ Traditional or Model Serving
▪ Inner Loop
▪ Outer Loop
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Primary Considerations
▪ Data Pipelines and Data Staging
▪ Data Processing and Analysis
▪ Application Development
Primary Considerations
Data Pipelines and Data Staging
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Pipelines and Data Staging – Considerations
▪ On boarding paths for Data Suppliers
- Files
- Embedded code
- APIs (Rest, WebSocket, GRPC, Syslog, ...)
- Agents
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Pipelines and Data Staging – Considerations
▪ Transmission
- At Least Once, Duplication, Latency, and Ordering
▪ Tokenization & Auditing & Governance
- GDPR, CA Protection Laws, Misuse, Data Breach
▪ Quality
- Schema Validation, Rules Validation, Carnality Verance
▪ Access
- Security, Matching the use case to the storage system
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Pipelines and Data Staging – Considerations
▪ Meta Management
- New and mutated Datasets
- Security
▪ Access
- Matching the use case to the storage system
- SQL is King
- No one tool
- Trade Offs
- Cost vs Time to Value vs Value of Data
Primary Considerations
Data Processing and Analysis
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Processing and Analysis – Considerations
▪ Curating Data
- Working with Producers
- Joining
- Time series
- CDC
▪ Undering Quality of Data
- SLAs
- Correctness of the Data
- Stability of the Data
- Coupling
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Processing and Analysis – Considerations
▪ Cultivating Ideas
- Defining Real Goals
- Evaluating ROI
▪ Productionization of Pipelines
- Service Reliability Engineering
▪ Culture
- ML vs AI vs Engineer
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Processing and Analysis – Considerations
▪ Understanding
- Explainable Outcomes
- Defendable Solutions
▪ Promotion Paths
- Deploying Products
- Historical Evaluation
- Up to Date Auditing
Primary Considerations
Application Development
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Application Development – Considerations
▪ Availability and Failure
- How will it fail
- How will failure impact customers
- What level of failure should be tested for
- Levels of failure design
▪ State Locality and Consistency
- What are the requirements
- Speed, cost, or truth
- Transactions and Locking
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Application Development
▪ Latency and Throughput
- Expectations and Throughput
- Is it really big data?
- Inner and Outer Looping
▪ Granularity of Deployments
- Monolith single deployment
- Monolith microservices
▪ Culture
- Development Towers
- Over the wall
- Development Granularity
Team Makeup
Team Makeup
● - Data Pipelines and Data Staging
● - Data Processing and Analysis
● - Application Development
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Pipelines and Data Staging – Team Makeup
▪ Data Engineers
▪ Site Reliability Engineers (SRE)
▪ Application Engineers
▪ Data Architects
▪ Governance
▪ Solution Engineers/Architects
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Processing and Analysis – Team Makeup
▪ Visionaries
▪ The Brains
▪ Problem Seekers
▪ Engineers
▪ Duct Tapers
▪ Tech Debt Payers
▪ Site Reliability Engineers (SRE)
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Application Development – Team Makeup
▪ Web Developers
▪ Front end Developers
▪ Data Engineers (DBAs)
▪ Performance Focused Engineers
▪ SOA / Queue Engineers
▪ Site Reliability Engineers (SRE)
Building Successful Teams
Lessons Learned Building Big Data Teams
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Build well rounded teams
Sysadmins Developers Analysts Data Scientists
Other roles:
Data Protection Officer Network/Systems
Engineers
Product Managers SRE
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
How to find people?
Start with people you already have, but make sure you invest in
training…
▪Linux, network, DBAs –> sysadmins
▪Developers –> developers
- Easy if you’re at a company like Orbitz, otherwise maybe not so much
▪Analysts –> analysts
▪It’s not an easy path though
- Set goals instead of micro-managing development
- Be prepared to iterate, don’t be afraid to fail
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Also don’t forget other teams
Communication is key
DBAs Other Project Teams
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Also, don’t do this:
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data
Scientists Admins
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Differing Skill Sets
Detail-
Oriented
Experimental The
Communicator
…
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Think beyond just skills
▪ Also look for complementary personalities
▪ And avoid toxic personalities
- But what if they’re really talented?
- See above.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Customer Engagement
▪ Your teams should work closely with your customers, whether they’re external or
internal
Evaluating and Selecting Data Solutions
Evaluating and Selecting Data Solutions
● - Solution Life Cycles
● - Tipping Point Considerations
● - Considerations for Technology Selection
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Solution Life Cycles
▪ Private Incubation Stage
▪ Release Stage
▪ “Curing Cancer” Stage
▪ Broken Promises Stage
▪ Hardening Stage
▪ Enterprise Stage
▪ Decline and Slow Death Stage
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Private Incubation Stage
▪ Technology Trigger
▪ Vision
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Release Stage
▪ Changes
- Inviting People In
- Documentation
- Marketing
▪ Reasons for Releasing
- Money
- Hiring
- Culture
- Future Building
▪ Big Promises
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
“Curing Cancer” Stage
▪ Big Promise
▪ Maybe outside area of expertise
- Promise to push internally
- Promises to gain influence
- Promises to get attriations
▪ Promises can be good and bad
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Broken Promises Stage
▪ Cracks in the Dream
- Scale
- Usability
- Use Case
- Security
- Practicality
- Skill Requirements
- Auditability
- Maintainability
- Integration
- Quality
- Lies
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Hardening
▪ Balance Features
▪ Technical Debt
▪ Partnering
▪ Corp Partnerships
▪ Leadership Stories
▪ Easy Success Paths
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Enterprise Stage
▪ Stable
▪ Predictable
▪ Easy to hire for
▪ Supportable / Maintainable
▪ Pragmatists outnumber innovators
▪ No longer cool, but still very lucrative
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Slow Decline Stage
▪ Not Worth Retiring
▪ Not worth Investing In
▪ Good Enough
Tipping Point Considerations
- Mavericks
- Connectors
- Salesman
- Stickiness
- Context
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mavericks
▪ Passion Driven
▪ Helpful
▪ Bottom Up Power
▪ They see the future or may see shadows
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Connectors
▪ High triangles
▪ Trusted weak ties
▪ Gateways for pain, needs, and opportunities
▪ Considering the towered companies
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Sales Man
▪ Make the Deal Happen
▪ Right or wrong doesn’t matter as much as action
▪ Momentum starters
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Stickiness
▪ Think about gravity
- Data
- Code
- User’s Favor
- Results
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Context
▪ Where is the company
▪ Looking for Opportunities
- Holding down the fort
- Lower cost
- Play around
▪ The Swing Pendulum Effort
- Where is the ball now and where is it heading
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Tipping Point Considerations
- Mavericks
- Connectors
- Salesman
- Stickiness
- Context
Considerations for Technology Selection
● - Demand
● - Fit
● - Visibility
● - Risks
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Evaluating the Demand
▪ Business Needs
▪ Internal Demand
▪ Desire to live on the edge
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Evaluating the Fit
▪ Primary Capability
▪ Skill Sets
▪ Level of Commitments
▪ Level of Alignment
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Evaluating the Visibility
▪ Benchmarks
- Hidden biases, Motivated Biases, Unfair Comparisons
▪ Fundamentals
- There is no magic
▪ Leaders Success
▪ Market Trends
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reviewing Fundamentals
▪ Relative Location of Data to Readers
▪ Compression formats and rates
▪ Data Structures
▪ Partitioning, Replication, and Failure
▪ API and Interfaces
▪ Resource Allocations and Tuning
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reviewing Market Trends
Google Trends
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reviewing Market Trends
Github activity
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reviewing Market Trends
Jira Counts and Charts
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reviewing Market Trends
Conferences and meetups
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reviewing Market Trends
▪ Also:
- Community Interest
- Email Lists and Forums
- Contributors
- Follow the Money $$$
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Evaluating the Risks
▪ Risk Tolerance
▪ Stress Tolerance
▪ Leader vs follower
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Future Proofing
▪ Assume Change
▪ Interface Design
▪ Producer & Consumer Experience
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Assume Change
▪ Remember the Logic and Physical
▪ Think Logical and Implementation
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Interface Design
▪ Standards
▪ SQL
▪ DataFrames / DataSets
▪ REST, GRPC
▪ AVRO, Parquet, Protobuf, Thrift, JSON, CSV
Managing Project Risk
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Managing Risk
Shark photo: http://www.travelbag.co.uk/
1 in 11.5 million 1 in 4292
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Managing Risk – Risk Categories
Technology Risk Team Risk Requirements Risk
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Managing Risk – Categorizing Risk
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Managing Risk – Categorizing Risk
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Managing Risk – Categorizing Risk
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Risk Weighting
▪ Technology Risk
- How much experience do we have with this technology?
- Do we have production experience with the technology?
- We know SQL, but what about Cassandra CQL?
- …
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Risk Weighting
▪ Team Risk
- Experience level of team members
- Team skill sets
- Size of team
- …
▪ Don’t forget about other teams
- System dependencies
- …
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Risk Weighting
▪ Requirements Risk
- Vaguely defined requirements
- Novel requirements (e.g. stringent latency requirements)
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Managing Risk – Categorizing Risk
▪ Cassandra
- Limited technical experience (team risk)
- Need to validate data model (reqs risk)
- Stringent uptime requirements (tech risk)
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mitigating Risk
▪ Requirements Risk
- Ensure good functional requirements
- Break requirements up – don’t boil the ocean
- Share requirements and get buy-in from all stakeholders
- Get agreement on scope
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mitigating Risk
▪ Technology Risk
- Tackle important/complex components first
- Use external resources to help fill knowledge gaps
- Consider replacing riskier technologies with more familiar ones
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mitigating Risk
▪ Technology Risk
- Use proofs of concept
- Than throw them away
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mitigating Risk
▪ Technology Risk
- Use abstractions to minimize dependencies
- Ensure repeatable build, deployment, monitoring processes
- Make it easy to deploy dev environments – leverage containers, etc.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mitigating Risk
▪ Technology Risk
- Start building early
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Mitigating Risk
▪ Team Risk
- Build well rounded teams
- Ensure communication with other teams
- But work to reduce coupling
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Communicating Risk
▪ Make sure stakeholders are aware of risks
- But remember there can be risks to overstating risk
▪ Collaborate and get buy-in
▪ Share risk
▪ Risk can be a negotiation tool
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Other Types of Risk
▪ Bias in your models
▪ Security
▪ Compliance
Ensuring Data Integrity
Ensuring Data Integrity
● - Pre-defined vs Derived via Discovery
● - Path of Fidelity
● - Validation of Quality
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Pre-Defined vs Derived via Discovery
▪ Producer - Productivity vs Audit
▪ Consumer - Consistency
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Producer - Productivity vs Audit
Red Tape Predefined
Flexible/Limited Predefined
After the Fact Discovery
EasytoAudit
Short-TermProductivity
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Predefined Traps
▪ Centralized Reviewing Org
▪ High bar to on board
▪ Unclear schema evolution paths
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Discovery Traps
▪ Uncommon output
▪ Data quality standards
▪ Uncommon SLAs
▪ The balloon problem
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Consumers Point of View
▪ Consistency is Key
▪ Access to Powerful Tools
▪ Multiple Landing Areas is Key
- Long Term
- Indexed
- Lucene Indexed
- Streams
▪ Future Proofing
Path of Fidelity
● - What is Fidelity
● - What can we mutate
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
What is Full Fidelity
▪ The cells and their values are preserved
▪ Field names and definitions are preserved
▪ No matter where or how you access the data
▪ No Filtering
▪ No Irreversible Mutations
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
What can we mutate
▪ Tokenization
▪ Underlining files structions
▪ Storage system
▪ Access Path
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Validate Quality
▪ Validation of Fidelity
▪ Validation of Quality
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Validation of Fidelity
▪ Row Counts
▪ Check Sums
▪ Reversible byte by byte check
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Validation of Quality
▪ Column level rules
▪ Null counts
▪ Field carnality
▪ Record counts
Metadata Management
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Metadata Management
▪ What do we mean?
- Understanding what data you have
- Knowing what the data is
- Knowing where the data is
▪ This is complex
- Large number of data sources, storage systems, processing…
- Ease of data access and creation of new data sets
- Start planning at the beginning of your project!
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Why Do We Care?
▪ Visibility – know what data you collect and how to access it
- Faster time to market
- Avoid duplication of work
- Derive more value from data
- Identify gaps
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Why Do We Care?
▪ Relationships between datasets
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Why Do We Care?
Regulations
GDPR, etc.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Types of Metadata
▪ Data at rest
▪ Data in motion
▪ Source data
▪ Data processing
▪ Reports, dashboards, etc.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data At Rest
▪ Files, database tables, Lucene indexes, etc.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data At Rest – Database Table Example
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data At Rest – Other Metadata Types
Audit Logs
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data At Rest – Other Metadata Types
Comments
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data At Rest – Other Metadata Types
Tags
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data At Rest – Other Metadata Types
Lineage
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data In Motion
▪ This is data that’s moving through the system
- Batch or streaming ingestion
- Data processing
- Derived data
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data in Motion – What to Capture
▪ Paths
▪ Sources
▪ Transformations
▪ Destinations
▪ Reports/Dashboards
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data in Motion – Paths
▪ How does the data move through the system?
- Source systems
- Data collection systems
- Routing
- Transformations
- Etc.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Source Data
▪ External systems
▪ Internal systems
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data In Motion – Transformations
▪ Data format changes, for example JSON to protocol buffers
▪ Data fidelity – is the data filtered or changed?
▪ Metadata about processing – job names, technologies, inputs, outputs, etc.
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Data Processing – Machine Learning
▪ More complex algorithms can require special considerations
- Purpose of a model
- Technologies, algorithms, etc.
- Features
- Datasets – training, test, etc.
- Goals of the model
- Who owns the model?
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Reports and Dashboards
▪ Data sources
▪ Any data transformations
▪ Information on the report’s creator
▪ Log of modifications
▪ Purpose of report
▪ Tags
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Approaches to Metadata Collection
▪ Declarative
- Require and enable metadata to be created as data is added to the system
▪ Discovery
- After the fact cataloging of data
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
How?
▪ Create your own solution
▪ Use tools provided by your vendor
▪ Use third party tools
▪ Some or all of the above
tiny.cloudera.com/uk2019questions
tiny.cloudera.com/uk2019slides
Remember…
The perfect is the enemy of the good
Having something is better than nothing
or…
Thank you!
Ted Malaska | @ted_malaska
Jonathan Seidman | @jseidman

More Related Content

PDF
Foundations strata sf-2019_final
PDF
Architecting a Next Gen Data Platform – Strata New York 2018
PPTX
Bbbt presentation 210415_final_2
PDF
Big Data for Managers: From hadoop to streaming and beyond
PPTX
Launching your advanced analytics program for success in a mature industry
PPTX
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
An Ounce of Prevention: Forging Healthy BI
Foundations strata sf-2019_final
Architecting a Next Gen Data Platform – Strata New York 2018
Bbbt presentation 210415_final_2
Big Data for Managers: From hadoop to streaming and beyond
Launching your advanced analytics program for success in a mature industry
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
The Maturity Model: Taking the Growing Pains Out of Hadoop
An Ounce of Prevention: Forging Healthy BI

What's hot (14)

PDF
What is Big Data Discovery, and how it complements traditional business anal...
PDF
Oracle big data spatial and graph
PDF
KScope14 - Real-Time Data Warehouse Upgrade - Success Stories
PPT
Kb 40 kevin_klineukug_reading20070717[1]
PDF
The Agile Analyst: Solving the Data Problem with Virtualization
PDF
DATA FORUM MICROPOLE 2015 - Forrester - Data Gouvernance Valuation
PDF
You're the New CDO, Now What?
PPT
Best Practices for Building a Warehouse Quickly
PDF
The New Frontier: Optimizing Big Data Exploration
PDF
365 Data Centers Presentation for Carriers, Cloud, Content and Channel
PDF
Pitfalls of Data Warehousing_2019-04-24
PPTX
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
PDF
Deploying Full BI Platforms to Oracle Cloud
PDF
Benefits of the Azure Cloud
What is Big Data Discovery, and how it complements traditional business anal...
Oracle big data spatial and graph
KScope14 - Real-Time Data Warehouse Upgrade - Success Stories
Kb 40 kevin_klineukug_reading20070717[1]
The Agile Analyst: Solving the Data Problem with Virtualization
DATA FORUM MICROPOLE 2015 - Forrester - Data Gouvernance Valuation
You're the New CDO, Now What?
Best Practices for Building a Warehouse Quickly
The New Frontier: Optimizing Big Data Exploration
365 Data Centers Presentation for Carriers, Cloud, Content and Channel
Pitfalls of Data Warehousing_2019-04-24
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploying Full BI Platforms to Oracle Cloud
Benefits of the Azure Cloud
Ad

Similar to Foundations for Successful Data Projects – Strata London 2019 (20)

PDF
Data engineering design patterns
PPTX
The Five Markers on Your Big Data Journey
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
PDF
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
PPTX
Modernising the data warehouse - January 2019
PPTX
Real time insights for better products, customer experience and resilient pla...
PPTX
Creating an Enterprise AI Strategy
PDF
Data & Analytic Innovations: 5 lessons from our customers
PPTX
When SAP alone is not enough
PPTX
Big Data Platform and Architecture Recommendation
PDF
Data Science and Culture
PDF
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
PDF
How to succeed at data without even trying!
PDF
BD_Architecture and Charateristics.pptx.pdf
PDF
Simply Business' Data Platform
PDF
Architecting Agile Data Applications for Scale
PPTX
Best practices to build a sustainable data lake on cloud - Impetus Webinar
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
PPTX
Managing Large Amounts of Data with Salesforce
Data engineering design patterns
The Five Markers on Your Big Data Journey
Data Engineer's Lunch #85: Designing a Modern Data Stack
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Modernising the data warehouse - January 2019
Real time insights for better products, customer experience and resilient pla...
Creating an Enterprise AI Strategy
Data & Analytic Innovations: 5 lessons from our customers
When SAP alone is not enough
Big Data Platform and Architecture Recommendation
Data Science and Culture
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
How to succeed at data without even trying!
BD_Architecture and Charateristics.pptx.pdf
Simply Business' Data Platform
Architecting Agile Data Applications for Scale
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Managing Large Amounts of Data with Salesforce
Ad

More from Jonathan Seidman (14)

PDF
Architecting a Next Gen Data Platform – Strata London 2018
PDF
Architecting a Next Generation Data Platform – Strata Singapore 2017
PDF
Application architectures with hadoop – big data techcon 2014
PPTX
Integrating hadoop - Big Data TechCon 2013
PDF
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
PDF
Extending the Data Warehouse with Hadoop - Hadoop world 2011
PDF
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
PPT
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
PDF
Distributed Data Analysis with Hadoop and R - OSCON 2011
PDF
Extending the EDW with Hadoop - Chicago Data Summit 2011
PDF
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
PDF
Real World Machine Learning at Orbitz, Strata 2011
PDF
Hadoop and Hive at Orbitz, Hadoop World 2010
PDF
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Generation Data Platform – Strata Singapore 2017
Application architectures with hadoop – big data techcon 2014
Integrating hadoop - Big Data TechCon 2013
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Extending the Data Warehouse with Hadoop - Hadoop world 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Real World Machine Learning at Orbitz, Strata 2011
Hadoop and Hive at Orbitz, Hadoop World 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010

Recently uploaded (20)

DOCX
Factor Analysis Word Document Presentation
PDF
How to run a consulting project- client discovery
PDF
Introduction to Data Science and Data Analysis
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Introduction to the R Programming Language
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
annual-report-2024-2025 original latest.
PPT
Predictive modeling basics in data cleaning process
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
Leprosy and NLEP programme community medicine
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Factor Analysis Word Document Presentation
How to run a consulting project- client discovery
Introduction to Data Science and Data Analysis
importance of Data-Visualization-in-Data-Science. for mba studnts
Introduction to the R Programming Language
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Database Infoormation System (DBIS).pptx
annual-report-2024-2025 original latest.
Predictive modeling basics in data cleaning process
Qualitative Qantitative and Mixed Methods.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
IMPACT OF LANDSLIDE.....................
Leprosy and NLEP programme community medicine
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Optimise Shopper Experiences with a Strong Data Estate.pdf
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx

Foundations for Successful Data Projects – Strata London 2019