SlideShare a Scribd company logo
Welcome to
Big Data Scotland 2017
#datascot
Mark Stephen
BBC Scotland
@bbcscotland
#scotdata
Ray Bugg
DIGIT
@digitfyi
#scotdata
www.digit.fyi
50,000 Monthly Page Views
30,000 Unique Visitors
Monthly News, Views,
Opinion, Insight
Our Next Event
DT2018
3rd Annual Digital
Transformation Conference
www.digifutures.co.uk
Kate Goldman
KBG Solutions
@digitfyi
#scotdata
New Foundations for a
Data-Driven Organisation
Big Data Scotland 2017
2018: A Confluence of Factors
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
The Time is Now
 The shift to data driven process is underway.
 Burgeoning data availability, open source, more affordable technology,
and consumer demand for betterment on every front suggests it might be prime
time for analytics teams.
 New and advanced analytics technologies on the rise: predictive and prescriptive
analytics, decision management software, smart machines, event stream
processing applications and operational intelligence platforms.
 As a result, the demand for analytics has moved outside specialty IT-based
communities and is now being headed up largely by business units as well.
Big Data Scotland 2017
Big Data Scotland 2017
Where’s the Beef?
 Building Blocks
 Fundamentally, building out the underlying data architecture as well as data
collection or generation capabilities.
 Switch from legacy data systems to a more nimble and exible architecture to store
and harness big data.
 Digitize operations more fully in order to capture more data from customer
interactions, supply chains, equipment, and internal processes.
 But… Data is not information, information is not knowledge,
knowledge is not understanding, understanding is not wisdom.
 Secondary (but no less important), is to find the answers to the
innumerable business opportunities your organisation faces in
your data.
So Here is What you Need to Do. Now.
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
5. Silo Busting to Let the Data Flow
 Silos, created because of structural issues, political factors, growth challenges,
or vendor lock-in can create significant challenges to creating insight and
advantage from data.
 The pressure is on, but incremental practical, methodical approaches are best for
those without the luxury of building from scratch.
 Find a good target.
 Use engagement practices to identify high-value opportunities.
 Analyze business needs, choose a problem where data could provide a tangible benefit and
value… perhaps in enhancing sales or preemptive incident response.
 Draw in the data from around the organization and invest in these use cases first. This is not
a proof of concept — you should do these earlier as a way of identifying opportunities — but
a banner project that can drive subsequent investments.
 Tie the integration to its application, so you get value early.
 Every Progressive Step, moves towards integration. (But beware planning fallacy!)
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Dan Fiehn
Markerstudy Group
@danfiehn
#scotdata
DATA INNOVATION
The Gap Smart Automation Smart Learning Smart Devices
THE GAP
Device Obsession
The Gap Smart Automation Smart Learning Smart Devices
Exceptional Experience
The Gap Smart Automation Smart Learning Smart Devices
Connect Anywhere
The Gap Smart Automation Smart Learning Smart Devices
Systems Unable to Cope
The Gap Smart Automation Smart Learning Smart Devices
Capability to Change
The Gap Smart Automation Smart Learning Smart Devices
SMART
AUTO-
MATION
Predicting Issues
The Gap Smart Automation Smart Learning Smart Devices
Intelligent Services
The Gap Smart Automation Smart Learning Smart Devices
The Gap Smart Automation Smart Learning Smart Devices
Autonomous Teams
SMART
LEARNING
Award Winning
AUTOMATION
The Gap Smart Automation Smart Learning Smart Devices
The Gap Smart Automation Smart Learning Smart Devices
+34%Machine Beats Man
The Gap Smart Automation Smart Learning Smart Devices
1. Model
Production
2. Data
Enrichment
3. Scenario
Simulations
4. Model
Deployment
5. AI
Aggregation
6.
Actionable
Insights
Intelligence Engine
The Gap Smart Automation Smart Learning Smart Devices
IncreaseReduction
Customer Retention
Sweet
Spot
The Gap Smart Automation Smart Learning Smart Devices
Insurance in a TRiCE
The Gap Smart Automation Smart Learning Smart Devices
SMART
DEVICES
For the first time in history,
greater insight into our
driver behaviour.
The Gap Smart Automation Smart Learning Smart Devices
ARE YOU DRIVING
DATA
INNOVATION?
DATAINNOVATION
uk.linkedin.com/in/danfiehn
@danfiehn
No algorithms were hurt in the m
GAP AUTOMATION
LEARNING DEVICES
Malachy Devlin
Clyde Space
@ClydeSpace
#scotdata
Advancing your Data Analytics Capabilities
Malachy Devlin
Big Data - Edinburgh | 7th December 2017
Company Journey – Initial Data
0
1
2
3
4
5
6
May-16 Sep-17 Feb-19 Jun-20 Oct-21 Mar-23
Company Revenue
0
20
40
60
80
100
120
140
May-16 Sep-17 Feb-19 Jun-20 Oct-21 Mar-23
Company Growth
Data Query Language - Relational Data
EQL
Relational Data
SQL1. Migrate to Databases
• Data
2. Migrate to MRP/ERP
• Data
• Business Logic
Manufacture
Customers
Finance
Quality
HR
Supply Chain
Management
Manufacturing/Entreprise Resource Planning
Manufacture
Customers
Finance
Quality
HR
Supply Chain
Management
Business Project
•IT is an enabler
Data Matured?
•Ready for extract/Transform/Load (ETL)
Right Software for Right Requirements
• Different companies, different formula
Business Process Upgrade
•Don’t implement bad processes
Training
•Tools and Process
Team - authority & understanding
• Budget, Resource, Technology
• Processes, Data, Requirements
Going Live is not the end game!
MRP/ERP - Real Time Business
• End to End records
• Critical for some markets
• Single point of truth
• Portfolio not Project view
• Accelerate intelligence
• 2 week to 1 minute in SC
• 15 minute to 15 seconds
production status
• Materials/Labour
variability
• More managed data
recorded, reduced effort
• Order to customer receipt
<18hours
• Partner Integration
• Port Merger Integrations
Agility Accuracy
TraceabilityInformed
The traditional space industry
Copyright © 2017 Clyde Space Ltd. All rights reserved.
The future of nanosatellite technology
Quality innovation
Copyright © 2017 Clyde Space Ltd. All rights reserved.
Copyright © 2017 Clyde Space Ltd. All rights reserved.
SmallSat market
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Where do satellites fit in?
Copyright © 2017 Clyde Space Ltd. All rights reserved.
Satellites are used to provide the main network connectivity for a localised area
which in turn relies on localised network (PAN or LAN) to distribute this
connectivity between devices within this area.
IoT from Space
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Earth observation applications
SeaHawk
▪ Pair of 3U CubeSats flying multispectral imagers
▪ x30 smaller than existing SeaWIFS asset
▪ Putting Moore’s Law into space
▪ Revisit rate for IOD of 7 days
▪ Downlinked via X-band to NASA NEN
Copyright © 2017 Clyde Space Ltd. All rights reserved.
Earth observation applications
PICASSO
▪ ESA and BISA mission
▪ Hyperspectral imager supplied by VTT
▪ Ozone monitoring mission
▪ Single 3U IOD
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
IPP Mission: FireSat
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Quantum applications
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Big Data Scotland 2017
Thank You
Launch and deployment
Clyde Space nanosatellite solutions
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Launch and deployment
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Ground control
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Clyde Space ground station solution:
▪ Based at our headquarters in Glasgow, Scotland
▪ Further sites established in ME and US
▪ SDR based architecture
▪ Fully automated
▪ Working to develop partnerships with recognised and emerging
ground station solutions globally
SmallSat excellence
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Clyde Space Cubesats
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Meeting and generating demand
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
enquiries@clyde.space
Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
Questions & Discussion
#scotdata
Refreshments & Networking
Please check rear of
badges for breakouts
#scotdata
How to make an impact
with your data:
Going from Boring to Beautiful
Louis Archer
Tableau
@louisarcher
Big Data Scotland 2017
Big Data Scotland 2017
TECHNOLOGY IS NOT THE
PROBLEM
BUSINESS CULTURE IS
THE PROBLEM
Big Data Scotland 2017
Collaboration Iteration
Training
“Data visualisation
is a language. It’s
a means to convey
an opinion, an
argument.”
Kim Rees – Founding Partner,
Periscopic
Big Data Scotland 2017
Iraq: Deaths on the decline
Iraq: Deaths on the decline
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Which do you prefer?
Design by Tableau; inspired by CTT Wireless Dashboard from
By Mike Cisneros
Big Data Scotland 2017
Functional Beautiful
Pleasurable experiences:
The three levels of processing
Pleasurable experiences:
The three levels of processing
Visceral
Behavioural
Reflective
Visceral
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Don Norman’s Pleasurable experiences:
The three levels of processing
Visceral
Behavioural
Reflective
Behavioural
Behavioural:
Chart Choice
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Behavioural:
Tableau Research:
Eye-tracking
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
http://tabsoft.co/designmonth #VisualDesignTricks @acotgreave
Don Norman’s Pleasurable experiences:
The three levels of processing
Visceral
Behavioural
Reflective
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
How to make an impact with your data?
The three levels of processing:
Visceral
Behavioural
Reflective
Collaboration Iteration
Training
However….
DASHBOARDS ARE THE PROBLEM
“Impact” is NOT just about
beautiful/functional
dashboards
The interesting thing was, we thought we were doing well, and then
we discovered there was this big negative cost. It was like, ‘Oh my
God.' Suddenly you go and say, 'Okay, I've discovered a new aspect
of engine cost that we hadn't realized.‘
Suddenly you're going, “Bang, bang, bang, two minutes in Tableau”
and you can see the average per month, the average per day, and it's
like, “Oh, wow—we can do this slightly differently.'
Within two days, I'd literally re-worked the whole instruction, sent it
out to people, and off we went. As a result, it’s been a very
significant difference in terms of U.S. dollars.
Jonathan Capper
Production Planning Manager
The interesting thing was, we thought we were doing well, and then
we discovered there was this big negative cost. It was like, ‘Oh my
God.' Suddenly you go and say, 'Okay, I've discovered a new aspect
of engine cost that we hadn't realized.‘
Suddenly you're going, “Bang, bang, bang, two minutes in Tableau”
and you can see the average per month, the average per day, and it's
like, “Oh, wow—we can do this slightly differently.'
Within two days, I'd literally re-worked the whole instruction, sent it
out to people, and off we went. As a result, it’s been a very
significant difference in terms of U.S. dollars.
Jonathan Capper
Production Planning Manager
Bang, bang,
bang, two
minutes…
Jonathan Capper
Production Planning Manager
Why?
Why?
Why?
Data visualisation
Known unknowns
Predefined answers only
Visual analytics
Unknown unknowns
Instant answers to new questions
Bang, bang, bang, two minutes…
We help see and understand datapeople
Big Data Scotland 2017
Welcome Back to
Big Data Scotland 2017
#datascot
Matthieu Poyade
The Glasgow School of Art
@GSofASimVis
#datascot
Data Visualisation
Dr. Matthieu Poyade
Data Visualisation
Graphical communication process which
empowers one to gain understanding and
insight into data.
“Visualization offers a method
for seeing the unseen” - McCornick (1987)
Data Visualisation
“People are generally better persuaded by the reasons
which they have themselves discovered than by those
which have come into the mind of others.”
Pascal (1623 – 1662)
Pragmatic Approach to Visualisation
• Data Communication
– Effectively and efficiently make the user understand
• Visual Efficiency
– Perceptually efficient, make maximum use of the visual
channels
• Data is given
– Visualization is not about generating data, although
sometimes data requires pre-processing
Looking at History… London’s Cholera
outbreak - Dr John Snow (1854)
Looking at history…
The London Underground through time
Looking at History…
The Glasgow Subway through time
We are drowning in a sea of data…The Flood of Data
A picture is worth a thousand words!!
How can Data Visualisation help to
envision decision making impact
Challenger Disaster
January 28, 1986
• Engineers knew there was a problem:
Failure of ring joint at low launching
temperatures
• Technical data presented through 13
Charts (notes, tables & diagrams) to
management officers recommended
not to launch
• Data were presented poorly and
didn’t enable the correlation
between cooler temperatures and an
increased chance of damage
Avoidable Tragedy?
?
“Bad slides or bad engineering don’t kill
people: bad decisions do” (Tufte)
How VR/AR can enhance Data
Visualisation for Businesses?
Big Data Scotland 2017
AUGMENTED REALITY
Big Data Scotland 2017
Big Data Scotland 2017
AR and VR in Data
Visualisation
Big Data Scotland 2017
Steven Faull
Aggreko
@stevenfaull
#datascot
D E C E M B E R 1 7
Big Data Scotland 2017172
Steven Faull
Head of Software & Analytics
@stevenfaull
From Reactive to Proactive to Predictive:
Using Data to Drive Customer Benefit
173
174
Reactive
The way things used to be…
• Service scheduling
• Customer calls for support
• Technicians have no visibility of faults
• Spare parts?
• Low customer service level
175
Proactive
The way it is today…
• Telemetry-enabled fleet
• The ROC
• Visibility of every alert and alarm globally
• Highly proactive
• Enhanced reliability & customer service
176
Innovation
Examples…
• Enterprise Social Networking
• Geo-spatial Sales application
• Cloud technology
• Micro-services architecture on Service Fabric
• Big Data Analytics
177
Applications
178
Predictive
The future…using data to enable:
• Near-time data analytics
• Predictive alerting to the ROC
• ‘Just-in-Time’ servicing
• Zero breakdowns
• ‘Best-in-Class’ customer service
179
Fire Prevention
180
High Temperature
181
Approach
182
Approach
1. Build a great team
183
Approach
1. Build a great team
2. Empower for innovation and exploration
184
Approach
1. Build a great team
2. Empower for innovation and exploration
3. Find a great sponsor
185
Approach
1. Build a great team
2. Empower for innovation and exploration
3. Find a great sponsor
4. Sell the vision through stories
186
Approach
1. Build a great team
2. Empower for innovation and exploration
3. Find a great sponsor
4. Sell the vision through stories
5. Collaborate, collaborate, collaborate
187
Approach
1. Build a great team
2. Empower for innovation and exploration
3. Find a great sponsor
4. Sell the vision through stories
5. Collaborate, collaborate, collaborate
6. Agile delivery
188
What’s Next?
• Reliability
• Market Intelligence
• Service Intervals
• Condition Based Service
• Product Development
• Procurement
• Audit
• Sales Effectiveness
Thank You189
Steven Faull
Head of Software & Analytics
@stevenfaull
Sarah Forbes
Peterson
@srf1980
#datascot
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
GUIDING PRINCIPLE 1
Full supply chain visibility for everyone
GUIDING PRINCIPLE 2
Predict the future state based on data
GUIDING PRINCIPLE 3
Enable ‘next best step’ decision-making or
management by exception to flourish
GUIDING PRINCIPLE 4
Enforcing a single source of truth encourages
collaboration across the entire supply chain
GUIDING PRINCIPLE 5
Provide a real-time trigger for smarter
communication
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Scott Krueger
Skyscanner
@skrueg
#datascot
From Producers To Consumers
Data Engineering @ Skyscanner
Scott Krueger
Principal Data Engineer
EVOLUTION
Brief history of travel
Skyscanner growth: interesting challenges
Big Data Scotland 2017
But wait - we have been designing during growth
Designing for growth
Producers and Consumers
Unified Flow - Data Platform
CHALLENGES
Perfection is in the eye of the data holder
Data systems old and
new
have constraints.
It is a fact of life.
Accept it, find them, and
change things for what
you need.
Scale – often the first business driver?
Data Quality
Organise your data organisation
Conway's law (1967): Technical system
determines org structure
ProcessIngest
Unified
Log
Transport
Archive
Science /
Report /
Analytics
(Re)Organise for growth
Process
Ingest
Unified
Log
Transport
Archive
Science /
Report /
Analytics
The Data Organisation
As a Business Strategy
Educate To Empower
The problem with Data Engineers and Data
Scientists
is that they’re Data Engineers and Data
Scientists....
(and no one else is)
"I'm not going to find that
out for you - you are"
Operational Strategy
Total Ownership
"You build it, you run it." -
Werner Vogels
This brings developers into contact with the day-to-day operation
of their software. It also brings them into day-to-day contact with
the customer. This customer feedback loop is essential for
improving the quality of the service."
Product Evolution - Data Assets into
Meaningful Information
Operational
Diagnostics
Your organisation
and strategy
Business
KPIs
Financial
Reporting
Just what do I do with 4TB's of Data Events Per Day?
Customer
Products
Partners
and Suppliers
Your
customers
(t)
The Future
* Barrier to entry gets lower and lower
* Data literacy and engineering up-skilling continues
* SQL is back
* Machine Learning / Auto-code generation
* 5G
Wrap Up
Unifying our data into a single platform…
…frees up energy
…to be invested into meaningful data products
…that allow us to innovate faster
…to better serve the world's travellers
?
Questions & Discussion
#datascot
Drinks & Networking
#datascot
TAKING DATA SEARCH, DISCOVERY AND ANALYSIS TO
THE NEXT LEVEL
DAVID RIVETT
C H I E F O P E R A T I N G O F F I C E R
Agenda
 Unstructured Data
 Information worker challenges
 The power of precision search
 When you don’t know where to start
 Data Science & AI
 Feral Data & GDPR
“Unstructured” Big Data
In every organisation data is “relatively” Big
 Organisational memory
 Staff turnover
 Personal and shared drives
 Email
 Document Management
 Business Operational Systems
Today’s challenges…
According to a McKinsey report,
“employees spend 1.8 hours every day - 9.3 hours per week, on
average - searching and gathering information.
Put another way, businesses hire 5 employees but only 4 show
up to work; the fifth is off searching for answers, but not
contributing any value.”
Source: Time Searching for Information.
Turning Data into Knowledge
WHOLE
DOCUMENTS
A SINGLE
SENTENCE
CONTENT BY
PARAGRAPH
DOCUMENTS
BY SECTION
The Chilcot Report
 58 separate PDF files.
SPARKVS FLINK
STEFAN PAPP,VDSG
BERNHARD ORTNER,THINKBIGANALYTICS
Stefan Papp
Seems himself as Data Evangelist who focuses on data since 2010. He is passionate
about how data can transform how we live and work and keen to explore all data
with an open mindset and in an agile manner.
He is a member of theVienna Data Science Group and has been working across all
industries and consults various companies on data strategies. He also works in
projects as data architects
Bernhard Ortner
He is a Senior Data Engineer at Think Big Analytics,A Teradata Company. He has
solid experience implementing several big data technologies across industries like
telecommunications, finance, energy and government. In his experience, he has
covered the whole spectrum in big data including visualization, data ingestion,
fusion and integration.
TERADATA PORTFOLIO
>
High-
impact
business
outcomes
Analytic Solutions
Analytics Business Consulting
Business Value Framework
Data Science
Business Solutions
Strategy and Roadmaps
Design and Implementation
Ecosystem Architecture
Managed Services
Architecture Expertise
Public Cloud,
Private Cloud,
Managed Cloud,
Hybrid Cloud,
On-Premises
Teradata Database,
Teradata Aster®
Analytics, Hadoop
Teradata
QueryGrid™, Presto,
Listener™, Unity,
AppCenter
Technology
Solutions
• ~1,400 + Customers in 77 Countries
• ~10,000 Employees including
~5,000 Consultants
• Market Cap: U.S. $4 Billion+
• World’s Most Ethical Companies –
Ethisphere Institute
• Fortune: Top 10 U.S. Software Company
• The leader in Gartner Magic Quadrant
Data Warehouse and Data Management
Solutions for Analytics
• The Forrester Wave™:
Big Data Hadoop-Optimized Systems
In-Memory Database Platforms
Teradata at-a-glance
© 2017 Teradata
VIENNA DATA SCIENCE GROUP
Mission
• Nonprofit association which aims to promote knowledge about data science methods/big
data/AI techniques.
Diverse members
• Academics, professionals,students and all other Data Science enthusiasts
Different fields
• Mathematics, physics,econometrics,electrical engineering,medical science,finance,real estate,
computer science and social sciences
Traditional Data:
• ERPs, CRM Databases
• Highly structured
0
5
10
15
20
25
30
35
40
2006
2007
2008
2009
2010
2011
2012
2013
2014
1015
2016
2017
2018
2019
2020
Zettabytes
Schema on Write Schema on Read
“New Data” :
• Human or Machine
Generated
• “unstructured”
SPARK – “THE SMART PHONE FOR BIG DATA”
What do 4 Gen. Processing Engines have in
common with the iPhone?
Big Data Scotland 2017




Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
In summary
Big Data Scotland 2017
Administration
• Power BI uses Azure Active
Directory to authenticate users
who login to the Power BI service
User Friendly / Self-Serve
• Uses a graphical user interface
built aroundself-serve
Security & Privacy
• Currently complies with both EU-
U.S. Privacy Shield and EU Model
Clauses.
• Committed to GDPR compliance
Cross Platform
Functionality & Embedding
• Cloud and desktop versions
• iOS and Andriod appsavailable
• Can embed Power BI code within
websites
Sources & Connections
• Wide list of our of the box
connectors including Excel, SQL,
MySQL, and widely used APIs (i.e.
Google Analytics)
Managing, Cleaning &
Transforming
• Self-serve ETL process
• Flexible to perform more
advanced queries, i.e. mergingand
cleaning files
Storing & Querying
• Does not automatically store raw
data however there is an optionto
save data within the report
• Can refresh data from source
automatically
Reports
• Allows users to build visual reports
• Can export the reportsto
PowerPoint
Dashboards
• Elements from reports can be
reproduced within dashboards
• Best viewed on devices
Visualisation
• Out of the box visuals provided
• Online library of custom visuals
• Ability to create own visuals using
R
Analytics
• Functionality similar to Excel
• Has natural language processing
built in
Collaboration
• Able to share with selected people
or open access
• Creates an associated
conversation stream
Explain
Enlighten
Engage
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
Good
Data
Sufficient
Observations
Representative
Timely
Big Data Scotland 2017
Big Data Scotland 2017
Big Data Scotland 2017
ANSWER
Before Spark After Spark After Flink
Program-API Map Reduce Spark Core Flink API
Abstraction Pig, Cascading Spark Core Flink API
Streaming Apache Storm Spark Streaming Flink API
Machine Learning Mahout Spark ML FlinkML
Graph Engine Giraph Spark GraphFrames Gelly
SQL Hive Spark SQL TableAPI
SPARK ARCHITECTURE
• APIs: Scala, Java, Python or R
• Core Modules Custom Modules
H2O
DISTRIBUTED DATA SETS
• Data Containers to keep data in-memory
• RDDs: untyped
• DataFrames / DataSets: typed / object-oriented
• Processing with data sharing instead of shared nothing
• complex, multi-pass analytics (e.g. ML, graph)
• interactive ad-hoc queries
• real-time stream processing
Query
Input
Query
Query
STREAMING IN DETAIL
• Data is micro-batched, i.e. dividied into n RDDs
• For each RDD a task is launched and destroyed once it’s completed
• A task is a function you apply, e.g. filter, map, …
ANALYTIC
• MLlib: Out-of-box ML library
• Continuously new algorithms are added
• Sparkling water: port of H2O for spark  add additional DS algorithms
• H2O: ML,AI library, for more advanced analytics
• Deep Learning
• Generalized Linear model
• PCA
• ….
SPARK - PIPELINES
• Concept: sequence of algorithms to process and learn from data
• A pipeline consists of
• Transformers:transforms a DataFrame into another one, e.g. ETL Steps
• Estimators: trains a model on a DataFrame, output is the trained model
SPARK PROGRAM
• Step 1: create a distributed data set
• Step II: apply filter, models,…
• Step III: get result
val training = spark.read.format(“jdbc”).option(“…“).toDF(”col1", ”col2")
val lr = new LogisticRegression()
lr.setMaxIter(10) .setRegParam(0.01)
val pipeline = new Pipeline() .setStages(Array(lr))
val model = pipeline.fit(training)
pipeline.write.overwrite().save(„name“)
model.transform(testData) .select(„col1“, „col2“).collect().foreach(print…)
https://spark.apache.org/docs/2.2.0/ml-pipeline.html
WHAT HAPPENS BEHINDTHE SCENES
• There is also a different approach….
... a micro-batch is a collection of streaming events…
But you can also use pure streaming
0 4,000,000 8,000,000 12,000,000 16,000,000
Storm
Flink
Flink (10 GigE)
Throughput: msgs/sec
10 GigE end-to-end
15m msgs/sec
SURVIVAL OF THE FASTEST….
STREAMING DATA EVERYWHERE
275
Streaming: continuous processing on data that is
continuously produced
Sources
Message
Broker
Stream
processor
collect publish/subscribe analyze Serve & store
CONNECTORS AND ADAPTERS
DataBases
…
TwitterFeeds
SensorData
DataPlatforms
Feeds
Applications
…
FLINK ECOSYSTEM
SOURCE -> PROCESS -> SINK278
Source
Transformation
Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.apply(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Source
[1]
map()
[1]
keyBy()/
window()/
apply()
[1]
Sink
[1]
Source
[2]
map()
[2]
keyBy()/
window()/
apply()
[2]
Streaming
Dataflow
279
Streaming
Source
Streaming
Source
Streaming
Source
Consumer
Forward events immediately
to pub/sub bus
Stream
Processor
Process at event time &
update serving layer
Message
Broker
Low latency
High throughput
Windowing / Out
of order events
State handling
Fault tolerance
and correctness
Huge Database
Application Application
Client Server
NoSQL
Application Application
Micro Services
Application Application
Continuous Streaming
Application Application
RDMBS
SECOND USE CASE: CONTINUOUS STREAMING
DOYOU HAVE CASESTHAT REQUIRETO PROCESS MORE
THAN ONE MILLION MESSAGES (PER SECOND)?
Fraud Prevention
CEP – Complex Event Processing
Spam Prevention
Network anomaly detection
Predictive Maintenance
Deep Packet Inspection
File Processing
CRM Import
BATCH
STREAM
ERP Data
Social Media
Internet of
Everything
Image Recognition
DOYOUR OPERATIONAL SYSTEMS DELIVER
STREAMING DATA?
REAL TIME ANALYTICS
• Anomaly Detection: detect outliers (= observation that deviates tremendous from other)
• Distance or Density based => clustering
• Spark is used to train the models, but it’s too slow for production
• Flink natively support streams, but has less algorithms than spark
SPARK & FLINK AT A GLANCE
Bernhard Ortner, Sr. Data Engineer
bernhard.ortner@thinkbiganalytics.com
Mobile +43 664
Contact:
Stefan Papp, Data Architect
stefan.papp@icloud.com
Mobile: +43 699 10209453
Precision Search
- The Internet Experience
Precision Search
– The Desktop Experience
Precision Search
- The ultimate experience
Use Cases
 Research
 Detailed Analysis & Review
 Litigation
 Mergers & Acquisitions
 Investigative Journalism
 Better informed decisions
 Faster more confident results and process
Where to start?
“You don’t know what you don’t know!”
“To really get answers out of your data
you need the right questions!”
Data Science & AI
 Topic Clustering and Categorisation
 Semantic Enhancement
 Dictionary Enhancement
 Pattern Matching
 Sentiment
 Inherent Structure
Abuse
60%
Neglect
25%
Disfunctional
Family
10%
Disability
5%
S AMP LE P RE DI CTI O N
Train Model using real case
documents
Model learns topics based
on training data
Model can now identify
topics in case documents
Model then predicts the
probability of topics
present in document
Topic Clustering
Visualization of Topic Model
Surfacing Knowledge -
Enhancement
Feral Data
 What does this mean for GDPR?
Feral Data
 Pattern matching PII
 Bank, Credit card numbers
 Passport numbers
 Driving License details
 Address, postcode, email
 …
GDPR – Dutch Example
Thank You!
More questions?
Please visit us at our stand
Take the Nalytics challenge!

More Related Content

PDF
Ibm big data
PPTX
Hadoop dev 01
PDF
Overview - IBM Big Data Platform
PPTX
Big Data Platform Landscape by 2017
PDF
IBM Big Data Analytics Concepts and Use Cases
PDF
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
PDF
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
Ibm big data
Hadoop dev 01
Overview - IBM Big Data Platform
Big Data Platform Landscape by 2017
IBM Big Data Analytics Concepts and Use Cases
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...

What's hot (20)

PPTX
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
PPTX
Protecting data privacy in analytics and machine learning ISACA London UK
PDF
02 a holistic approach to big data
PDF
What is big data - Architectures and Practical Use Cases
PDF
Overview of analytics and big data in practice
PDF
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
PPTX
Capgemini Insights and Data
PPTX
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
PDF
IBM-Why Big Data?
PDF
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
PDF
Apache hadoop bigdata-in-banking
PDF
Telco Big Data Workshop Sample
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
PDF
Big data ibm keynote d advani presentation
PDF
1524 how ibm's big data solution can help you gain insight into your data cen...
 
PDF
NextGen Infrastructure for Big Data
PPT
Big Data Real Time Analytics - A Facebook Case Study
PPTX
Infochimps + CloudCon: Infinite Monkey Theorem
PPTX
Big Data in Action : Operations, Analytics and more
PPTX
Pouring the Foundation: Data Management in the Energy Industry
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Protecting data privacy in analytics and machine learning ISACA London UK
02 a holistic approach to big data
What is big data - Architectures and Practical Use Cases
Overview of analytics and big data in practice
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
Capgemini Insights and Data
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
IBM-Why Big Data?
Driven by data - Why we need a Modern Enterprise Data Analytics Platform
Apache hadoop bigdata-in-banking
Telco Big Data Workshop Sample
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Big data ibm keynote d advani presentation
1524 how ibm's big data solution can help you gain insight into your data cen...
 
NextGen Infrastructure for Big Data
Big Data Real Time Analytics - A Facebook Case Study
Infochimps + CloudCon: Infinite Monkey Theorem
Big Data in Action : Operations, Analytics and more
Pouring the Foundation: Data Management in the Energy Industry
Ad

Similar to Big Data Scotland 2017 (20)

PDF
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
PDF
From Customer Insights to Action
PDF
The Rise of Intelligent Content Services
PDF
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
PPTX
Smart Data Module 6 d drive the future
PPTX
Module 6 The Future of Big and Smart Data- Online
PDF
The Art of Data Science - event slides
PDF
Big Data - A Real Life Revolution
PDF
Living in a data driven world by V Laxmikanth Broadridge
PDF
CAN DATA SCIENCE COMMAND THE FUTURE OF BUSINESSES IN 2025.pdf
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PPTX
Data Culture Keynote and Exec Track Birm Dec 8th
PDF
Big Data LDN 2017: The New Dominant Companies Are Running on Data
PDF
Big Data LDN 2017: The New Dominant Companies Are Running on Data
PPTX
The new dominant companies are running on data
PPTX
The value of our data
PDF
BIg Data Trends in 2016
PPTX
Big Data Mining Keynote presentation Sept 2013 09012013
PDF
Future of Big Data
PPTX
Satyam open analytics nyc
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
From Customer Insights to Action
The Rise of Intelligent Content Services
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
Smart Data Module 6 d drive the future
Module 6 The Future of Big and Smart Data- Online
The Art of Data Science - event slides
Big Data - A Real Life Revolution
Living in a data driven world by V Laxmikanth Broadridge
CAN DATA SCIENCE COMMAND THE FUTURE OF BUSINESSES IN 2025.pdf
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Data Culture Keynote and Exec Track Birm Dec 8th
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
The new dominant companies are running on data
The value of our data
BIg Data Trends in 2016
Big Data Mining Keynote presentation Sept 2013 09012013
Future of Big Data
Satyam open analytics nyc
Ad

More from Ray Bugg (20)

PDF
Financial Services Technology Summit 2025
PDF
ScotSecure Cyber Security Summit 2025 Edinburgh
PDF
Digital Transformation Summit 2024 - Edinburgh
PDF
Fintech Summit 2024 - Edinburgh Sept 27th
PDF
ScotSecure West Summit 2024 - Glasgow 11th Sept
PDF
Digit Leaders 2023
PDF
DIGIT North 2022
PDF
Digital Transformation Summit 2021
PDF
ScotSecure 2020
PDF
Data Protection Scotland Summit 2019
PDF
DIGIT Expo 2019
PDF
DIGIT Expo 2019
PDF
Scotland's FinTech Summit 2019
PDF
Intelligent Automation 2019
PDF
DIGIT Leader 2019
PDF
DIgital Energy 2019
PDF
Scot Secure 2019 Edinburgh (Day 2)
PDF
Scot Secure 2019 Edinburgh (Day 1)
PDF
Digital Transformation Scotland 2019
PDF
GDPR Scotland 2018
Financial Services Technology Summit 2025
ScotSecure Cyber Security Summit 2025 Edinburgh
Digital Transformation Summit 2024 - Edinburgh
Fintech Summit 2024 - Edinburgh Sept 27th
ScotSecure West Summit 2024 - Glasgow 11th Sept
Digit Leaders 2023
DIGIT North 2022
Digital Transformation Summit 2021
ScotSecure 2020
Data Protection Scotland Summit 2019
DIGIT Expo 2019
DIGIT Expo 2019
Scotland's FinTech Summit 2019
Intelligent Automation 2019
DIGIT Leader 2019
DIgital Energy 2019
Scot Secure 2019 Edinburgh (Day 2)
Scot Secure 2019 Edinburgh (Day 1)
Digital Transformation Scotland 2019
GDPR Scotland 2018

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Advanced IT Governance
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
GamePlan Trading System Review: Professional Trader's Honest Take
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Advanced IT Governance
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I

Big Data Scotland 2017

  • 1. Welcome to Big Data Scotland 2017 #datascot
  • 4. www.digit.fyi 50,000 Monthly Page Views 30,000 Unique Visitors Monthly News, Views, Opinion, Insight
  • 5. Our Next Event DT2018 3rd Annual Digital Transformation Conference www.digifutures.co.uk
  • 7. New Foundations for a Data-Driven Organisation
  • 9. 2018: A Confluence of Factors
  • 14. The Time is Now  The shift to data driven process is underway.  Burgeoning data availability, open source, more affordable technology, and consumer demand for betterment on every front suggests it might be prime time for analytics teams.  New and advanced analytics technologies on the rise: predictive and prescriptive analytics, decision management software, smart machines, event stream processing applications and operational intelligence platforms.  As a result, the demand for analytics has moved outside specialty IT-based communities and is now being headed up largely by business units as well.
  • 17. Where’s the Beef?  Building Blocks  Fundamentally, building out the underlying data architecture as well as data collection or generation capabilities.  Switch from legacy data systems to a more nimble and exible architecture to store and harness big data.  Digitize operations more fully in order to capture more data from customer interactions, supply chains, equipment, and internal processes.  But… Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.  Secondary (but no less important), is to find the answers to the innumerable business opportunities your organisation faces in your data.
  • 18. So Here is What you Need to Do. Now.
  • 23. 5. Silo Busting to Let the Data Flow  Silos, created because of structural issues, political factors, growth challenges, or vendor lock-in can create significant challenges to creating insight and advantage from data.  The pressure is on, but incremental practical, methodical approaches are best for those without the luxury of building from scratch.  Find a good target.  Use engagement practices to identify high-value opportunities.  Analyze business needs, choose a problem where data could provide a tangible benefit and value… perhaps in enhancing sales or preemptive incident response.  Draw in the data from around the organization and invest in these use cases first. This is not a proof of concept — you should do these earlier as a way of identifying opportunities — but a banner project that can drive subsequent investments.  Tie the integration to its application, so you get value early.  Every Progressive Step, moves towards integration. (But beware planning fallacy!)
  • 28. DATA INNOVATION The Gap Smart Automation Smart Learning Smart Devices
  • 30. Device Obsession The Gap Smart Automation Smart Learning Smart Devices
  • 31. Exceptional Experience The Gap Smart Automation Smart Learning Smart Devices
  • 32. Connect Anywhere The Gap Smart Automation Smart Learning Smart Devices
  • 33. Systems Unable to Cope The Gap Smart Automation Smart Learning Smart Devices
  • 34. Capability to Change The Gap Smart Automation Smart Learning Smart Devices
  • 36. Predicting Issues The Gap Smart Automation Smart Learning Smart Devices
  • 37. Intelligent Services The Gap Smart Automation Smart Learning Smart Devices
  • 38. The Gap Smart Automation Smart Learning Smart Devices Autonomous Teams
  • 40. Award Winning AUTOMATION The Gap Smart Automation Smart Learning Smart Devices
  • 41. The Gap Smart Automation Smart Learning Smart Devices
  • 42. +34%Machine Beats Man The Gap Smart Automation Smart Learning Smart Devices
  • 43. 1. Model Production 2. Data Enrichment 3. Scenario Simulations 4. Model Deployment 5. AI Aggregation 6. Actionable Insights Intelligence Engine The Gap Smart Automation Smart Learning Smart Devices
  • 44. IncreaseReduction Customer Retention Sweet Spot The Gap Smart Automation Smart Learning Smart Devices
  • 45. Insurance in a TRiCE The Gap Smart Automation Smart Learning Smart Devices
  • 47. For the first time in history, greater insight into our driver behaviour. The Gap Smart Automation Smart Learning Smart Devices
  • 49. DATAINNOVATION uk.linkedin.com/in/danfiehn @danfiehn No algorithms were hurt in the m GAP AUTOMATION LEARNING DEVICES
  • 51. Advancing your Data Analytics Capabilities Malachy Devlin Big Data - Edinburgh | 7th December 2017
  • 52. Company Journey – Initial Data 0 1 2 3 4 5 6 May-16 Sep-17 Feb-19 Jun-20 Oct-21 Mar-23 Company Revenue 0 20 40 60 80 100 120 140 May-16 Sep-17 Feb-19 Jun-20 Oct-21 Mar-23 Company Growth
  • 53. Data Query Language - Relational Data EQL
  • 54. Relational Data SQL1. Migrate to Databases • Data 2. Migrate to MRP/ERP • Data • Business Logic Manufacture Customers Finance Quality HR Supply Chain Management
  • 55. Manufacturing/Entreprise Resource Planning Manufacture Customers Finance Quality HR Supply Chain Management Business Project •IT is an enabler Data Matured? •Ready for extract/Transform/Load (ETL) Right Software for Right Requirements • Different companies, different formula Business Process Upgrade •Don’t implement bad processes Training •Tools and Process Team - authority & understanding • Budget, Resource, Technology • Processes, Data, Requirements Going Live is not the end game!
  • 56. MRP/ERP - Real Time Business • End to End records • Critical for some markets • Single point of truth • Portfolio not Project view • Accelerate intelligence • 2 week to 1 minute in SC • 15 minute to 15 seconds production status • Materials/Labour variability • More managed data recorded, reduced effort • Order to customer receipt <18hours • Partner Integration • Port Merger Integrations Agility Accuracy TraceabilityInformed
  • 57. The traditional space industry Copyright © 2017 Clyde Space Ltd. All rights reserved.
  • 58. The future of nanosatellite technology
  • 59. Quality innovation Copyright © 2017 Clyde Space Ltd. All rights reserved.
  • 60. Copyright © 2017 Clyde Space Ltd. All rights reserved.
  • 61. SmallSat market Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 62. Where do satellites fit in? Copyright © 2017 Clyde Space Ltd. All rights reserved. Satellites are used to provide the main network connectivity for a localised area which in turn relies on localised network (PAN or LAN) to distribute this connectivity between devices within this area.
  • 63. IoT from Space Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 64. Earth observation applications SeaHawk ▪ Pair of 3U CubeSats flying multispectral imagers ▪ x30 smaller than existing SeaWIFS asset ▪ Putting Moore’s Law into space ▪ Revisit rate for IOD of 7 days ▪ Downlinked via X-band to NASA NEN Copyright © 2017 Clyde Space Ltd. All rights reserved.
  • 65. Earth observation applications PICASSO ▪ ESA and BISA mission ▪ Hyperspectral imager supplied by VTT ▪ Ozone monitoring mission ▪ Single 3U IOD Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 66. IPP Mission: FireSat Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 67. Quantum applications Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 71. Clyde Space nanosatellite solutions Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 72. Launch and deployment Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 73. Ground control Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved. Clyde Space ground station solution: ▪ Based at our headquarters in Glasgow, Scotland ▪ Further sites established in ME and US ▪ SDR based architecture ▪ Fully automated ▪ Working to develop partnerships with recognised and emerging ground station solutions globally
  • 74. SmallSat excellence Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 75. Clyde Space Cubesats Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 76. Meeting and generating demand Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 77. enquiries@clyde.space Clyde Space Commercial in ConfidenceCopyright © 2017 Clyde Space Ltd. All rights reserved.
  • 79. Refreshments & Networking Please check rear of badges for breakouts #scotdata
  • 80. How to make an impact with your data: Going from Boring to Beautiful Louis Archer Tableau @louisarcher
  • 83. TECHNOLOGY IS NOT THE PROBLEM
  • 87. “Data visualisation is a language. It’s a means to convey an opinion, an argument.” Kim Rees – Founding Partner, Periscopic
  • 89. Iraq: Deaths on the decline
  • 90. Iraq: Deaths on the decline
  • 94. Which do you prefer?
  • 95. Design by Tableau; inspired by CTT Wireless Dashboard from
  • 99. Pleasurable experiences: The three levels of processing
  • 100. Pleasurable experiences: The three levels of processing Visceral Behavioural Reflective
  • 113. Don Norman’s Pleasurable experiences: The three levels of processing Visceral Behavioural Reflective
  • 125. Don Norman’s Pleasurable experiences: The three levels of processing Visceral Behavioural Reflective
  • 129. How to make an impact with your data? The three levels of processing: Visceral Behavioural Reflective
  • 132. DASHBOARDS ARE THE PROBLEM
  • 133. “Impact” is NOT just about beautiful/functional dashboards
  • 134. The interesting thing was, we thought we were doing well, and then we discovered there was this big negative cost. It was like, ‘Oh my God.' Suddenly you go and say, 'Okay, I've discovered a new aspect of engine cost that we hadn't realized.‘ Suddenly you're going, “Bang, bang, bang, two minutes in Tableau” and you can see the average per month, the average per day, and it's like, “Oh, wow—we can do this slightly differently.' Within two days, I'd literally re-worked the whole instruction, sent it out to people, and off we went. As a result, it’s been a very significant difference in terms of U.S. dollars. Jonathan Capper Production Planning Manager
  • 135. The interesting thing was, we thought we were doing well, and then we discovered there was this big negative cost. It was like, ‘Oh my God.' Suddenly you go and say, 'Okay, I've discovered a new aspect of engine cost that we hadn't realized.‘ Suddenly you're going, “Bang, bang, bang, two minutes in Tableau” and you can see the average per month, the average per day, and it's like, “Oh, wow—we can do this slightly differently.' Within two days, I'd literally re-worked the whole instruction, sent it out to people, and off we went. As a result, it’s been a very significant difference in terms of U.S. dollars. Jonathan Capper Production Planning Manager
  • 136. Bang, bang, bang, two minutes… Jonathan Capper Production Planning Manager
  • 137. Why? Why? Why? Data visualisation Known unknowns Predefined answers only Visual analytics Unknown unknowns Instant answers to new questions Bang, bang, bang, two minutes…
  • 138. We help see and understand datapeople
  • 140. Welcome Back to Big Data Scotland 2017 #datascot
  • 141. Matthieu Poyade The Glasgow School of Art @GSofASimVis #datascot
  • 143. Data Visualisation Graphical communication process which empowers one to gain understanding and insight into data. “Visualization offers a method for seeing the unseen” - McCornick (1987) Data Visualisation
  • 144. “People are generally better persuaded by the reasons which they have themselves discovered than by those which have come into the mind of others.” Pascal (1623 – 1662)
  • 145. Pragmatic Approach to Visualisation • Data Communication – Effectively and efficiently make the user understand • Visual Efficiency – Perceptually efficient, make maximum use of the visual channels • Data is given – Visualization is not about generating data, although sometimes data requires pre-processing
  • 146. Looking at History… London’s Cholera outbreak - Dr John Snow (1854)
  • 147. Looking at history… The London Underground through time
  • 148. Looking at History… The Glasgow Subway through time
  • 149. We are drowning in a sea of data…The Flood of Data
  • 150. A picture is worth a thousand words!!
  • 151. How can Data Visualisation help to envision decision making impact Challenger Disaster January 28, 1986
  • 152. • Engineers knew there was a problem: Failure of ring joint at low launching temperatures • Technical data presented through 13 Charts (notes, tables & diagrams) to management officers recommended not to launch • Data were presented poorly and didn’t enable the correlation between cooler temperatures and an increased chance of damage
  • 153. Avoidable Tragedy? ? “Bad slides or bad engineering don’t kill people: bad decisions do” (Tufte)
  • 154. How VR/AR can enhance Data Visualisation for Businesses?
  • 159. AR and VR in Data Visualisation
  • 162. D E C E M B E R 1 7 Big Data Scotland 2017172 Steven Faull Head of Software & Analytics @stevenfaull From Reactive to Proactive to Predictive: Using Data to Drive Customer Benefit
  • 163. 173
  • 164. 174 Reactive The way things used to be… • Service scheduling • Customer calls for support • Technicians have no visibility of faults • Spare parts? • Low customer service level
  • 165. 175 Proactive The way it is today… • Telemetry-enabled fleet • The ROC • Visibility of every alert and alarm globally • Highly proactive • Enhanced reliability & customer service
  • 166. 176 Innovation Examples… • Enterprise Social Networking • Geo-spatial Sales application • Cloud technology • Micro-services architecture on Service Fabric • Big Data Analytics
  • 168. 178 Predictive The future…using data to enable: • Near-time data analytics • Predictive alerting to the ROC • ‘Just-in-Time’ servicing • Zero breakdowns • ‘Best-in-Class’ customer service
  • 172. 182 Approach 1. Build a great team
  • 173. 183 Approach 1. Build a great team 2. Empower for innovation and exploration
  • 174. 184 Approach 1. Build a great team 2. Empower for innovation and exploration 3. Find a great sponsor
  • 175. 185 Approach 1. Build a great team 2. Empower for innovation and exploration 3. Find a great sponsor 4. Sell the vision through stories
  • 176. 186 Approach 1. Build a great team 2. Empower for innovation and exploration 3. Find a great sponsor 4. Sell the vision through stories 5. Collaborate, collaborate, collaborate
  • 177. 187 Approach 1. Build a great team 2. Empower for innovation and exploration 3. Find a great sponsor 4. Sell the vision through stories 5. Collaborate, collaborate, collaborate 6. Agile delivery
  • 178. 188 What’s Next? • Reliability • Market Intelligence • Service Intervals • Condition Based Service • Product Development • Procurement • Audit • Sales Effectiveness
  • 179. Thank You189 Steven Faull Head of Software & Analytics @stevenfaull
  • 189. GUIDING PRINCIPLE 1 Full supply chain visibility for everyone GUIDING PRINCIPLE 2 Predict the future state based on data GUIDING PRINCIPLE 3 Enable ‘next best step’ decision-making or management by exception to flourish GUIDING PRINCIPLE 4 Enforcing a single source of truth encourages collaboration across the entire supply chain GUIDING PRINCIPLE 5 Provide a real-time trigger for smarter communication
  • 195. From Producers To Consumers Data Engineering @ Skyscanner Scott Krueger Principal Data Engineer
  • 197. Brief history of travel
  • 200. But wait - we have been designing during growth
  • 203. Unified Flow - Data Platform
  • 205. Perfection is in the eye of the data holder Data systems old and new have constraints. It is a fact of life. Accept it, find them, and change things for what you need.
  • 206. Scale – often the first business driver?
  • 208. Organise your data organisation Conway's law (1967): Technical system determines org structure ProcessIngest Unified Log Transport Archive Science / Report / Analytics
  • 210. The Data Organisation As a Business Strategy Educate To Empower The problem with Data Engineers and Data Scientists is that they’re Data Engineers and Data Scientists.... (and no one else is) "I'm not going to find that out for you - you are"
  • 211. Operational Strategy Total Ownership "You build it, you run it." - Werner Vogels This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service."
  • 212. Product Evolution - Data Assets into Meaningful Information Operational Diagnostics Your organisation and strategy Business KPIs Financial Reporting Just what do I do with 4TB's of Data Events Per Day? Customer Products Partners and Suppliers Your customers (t)
  • 213. The Future * Barrier to entry gets lower and lower * Data literacy and engineering up-skilling continues * SQL is back * Machine Learning / Auto-code generation * 5G
  • 214. Wrap Up Unifying our data into a single platform… …frees up energy …to be invested into meaningful data products …that allow us to innovate faster …to better serve the world's travellers
  • 215. ?
  • 218. TAKING DATA SEARCH, DISCOVERY AND ANALYSIS TO THE NEXT LEVEL DAVID RIVETT C H I E F O P E R A T I N G O F F I C E R
  • 219. Agenda  Unstructured Data  Information worker challenges  The power of precision search  When you don’t know where to start  Data Science & AI  Feral Data & GDPR
  • 220. “Unstructured” Big Data In every organisation data is “relatively” Big  Organisational memory  Staff turnover  Personal and shared drives  Email  Document Management  Business Operational Systems
  • 221. Today’s challenges… According to a McKinsey report, “employees spend 1.8 hours every day - 9.3 hours per week, on average - searching and gathering information. Put another way, businesses hire 5 employees but only 4 show up to work; the fifth is off searching for answers, but not contributing any value.” Source: Time Searching for Information.
  • 222. Turning Data into Knowledge WHOLE DOCUMENTS A SINGLE SENTENCE CONTENT BY PARAGRAPH DOCUMENTS BY SECTION
  • 223. The Chilcot Report  58 separate PDF files.
  • 224. SPARKVS FLINK STEFAN PAPP,VDSG BERNHARD ORTNER,THINKBIGANALYTICS
  • 225. Stefan Papp Seems himself as Data Evangelist who focuses on data since 2010. He is passionate about how data can transform how we live and work and keen to explore all data with an open mindset and in an agile manner. He is a member of theVienna Data Science Group and has been working across all industries and consults various companies on data strategies. He also works in projects as data architects Bernhard Ortner He is a Senior Data Engineer at Think Big Analytics,A Teradata Company. He has solid experience implementing several big data technologies across industries like telecommunications, finance, energy and government. In his experience, he has covered the whole spectrum in big data including visualization, data ingestion, fusion and integration.
  • 226. TERADATA PORTFOLIO > High- impact business outcomes Analytic Solutions Analytics Business Consulting Business Value Framework Data Science Business Solutions Strategy and Roadmaps Design and Implementation Ecosystem Architecture Managed Services Architecture Expertise Public Cloud, Private Cloud, Managed Cloud, Hybrid Cloud, On-Premises Teradata Database, Teradata Aster® Analytics, Hadoop Teradata QueryGrid™, Presto, Listener™, Unity, AppCenter Technology Solutions • ~1,400 + Customers in 77 Countries • ~10,000 Employees including ~5,000 Consultants • Market Cap: U.S. $4 Billion+ • World’s Most Ethical Companies – Ethisphere Institute • Fortune: Top 10 U.S. Software Company • The leader in Gartner Magic Quadrant Data Warehouse and Data Management Solutions for Analytics • The Forrester Wave™: Big Data Hadoop-Optimized Systems In-Memory Database Platforms Teradata at-a-glance © 2017 Teradata
  • 227. VIENNA DATA SCIENCE GROUP Mission • Nonprofit association which aims to promote knowledge about data science methods/big data/AI techniques. Diverse members • Academics, professionals,students and all other Data Science enthusiasts Different fields • Mathematics, physics,econometrics,electrical engineering,medical science,finance,real estate, computer science and social sciences
  • 228. Traditional Data: • ERPs, CRM Databases • Highly structured 0 5 10 15 20 25 30 35 40 2006 2007 2008 2009 2010 2011 2012 2013 2014 1015 2016 2017 2018 2019 2020 Zettabytes Schema on Write Schema on Read “New Data” : • Human or Machine Generated • “unstructured”
  • 229. SPARK – “THE SMART PHONE FOR BIG DATA” What do 4 Gen. Processing Engines have in common with the iPhone?
  • 245. Administration • Power BI uses Azure Active Directory to authenticate users who login to the Power BI service User Friendly / Self-Serve • Uses a graphical user interface built aroundself-serve Security & Privacy • Currently complies with both EU- U.S. Privacy Shield and EU Model Clauses. • Committed to GDPR compliance Cross Platform Functionality & Embedding • Cloud and desktop versions • iOS and Andriod appsavailable • Can embed Power BI code within websites Sources & Connections • Wide list of our of the box connectors including Excel, SQL, MySQL, and widely used APIs (i.e. Google Analytics) Managing, Cleaning & Transforming • Self-serve ETL process • Flexible to perform more advanced queries, i.e. mergingand cleaning files Storing & Querying • Does not automatically store raw data however there is an optionto save data within the report • Can refresh data from source automatically Reports • Allows users to build visual reports • Can export the reportsto PowerPoint Dashboards • Elements from reports can be reproduced within dashboards • Best viewed on devices Visualisation • Out of the box visuals provided • Online library of custom visuals • Ability to create own visuals using R Analytics • Functionality similar to Excel • Has natural language processing built in Collaboration • Able to share with selected people or open access • Creates an associated conversation stream
  • 254. ANSWER Before Spark After Spark After Flink Program-API Map Reduce Spark Core Flink API Abstraction Pig, Cascading Spark Core Flink API Streaming Apache Storm Spark Streaming Flink API Machine Learning Mahout Spark ML FlinkML Graph Engine Giraph Spark GraphFrames Gelly SQL Hive Spark SQL TableAPI
  • 255. SPARK ARCHITECTURE • APIs: Scala, Java, Python or R • Core Modules Custom Modules H2O
  • 256. DISTRIBUTED DATA SETS • Data Containers to keep data in-memory • RDDs: untyped • DataFrames / DataSets: typed / object-oriented • Processing with data sharing instead of shared nothing • complex, multi-pass analytics (e.g. ML, graph) • interactive ad-hoc queries • real-time stream processing Query Input Query Query
  • 257. STREAMING IN DETAIL • Data is micro-batched, i.e. dividied into n RDDs • For each RDD a task is launched and destroyed once it’s completed • A task is a function you apply, e.g. filter, map, …
  • 258. ANALYTIC • MLlib: Out-of-box ML library • Continuously new algorithms are added • Sparkling water: port of H2O for spark  add additional DS algorithms • H2O: ML,AI library, for more advanced analytics • Deep Learning • Generalized Linear model • PCA • ….
  • 259. SPARK - PIPELINES • Concept: sequence of algorithms to process and learn from data • A pipeline consists of • Transformers:transforms a DataFrame into another one, e.g. ETL Steps • Estimators: trains a model on a DataFrame, output is the trained model
  • 260. SPARK PROGRAM • Step 1: create a distributed data set • Step II: apply filter, models,… • Step III: get result val training = spark.read.format(“jdbc”).option(“…“).toDF(”col1", ”col2") val lr = new LogisticRegression() lr.setMaxIter(10) .setRegParam(0.01) val pipeline = new Pipeline() .setStages(Array(lr)) val model = pipeline.fit(training) pipeline.write.overwrite().save(„name“) model.transform(testData) .select(„col1“, „col2“).collect().foreach(print…) https://spark.apache.org/docs/2.2.0/ml-pipeline.html
  • 262. • There is also a different approach…. ... a micro-batch is a collection of streaming events… But you can also use pure streaming
  • 263. 0 4,000,000 8,000,000 12,000,000 16,000,000 Storm Flink Flink (10 GigE) Throughput: msgs/sec 10 GigE end-to-end 15m msgs/sec SURVIVAL OF THE FASTEST….
  • 265. 275 Streaming: continuous processing on data that is continuously produced Sources Message Broker Stream processor collect publish/subscribe analyze Serve & store
  • 268. SOURCE -> PROCESS -> SINK278 Source Transformation Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .apply(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Source [1] map() [1] keyBy()/ window()/ apply() [1] Sink [1] Source [2] map() [2] keyBy()/ window()/ apply() [2] Streaming Dataflow
  • 269. 279 Streaming Source Streaming Source Streaming Source Consumer Forward events immediately to pub/sub bus Stream Processor Process at event time & update serving layer Message Broker Low latency High throughput Windowing / Out of order events State handling Fault tolerance and correctness
  • 270. Huge Database Application Application Client Server NoSQL Application Application Micro Services Application Application Continuous Streaming Application Application RDMBS SECOND USE CASE: CONTINUOUS STREAMING
  • 271. DOYOU HAVE CASESTHAT REQUIRETO PROCESS MORE THAN ONE MILLION MESSAGES (PER SECOND)? Fraud Prevention CEP – Complex Event Processing Spam Prevention Network anomaly detection Predictive Maintenance Deep Packet Inspection File Processing CRM Import BATCH STREAM ERP Data Social Media Internet of Everything Image Recognition
  • 272. DOYOUR OPERATIONAL SYSTEMS DELIVER STREAMING DATA?
  • 273. REAL TIME ANALYTICS • Anomaly Detection: detect outliers (= observation that deviates tremendous from other) • Distance or Density based => clustering • Spark is used to train the models, but it’s too slow for production • Flink natively support streams, but has less algorithms than spark
  • 274. SPARK & FLINK AT A GLANCE Bernhard Ortner, Sr. Data Engineer bernhard.ortner@thinkbiganalytics.com Mobile +43 664 Contact: Stefan Papp, Data Architect stefan.papp@icloud.com Mobile: +43 699 10209453
  • 275. Precision Search - The Internet Experience
  • 276. Precision Search – The Desktop Experience
  • 277. Precision Search - The ultimate experience
  • 278. Use Cases  Research  Detailed Analysis & Review  Litigation  Mergers & Acquisitions  Investigative Journalism  Better informed decisions  Faster more confident results and process
  • 279. Where to start? “You don’t know what you don’t know!” “To really get answers out of your data you need the right questions!”
  • 280. Data Science & AI  Topic Clustering and Categorisation  Semantic Enhancement  Dictionary Enhancement  Pattern Matching  Sentiment  Inherent Structure
  • 281. Abuse 60% Neglect 25% Disfunctional Family 10% Disability 5% S AMP LE P RE DI CTI O N Train Model using real case documents Model learns topics based on training data Model can now identify topics in case documents Model then predicts the probability of topics present in document Topic Clustering
  • 284. Feral Data  What does this mean for GDPR?
  • 285. Feral Data  Pattern matching PII  Bank, Credit card numbers  Passport numbers  Driving License details  Address, postcode, email  …
  • 286. GDPR – Dutch Example
  • 287. Thank You! More questions? Please visit us at our stand Take the Nalytics challenge!