SlideShare a Scribd company logo
Hadoop Jungle
Alexey Zinovyev, Java/BigData Trainer in EPAM
2Hadoop Jungle
About
I am a <graph theory, machine learning,
traffic jams prediction, BigData algorithms>
scientist
But I'm a <Java, Scala, NoSQL, Hadoop,
Spark> programmer and trainer
3Hadoop Jungle
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs
4Hadoop Jungle
Are you a Hadoop developer?
5Hadoop Jungle
Let’s do THIS!
6Hadoop Jungle
WHAT IS BIG DATA?
7Hadoop Jungle
Joke about Excel
8Hadoop Jungle
Every 60 seconds…
9Hadoop Jungle
From Mobile Devices
10Hadoop Jungle
We started to keep and handle stupid new things!
11Hadoop Jungle
10^6 rows
in MySQL
12Hadoop Jungle
GB->TB->PB->?
13Hadoop Jungle
Is BigData about PBs?
14Hadoop Jungle
Is BigData about PBs?
15Hadoop Jungle
It’s hard to …
• .. store
• .. handle
• .. search in
• .. visualize
• .. send in network
16Hadoop Jungle
Just do it … in parallel
17Hadoop Jungle
Scale-up vs scale-out
Scale - Up16 CPUs
16 CPUs Scale - Out
48 CPUs
16 CPUs 16 CPUs 16 CPUs
18Hadoop Jungle
Motivation: Fault tolerance
19Hadoop Jungle
Motivation: Fault tolerance
20Hadoop Jungle
You need to write
• .. distributed on-disk storage
21Hadoop Jungle
You need to write
• .. distributed on-disk storage
• .. in-memory storage (or shared memory buffer)
22Hadoop Jungle
You need to write
• .. distributed on-disk storage
• .. in-memory storage (or shared memory buffer)
• .. thread pool to run hundreds of threads
23Hadoop Jungle
You need to write
• .. distributed on-disk storage
• .. in-memory storage (or shared memory buffer)
• .. thread pool to run hundreds of threads
• .. synchronize all components
24Hadoop Jungle
You need to write
• .. distributed on-disk storage
• .. in-memory storage (or shared memory buffer)
• .. thread pool to run hundreds of threads
• .. synchronize all components
• .. provide API for reusing by other developers
25Hadoop Jungle
All we love reinvent bicycles, but…
26Hadoop Jungle
HADOOP
27Hadoop Jungle
Hadoop
28Hadoop Jungle
Disks Performance
29Hadoop Jungle
The main concept
Let’s read data in parallel
30Hadoop Jungle
“Cheap” cluster
31Hadoop Jungle
The main concept
Let’s read data in parallel
32Hadoop Jungle
It must survive
33Hadoop Jungle
Parallel Computing vs Distributed Computing
34Hadoop Jungle
Hadoop and its workers: send me your little code
35Hadoop Jungle
Hadoop
Jobs
36Hadoop Jungle
• Hadoop Commons
• Hadoop Clients
• HDFS
• YARN
• MapReduce
Main components
37Hadoop Jungle
Hadoop frameworks
• Universal MapReduce, Tez, Kudu, RDD in Spark)
• Abstract (Pig, Pipeline Spark)
• SQL - like (Hive, Impala, Spark SQL)
• Processing graph (Giraph, GraphX)
• Machine Learning (Mahout, MLib)
• Stream processing (Spark Streaming, Storm)
38Hadoop Jungle
Hadoop Architecture
39Hadoop Jungle
• Automatic parallelization and distribution
• Fault-tolerance
• Data Locality
• Writing the Map and Reduce functions only
• Single-threaded model
Key features
40Hadoop Jungle
HDFS DAEMONS
41Hadoop Jungle
The main idea
'Time to transfer' > 'Time to seek'
42Hadoop Jungle
Main idea
43Hadoop Jungle
Files in HDFS
44Hadoop Jungle
• NameNode
HDFS node types
45Hadoop Jungle
• NameNode
• DataNode
HDFS node types
46Hadoop Jungle
• NameNode
• DataNode
• SecondaryNode (not for HA)
HDFS node types
47Hadoop Jungle
• NameNode
• DataNode
• SecondaryNode
• StandbyNode
HDFS node types
48Hadoop Jungle
• NameNode
• DataNode
• SecondaryNode
• StandbyNode
• Checkpoint Node
HDFS node types
49Hadoop Jungle
• NameNode
• DataNode
• SecondaryNode
• StandbyNode
• Checkpoint Node
• Backup Node
HDFS node types
50Hadoop Jungle
The main thought about HDFS
HDFS node is JVM daemon
51Hadoop Jungle
• monitor with JMX
• use jmap, jps and so on..
• configure NameNode Heap Size
• use power of JVM flags
You can do it with HDFS node
52Hadoop Jungle
MAPREDUCE THEORY
53Hadoop Jungle
MapReduce in different languages
54Hadoop Jungle
• WordCount
• Log handling
• Filtering
• Reporting Preparation
MR Typical Tasks
55Hadoop Jungle
Why should we use MapReduce?
56Hadoop Jungle
We try to reduce ‘time to act’ but keep BigData
57Hadoop Jungle
Classic Batch
58Hadoop Jungle
Do you like batches?
59Hadoop Jungle
Fixed Windows
60Hadoop Jungle
Filter all elements
61Hadoop Jungle
Yet One YARN’s lover
62Hadoop Jungle
Pay attention!
Hadoop != MapReduce
63Hadoop Jungle
Think in Key-Value style
map (k1, v1) → list(k2, v2)
reduce (k2, list(v2*) )→ list(k3, v3)
64Hadoop Jungle
• Map
Main steps
65Hadoop Jungle
• Map
• Shuffle
Main steps
66Hadoop Jungle
• Map
• Shuffle
• Reduce
Main steps
67Hadoop Jungle
68Hadoop Jungle
69Hadoop Jungle
The main performance idea
Reduce shuffle time & resources
70Hadoop Jungle
Minimal
Runner
public class MinimalMapReduce extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MinimalMapReduce(), args);
System.exit(exitCode);
}
}
71Hadoop Jungle
Minimal
Runner
public class MinimalMapReduce extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MinimalMapReduce(), args);
System.exit(exitCode);
}
}
72Hadoop Jungle
Minimal
Runner
public class MinimalMapReduce extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MinimalMapReduce(), args);
System.exit(exitCode);
}
}
73Hadoop Jungle
Job Config
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(Mapper.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setPartitionerClass(HashPartitioner.class);
job.setNumReduceTasks(1);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
74Hadoop Jungle
Customize MapReduce!
75Hadoop Jungle
Hadoop club rules
76Hadoop Jungle
MAPREDUCE ADVANCED
77Hadoop Jungle
MapReduce for WordCount
78Hadoop Jungle
WordCount Combiner
79Hadoop Jungle
Combine
80Hadoop Jungle
Combiner as a Local Reducer
81Hadoop Jungle
Can we make something around map(), reduce()
calls?
82Hadoop Jungle
Setup
/**
* Called once at the start of the task.
*/
protected void setup(Context context
) throws IOException, InterruptedException {
// Prepare something for each Mapper or Reducer
// Validate external sources
}
83Hadoop Jungle
Setup
/**
* Called once at the end of the task.
*/
protected void cleanup(Context context
) throws IOException, InterruptedException
{
// Finish something after each Mapper or Reducer
// Handle specific exceptions
}
84Hadoop Jungle
Full control
85Hadoop Jungle
Run Mapper
/**
* Expert users can override this method for more complete control
over the
* execution of the Mapper.
* @param context
* @throws IOException
*/
public void run(Context context) throws IOException,
InterruptedException {
setup(context);
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(),
context);
}
} finally {
cleanup(context);
}
}
86Hadoop Jungle
Run
Reducer
/**
* Advanced application writers can use the
* {@link #run(*.Reducer.Context)} method to
* control how the reduce task works.
*/
public void run(Context context) throws IOException,
InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(),
context);
// If a back up store is used, reset it
Iterator<VALUEIN> iter = context.getValues().iterator();
if(iter instanceof ReduceContext.ValueIterator) {
((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();
}
}
} finally {
cleanup(context);
}
}
87Hadoop Jungle
MR APPROACHES
88Hadoop Jungle
How many Reducers do we need?
89Hadoop Jungle
Data Flow with single Reducer
90Hadoop Jungle
• receives all keys in sorted order
• the output will be completely in sorted order
• take a long time to run
• can throw OutOfMemory
One Reducer
91Hadoop Jungle
Data Flow with multiple Reducers
92Hadoop Jungle
How many Reducers should we do?
93Hadoop Jungle
• Key will be the weekday
• Seven Reducers will be specified
• A Partitioner will be written which sends one key to each
Reducer
Ideal Case: data grouped by day of week
94Hadoop Jungle
Could we skip shuffle step?
95Hadoop Jungle
Map-Only ‘MapReduce’ Jobs
96Hadoop Jungle
HADOOP ADVANCED
97Hadoop Jungle
98Hadoop Jungle
The main concept of Distributed Cache
Share common data
99Hadoop Jungle
• For ToolRunner in command line with –files option
• - archives
• - libjars
• In code Job.addCacheFile(URI uriToFileInHdfs);
• Job.addCacheArchive(URI uriToArchiveFileInHdfs);
• Job.addArchiveToClassPath(Path pathToJarInHdfs);
How to implement?
100Hadoop Jungle
How to get data from Cache?
To list the content of Distributed Cache:
Path[] pathes = JobContext.getLocalCacheFiles();
Then you have to check this URI to pick up your file
for(Path u : pathes) {
if(u.getName().toUpperCase.contains(“TARGET”)) {
….
}
}
101Hadoop Jungle
Counters usage
102Hadoop Jungle
Counters
103Hadoop Jungle
May I customize DataFlow before shuffling?
104Hadoop Jungle
Hash Partitioner just do it..
public class HashPartitioner<K2, V2> extends Partitioner<K2, V2> {
public int getPartition(K2 key, V2 value, int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}
105Hadoop Jungle
Partitioner’s Role in Shuffle and Sort
106Hadoop Jungle
• You want to send by other rule (hash to value)
• Need know more about secondary sort
• Extend Partitioner abstract class
• getPartition(key, value, number of Reducers)
• Return int [0, number of Reducers – 1]
• Don’t forget job.setPartitionerClass(MyPartitioner.class);
How to customize?
107Hadoop Jungle
Full power
108Hadoop Jungle
JOINS
109Hadoop Jungle
Reduce
JOIN
110Hadoop Jungle
The main concept of JOIN
Let’s skip Reduce + Shuffle
111Hadoop Jungle
Map-side join
112Hadoop Jungle
Map-side join for large datasets
113Hadoop Jungle
What about Really Large Tables?
SELECT Employees.Name, Employees.Age, Department.Name FROM
Employees INNER JOIN Department ON
Employees.Dept_Id=Department.Dept_Id
114Hadoop Jungle
The main JOIN idea fro large tables
Redis or Memcache cluster as Distributed Cache
115Hadoop Jungle
PERFORMANCE
116Hadoop Jungle
Dig to DAG
117Hadoop Jungle
The main concept of DAG
Decompose you problem, build DAG, skip HDFS
Let’s run on
JVM!
119Hadoop Jungle
JVM SETTINGS
120Hadoop Jungle
Typical mistakes
• Collections in memory
• Sorting all data in memory
• Logging each input key-value pairs
• Many JARs
• JVM issues
121Hadoop Jungle
Performance tips
• Correct data storage (on JVM )
• Don’t forget about combiner
• Use appropriate Writable type
• Min required replication factor
• Tune your JVM
• Think in terms Big-O
122Hadoop Jungle
JVM tuning
• mapred.child.java.opts
• -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
• Low-latence GC collector –XX: +
UseConcMarkSweepGC, -XX:ParallelGCThreads
• Xmx == Xms (max and starting heap size)
123Hadoop Jungle
JVM Reusing: Uber Task
124Hadoop Jungle
TESTING & DEVELOPMENT
125Hadoop Jungle
MR Unit idea
126Hadoop Jungle
Do it
separatly
127Hadoop Jungle
Simple Test
public class MRUnitHelloWorld {
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
mapDriver = new MapDriver<LongWritable, Text, Text,
IntWritable>();
mapDriver.setMapper(mapper);
}
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat
dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
}
128Hadoop Jungle
• First develop/test in local mode using small amount of
data
Testing strategies
129Hadoop Jungle
• First develop/test in local mode using small amount of
data
• Test in pseudo-distributed mode and more data
Testing strategies
130Hadoop Jungle
• First develop/test in local mode using small amount of
data
• Test in pseudo-distributed mode and more data
• Test on fully distributed mode and even more data
Testing strategies
131Hadoop Jungle
• First develop/test in local mode using small amount of
data
• Test in pseudo-distributed mode and more data
• Test on fully distributed mode and even more data
• Final execution: fully distributed mode & all data
Testing strategies
132Hadoop Jungle
And we can DO IT!
Real-Time Data-Marts
Batch Data-Marts
Relations Graph
Ontology Metadata
Search Index
Events & Alarms
Real-time
Dashboarding
Events & Alarms
All Raw Data backup
is stored here
Real-time Data
Ingestion
Batch Data
Ingestion
Real-Time ETL & CEP
Batch ETL & Raw Area
Scheduler
Internal
External
Social
HDFS → CFS
as an option
Time-Series Data
Titan & KairosDB
store data in Cassandra
Push Events & Alarms (Email, SNMP etc.)
133Hadoop Jungle
It reminds me …
134Hadoop Jungle
Infrastructure issues are waiting YOU!
135Hadoop Jungle
• Move to Java 8
• Support more than 2 NameNodes (multiple standby
NameNodes)
• Derive heap size or mapreduce.*.memory.mb automatically
• Work with SSD, RAM, HDD, CPU as resources for YARN
• Support Docker containers
Hadoop 3: Roadmap
136Hadoop Jungle
MapReduce is not a ideal approach! But it works!

More Related Content

PDF
JavaDayKiev'15 Java in production for Data Mining Research projects
PPTX
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
PDF
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
PDF
Beyond shuffling - Scala Days Berlin 2016
PDF
Intro to py spark (and cassandra)
PDF
Improving PySpark performance: Spark Performance Beyond the JVM
PDF
Troubleshooting Hadoop: Distributed Debugging
PDF
Spark after Dark by Chris Fregly of Databricks
JavaDayKiev'15 Java in production for Data Mining Research projects
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Beyond shuffling - Scala Days Berlin 2016
Intro to py spark (and cassandra)
Improving PySpark performance: Spark Performance Beyond the JVM
Troubleshooting Hadoop: Distributed Debugging
Spark after Dark by Chris Fregly of Databricks

What's hot (20)

PPTX
Frustration-Reduced PySpark: Data engineering with DataFrames
PDF
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
PDF
Apache Spark
PDF
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
PDF
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
ODP
Hadoop Ecosystem Overview
PDF
Hadoop and Spark
PPTX
Beyond shuffling global big data tech conference 2015 sj
PDF
Architectural Patterns for Streaming Applications
PDF
Introduction to Spark Datasets - Functional and relational together at last
PPTX
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
PPTX
Asbury Hadoop Overview
PDF
Python and Bigdata - An Introduction to Spark (PySpark)
PPTX
Introduction to real time big data with Apache Spark
PDF
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
PPTX
Spark Internals - Hadoop Source Code Reading #16 in Japan
PDF
Apache Spark: The Next Gen toolset for Big Data Processing
PDF
Spark and shark
PPTX
Introduction to Apache Spark
PDF
PySpark Best Practices
Frustration-Reduced PySpark: Data engineering with DataFrames
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...
Apache Spark
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Ecosystem Overview
Hadoop and Spark
Beyond shuffling global big data tech conference 2015 sj
Architectural Patterns for Streaming Applications
Introduction to Spark Datasets - Functional and relational together at last
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Asbury Hadoop Overview
Python and Bigdata - An Introduction to Spark (PySpark)
Introduction to real time big data with Apache Spark
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Spark Internals - Hadoop Source Code Reading #16 in Japan
Apache Spark: The Next Gen toolset for Big Data Processing
Spark and shark
Introduction to Apache Spark
PySpark Best Practices
Ad

Viewers also liked (20)

PDF
HappyDev'15 Keynote: Когда все данные станут большими...
PDF
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
PDF
Мастер-класс по BigData Tools для HappyDev'15
PDF
Java BigData Full Stack Development (version 2.0)
PDF
Joker'15 Java straitjackets for MongoDB
PPT
5th Period Temperate Rain Forest
PPT
2 animals meet in the jungle
PPT
Forest Animals
PPSX
C:\Fakepath\Welcome To The Animal World 2
POTX
Jungle operation join
PPTX
The five senses
PPT
Forest Animals
PPT
PPT
Jeremias and ambar
PPT
Jungle Animals
PPT
About Animals
PPTX
South East Asian Jungle
PPT
Animal’s descripction lisandro
PPTX
Monitoring Animals/ Fauna in Malaysia
HappyDev'15 Keynote: Когда все данные станут большими...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
Мастер-класс по BigData Tools для HappyDev'15
Java BigData Full Stack Development (version 2.0)
Joker'15 Java straitjackets for MongoDB
5th Period Temperate Rain Forest
2 animals meet in the jungle
Forest Animals
C:\Fakepath\Welcome To The Animal World 2
Jungle operation join
The five senses
Forest Animals
Jeremias and ambar
Jungle Animals
About Animals
South East Asian Jungle
Animal’s descripction lisandro
Monitoring Animals/ Fauna in Malaysia
Ad

Similar to Hadoop Jungle (20)

PPTX
EuroPython 2015 - Big Data with Python and Hadoop
PDF
Mapreduce by examples
PDF
Hadoop 101 for bioinformaticians
PDF
Spark overview
PDF
Getting started with Hadoop, Hive, and Elastic MapReduce
PDF
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
PDF
Osd ctw spark
PPTX
Hadoop gets Groovy
PDF
Hadoop interview question
PPTX
Hadoop ecosystem
PDF
Hadoop ecosystem
PPTX
The Fundamentals Guide to HDP and HDInsight
PDF
Full stack analytics with Hadoop 2
PDF
Devoxx UK 2014 High Performance In-Memory Java with Open Source
PDF
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
PDF
Hadoop interview questions
PPTX
Think Distributed: The Hazelcast Way
PDF
myHadoop 0.30
PPTX
Amazon elastic map reduce
PPTX
Distributed caching and computing v3.7
EuroPython 2015 - Big Data with Python and Hadoop
Mapreduce by examples
Hadoop 101 for bioinformaticians
Spark overview
Getting started with Hadoop, Hive, and Elastic MapReduce
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
Osd ctw spark
Hadoop gets Groovy
Hadoop interview question
Hadoop ecosystem
Hadoop ecosystem
The Fundamentals Guide to HDP and HDInsight
Full stack analytics with Hadoop 2
Devoxx UK 2014 High Performance In-Memory Java with Open Source
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Hadoop interview questions
Think Distributed: The Hazelcast Way
myHadoop 0.30
Amazon elastic map reduce
Distributed caching and computing v3.7

More from Alexey Zinoviev (20)

PDF
Kafka pours and Spark resolves
PDF
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
PDF
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
PDF
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
PDF
Joker'14 Java as a fundamental working tool of the Data Scientist
PDF
First steps in Data Mining Kindergarten
PDF
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
PDF
Android Geo Apps in Soviet Russia: Latitude and longitude find you
PDF
Keynote on JavaDay Omsk 2014 about new features in Java 8
PDF
Big data algorithms and data structures for large scale graphs
PDF
"Говнокод-шоу"
PDF
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
PDF
Алгоритмы и структуры данных BigData для графов большой размерности
PDF
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
PDF
GDG Devfest Omsk 2013. Year of events!
PDF
How to port JavaScript library to Android and iOS
PDF
Поездка на IT-DUMP 2012
PDF
MyBatis и Hibernate на одном проекте. Как подружить?
PDF
Google I/O туда и обратно.
PDF
Google Maps. Zinoviev Alexey.
Kafka pours and Spark resolves
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Joker'14 Java as a fundamental working tool of the Data Scientist
First steps in Data Mining Kindergarten
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
Android Geo Apps in Soviet Russia: Latitude and longitude find you
Keynote on JavaDay Omsk 2014 about new features in Java 8
Big data algorithms and data structures for large scale graphs
"Говнокод-шоу"
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Алгоритмы и структуры данных BigData для графов большой размерности
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
GDG Devfest Omsk 2013. Year of events!
How to port JavaScript library to Android and iOS
Поездка на IT-DUMP 2012
MyBatis и Hibernate на одном проекте. Как подружить?
Google I/O туда и обратно.
Google Maps. Zinoviev Alexey.

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.

Hadoop Jungle