Table of Contents
1. Abstract 1
2. Architecture 1
3. Tools 1-2
4. Configurations 2-7
5. Code Snippets
6. Screenshots
7. References
Table of Figures
1 Log Analysis using Kafka Streaming 8
2 Log Analysis Web Page with Statistics
3 Top Endpoints
4 Frequent IP Addresses
5 Frequent IP Addresses Last Window
6 Spark Environment
7 Spark Jobs triggered during execution
8 RDD Storage
9 Streaming Statistics
10 Streaming Statistics after burst input
1
Abstract
This project aims at Analyzing the logs being streamed into spark using Kafka. This project has
an interactive Web Page to show log analysis of number of logs being streamed all time and
Last time window, response code counts, frequent IP Addresses and top-endpoints based on
request frequency.
Design and architecture:
Figure 1: Log Analysis using Kafka Streaming
Tools Used:
● Scala 2.10
● Java 8
● Apache Spark 1.5.2
● Apache Kafka 2.10.-0.8.2.0
● Ubuntu Linux Server
2
Configurations:
Setting up a Multi - broker Kafka Cluster :
Start ZooKeeper
Kafka ships with a reasonable default ZooKeeper configuration for our simple use case. The
following command launches a local ZooKeeper instance.
bin/zookeeper-server-start.sh config/zookeeper.
Note : By default the ZooKeeper server will listen on *:2181/tcp.
Configure and start the Kafka brokers
We will create 2 Kafka brokers, whose configurations are based on the default
config/server.properties. Apart from the settings below the configurations of the brokers are
identical.
The first broker:
Create the config file for broker 1
cp config/server.properties config/server1.properties
Edit config/server1.properties and replace the existing config values as follows:
broker.id=1
port=9092
log.dir=/tmp/kafka-logs-1
3
The second broker:
Create the config file for broker 2
cp config/server.properties config/server2.properties
Edit config/server2.properties and replace the existing config values as follows:
broker.id=2
port=9093
log.dir=/tmp/kafka-logs-2
Now you can start each Kafka broker in a separate console:
Start first broker in its own terminal session:
bin/kafka-server-start.sh config/server1.properties
Start second broker in its own terminal session:
bin/kafka-server-start.sh config/server2.properties
Create a Kafka topic :
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic topicOne --partitions 3 --
replication-factor 2
4
Commands:
Start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Start Kafka Server (Broker):
bin/kafka-server-start.sh config/server.properties
Create topics:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --
topic topicOne
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --
topic topicTwo
Start Producer:
bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicOne
bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicTwo
Spark Command to execute KafkaLogAnalyzerApplication :
Note :Locate jar file as per project hierarchy
bin/spark-submit --class "com.cs696.bigdata.loganalyzer.KafkaLogAnalyzerApplication" --
master local[20] projectFinal/app/java8/target/uber-log-analysis-1.0.jar --output_html_file
/tmp/log_stats.html
5
Code Snippets:
Integrating Apache Kafka with Log Analyzer Application:
//We stream in the logs through Apache Kafka using multiple brokers which will be configured in the
//producer.properties file under config directory
HashSet<String> topicsSet = new
HashSet<String>(Arrays.asList(LogAnalyzerFlags.getInstance().getTopics().split(",")));
HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list",LogAnalyzerFlags.getInstance().getBrokers());
// Create Pair Input DStream kafka stream with brokers and topics
JavaPairInputDStream<String, String> logRecords = KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
6
Screen Shots:
Figure 2: Log Analysis Web Page with Statistics
Figure 3: Top Endpoints
7
Figure 4: Frequent IP Addresses
Figure 5: Frequent IP Addresses Last Window
8
Figure 6: Spark Environment
Figure 7: Spark Jobs triggered during execution
9
Figure 8: RDD Storage
Figure 9: Streaming Statistics
10
Figure 10: Streaming Statistics after burst input
References:
● http://spark.apache.org/docs/latest/streaming-kafka-integration.html
● http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-
cluster-on-a-single-node/
● https://databricks.gitbooks.io/databricks-spark-reference-
applications/content/logs_analyzer/chapter1/streaming.html

More Related Content

PPTX
Message queue demo
PDF
Setup 3 Node Kafka Cluster on AWS - Hands On
PDF
A user's perspective on SaltStack and other configuration management tools
PDF
Optimizing kubernetes networking
PDF
KubeCon EU 2016: Getting the Jobs Done With Kubernetes
PDF
How the OOM Killer Deleted My Namespace
PDF
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
PDF
OpenNebula and SaltStack - OpenNebulaConf 2013
Message queue demo
Setup 3 Node Kafka Cluster on AWS - Hands On
A user's perspective on SaltStack and other configuration management tools
Optimizing kubernetes networking
KubeCon EU 2016: Getting the Jobs Done With Kubernetes
How the OOM Killer Deleted My Namespace
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
OpenNebula and SaltStack - OpenNebulaConf 2013

What's hot (20)

PDF
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
PDF
Docker and Maestro for fun, development and profit
PDF
IMPACT/myGrid Hackathon - Taverna Server as a Portal
PPTX
Container Monitoring with Sysdig
PPTX
DCUS17 : Docker networking deep dive
PDF
OSMC 2021 | Icinga-Installer – the easy way to your Icinga
PPTX
Docker Networking & Swarm Mode Introduction
PPTX
High availability for puppet - 2016
PDF
Nginx Internals
PDF
Evolution of kube-proxy (Brussels, Fosdem 2020)
PPTX
Capistrano 3 Deployment
PPTX
Docker cluster with swarm, consul, registrator and consul-template
DOCX
DockerCoreNet
PDF
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
PDF
How to contribute Apache CloudStack
PPTX
Control your deployments with Capistrano
PDF
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
PDF
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
PDF
SRX Automation at Groupon
PDF
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker and Maestro for fun, development and profit
IMPACT/myGrid Hackathon - Taverna Server as a Portal
Container Monitoring with Sysdig
DCUS17 : Docker networking deep dive
OSMC 2021 | Icinga-Installer – the easy way to your Icinga
Docker Networking & Swarm Mode Introduction
High availability for puppet - 2016
Nginx Internals
Evolution of kube-proxy (Brussels, Fosdem 2020)
Capistrano 3 Deployment
Docker cluster with swarm, consul, registrator and consul-template
DockerCoreNet
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
How to contribute Apache CloudStack
Control your deployments with Capistrano
Running at Scale: Practical Performance Tuning with Puppet - PuppetConf 2013
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
SRX Automation at Groupon
Arnold Bechtoldt, Inovex GmbH Linux systems engineer - Configuration Manageme...
Ad

Similar to Final_Report_new (1) (20)

PDF
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
PDF
Stream Processing using Apache Spark and Apache Kafka
DOCX
Apache kafka configuration-guide
PDF
Scaling docker with kubernetes
DOCX
Kafk a with zoo keeper setup documentation
PPTX
Training
PPTX
Apache Kafka
PDF
Integrating Apache Web Server with Tomcat Application Server
PDF
Integrating Apache Web Server with Tomcat Application Server
PDF
Publishing AwsLlambda Logs Into SplunkCloud
PDF
Lessons learned and challenges faced while running Kubernetes at Scale
PPTX
Docker 1.11 Presentation
PPT
Spark Streaming Info
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
Kafka Workshop
PDF
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
PDF
Stream Processing with Apache Kafka and .NET
PDF
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
PDF
Introduction to apache kafka
PDF
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Stream Processing using Apache Spark and Apache Kafka
Apache kafka configuration-guide
Scaling docker with kubernetes
Kafk a with zoo keeper setup documentation
Training
Apache Kafka
Integrating Apache Web Server with Tomcat Application Server
Integrating Apache Web Server with Tomcat Application Server
Publishing AwsLlambda Logs Into SplunkCloud
Lessons learned and challenges faced while running Kubernetes at Scale
Docker 1.11 Presentation
Spark Streaming Info
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Kafka Workshop
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Stream Processing with Apache Kafka and .NET
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
Introduction to apache kafka
TDC2016POA | Trilha Arquitetura - Apache Kafka: uma introdução a logs distrib...
Ad

More from Adarsh Burma (7)

DOCX
Summary2 (1)
DOCX
Summary_Onset (1)
PDF
unofficial_Transcript (1)
PDF
Toss_Up
PDF
brf to mathml
PDF
Academic_Projects
PDF
Keste_Projects
Summary2 (1)
Summary_Onset (1)
unofficial_Transcript (1)
Toss_Up
brf to mathml
Academic_Projects
Keste_Projects

Final_Report_new (1)

  • 1. Table of Contents 1. Abstract 1 2. Architecture 1 3. Tools 1-2 4. Configurations 2-7 5. Code Snippets 6. Screenshots 7. References Table of Figures 1 Log Analysis using Kafka Streaming 8 2 Log Analysis Web Page with Statistics 3 Top Endpoints 4 Frequent IP Addresses 5 Frequent IP Addresses Last Window 6 Spark Environment 7 Spark Jobs triggered during execution 8 RDD Storage 9 Streaming Statistics 10 Streaming Statistics after burst input
  • 2. 1 Abstract This project aims at Analyzing the logs being streamed into spark using Kafka. This project has an interactive Web Page to show log analysis of number of logs being streamed all time and Last time window, response code counts, frequent IP Addresses and top-endpoints based on request frequency. Design and architecture: Figure 1: Log Analysis using Kafka Streaming Tools Used: ● Scala 2.10 ● Java 8 ● Apache Spark 1.5.2 ● Apache Kafka 2.10.-0.8.2.0 ● Ubuntu Linux Server
  • 3. 2 Configurations: Setting up a Multi - broker Kafka Cluster : Start ZooKeeper Kafka ships with a reasonable default ZooKeeper configuration for our simple use case. The following command launches a local ZooKeeper instance. bin/zookeeper-server-start.sh config/zookeeper. Note : By default the ZooKeeper server will listen on *:2181/tcp. Configure and start the Kafka brokers We will create 2 Kafka brokers, whose configurations are based on the default config/server.properties. Apart from the settings below the configurations of the brokers are identical. The first broker: Create the config file for broker 1 cp config/server.properties config/server1.properties Edit config/server1.properties and replace the existing config values as follows: broker.id=1 port=9092 log.dir=/tmp/kafka-logs-1
  • 4. 3 The second broker: Create the config file for broker 2 cp config/server.properties config/server2.properties Edit config/server2.properties and replace the existing config values as follows: broker.id=2 port=9093 log.dir=/tmp/kafka-logs-2 Now you can start each Kafka broker in a separate console: Start first broker in its own terminal session: bin/kafka-server-start.sh config/server1.properties Start second broker in its own terminal session: bin/kafka-server-start.sh config/server2.properties Create a Kafka topic : bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic topicOne --partitions 3 -- replication-factor 2
  • 5. 4 Commands: Start Zookeeper: bin/zookeeper-server-start.sh config/zookeeper.properties Start Kafka Server (Broker): bin/kafka-server-start.sh config/server.properties Create topics: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 -- topic topicOne bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 -- topic topicTwo Start Producer: bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicOne bin/kafka-console-producer.sh --broker-list localhost:9091,localhost:9092 --topic topicTwo Spark Command to execute KafkaLogAnalyzerApplication : Note :Locate jar file as per project hierarchy bin/spark-submit --class "com.cs696.bigdata.loganalyzer.KafkaLogAnalyzerApplication" -- master local[20] projectFinal/app/java8/target/uber-log-analysis-1.0.jar --output_html_file /tmp/log_stats.html
  • 6. 5 Code Snippets: Integrating Apache Kafka with Log Analyzer Application: //We stream in the logs through Apache Kafka using multiple brokers which will be configured in the //producer.properties file under config directory HashSet<String> topicsSet = new HashSet<String>(Arrays.asList(LogAnalyzerFlags.getInstance().getTopics().split(","))); HashMap<String, String> kafkaParams = new HashMap<String, String>(); kafkaParams.put("metadata.broker.list",LogAnalyzerFlags.getInstance().getBrokers()); // Create Pair Input DStream kafka stream with brokers and topics JavaPairInputDStream<String, String> logRecords = KafkaUtils.createDirectStream( jssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet );
  • 7. 6 Screen Shots: Figure 2: Log Analysis Web Page with Statistics Figure 3: Top Endpoints
  • 8. 7 Figure 4: Frequent IP Addresses Figure 5: Frequent IP Addresses Last Window
  • 9. 8 Figure 6: Spark Environment Figure 7: Spark Jobs triggered during execution
  • 10. 9 Figure 8: RDD Storage Figure 9: Streaming Statistics
  • 11. 10 Figure 10: Streaming Statistics after burst input References: ● http://spark.apache.org/docs/latest/streaming-kafka-integration.html ● http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka- cluster-on-a-single-node/ ● https://databricks.gitbooks.io/databricks-spark-reference- applications/content/logs_analyzer/chapter1/streaming.html