SlideShare a Scribd company logo
Alex Johnson
alex (at) white.net
@alex_cestrian
USING SERVER LOGS TO YOUR
ADVANTAGE
@alex_cestrian #OptimiseOxford#OptimiseOxford
What are server logs?
@alex_cestrian #OptimiseOxford#OptimiseOxford
A server log is a simple text file which
records activity on a server.
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
So why bother looking at server
logs?
@alex_cestrian #OptimiseOxford#OptimiseOxford
There is only one resource that tells you what
search engines are looking for on a domain…
These are web server logs.
including stuff they found 13 years ago.
@alex_cestrian #OptimiseOxford#OptimiseOxford
How do we analyse all that data?
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford#OptimiseOxford
2 SCENARIOS
@alex_cestrian #OptimiseOxford#OptimiseOxford
Scenario 1
IDENTIFY ORPHAN PAGES
@alex_cestrian #OptimiseOxford#OptimiseOxford
An orphan is a page that is not linked to by another page on the site.
Homepage
Dresses Skirts Our offers
Summer 2016
offers
@alex_cestrian #OptimiseOxford#OptimiseOxford
Summer 2016 Offers
@alex_cestrian #OptimiseOxford
Why are orphan pages bad?
• There may be a lot of them, and they may be
competing with your ‘live’ content
• They waste GoogleBot’s crawl budget for your
domain
@alex_cestrian #OptimiseOxford#OptimiseOxford
So how do we find orphan pages using
log files?
@alex_cestrian #OptimiseOxford
Upload a crawl of your website (from SF, DeepCrawl etc)
URLs that return a 200 ✅ status code… that don’t appear in the crawl of
your site
@alex_cestrian #OptimiseOxford
Redundant content,
off little value
404/410 status code
Relevant, valuable but
out-of-date
301 redirect to
relevant live page
Useful content that
orphaned accidentally
Re-attach the page to
the website
@alex_cestrian #OptimiseOxford
If GoogleBot is wasting lots of time in a specific folder full of orphan
pages that hold no value, block it via robots.txt
@alex_cestrian #OptimiseOxford#OptimiseOxford
Scenario 2
IMPROVING CRAWL EFFICIENCY
@alex_cestrian #OptimiseOxford#OptimiseOxford
Find where GoogleBot is wasting time
Find parameter driven pages
@alex_cestrian #OptimiseOxford#OptimiseOxford
@alex_cestrian #OptimiseOxford
Block GoogleBot from crawling these URLs
@alex_cestrian #OptimiseOxford#OptimiseOxford
Find infrequently visited pages Order by number of
events: low to high
@alex_cestrian #OptimiseOxford#OptimiseOxford
• Is this URL in the xml sitemap?
• Is the page too deep within the architecture?
• Is internal linking to this page optimal?
• Are links to this page travelling through multiple redirects?
• Can GoogleBot actually parse the links pointing to this page?
@alex_cestrian #OptimiseOxford#OptimiseOxford
Look at all urls, and
filter by average
response time
Find slow loading pages
@alex_cestrian #OptimiseOxford#OptimiseOxford
If time taken is
consistently high, you
need to look at how
you can reduce the
load of the page
@alex_cestrian #OptimiseOxford#OptimiseOxford
“See what GoogleBot is actually
consuming. Improve GoogleBot’s
diet.”
Oliver Mason at Brighton SEO 2016
THANK
@alex_cestrian
ALEX JOHNSON
THANK
ALEX

More Related Content

PDF
Log File Analysis: The most powerful tool in your SEO toolkit
PDF
Server Logs: After Excel Fails
PPT
LatJUG. Google App Engine
PPT
Investigating server logs
PPTX
Elasticsearch Distributed search & analytics on BigData made easy
PDF
A Survey of Elasticsearch Usage
PPTX
MongoDB and Hadoop: Driving Business Insights
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Log File Analysis: The most powerful tool in your SEO toolkit
Server Logs: After Excel Fails
LatJUG. Google App Engine
Investigating server logs
Elasticsearch Distributed search & analytics on BigData made easy
A Survey of Elasticsearch Usage
MongoDB and Hadoop: Driving Business Insights
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...

What's hot (20)

PPTX
Realtimestream and realtime fastcatsearch
PPTX
MongoDB and Spark
PDF
Introduction to elasticsearch
PPTX
Beautiful REST+JSON APIs with Ion
PPTX
Introducing URL Shorteners
PDF
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
PPTX
Elasticsearch 5.0
PDF
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
PDF
How To Connect Spark To Your Own Datasource
PPTX
Big data at scrapinghub
PPTX
Google history nd architecture
PPTX
SemaGrow demonstrator: “Web Crawler + AgroTagger”
PDF
LAWDI - Rogue Linked Data
PDF
Gitminer 2.0 - Advance Search on Github
PPTX
Effective Searching by Dominik Kornas
PPTX
Data Science Stack with MongoDB and RStudio
PPTX
Watch Your Log!
ODP
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
PPTX
Elastic search overview
Realtimestream and realtime fastcatsearch
MongoDB and Spark
Introduction to elasticsearch
Beautiful REST+JSON APIs with Ion
Introducing URL Shorteners
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Elasticsearch 5.0
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
How To Connect Spark To Your Own Datasource
Big data at scrapinghub
Google history nd architecture
SemaGrow demonstrator: “Web Crawler + AgroTagger”
LAWDI - Rogue Linked Data
Gitminer 2.0 - Advance Search on Github
Effective Searching by Dominik Kornas
Data Science Stack with MongoDB and RStudio
Watch Your Log!
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Elastic search overview
Ad

Similar to Using server logs to your advantage (9)

PPTX
Alexis + Max - We Love SEO 19 - Bot X
PPTX
Alexis max-Creating a bot experience as good as your user experience - Alexis...
PDF
Discovering SEO Opportunities through Log Analysis #DTDConf
PDF
Log analysis and pro use cases for search marketers online version (1)
PPTX
User Experience and SEO
PPTX
Technical SEO FTW!
PDF
Advanced data-driven technical SEO - SMX London 2019
PPTX
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
PPTX
SEO Server Log File Analysis - What You Should Be Looking For - Tea-Time SEO ...
Alexis + Max - We Love SEO 19 - Bot X
Alexis max-Creating a bot experience as good as your user experience - Alexis...
Discovering SEO Opportunities through Log Analysis #DTDConf
Log analysis and pro use cases for search marketers online version (1)
User Experience and SEO
Technical SEO FTW!
Advanced data-driven technical SEO - SMX London 2019
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SEO Server Log File Analysis - What You Should Be Looking For - Tea-Time SEO ...
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Lecture1 pattern recognition............
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Knowledge Engineering Part 1
oil_refinery_comprehensive_20250804084928 (1).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IBA_Chapter_11_Slides_Final_Accessible.pptx
annual-report-2024-2025 original latest.
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Lecture1 pattern recognition............
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx
1_Introduction to advance data techniques.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Knowledge Engineering Part 1

Using server logs to your advantage

Editor's Notes

  • #4: I’m going to talk you through 3 scenarios where logs files can help you.
  • #5: I’m going to talk you through 3 scenarios where logs files can help you.
  • #6: This is a raw server log file. Boring isn’t it? So what do you do with this?
  • #10: Well there are a few options including tools like Botify and OnCrawl, but one of the most usable, affordable (and idiot-friendly ones) that has come onto the market in the past few years is Log Analyzer from Screaming Frog.
  • #11: It’s really easy to use, you can drag and drop your raw log files (or a zip file) directly into the program, and it sorts them out into manageable sets of data.
  • #12: By default the Log File Analyser only analyses search engine bot events, so the ‘Store Bot Events Only (Improves Performance)’ box is ticked. We recommend keeping this setting ticked, as it massively reduces time required to only have to store and compile search bots, rather than all event data from users and other User Agents.
  • #13: And you end up with a pretty dashboard like this. Doing that alone isn’t going to solve anything, so I’m not going to show you….
  • #14: 3 actionable scenarios where logs files can help you do your job….
  • #15: Let’s start with what is an orphan page?
  • #16: Some websites stop linking old content that is expired and do not deliver the right status code (like a 404 or a redirect to a newer version). The expired page is thus still available.
  • #21: What do you do with orphan pages when you identify them?
  • #22: What do you do with orphan pages when you identify them?
  • #24: Look for large quantities of parameter driven pages, and combinations of parameters. These will often be areas where GoogleBot is losing time and wasting resource.
  • #25: One common example of this is on Wordpress blogs. You’ll often find things like this in your log files/
  • #27: If you see category pages or main service pages at the top of this list – further investigation is much needed.
  • #28: Investigate why these pages haven’t been visited by search engines;
  • #30: Review each bot event for these URLs.
  • #31: Oliver Mason put this eloquently in his recent talk at the Brighton SEO conference:
  • #32: That’s just an overview of a few things you can do with log files. Once you start playing around and analysing the data, it’s really rather interesting.