SlideShare a Scribd company logo
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Watching
Googlebot
Watching You
–
Optimizing with Server Logs
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
All of us share a common goal
To be crawled, indexed, and ranked.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
We spend a lot of time and energy
figuring out how to do it better.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Hey! It’s
me talking
about
how to
pwn JS
SEO.
(That was
rad.)
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Aaaaaand
here’s my
twitter feed
two weeks
later…
#whompwhomp
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Happy ending,
but more fable
than fairy tale.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Actions > Words.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
What exactly is Googlebot
crawling?
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Things this report is not
● solely traditional web pages
● details about which
Googlebot is crawling
● just pages 200 response
codes
● reflective of how many
unique pages are crawled
● bigger ≠ better
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Does reality match our
expectations?
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
https://twitter.com/JohnMu/status/856449976351825921
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Was the crawl healthy?
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Googlebot is designed to be a
good citizen of the web...
For Googlebot a speedy site is
a sign of healthy servers...
If the site slows down or
responds with server errors,
the [crawl rate] limit goes down
and Googlebot crawls less.
Politeness is
job 0
https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-
googlebot.html
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Is it already too late?
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
https://twitter.com/JohnMu/status/1032553570468552704
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Server logs
are a record
of every
request a
server
receives.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
How do I get logs?
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
credit: https://flic.kr/p/cnorAf
Make new allies.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Ask: Is there already a log
management platform in place?
Be Clear: We do not want Personal
Identification Information (PII) and
request it be removed
BE SPECIFIC.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Web
Server 1
Web
Server 2
Web
Server 3
CDN
DDOS Mitigation/Bot Manager
Logs can
come from
multiple
places in
your stack.
Load Balancer
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Accessing Log Files
Apache (Linux Server)
NGINX (Linux Server)
IIS log files (Windows Server)
AWS Load Balancer (Load Balancer)
Google Cloud Load Balancer (Load Balancer)
AWS Cloudfront (CDN)
Accessing CloudFare log files (CDN)
Incapsula (CDN/DDoS Mitigation)
Akamai logs (CDN/DDoS Mitigation)
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Aggregate Log
Data
Validate
Googlebot
Read Log Data
Parse logs for
meaningful
search and
analysis
Log
Source
1
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Some
assembly
required.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Ways to read logs
Paid: Botify, Logz.io, Sumo Logic, Splunk
Free(mium): Screaming Frog Log Analyzer,
Big Query
Masochistic: Excel, Command Line
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Server IP
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Server Name*
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616"-"
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Date & Time
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616"-"
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Requester’s IP
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Request Method
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
URI
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Hostname*
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Response Code
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Response Size*
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Response Time
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Requester’s User Agent
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
216.150.168.131 emeasrvr003
[07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET
/twiki/bin/view/TWiki/WikiSyntax?q=ntoon
HTTP/1.1 www.arrow.com 200 7352 616 -
Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu
ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge
cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+
(compatible;+Googlebot/2.1;++http://www.google
.com/bot.html) https://www.arrow.com/en/
indiegogo
Referring URL*
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Aggregate Log
Data
Validate
Googlebot
Read Log Data
Parse logs for
meaningful
search and
analysis
Log
Source
1
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
https://support.google.com/webmasters/answer/80553?hl=en
Validate Googlebot IPs: Manual
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Validate Googlebot IPs: Bulk with Script
https://dzone.com/articles/shell-script-to-detect-if-the-ip-address-is-google-1
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Validate Googlebot IPs: Log
Analyser Functionality
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Aggregate Log
Data
Validate
Googlebot
Read Log Data
Parse logs for
meaningful
search and
analysis
Log
Source
1
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Unlock logs ≤ 6 lines
• Data Source
• Condition
• Parse
• Aggregate
• Sort
• Limit
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
!!Every site will be different.
Make a new engineering ally.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Use Case
Site section with low
index coverage
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Parsing URL Structure
/en/products/blam-o/log-12345
}
}
Language
App
}
Manufacturer
}
SKU
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
LPT: Limit is how you keep
your access to server logs.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
If 40% of my site is articles,
should those URLS
represent 80% of crawl?
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Use Case
Google chose a
different canonical than
user.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Query
Duplicate domains by
looking for ‘hostname’
values
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Use Case
Sudden crawl flux
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Query
Count by response code
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
No clear answers?
Dig deeper.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Query
Broken JS, CSS, or
AJAX Endpoints
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Use Case
Intermittent crawl
errors
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Query
Server parity
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Analysing Log Files
Screaming Frog Log Analysis
BigQuery + Google Cloud Storage Services
Excel + .csv
Big Query + .csv
Command Line
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Making the most of logs means adapting to
your environment and making new friends.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Iterate.
Test.
Share what you learn.
Jamie Alberico | @Jammer_Volts | #TechSEOBoost
Thank you for your time, energy, and
being part of this wonderful
community
–
@Jammer_Volts
totally@not-a-robot.com

More Related Content

PDF
Rendering strategies: Measuring the devil's details in core web vitals - Jam...
PDF
Navigating the critical rendering path - Jamie Alberico - VirtuaCon
PDF
Do SEOs Need to Know About Chromium? Of CORS! Extended Edition - BrightonSEO ...
PDF
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
PPTX
Render v Rank SEO for JavaScript - SEMPDX EngagePDX 2019
PPTX
Optimizing for Mobile First Index
PPTX
Rendering SEO (explained by Google's Martin Splitt)
PDF
Challenges of building a search engine like web rendering service
Rendering strategies: Measuring the devil's details in core web vitals - Jam...
Navigating the critical rendering path - Jamie Alberico - VirtuaCon
Do SEOs Need to Know About Chromium? Of CORS! Extended Edition - BrightonSEO ...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
Render v Rank SEO for JavaScript - SEMPDX EngagePDX 2019
Optimizing for Mobile First Index
Rendering SEO (explained by Google's Martin Splitt)
Challenges of building a search engine like web rendering service

What's hot (20)

PPTX
SearchLove San Diego 2018 | Mat Clayton | Site Speed for Digital Marketers
PPTX
The New Renaissance of JavaScript
PDF
The State of the Web: Pagination and Infinite Scroll
PPTX
DeepCrawl Webinar: Performing SEO on the Edge
PDF
Crafting Expertise, Authority and Trust with Entity-Based Content Strategy - ...
PDF
Automating Google Lighthouse
PPTX
SearchLove Boston 2018 - Bartosz Goralewicz - JavaScript: Looking Past the ...
PDF
Browser Changes That Will Impact SEO From 2019-2020
PDF
Hey Googlebot, did you cache that ?
PDF
Web Performance & Search Engines - A look beyond rankings
PPTX
Technical SEO "Overoptimization"
PPTX
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
PDF
Debugging rendering problems at scale
PPTX
5 Time-Saving SEO Alerts to Use Right Now - brightonSEO 2019
PDF
SEO for Angular - BrightonSEO 2018
PPTX
Technical Foundations of Successful Internationalization - SMX Munich
PPTX
Accelerated Mobile - Beyond AMP
PDF
Mauro Cattaneo - Why hreflang is crucial to international SEO success - Brigh...
PPTX
GTM Clowns, fun and hacks - Search Elite - May 2017 Gerry White
PDF
SearchLove San Diego 2018 | Tom Anthony | An Introduction to HTTP/2 & Service...
SearchLove San Diego 2018 | Mat Clayton | Site Speed for Digital Marketers
The New Renaissance of JavaScript
The State of the Web: Pagination and Infinite Scroll
DeepCrawl Webinar: Performing SEO on the Edge
Crafting Expertise, Authority and Trust with Entity-Based Content Strategy - ...
Automating Google Lighthouse
SearchLove Boston 2018 - Bartosz Goralewicz - JavaScript: Looking Past the ...
Browser Changes That Will Impact SEO From 2019-2020
Hey Googlebot, did you cache that ?
Web Performance & Search Engines - A look beyond rankings
Technical SEO "Overoptimization"
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
Debugging rendering problems at scale
5 Time-Saving SEO Alerts to Use Right Now - brightonSEO 2019
SEO for Angular - BrightonSEO 2018
Technical Foundations of Successful Internationalization - SMX Munich
Accelerated Mobile - Beyond AMP
Mauro Cattaneo - Why hreflang is crucial to international SEO success - Brigh...
GTM Clowns, fun and hacks - Search Elite - May 2017 Gerry White
SearchLove San Diego 2018 | Tom Anthony | An Introduction to HTTP/2 & Service...
Ad

Similar to Optimizing with Server Logs | Jamie Alberico @ #TechSEO Boost 2018 (20)

PDF
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
PDF
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
PDF
The Technical SEO Renaissance
PDF
The Technical Seo Renaissance - Mike King
PDF
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
PDF
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
PDF
What I Learned Building a Toy Example to Crawl & Render like Google
PDF
SEOday 2017 - Technical SEO to get excited about
PDF
Log analysis and pro use cases for search marketers online version (1)
PPT
GNUCITIZEN Pdp Owasp Day September 2007
PPTX
Introduction to Programming Bots
PDF
Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021
ODP
Search Engine Spiders
KEY
Search Engine Optimize for WordPress in 3 Easy Steps
PPTX
Server-side SEO (The art of making love to spiders) by Boaz Sasoon (SimilarWeb)
PPTX
Server side SEO - The art of making love to spiders
PDF
Lessons From Spider Support
PDF
Frontend. Global domination.
PDF
Front-end. Global domination
PDF
Google Hacking
TechSEO Boost 2018: Watching Googlebot Watching You: Optimizing with Server Logs
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
The Technical SEO Renaissance
The Technical Seo Renaissance - Mike King
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
What I Learned Building a Toy Example to Crawl & Render like Google
SEOday 2017 - Technical SEO to get excited about
Log analysis and pro use cases for search marketers online version (1)
GNUCITIZEN Pdp Owasp Day September 2007
Introduction to Programming Bots
Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021
Search Engine Spiders
Search Engine Optimize for WordPress in 3 Easy Steps
Server-side SEO (The art of making love to spiders) by Boaz Sasoon (SimilarWeb)
Server side SEO - The art of making love to spiders
Lessons From Spider Support
Frontend. Global domination.
Front-end. Global domination
Google Hacking
Ad

Recently uploaded (20)

PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PDF
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
PDF
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
t_and_OpenAI_Combined_two_pressentations
PDF
The Evolution of Traditional to New Media .pdf
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPT
Ethics in Information System - Management Information System
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PDF
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
Funds Management Learning Material for Beg
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPTX
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
PDF
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
PPTX
Mathew Digital SEO Checklist Guidlines 2025
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
artificialintelligenceai1-copy-210604123353.pptx
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Exploring VPS Hosting Trends for SMBs in 2025
t_and_OpenAI_Combined_two_pressentations
The Evolution of Traditional to New Media .pdf
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
Ethics in Information System - Management Information System
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
Power Point - Lesson 3_2.pptx grad school presentation
Funds Management Learning Material for Beg
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
SAP Ariba Sourcing PPT for learning material
newyork.pptxirantrafgshenepalchinachinane
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
Mathew Digital SEO Checklist Guidlines 2025

Optimizing with Server Logs | Jamie Alberico @ #TechSEO Boost 2018

  • 1. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Watching Googlebot Watching You – Optimizing with Server Logs
  • 2. Jamie Alberico | @Jammer_Volts | #TechSEOBoost All of us share a common goal To be crawled, indexed, and ranked.
  • 3. Jamie Alberico | @Jammer_Volts | #TechSEOBoost We spend a lot of time and energy figuring out how to do it better.
  • 4. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Hey! It’s me talking about how to pwn JS SEO. (That was rad.)
  • 5. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Aaaaaand here’s my twitter feed two weeks later… #whompwhomp
  • 6. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Happy ending, but more fable than fairy tale.
  • 7. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Actions > Words.
  • 8. Jamie Alberico | @Jammer_Volts | #TechSEOBoost What exactly is Googlebot crawling?
  • 9. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Things this report is not ● solely traditional web pages ● details about which Googlebot is crawling ● just pages 200 response codes ● reflective of how many unique pages are crawled ● bigger ≠ better
  • 10. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Does reality match our expectations?
  • 11. Jamie Alberico | @Jammer_Volts | #TechSEOBoost https://twitter.com/JohnMu/status/856449976351825921
  • 12. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Was the crawl healthy?
  • 13. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Googlebot is designed to be a good citizen of the web... For Googlebot a speedy site is a sign of healthy servers... If the site slows down or responds with server errors, the [crawl rate] limit goes down and Googlebot crawls less. Politeness is job 0 https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for- googlebot.html
  • 14. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Is it already too late?
  • 15. Jamie Alberico | @Jammer_Volts | #TechSEOBoost https://twitter.com/JohnMu/status/1032553570468552704
  • 16. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Server logs are a record of every request a server receives.
  • 17. Jamie Alberico | @Jammer_Volts | #TechSEOBoost
  • 18. Jamie Alberico | @Jammer_Volts | #TechSEOBoost How do I get logs?
  • 19. Jamie Alberico | @Jammer_Volts | #TechSEOBoost credit: https://flic.kr/p/cnorAf Make new allies.
  • 20. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Ask: Is there already a log management platform in place? Be Clear: We do not want Personal Identification Information (PII) and request it be removed BE SPECIFIC.
  • 21. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Web Server 1 Web Server 2 Web Server 3 CDN DDOS Mitigation/Bot Manager Logs can come from multiple places in your stack. Load Balancer
  • 22. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Accessing Log Files Apache (Linux Server) NGINX (Linux Server) IIS log files (Windows Server) AWS Load Balancer (Load Balancer) Google Cloud Load Balancer (Load Balancer) AWS Cloudfront (CDN) Accessing CloudFare log files (CDN) Incapsula (CDN/DDoS Mitigation) Akamai logs (CDN/DDoS Mitigation)
  • 23. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Aggregate Log Data Validate Googlebot Read Log Data Parse logs for meaningful search and analysis Log Source 1
  • 24. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Some assembly required.
  • 25. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Ways to read logs Paid: Botify, Logz.io, Sumo Logic, Splunk Free(mium): Screaming Frog Log Analyzer, Big Query Masochistic: Excel, Command Line
  • 26. Jamie Alberico | @Jammer_Volts | #TechSEOBoost
  • 27. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Server IP
  • 28. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Server Name*
  • 29. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616"-" Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Date & Time
  • 30. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616"-" Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Requester’s IP
  • 31. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Request Method
  • 32. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo URI
  • 33. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Hostname*
  • 34. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Response Code
  • 35. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Response Size*
  • 36. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Response Time
  • 37. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Requester’s User Agent
  • 38. Jamie Alberico | @Jammer_Volts | #TechSEOBoost 216.150.168.131 emeasrvr003 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax?q=ntoon HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo Referring URL*
  • 39. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Aggregate Log Data Validate Googlebot Read Log Data Parse logs for meaningful search and analysis Log Source 1
  • 40. Jamie Alberico | @Jammer_Volts | #TechSEOBoost https://support.google.com/webmasters/answer/80553?hl=en Validate Googlebot IPs: Manual
  • 41. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Validate Googlebot IPs: Bulk with Script https://dzone.com/articles/shell-script-to-detect-if-the-ip-address-is-google-1
  • 42. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Validate Googlebot IPs: Log Analyser Functionality
  • 43. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Aggregate Log Data Validate Googlebot Read Log Data Parse logs for meaningful search and analysis Log Source 1
  • 44. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Unlock logs ≤ 6 lines • Data Source • Condition • Parse • Aggregate • Sort • Limit
  • 45. Jamie Alberico | @Jammer_Volts | #TechSEOBoost !!Every site will be different. Make a new engineering ally.
  • 46. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Use Case Site section with low index coverage
  • 47. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Parsing URL Structure /en/products/blam-o/log-12345 } } Language App } Manufacturer } SKU
  • 48. Jamie Alberico | @Jammer_Volts | #TechSEOBoost LPT: Limit is how you keep your access to server logs.
  • 49. Jamie Alberico | @Jammer_Volts | #TechSEOBoost
  • 50. Jamie Alberico | @Jammer_Volts | #TechSEOBoost If 40% of my site is articles, should those URLS represent 80% of crawl?
  • 51. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Use Case Google chose a different canonical than user.
  • 52. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Query Duplicate domains by looking for ‘hostname’ values
  • 53. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Use Case Sudden crawl flux
  • 54. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Query Count by response code
  • 55. Jamie Alberico | @Jammer_Volts | #TechSEOBoost No clear answers? Dig deeper.
  • 56. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Query Broken JS, CSS, or AJAX Endpoints
  • 57. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Use Case Intermittent crawl errors
  • 58. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Query Server parity
  • 59. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Analysing Log Files Screaming Frog Log Analysis BigQuery + Google Cloud Storage Services Excel + .csv Big Query + .csv Command Line
  • 60. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Making the most of logs means adapting to your environment and making new friends.
  • 61. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Iterate. Test. Share what you learn.
  • 62. Jamie Alberico | @Jammer_Volts | #TechSEOBoost Thank you for your time, energy, and being part of this wonderful community – @Jammer_Volts totally@not-a-robot.com