SlideShare a Scribd company logo
Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
Who is Jim Jansen? Associate professor at College of Information Sciences and Technology, The  Pennsylvania State University , USA Senior Fellow at the  Pew Research Center  (Pew Internet and American Life Project) -  http://www.pewinternet.org   Active research and teaching efforts -  http://ist.psu.edu/faculty_pages/jjansen/   Several funded and non-funded research project Teach several courses, including keyword advertising Forthcoming book,  Understanding Sponsored Search  (Cambridge) …  theory of keyword advertising Editor of journal,  Internet Research  (Emerald) Book,  Understanding User-Web Interactions via Web Analytics  (Morgan & Claypool) -  basics of web analytics
Let talk  web analytics !  We’ll discuss: context theory application Begin by setting the stage …  what are we facing ?
Moving too ‘ everything ’ recorded  and indexed A lot  global  but much will remain  local Search  (along with data summarization, trend detection, information and knowledge extraction and discovery) is  foundational technology Raises issues, including: Infrastructure  requirements. How and who pays? Changes the nature of  privacy  and  anonymity As publishers or providers, how do we make sense of how people are using this data?  ---  Web analytics Explosion of Information - the  Zettabytes  are coming There will be nearly  15 billion devices  connected to the Internet, generating nearly a   Zettabyte  (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
How much is a Zettabyte?
The  volume of data  is exploding ( information growth ) The  complexity of data  is growing ( information architecture ) The users have  less time  ( attention economy ) The user expects  improved features  ( technological sophistication ) Explosion of Information - the  Zettabytes  are coming There will be nearly  15 billion devices  connected to the Internet, generating nearly a  Zettabyte  (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
Web analytics can help us … Deal with the  volume of data  ( information growth ) Understand the growing  complexity of data  ( information architecture ) Address users’  less time  ( attention economy ) Lead to  improved features  ( technological sophistication ) expected by users How does web analytics do this?
Thousand years ago:    science was mainly  naturalistic describing natural phenomena Last few hundred years:    theoretical  branch using models, generalizations Last few decades:    a  computational  branch simulating complex phenomena Today:     data exploration  (eScience) unifying theory, experiment, and simulation  Data  captured by sensors, instruments, or generated by simulator Processed  by humans and software Information   / knowledge  stored in computer Analyzes  database / collection content using data management and statistics Network and  Web Science Data    Information    Knowledge This is the realm of Web analytics!
What is web analytics? The Web Analytics Association (WAA) defines  Web analytics  as: the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage   ( http://www.webanalyticsassociation.org/ ) Shares common  theoretical  and  methodology  characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)
Let’s break that definition down …  Collection   -  accumulate  and  store  over a period of time  Internet data   - internet  facts  and  statistics  collected together for reference or analysis   Measurement  –  ascertain  the size, amount, or degree of something by using an instrument or device Analysis   -  examine methodically  the structure of information for purposes of explanation and interpretation.   Reporting   - giving a spoken or written  account  of something that one has investigated.  Understanding   -  perceive  the significance, explanation, or cause of something   Optimizing   - make the best or most  effective use  of a resource  Web usage   – employ or  deploy something as a  means of   accomplishing  a purpose or achieving a result   Data Information Knowledge
How is the data collected?
W3C Extended Log Format -Variety of fields for examining visitors to Web sites. Other common format is  NCSA   Separate Log  that is composed of three logs  Common log  – actions on the server,  Referral log  – where they came from, and  Agent log  – stuff about the client computer Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log.  W3C Extended Log Format
Okay, that’s  collection ?  What about  analysis  and  reporting ?
Variety of tools help make sense of this log data
With that  context , let’s look at the  foundations aspects  …
Theoretical Foundations Web analytics is based on the  behaviorism paradigm Behaviorism  – an approach focused on the outward  behavioral aspects  of thought and emphases the  observed behaviors Behaviorism   – Pavlov, Watson, and Skinner Burrhus Frederic Skinner  John B. Watson  Ivan Petrovich Pavlov
Behaviorism Characteristics Inductive ,  data-driven   and characterized by  empirical  observation of measurable behavior   Grounded on  somebody   doing   something  in a  situation  ( all   the environmental and situational features are embedded behaviors) Critics  of behaviorism as a psychological theory have issues with  rejection of mental processes . I agree  - people are more than “ mediators between behavior and the environment ” (Skinner, 1993, p 428) (c.f.c., social learning theory) …  however, don’t throw out the baby with the bath water
What is a Behavior? …  an  observable activity  of a person, animal, team, organization, or system. One can classify  behaviors  into three general categories. Behaviors are  something that one can  detect  and  record actions  or specific goal-driven  events  with some purpose other than the specific action that is observable reactive   responses  to environmental stimuli
What is a Behavior? Behavior is the  essential construct  of the behaviorism and of  web analytics Logs record  behaviors  of users and systems (records behavior but can’t tell  affective ,  cognitive , or  situational  aspects ..  yet, but we’re working on it!  ) A behavior is the key  variable  (i.e., an  entity  representing a  set of events  where each event may have a  different value )
can view the data collected in log files as  trace data   people  conducting the activities of their daily lives many times  create  things, create marks, induce wear, or  reduce  some existing  material within the confines of research, these things, marks, and wear become  data   classically, trace data are the  physical remains of people’s interaction   Data Collection: Trace Data Wear on a carpet Trash heap Surfing web
Trace Data In the past, trace data was often  time consuming  to gather and process, making such data costly.  logging software  makes collecting trace data on the Internet  easy  and  cheap Log data is  controlled accretion data , where the researcher or some other entity alters the environment in order to create the accretion data  With the user of client apps (such as desktop search bars), the  collection of data is nearly unlimited  from a technology perspective What is  cool  about  trace data  for researchers?
Data Collection Log data/trace data has  significant advantages  as a data collection approach for the study and investigation of behaviors, including: Scale : not a limiting factor as in lab user studies Power : large sample size for inference testing; in fact, so large must account for the size effect Scope : naturalistic; researchers can investigate  range of interactions in a multi-variable context Location : can collect in distributed environments Duration : collect log data over an extended period
Methodological Foundations Use of  logs  to collect  trace data  is an unobtrusive methods (a.k.a., non-reactive or low-constraint).  Unobtrusive methods  … allows data collection  without directly  interfering   into the context and,  does  not require a direct response  from participants  Customer Behavior (video) Chemistry (surface marking)
Methodological Foundations Three  justifications  for unobtrusive methods:  Uncertainty principle : researchers interjected into an environment become part of the system Observer effect : difference that is made to an activity or a person’s behaviors by being observed Observer bias : observers overemphasize behavior they expect to find and fail to notice behavior they do not expect Trace data helps in  overcoming  the  Uncertainty principle ,  Observer effect , and  Observer bias  in the data collection. Note: Observer bias for  data collection  but  not data analysis Example: ethnography studies (where the researcher “bird dogs” a study participant Example: no one searches for porn in a lab study of Web searching Example: is why medical trials are double blind rather than single blind
Methodological Foundations Inherent  characteristics  in the method of log data collection; Web analytics has issues to address as a result: Abstraction  –  how does one relate low-level data to higher-level concepts? Selection  –  how does one separate the necessary from unnecessary data?  Reduction  –  how does one reduce the complexity and size of the data set? Context  –  how does one interpret the significance of events?  Evolution  –  how can one collect data without impacting application deployment or use?
Okay, nice but how to we apply it …
Web analytics process  Every  consulting firm  has a  web analytics process  … (which is fine) However, the  effective ones  all boil down to  four essential steps
Essential steps to any effective web analytics process  Typically counts. Basically, data collection Examples: time stamp referral URL query term Typically ratios. Data becomes metrics. Counts and ratios infused with business strategy. Online goals, objectives, or standards for organization. Examples: time on page bounce rate unique visitors Examples: conversion rate average order value task completion rate Examples: save money make money marketshare Collection of  data Processing of data into information Developing key  performance  indicators Formulating online strategy Drives Drives Drives Drives Drives Drives
Three types ( plus 1 ) of Web analytics metrics Implementation Count  — the most  basic  unit of measure; a single number. Ratio  — typically, a  count divided by a count , although a ratio can use either a count or a ratio in the numerator or denominator. KPI  ( Key Performance Indicator ) — can be either a count or a  ratio , it is frequently a ratio. A KPI is  infused with   business strategy , and therefore the set of appropriate KPIs typically differs between site and process types. Dimension  - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically,  not associated with a number . Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Can be applied to three levels of granularity Aggregate  — Total site traffic for a defined period of time. ( typically used for market comparisons ) Segmented  — A subset of the site traffic for a  defined period of time, filtered in some way to gain greater analytical insight. ( by developing personas and profiles in Google Analytics ). Individual  — Activity of a single Web visitor for a defined period of time. ( excellent for persona developing and outlier analysis ) Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Classifications of Metrics Building Block  – foundational metrics  Visit Characterization  – metrics aimed at understanding visits, either single or aggregate Content Characterization  – metrics aimed at understanding content or its use Conversion  – metrics aimed at linking visits and content Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Building Block Page : A page is an analyst  definable unit of content . Page Views : The number of times a  page was viewed . Visits/Sessions : A visit is an interaction by an individual, with a website consisting of  one or more requests for a page . Unique Visitors : The number of inferred  individual people , within a designated reporting timeframe, with activity consisting of one or more visits to a site. New Visitor : The number of  Unique Visitors  with activity including a first-ever Visit to a site during a reporting period Repeat Visitor : The number of Unique Visitors with activity consisting of  two or more Visits  to a site during a reporting period. Return Visitor : The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the  Unique Visitor also Visited the site prior to the reporting period Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Visit Characteristics Entry Page : The  first page  of a visit. Landing Page  : A page intended to identify  the beginning of the user experience . Exit Page : The  last page  on a site accessed during a visit, signifying the end of a visit/session. Visit Duration : The  length of time  in a session. Referrer : The referrer is the page URL that originally  generated the request  for the current page view or object. Click-through : Number of  times a link was clicked  by a visitor. Click-through Rate : The number of  click-throughs for a specific link  divided by the number of times that link was viewed. Page Views per Visit : The  number of page views  in a reporting period divided by number of visits in the same reporting period. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Content Characterization Page Exit Ratio : Number of  exits  from a page divided by total number of page views of that page Single Page Visits : Visits that  consist of one page  regardless of the number of times the page was viewed. Single Page View Visits (Bounces) : Visits that  consist of one page-view . Bounce Rate :  Single page view visits  divided by entry pages. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Conversion Metrics Event : Any logged or recorded  action  that has a specific date and time assigned to it by either the browser or server Conversion : A visitor  completing  a target action Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Translating these metrics Translating  these  metrics  into meaningful and accurate knowledge is not always easy. Real world example  –  the hotel problem  ( excellent illustration of the importance of proper period selection )
The hotel Use Daily Uniques Sam Ted Jane Sam Scott Jane Sam Ara Sam Chi Sam Tom Sam Yen Sam Tim Jane Jane Jane Jane Jane Rooms 1  2  3 Days 1  2  3  4  5  6  7 3 3 3 3 3 3 3 Total Daily Uniques = 21 Use Weekly Uniques 1 1 Count Count 7 Total Weekly Uniques = 9
Bottom line: the time qualifier matters! So,  can’t  just  add   daily uniques  to get  weekly uniques Have to  scrub  the data This just one example of many issues that one can face when digging into the data in order to get meaningful  web analytics data !
50 minutes = Can’t Cover Everything … some starting points for further reading
Research Work (mine) Book: Jansen, B. J., Spink, A., and Taksa, I. (2009)  Handbook of Research on Web Log Analysis , Hershey, PA: Idea Group Publishing First chapter on theory of log analysis is free!   Lecture: Jansen, B. J. (2009)  Understanding User – Web Interactions via Web Analytics . Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA. manuscript about Web Analytics, soup to nuts companion website:  http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html
Research Work (mine) Article: Jansen, B. J. 2006.  Search log analysis: What is it; what's been done; how to do it .  Library and Information Science Research, 28(3), 407-432 .
Great ‘how to books’ for web analytics Web Analytics: An Hour a Day  by Avinash Kaushik (Jun 5, 2007)  Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity  by Avinash Kaushik (Oct 2009) Advanced Web Metrics with Google Analytics , 2nd Edition by Brian Clifton (Mar 15, 2010)  Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business  by Eric Peterson (Mar 2004)
Thanks! (welcome questions / discussion!) Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
Before we end …
Follow-on Discussion Happy  to  chat  with anyone (get with me either today or contact me via email)  Email  [email_address] LinkedIn  http://www.linkedin.com/in/jjansen Twitter  jimjansen
Again, thanks! Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University

More Related Content

PDF
Data and Knowledge as Commodities
PPT
Data management plans
PPT
Data management plans (dmp) for nsf
PPTX
Data, Responsibly: The Next Decade of Data Science
PPTX
Science Data, Responsibly
PPTX
Data Science, Data Curation, and Human-Data Interaction
PPT
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
PDF
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Data and Knowledge as Commodities
Data management plans
Data management plans (dmp) for nsf
Data, Responsibly: The Next Decade of Data Science
Science Data, Responsibly
Data Science, Data Curation, and Human-Data Interaction
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...

What's hot (20)

PDF
Slides | Research data literacy and the library
PPTX
The Rensselaer IDEA: Data Exploration
PPTX
Machines are people too
PPT
In search of lost knowledge: joining the dots with Linked Data
PPT
How to Execute A Research Paper
PDF
Slides | Targeting the librarian’s role in research services
PDF
Data Science and What It Means to Library and Information Science
PDF
OpenML data@Sheffield
PPTX
The Roots: Linked data and the foundations of successful Agriculture Data
PPTX
Tragedy of the (Data) Commons
PPTX
Data Analytics
PPT
Linking Data to Publications through Citation and Virtual Archives
PPTX
No Free Lunch: Metadata in the life sciences
PDF
accelerating-data-driven
PPTX
Emerging Data Citation Infrastructure
PPTX
Data, Data Everywhere: What's A Publisher to Do?
PDF
Research Metadata Mechanics - Simon Porter
PPTX
Linked data presentation for libraries (COMO)
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
Prov-O-Viz: Interactive Provenance Visualization
Slides | Research data literacy and the library
The Rensselaer IDEA: Data Exploration
Machines are people too
In search of lost knowledge: joining the dots with Linked Data
How to Execute A Research Paper
Slides | Targeting the librarian’s role in research services
Data Science and What It Means to Library and Information Science
OpenML data@Sheffield
The Roots: Linked data and the foundations of successful Agriculture Data
Tragedy of the (Data) Commons
Data Analytics
Linking Data to Publications through Citation and Virtual Archives
No Free Lunch: Metadata in the life sciences
accelerating-data-driven
Emerging Data Citation Infrastructure
Data, Data Everywhere: What's A Publisher to Do?
Research Metadata Mechanics - Simon Porter
Linked data presentation for libraries (COMO)
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Prov-O-Viz: Interactive Provenance Visualization
Ad

Viewers also liked (20)

PPTX
Optimize Oracle On VMware (Sep 2011)
PDF
AGLSP Conference Presentation
PDF
Artist training refugee class social media for musicians
PPTX
Smart Water & Sewer Systems: The Future of Utilities
PDF
Supplemental Info: Say on Pay and Dodd Frank 20100723
PDF
Performensation Blog Articles Jan - June 2011
PPTX
Pró-Labore - Como aumentar o seu
PDF
Asset Management for Small Systems - AWWA Conference
PDF
Rosa Et Al. 2010
KEY
Building a Community Around your Blog
ODP
Digital Learners
PPTX
High Performance Plsql
PPT
WordPress Development Confoo 2010
PDF
Green Stormwater: LID with GIS
DOC
Cv L.S.Bhandary Eng
PDF
Conectores_Slides
PPTX
The Top 4 risks in P4P (Pay for Performance) 20120611
DOC
Krishnan V Resume2
PPT
Imagine Cup 2009
PDF
Wastewater Treatment Systems-Public And Private
Optimize Oracle On VMware (Sep 2011)
AGLSP Conference Presentation
Artist training refugee class social media for musicians
Smart Water & Sewer Systems: The Future of Utilities
Supplemental Info: Say on Pay and Dodd Frank 20100723
Performensation Blog Articles Jan - June 2011
Pró-Labore - Como aumentar o seu
Asset Management for Small Systems - AWWA Conference
Rosa Et Al. 2010
Building a Community Around your Blog
Digital Learners
High Performance Plsql
WordPress Development Confoo 2010
Green Stormwater: LID with GIS
Cv L.S.Bhandary Eng
Conectores_Slides
The Top 4 risks in P4P (Pay for Performance) 20120611
Krishnan V Resume2
Imagine Cup 2009
Wastewater Treatment Systems-Public And Private
Ad

Similar to Web analytics webinar (20)

PPT
What Is Log Analyis
PPTX
UCIAD overview
PPTX
Introduction to data science
PDF
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
PPT
Artificial Intelligence and the Internet
PPT
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
PPT
Acting as Advocate? Seven steps for libraries in the data decade
PPTX
Data Science topic and introduction to basic concepts involving data manageme...
PPTX
Week-1-Introduction to Data Mining.pptx
PPT
Getting the Most Out of Your E-Resources: Measuring Success
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
PPTX
Information entanglement
PPTX
Combining analytics and user research
PDF
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
PPT
Business research (1)
PDF
Advantages And Disadvantages Of Chronic Kidney Disease
PPT
business-research.ppt
PDF
Jonathan Breeze, Symplectic
PDF
BLC & Digital Science: Jonathan Breeze, Symplectic
What Is Log Analyis
UCIAD overview
Introduction to data science
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Artificial Intelligence and the Internet
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
Acting as Advocate? Seven steps for libraries in the data decade
Data Science topic and introduction to basic concepts involving data manageme...
Week-1-Introduction to Data Mining.pptx
Getting the Most Out of Your E-Resources: Measuring Success
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
Information entanglement
Combining analytics and user research
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
Business research (1)
Advantages And Disadvantages Of Chronic Kidney Disease
business-research.ppt
Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, Symplectic

More from Jim Jansen (15)

PPTX
Networked Consumers: How networked and how important?
PPT
Web analytics presentation
PPTX
Jjansen networked consumer_2011
PPT
Twitter and EWOM Branding
PPT
Lesson_04_ist402_google_adwords_02
PPT
Lesson 15 When Where To Show Your Ads
PPT
Lesson 13 Writing Good Ads 02
PPT
Lesson 11 Writing Good Ads
PPT
Lesson 07 Ist402 Keywords Take 02
PPT
Lesson 06 Ist402 Keywords 02
PPT
Lesson 05 Three Course Requirements
PPT
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
PPT
Ist402 Google Marketing Challenge V02
PPT
The Use of Query Reformulation to Predict Future User Actions
PPT
Profiling a Person With Search Log Data
Networked Consumers: How networked and how important?
Web analytics presentation
Jjansen networked consumer_2011
Twitter and EWOM Branding
Lesson_04_ist402_google_adwords_02
Lesson 15 When Where To Show Your Ads
Lesson 13 Writing Good Ads 02
Lesson 11 Writing Good Ads
Lesson 07 Ist402 Keywords Take 02
Lesson 06 Ist402 Keywords 02
Lesson 05 Three Course Requirements
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
Ist402 Google Marketing Challenge V02
The Use of Query Reformulation to Predict Future User Actions
Profiling a Person With Search Log Data

Recently uploaded (20)

PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
IFRS Notes in your pocket for study all the time
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
Cours de Système d'information about ERP.pdf
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PDF
How to Get Funding for Your Trucking Business
DOCX
Business Management - unit 1 and 2
PPTX
2025 Product Deck V1.0.pptxCATALOGTCLCIA
PPTX
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PDF
How to Get Business Funding for Small Business Fast
PDF
Nidhal Samdaie CV - International Business Consultant
PPTX
HR Introduction Slide (1).pptx on hr intro
PPTX
Lecture (1)-Introduction.pptx business communication
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
IFRS Notes in your pocket for study all the time
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
Belch_12e_PPT_Ch18_Accessible_university.pptx
Cours de Système d'information about ERP.pdf
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
How to Get Funding for Your Trucking Business
Business Management - unit 1 and 2
2025 Product Deck V1.0.pptxCATALOGTCLCIA
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
ICG2025_ICG 6th steering committee 30-8-24.pptx
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
How to Get Business Funding for Small Business Fast
Nidhal Samdaie CV - International Business Consultant
HR Introduction Slide (1).pptx on hr intro
Lecture (1)-Introduction.pptx business communication
unit 1 COST ACCOUNTING AND COST SHEET
Unit 1 Cost Accounting - Cost sheet
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi

Web analytics webinar

  • 1. Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  • 2. Who is Jim Jansen? Associate professor at College of Information Sciences and Technology, The Pennsylvania State University , USA Senior Fellow at the Pew Research Center (Pew Internet and American Life Project) - http://www.pewinternet.org Active research and teaching efforts - http://ist.psu.edu/faculty_pages/jjansen/ Several funded and non-funded research project Teach several courses, including keyword advertising Forthcoming book, Understanding Sponsored Search (Cambridge) … theory of keyword advertising Editor of journal, Internet Research (Emerald) Book, Understanding User-Web Interactions via Web Analytics (Morgan & Claypool) - basics of web analytics
  • 3. Let talk web analytics ! We’ll discuss: context theory application Begin by setting the stage … what are we facing ?
  • 4. Moving too ‘ everything ’ recorded and indexed A lot global but much will remain local Search (along with data summarization, trend detection, information and knowledge extraction and discovery) is foundational technology Raises issues, including: Infrastructure requirements. How and who pays? Changes the nature of privacy and anonymity As publishers or providers, how do we make sense of how people are using this data? --- Web analytics Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  • 5. How much is a Zettabyte?
  • 6. The volume of data is exploding ( information growth ) The complexity of data is growing ( information architecture ) The users have less time ( attention economy ) The user expects improved features ( technological sophistication ) Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  • 7. Web analytics can help us … Deal with the volume of data ( information growth ) Understand the growing complexity of data ( information architecture ) Address users’ less time ( attention economy ) Lead to improved features ( technological sophistication ) expected by users How does web analytics do this?
  • 8. Thousand years ago: science was mainly naturalistic describing natural phenomena Last few hundred years: theoretical branch using models, generalizations Last few decades: a computational branch simulating complex phenomena Today: data exploration (eScience) unifying theory, experiment, and simulation Data captured by sensors, instruments, or generated by simulator Processed by humans and software Information / knowledge stored in computer Analyzes database / collection content using data management and statistics Network and Web Science Data  Information  Knowledge This is the realm of Web analytics!
  • 9. What is web analytics? The Web Analytics Association (WAA) defines Web analytics as: the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage ( http://www.webanalyticsassociation.org/ ) Shares common theoretical and methodology characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)
  • 10. Let’s break that definition down … Collection - accumulate and store over a period of time Internet data - internet facts and statistics collected together for reference or analysis Measurement – ascertain the size, amount, or degree of something by using an instrument or device Analysis - examine methodically the structure of information for purposes of explanation and interpretation. Reporting - giving a spoken or written account of something that one has investigated. Understanding - perceive the significance, explanation, or cause of something Optimizing - make the best or most effective use of a resource Web usage – employ or deploy something as a means of accomplishing a purpose or achieving a result Data Information Knowledge
  • 11. How is the data collected?
  • 12. W3C Extended Log Format -Variety of fields for examining visitors to Web sites. Other common format is NCSA Separate Log that is composed of three logs Common log – actions on the server, Referral log – where they came from, and Agent log – stuff about the client computer Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log. W3C Extended Log Format
  • 13. Okay, that’s collection ? What about analysis and reporting ?
  • 14. Variety of tools help make sense of this log data
  • 15. With that context , let’s look at the foundations aspects …
  • 16. Theoretical Foundations Web analytics is based on the behaviorism paradigm Behaviorism – an approach focused on the outward behavioral aspects of thought and emphases the observed behaviors Behaviorism – Pavlov, Watson, and Skinner Burrhus Frederic Skinner John B. Watson Ivan Petrovich Pavlov
  • 17. Behaviorism Characteristics Inductive , data-driven and characterized by empirical observation of measurable behavior Grounded on somebody doing something in a situation ( all the environmental and situational features are embedded behaviors) Critics of behaviorism as a psychological theory have issues with rejection of mental processes . I agree - people are more than “ mediators between behavior and the environment ” (Skinner, 1993, p 428) (c.f.c., social learning theory) … however, don’t throw out the baby with the bath water
  • 18. What is a Behavior? … an observable activity of a person, animal, team, organization, or system. One can classify behaviors into three general categories. Behaviors are something that one can detect and record actions or specific goal-driven events with some purpose other than the specific action that is observable reactive responses to environmental stimuli
  • 19. What is a Behavior? Behavior is the essential construct of the behaviorism and of web analytics Logs record behaviors of users and systems (records behavior but can’t tell affective , cognitive , or situational aspects .. yet, but we’re working on it! ) A behavior is the key variable (i.e., an entity representing a set of events where each event may have a different value )
  • 20. can view the data collected in log files as trace data people conducting the activities of their daily lives many times create things, create marks, induce wear, or reduce some existing material within the confines of research, these things, marks, and wear become data classically, trace data are the physical remains of people’s interaction Data Collection: Trace Data Wear on a carpet Trash heap Surfing web
  • 21. Trace Data In the past, trace data was often time consuming to gather and process, making such data costly. logging software makes collecting trace data on the Internet easy and cheap Log data is controlled accretion data , where the researcher or some other entity alters the environment in order to create the accretion data With the user of client apps (such as desktop search bars), the collection of data is nearly unlimited from a technology perspective What is cool about trace data for researchers?
  • 22. Data Collection Log data/trace data has significant advantages as a data collection approach for the study and investigation of behaviors, including: Scale : not a limiting factor as in lab user studies Power : large sample size for inference testing; in fact, so large must account for the size effect Scope : naturalistic; researchers can investigate range of interactions in a multi-variable context Location : can collect in distributed environments Duration : collect log data over an extended period
  • 23. Methodological Foundations Use of logs to collect trace data is an unobtrusive methods (a.k.a., non-reactive or low-constraint). Unobtrusive methods … allows data collection without directly interfering into the context and, does not require a direct response from participants Customer Behavior (video) Chemistry (surface marking)
  • 24. Methodological Foundations Three justifications for unobtrusive methods: Uncertainty principle : researchers interjected into an environment become part of the system Observer effect : difference that is made to an activity or a person’s behaviors by being observed Observer bias : observers overemphasize behavior they expect to find and fail to notice behavior they do not expect Trace data helps in overcoming the Uncertainty principle , Observer effect , and Observer bias in the data collection. Note: Observer bias for data collection but not data analysis Example: ethnography studies (where the researcher “bird dogs” a study participant Example: no one searches for porn in a lab study of Web searching Example: is why medical trials are double blind rather than single blind
  • 25. Methodological Foundations Inherent characteristics in the method of log data collection; Web analytics has issues to address as a result: Abstraction – how does one relate low-level data to higher-level concepts? Selection – how does one separate the necessary from unnecessary data? Reduction – how does one reduce the complexity and size of the data set? Context – how does one interpret the significance of events? Evolution – how can one collect data without impacting application deployment or use?
  • 26. Okay, nice but how to we apply it …
  • 27. Web analytics process Every consulting firm has a web analytics process … (which is fine) However, the effective ones all boil down to four essential steps
  • 28. Essential steps to any effective web analytics process Typically counts. Basically, data collection Examples: time stamp referral URL query term Typically ratios. Data becomes metrics. Counts and ratios infused with business strategy. Online goals, objectives, or standards for organization. Examples: time on page bounce rate unique visitors Examples: conversion rate average order value task completion rate Examples: save money make money marketshare Collection of data Processing of data into information Developing key performance indicators Formulating online strategy Drives Drives Drives Drives Drives Drives
  • 29. Three types ( plus 1 ) of Web analytics metrics Implementation Count — the most basic unit of measure; a single number. Ratio — typically, a count divided by a count , although a ratio can use either a count or a ratio in the numerator or denominator. KPI ( Key Performance Indicator ) — can be either a count or a ratio , it is frequently a ratio. A KPI is infused with business strategy , and therefore the set of appropriate KPIs typically differs between site and process types. Dimension - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically, not associated with a number . Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 30. Can be applied to three levels of granularity Aggregate — Total site traffic for a defined period of time. ( typically used for market comparisons ) Segmented — A subset of the site traffic for a defined period of time, filtered in some way to gain greater analytical insight. ( by developing personas and profiles in Google Analytics ). Individual — Activity of a single Web visitor for a defined period of time. ( excellent for persona developing and outlier analysis ) Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 31. Classifications of Metrics Building Block – foundational metrics Visit Characterization – metrics aimed at understanding visits, either single or aggregate Content Characterization – metrics aimed at understanding content or its use Conversion – metrics aimed at linking visits and content Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 32. Building Block Page : A page is an analyst definable unit of content . Page Views : The number of times a page was viewed . Visits/Sessions : A visit is an interaction by an individual, with a website consisting of one or more requests for a page . Unique Visitors : The number of inferred individual people , within a designated reporting timeframe, with activity consisting of one or more visits to a site. New Visitor : The number of Unique Visitors with activity including a first-ever Visit to a site during a reporting period Repeat Visitor : The number of Unique Visitors with activity consisting of two or more Visits to a site during a reporting period. Return Visitor : The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the Unique Visitor also Visited the site prior to the reporting period Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 33. Visit Characteristics Entry Page : The first page of a visit. Landing Page : A page intended to identify the beginning of the user experience . Exit Page : The last page on a site accessed during a visit, signifying the end of a visit/session. Visit Duration : The length of time in a session. Referrer : The referrer is the page URL that originally generated the request for the current page view or object. Click-through : Number of times a link was clicked by a visitor. Click-through Rate : The number of click-throughs for a specific link divided by the number of times that link was viewed. Page Views per Visit : The number of page views in a reporting period divided by number of visits in the same reporting period. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 34. Content Characterization Page Exit Ratio : Number of exits from a page divided by total number of page views of that page Single Page Visits : Visits that consist of one page regardless of the number of times the page was viewed. Single Page View Visits (Bounces) : Visits that consist of one page-view . Bounce Rate : Single page view visits divided by entry pages. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 35. Conversion Metrics Event : Any logged or recorded action that has a specific date and time assigned to it by either the browser or server Conversion : A visitor completing a target action Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 36. Translating these metrics Translating these metrics into meaningful and accurate knowledge is not always easy. Real world example – the hotel problem ( excellent illustration of the importance of proper period selection )
  • 37. The hotel Use Daily Uniques Sam Ted Jane Sam Scott Jane Sam Ara Sam Chi Sam Tom Sam Yen Sam Tim Jane Jane Jane Jane Jane Rooms 1 2 3 Days 1 2 3 4 5 6 7 3 3 3 3 3 3 3 Total Daily Uniques = 21 Use Weekly Uniques 1 1 Count Count 7 Total Weekly Uniques = 9
  • 38. Bottom line: the time qualifier matters! So, can’t just add daily uniques to get weekly uniques Have to scrub the data This just one example of many issues that one can face when digging into the data in order to get meaningful web analytics data !
  • 39. 50 minutes = Can’t Cover Everything … some starting points for further reading
  • 40. Research Work (mine) Book: Jansen, B. J., Spink, A., and Taksa, I. (2009) Handbook of Research on Web Log Analysis , Hershey, PA: Idea Group Publishing First chapter on theory of log analysis is free! Lecture: Jansen, B. J. (2009) Understanding User – Web Interactions via Web Analytics . Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA. manuscript about Web Analytics, soup to nuts companion website: http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html
  • 41. Research Work (mine) Article: Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it . Library and Information Science Research, 28(3), 407-432 .
  • 42. Great ‘how to books’ for web analytics Web Analytics: An Hour a Day by Avinash Kaushik (Jun 5, 2007) Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity by Avinash Kaushik (Oct 2009) Advanced Web Metrics with Google Analytics , 2nd Edition by Brian Clifton (Mar 15, 2010) Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business by Eric Peterson (Mar 2004)
  • 43. Thanks! (welcome questions / discussion!) Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  • 45. Follow-on Discussion Happy to chat with anyone (get with me either today or contact me via email) Email [email_address] LinkedIn http://www.linkedin.com/in/jjansen Twitter jimjansen
  • 46. Again, thanks! Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University