SlideShare a Scribd company logo
Web 2.0   blog, wiki, tag, social network: what are they, how to use them and why they are important Lesson 8 : the Google world
This material is distributed under the Creative Commons "Attribution - NonCommercial - Share Alike - 3.0", available at  http://creativecommons.org/licenses/by-nc-sa/3.0/   . Part of the slides is the result of a welcome distance collaboration with prof. Roberto Polillo, University Milan Bicocca ( http://www.rpolillo.it )
Google: searching Each search engine has three main components: - Crawler - Database - Interface and query software The crawler is a software program which surfs the net and brings the pages in the index. The crawler also takes note of the links it finds and uses them to gradually reach new pages with new links The index is a huge database where pages are stored with all metadata and where all the words are "reversed" by creating indexes / keys for each The interface receives the user's request, try to interpret it and passes the request to the "query processor" that works on the index
Google: searching search engine schema http://en.wikipedia.org/wiki/Search_engine
Google: searching The searches are usually very short: 20% use a word, almost 50% is composed of two or three words, only 5% more than six words Also the "searches" are distributed according to a "long tail" curve, approximately 50% of daily searches are unique. Do you know GoogleWhacking? About 90% of users use the first four engines: G Y AOL and Bing (G> 50%) The traffic on search engines has two peaks in the morning (in the office) and one in the evening (once returned home). The approx cost of acquiring a customer ranges from $ 70  mail advertising, online advertising to $ 50, $ 20 of the yellow pages up to $ 8 (!)for links related
Google: “old” searching First search engines:  Archie 1990 (ftp command line query) Veronica Gopher 1993 (search only documents title) WebCrawler 1994, the first to index the text of the pages. First  good  search engine: AltaVista (1995), born in DEC laboratories; thanks to Alpha 64bit processor it could launch a thousand crawler simultaneously. AltaVista answered the first year to 4 billion searches! Sold to Compaq, AltaVista was transformed into a portal  Yahoo! Born as "David's and Jerry's Guide to the WWW" with a directory approach (see archive.org), a great success thanks to the link with Netscape. Yahoo! used its own directory service and for the search it used outboard engine: OpenText, AltaVista, then Inktomi and Google. 2009: Yahoo! and Microsoft Bing http://ppcblog.com/search-history/   http://www.searchenginehistory.com/   http://performancing.com/search-engine-history/
Google: born Brin and Page studied at Stanford and Page had the degree thesis on “the Web as a graph” with Terry Winograd. The project BackRub (1995) was a system to find links on the Web, store and republishing them for analysis to see which pages pointing to a  Then (1994)  given page. In 1996 BackRub began to index the Web and, through the interpretation of graphs, also to assess the relative importance of sites. So was born the basic concept of  Page Rank algorithm, that takes into account both the number of links a site receives and the number of links to each of the sites linked to the first. In 1998 Brin and Page released the features of PageRank in paper "The Anatomy of a large-scale hypertextual Web search engine" and founded Google Inc. based in classic garage.
Google: the algorithm The secret of Google success is in the algorithm, obviously covered by secret, even if the network you can find its most important features A SEO expert has developed the “Randfish theorem"  http://www.seomoz.org/  in which an hypothesis is presented about the Google scoring method (Keywords used * 0.3) + (Domain revelance * 0.25) + (Links in input * 0.25) + (User data * 0.1) + (Content Quality * 0.1) + (Manual push) - (Penalty automatic & manual) = Google Score
Google:  the algorithm Factors in the keywords use : * Keywords in title tag * Keywords in header tags * Keywords in the document text * Keywords in internal links pointing to page * Keywords in domain name and / or URL
Google: the algorithm Domain relevance: * History of registration * Domain “age” * Importance of links pointing to the domain * Domain relevance on the subject, based on incoming and outgoing links  * Links historical use & patterns to the domain Score of incoming links: * Links “age” * Quality of domains that send the link * Quality of pages sending the link * Links text * Assessment of quantity / weight of the links (PageRank) * Relevance of pages sending the link
Google: the algorithm User data: * All-time percentage of clicks (CTR) on the results page of search engines * Time spent by users on the page * Number of searches for URL / domain name * History of visits / usage of the URL / domain name that Google users can monitor (toolbar, wifi, analytics, etc.) Content quality: * Potentially given by hand for searches and the most popular pages * Provided by Google internal evaluators  * Automated algorithms to assess the text (quality, readability, etc.)
Google: the algorithm The original patent (1998) U.s Patent file # 6,285,999 ; METHOD FOR NODE RANKING IN A LINKED DATABASE A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality.  Inventor: Page; Lawrence (Stanford, CA) Assignee: The Board of Trustees of the Leland Stanford Junior University (Stanford, CA)
Google: the algorithm The simplified formula  http://en.wikipedia.org/wiki/PageRank   Where: * PR[A] is PageRank value for A page * PR[B] ... PR[n] are PageRank values for pages B ... n linking to A  * L[B] ... L[n] is the total numer of links in pages B ... n  * d (damping factor) is the probability that an imaginary surfer who is randomly clicking on links will go on clicking. it is generally assumed that the damping factor will be set around 0.85. It represents the PageRank percentage passing from one page to another
Google: the algorithm  PageRank in detail (from  www.google.com/corporate/tech.html   ) PageRank  reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page's importance.
Google: the algorithm Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.
Other links about search engines http://docs.google.com/View?id=dfvwdtqp_1c8x6bmd8 \  https://docs.google.com/present/view?id=dfvwdtqp_31dqxqk8g9&ndplr=1   https://docs.google.com/present/view?hl=en&id=dfvwdtqp_35hq27gfhk   http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1
Google The Google search-engine is now the most important access point to the network http://gs.statcounter.com/#search_engine-ww-monthly-200807-201104   Search on Google, or  to google , is now part of common language.  You don't know? Ask Google! Now many services offered by Google (BigG!) : a  big part of Web 2.0 world now belongs to Google: YouTube, Google Earth / Maps / Calendar / Reader, ... and now Google went in browser market with Chrome and in mobile market with Android  http://en.wikipedia.org/wiki/Usage_share_of_web_browsers   http://blog.nielsen.com/nielsenwire/online_mobile/who-is-winning-the-u-s-smartphone-battle/ http://blog.nielsen.com/nielsenwire/consumer/more-us-consumers-choosing-smartphones-as-apple-closes-the-gap-on-android/
Google Dance Google periodically updates engine algorithms to penalize what it considers spam by specialists SEM / SEO (Search Engine Marketing / Optimization): the position index is so important that many websites are written containing only links to "climb" the sites that pay There is no doubt that these attacks continue against spamming trade also serves to "push" services AdWords advertising.  Other frauds are possible with AdSense, where site owners earn from clicks on sponsored links on their sites; sometimes robot programs are used, sometimes workers offshore to click on the links and gain (an estimated 30% of advertising budgets so go missing) AdSense has helped to create the long tail of advertising, bringing hundreds of thousands of businesses to advertise and thousands of sites offering it. https://www.google.com/adsense/static/en/Publishertools.html
Google In 2007 Big Brother Award Italy has awarded Google the dubious prize of "most invasive technology”, motivating the decision this way: "Brin, one of the founders of Google likes to say its employees "Do not Be Evil" and this became the company slogan. The admiration for Google and  his services and its success as a company can not hide the fact that every search, every e-mail, post on Google Groups is recorded and analyzed, even if anonymous, and all the analysis head on the profiling of the navigator. Google, given the size, is the entity in the world potentially more threatening to privacy. With the recent purchase of DoubleClick.com giant of advertising and online profiling, which enlarges the potential data mining of Google, it seems that the motto could now become "Do not Be Evil, buy the Devil." http://en.wikipedia.org/wiki/Criticism_of_Google
Google AdWords AdWords (introduced in 2000) is the main advertising from Google, and the main source of revenue (> $ 28 billion in 2010) Advertisers specify the search words that bring their ads on the right of the results page of search engine ("sponsored links") The advertiser pays when the user clicks on the ad (Pay Per Click) and the price per click is determined by complex rules  The service is managed online: the software makes all the work (negotiations, sales, execution)  http://en.wikipedia.org/wiki/AdWords http://adwords.google.com   http://investor.google.com/financial/tables.html  from advertising a big part of income
Google AdWords
Google AdWords top queries covers only 3% of total -> long tail http://bnoopy.typepad.com/bnoopy/2005/03/the_long_tail_o.html see Google AdWords Intro.odp
Google AdSense With this service, Google "administer" advertising space on the web pages of the sites customers Google places ads in the web pages, according to criteria of semantic correlation with pages of the host site The host site is paid "per click" AdSense has brought hundreds of thousands of small businesses to advertise and offer it to thousands of sites Google currently shares 68% of revenues generated by AdSense with content network partners. http://en.wikipedia.org/wiki/AdSense
Google AdSense R.Polillo - Ottobre 2010
Google Operating Systems Android : open-source platform Linux-based for mobile device application developments Google Chrome OS : netbooks/notebooks platform “ Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year we will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010. (...) Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year. The software architecture is simple — Google Chrome running within a new windowing system on top of a Linux kernel.” http://getchrome.eu/index.php
Google Operating Systems Android :  see Android.ppt  http://www.android.com/about/   Google Chrome OS : first systems in 2011 http://www.google.com/chromeos/features.html   http://www.chromium.org/chromium-os   http://www.chromium.org/chromium-os/chromiumos-design-docs/software-architecture
Google  tricks Google tells what information is collected when using the search engine and what is done to protect the privacy of users: http://www.youtube.com/watch?v=iPkvNr2cpqg   http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf   Search in the blogs:  http://blogsearch.google.it/   Search history  http://www.google.com/history   Sites comparison:  http://www.google.com/insights/search/ #  Other:  http://www.google.com/intl/en/options/   and  http://labs.google.com/
exercise 8 Shortest GoogleWhacking (one or two words)

More Related Content

PDF
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
PDF
Search Engine Optimization - Aykut Aslantaş
PDF
Search engine and web crawler
PPTX
Web crawler with seo analysis
PPT
Understanding Seo At A Glance
PPTX
Lvr ppt
ZIP
Facebook ( Open ) Graph and the Semantic Web
PPS
Google Search Presentation
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Search Engine Optimization - Aykut Aslantaş
Search engine and web crawler
Web crawler with seo analysis
Understanding Seo At A Glance
Lvr ppt
Facebook ( Open ) Graph and the Semantic Web
Google Search Presentation

What's hot (20)

PPT
Lecture7
PPT
Internet Tutorial 04
 
PPT
Google
PDF
Accessing the deep web (2007)
PPT
Tutorial 4 - Information Resources on the Web
 
DOCX
Computer study lesson - Internet Search (25 Mar 2020)
PPTX
Programming Social Applications
PPT
Ranking Web Pages
PPTX
How search engine works and history of search engine
PPT
Working Of Search Engine
PPT
The Internet
PPTX
Beyond Google: Advanced Internet Search Tips and Tricks
PDF
Analysis of websites as graphs for SEO
PPTX
Consuming Linked Data 4/5 Semtech2011
PPTX
search engines
PPT
Internet Search Slideshow
PPTX
SMX Advanced 2012 - Catching up with the Semantic Web
Lecture7
Internet Tutorial 04
 
Google
Accessing the deep web (2007)
Tutorial 4 - Information Resources on the Web
 
Computer study lesson - Internet Search (25 Mar 2020)
Programming Social Applications
Ranking Web Pages
How search engine works and history of search engine
Working Of Search Engine
The Internet
Beyond Google: Advanced Internet Search Tips and Tricks
Analysis of websites as graphs for SEO
Consuming Linked Data 4/5 Semtech2011
search engines
Internet Search Slideshow
SMX Advanced 2012 - Catching up with the Semantic Web
Ad

Similar to Web2.0.2012 - lesson 8 - Google world (20)

PDF
Search Engine Google
PDF
PPT
Googling of GooGle
PPTX
Search engine
PPTX
Algorithms that changed the future
PPT
Web Search And Mining (Ntuim)
PPT
Google- manendra
PPT
Google Search Engine
PPT
Introduction To Search - SEO 101
PPTX
Search engine
PPT
Google ppt by amit
PPT
Search
PPTX
Internet search techniques by zakir hossain
PPTX
Lost in the Net: Navigating Search Engines
PDF
Getting Traffic From Google.pdf
PPT
Search Engine Optimization - David Goebel at eMarketing Techniques
PPTX
Google history nd architecture
PPT
Google
PPT
Super searcher2012septemberyoungkin
PPTX
Search Engine Optimization - Fundamentals - SEO
Search Engine Google
Googling of GooGle
Search engine
Algorithms that changed the future
Web Search And Mining (Ntuim)
Google- manendra
Google Search Engine
Introduction To Search - SEO 101
Search engine
Google ppt by amit
Search
Internet search techniques by zakir hossain
Lost in the Net: Navigating Search Engines
Getting Traffic From Google.pdf
Search Engine Optimization - David Goebel at eMarketing Techniques
Google history nd architecture
Google
Super searcher2012septemberyoungkin
Search Engine Optimization - Fundamentals - SEO
Ad

More from Carlo Vaccari (20)

ODP
HLG Big Data project and Sandbox
ODP
I Big Data e la Statistica: un progetto internazionale
PDF
Andrea Talamonti: CKAN a tool for Open Data
PDF
Fabrizio Allegretto: Open Data & University
PDF
Yapo Juares Tanguy: RSS environment
PDF
Matteo Marchionne: Foaf e feed reader
PPTX
Alex Haechler: China vs USA social networks
PDF
Carlo Colicchio: Big Data for business
PDF
Yves Studer: Big Data in practice
PPTX
Klevis Mino: MongoDB
PDF
Rando Veizi: Data warehouse and Pentaho suite
PPTX
Unkan Erol: Xing vs Linkedin
ODP
Big Data Conference Ottobre 2013
ODP
Big data analytics vaccari oct2013
PPTX
Serena Carota: Open Data nella Regione Marche
ODP
Introduzione ai Social network
PDF
Start up innovative
ODP
Social network ,ricerca di lavoro e ricerca scientifica
ODP
Social network and job searching and SN for researchers
PDF
Sharing Advisory Board newsletter #8
HLG Big Data project and Sandbox
I Big Data e la Statistica: un progetto internazionale
Andrea Talamonti: CKAN a tool for Open Data
Fabrizio Allegretto: Open Data & University
Yapo Juares Tanguy: RSS environment
Matteo Marchionne: Foaf e feed reader
Alex Haechler: China vs USA social networks
Carlo Colicchio: Big Data for business
Yves Studer: Big Data in practice
Klevis Mino: MongoDB
Rando Veizi: Data warehouse and Pentaho suite
Unkan Erol: Xing vs Linkedin
Big Data Conference Ottobre 2013
Big data analytics vaccari oct2013
Serena Carota: Open Data nella Regione Marche
Introduzione ai Social network
Start up innovative
Social network ,ricerca di lavoro e ricerca scientifica
Social network and job searching and SN for researchers
Sharing Advisory Board newsletter #8

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx

Web2.0.2012 - lesson 8 - Google world

  • 1. Web 2.0 blog, wiki, tag, social network: what are they, how to use them and why they are important Lesson 8 : the Google world
  • 2. This material is distributed under the Creative Commons "Attribution - NonCommercial - Share Alike - 3.0", available at http://creativecommons.org/licenses/by-nc-sa/3.0/ . Part of the slides is the result of a welcome distance collaboration with prof. Roberto Polillo, University Milan Bicocca ( http://www.rpolillo.it )
  • 3. Google: searching Each search engine has three main components: - Crawler - Database - Interface and query software The crawler is a software program which surfs the net and brings the pages in the index. The crawler also takes note of the links it finds and uses them to gradually reach new pages with new links The index is a huge database where pages are stored with all metadata and where all the words are "reversed" by creating indexes / keys for each The interface receives the user's request, try to interpret it and passes the request to the "query processor" that works on the index
  • 4. Google: searching search engine schema http://en.wikipedia.org/wiki/Search_engine
  • 5. Google: searching The searches are usually very short: 20% use a word, almost 50% is composed of two or three words, only 5% more than six words Also the "searches" are distributed according to a "long tail" curve, approximately 50% of daily searches are unique. Do you know GoogleWhacking? About 90% of users use the first four engines: G Y AOL and Bing (G> 50%) The traffic on search engines has two peaks in the morning (in the office) and one in the evening (once returned home). The approx cost of acquiring a customer ranges from $ 70 mail advertising, online advertising to $ 50, $ 20 of the yellow pages up to $ 8 (!)for links related
  • 6. Google: “old” searching First search engines: Archie 1990 (ftp command line query) Veronica Gopher 1993 (search only documents title) WebCrawler 1994, the first to index the text of the pages. First good search engine: AltaVista (1995), born in DEC laboratories; thanks to Alpha 64bit processor it could launch a thousand crawler simultaneously. AltaVista answered the first year to 4 billion searches! Sold to Compaq, AltaVista was transformed into a portal Yahoo! Born as "David's and Jerry's Guide to the WWW" with a directory approach (see archive.org), a great success thanks to the link with Netscape. Yahoo! used its own directory service and for the search it used outboard engine: OpenText, AltaVista, then Inktomi and Google. 2009: Yahoo! and Microsoft Bing http://ppcblog.com/search-history/ http://www.searchenginehistory.com/ http://performancing.com/search-engine-history/
  • 7. Google: born Brin and Page studied at Stanford and Page had the degree thesis on “the Web as a graph” with Terry Winograd. The project BackRub (1995) was a system to find links on the Web, store and republishing them for analysis to see which pages pointing to a Then (1994) given page. In 1996 BackRub began to index the Web and, through the interpretation of graphs, also to assess the relative importance of sites. So was born the basic concept of Page Rank algorithm, that takes into account both the number of links a site receives and the number of links to each of the sites linked to the first. In 1998 Brin and Page released the features of PageRank in paper "The Anatomy of a large-scale hypertextual Web search engine" and founded Google Inc. based in classic garage.
  • 8. Google: the algorithm The secret of Google success is in the algorithm, obviously covered by secret, even if the network you can find its most important features A SEO expert has developed the “Randfish theorem" http://www.seomoz.org/ in which an hypothesis is presented about the Google scoring method (Keywords used * 0.3) + (Domain revelance * 0.25) + (Links in input * 0.25) + (User data * 0.1) + (Content Quality * 0.1) + (Manual push) - (Penalty automatic & manual) = Google Score
  • 9. Google: the algorithm Factors in the keywords use : * Keywords in title tag * Keywords in header tags * Keywords in the document text * Keywords in internal links pointing to page * Keywords in domain name and / or URL
  • 10. Google: the algorithm Domain relevance: * History of registration * Domain “age” * Importance of links pointing to the domain * Domain relevance on the subject, based on incoming and outgoing links * Links historical use & patterns to the domain Score of incoming links: * Links “age” * Quality of domains that send the link * Quality of pages sending the link * Links text * Assessment of quantity / weight of the links (PageRank) * Relevance of pages sending the link
  • 11. Google: the algorithm User data: * All-time percentage of clicks (CTR) on the results page of search engines * Time spent by users on the page * Number of searches for URL / domain name * History of visits / usage of the URL / domain name that Google users can monitor (toolbar, wifi, analytics, etc.) Content quality: * Potentially given by hand for searches and the most popular pages * Provided by Google internal evaluators * Automated algorithms to assess the text (quality, readability, etc.)
  • 12. Google: the algorithm The original patent (1998) U.s Patent file # 6,285,999 ; METHOD FOR NODE RANKING IN A LINKED DATABASE A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality. Inventor: Page; Lawrence (Stanford, CA) Assignee: The Board of Trustees of the Leland Stanford Junior University (Stanford, CA)
  • 13. Google: the algorithm The simplified formula http://en.wikipedia.org/wiki/PageRank Where: * PR[A] is PageRank value for A page * PR[B] ... PR[n] are PageRank values for pages B ... n linking to A * L[B] ... L[n] is the total numer of links in pages B ... n * d (damping factor) is the probability that an imaginary surfer who is randomly clicking on links will go on clicking. it is generally assumed that the damping factor will be set around 0.85. It represents the PageRank percentage passing from one page to another
  • 14. Google: the algorithm PageRank in detail (from www.google.com/corporate/tech.html ) PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page's importance.
  • 15. Google: the algorithm Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.
  • 16. Other links about search engines http://docs.google.com/View?id=dfvwdtqp_1c8x6bmd8 \ https://docs.google.com/present/view?id=dfvwdtqp_31dqxqk8g9&ndplr=1 https://docs.google.com/present/view?hl=en&id=dfvwdtqp_35hq27gfhk http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1
  • 17. Google The Google search-engine is now the most important access point to the network http://gs.statcounter.com/#search_engine-ww-monthly-200807-201104 Search on Google, or to google , is now part of common language. You don't know? Ask Google! Now many services offered by Google (BigG!) : a big part of Web 2.0 world now belongs to Google: YouTube, Google Earth / Maps / Calendar / Reader, ... and now Google went in browser market with Chrome and in mobile market with Android http://en.wikipedia.org/wiki/Usage_share_of_web_browsers http://blog.nielsen.com/nielsenwire/online_mobile/who-is-winning-the-u-s-smartphone-battle/ http://blog.nielsen.com/nielsenwire/consumer/more-us-consumers-choosing-smartphones-as-apple-closes-the-gap-on-android/
  • 18. Google Dance Google periodically updates engine algorithms to penalize what it considers spam by specialists SEM / SEO (Search Engine Marketing / Optimization): the position index is so important that many websites are written containing only links to "climb" the sites that pay There is no doubt that these attacks continue against spamming trade also serves to "push" services AdWords advertising. Other frauds are possible with AdSense, where site owners earn from clicks on sponsored links on their sites; sometimes robot programs are used, sometimes workers offshore to click on the links and gain (an estimated 30% of advertising budgets so go missing) AdSense has helped to create the long tail of advertising, bringing hundreds of thousands of businesses to advertise and thousands of sites offering it. https://www.google.com/adsense/static/en/Publishertools.html
  • 19. Google In 2007 Big Brother Award Italy has awarded Google the dubious prize of "most invasive technology”, motivating the decision this way: "Brin, one of the founders of Google likes to say its employees "Do not Be Evil" and this became the company slogan. The admiration for Google and his services and its success as a company can not hide the fact that every search, every e-mail, post on Google Groups is recorded and analyzed, even if anonymous, and all the analysis head on the profiling of the navigator. Google, given the size, is the entity in the world potentially more threatening to privacy. With the recent purchase of DoubleClick.com giant of advertising and online profiling, which enlarges the potential data mining of Google, it seems that the motto could now become "Do not Be Evil, buy the Devil." http://en.wikipedia.org/wiki/Criticism_of_Google
  • 20. Google AdWords AdWords (introduced in 2000) is the main advertising from Google, and the main source of revenue (> $ 28 billion in 2010) Advertisers specify the search words that bring their ads on the right of the results page of search engine ("sponsored links") The advertiser pays when the user clicks on the ad (Pay Per Click) and the price per click is determined by complex rules The service is managed online: the software makes all the work (negotiations, sales, execution) http://en.wikipedia.org/wiki/AdWords http://adwords.google.com http://investor.google.com/financial/tables.html from advertising a big part of income
  • 22. Google AdWords top queries covers only 3% of total -> long tail http://bnoopy.typepad.com/bnoopy/2005/03/the_long_tail_o.html see Google AdWords Intro.odp
  • 23. Google AdSense With this service, Google "administer" advertising space on the web pages of the sites customers Google places ads in the web pages, according to criteria of semantic correlation with pages of the host site The host site is paid "per click" AdSense has brought hundreds of thousands of small businesses to advertise and offer it to thousands of sites Google currently shares 68% of revenues generated by AdSense with content network partners. http://en.wikipedia.org/wiki/AdSense
  • 24. Google AdSense R.Polillo - Ottobre 2010
  • 25. Google Operating Systems Android : open-source platform Linux-based for mobile device application developments Google Chrome OS : netbooks/notebooks platform “ Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year we will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010. (...) Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year. The software architecture is simple — Google Chrome running within a new windowing system on top of a Linux kernel.” http://getchrome.eu/index.php
  • 26. Google Operating Systems Android : see Android.ppt http://www.android.com/about/ Google Chrome OS : first systems in 2011 http://www.google.com/chromeos/features.html http://www.chromium.org/chromium-os http://www.chromium.org/chromium-os/chromiumos-design-docs/software-architecture
  • 27. Google tricks Google tells what information is collected when using the search engine and what is done to protect the privacy of users: http://www.youtube.com/watch?v=iPkvNr2cpqg http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf Search in the blogs: http://blogsearch.google.it/ Search history http://www.google.com/history Sites comparison: http://www.google.com/insights/search/ # Other: http://www.google.com/intl/en/options/ and http://labs.google.com/
  • 28. exercise 8 Shortest GoogleWhacking (one or two words)
  • 29. Try some search on Google, Bing and Yahoo!: report about differences between them
  • 30. Analyze AdWords and give your opinion on it
  • 31. Give your opinion about ChromeOs future