Effective Use of the Twitter Search API
Effective Use of the
Twitter Search API
Eric Jensen
Twitter Search

Submit your questions via
http://bit.ly/chirpsearch
or hashtag #chirpsearch
Agenda
•   Mission of the Twitter Search API

•   History

•   Most recently: ranking the top results

•   What’s next
Search API Mission

Connect users with what's most
important and interesting to
them in the here and now

(return the best stuff for a query)
Search Stats
•   Over 600 million queries per day

•   Typically less than 200 milliseconds per query

•   Typically less than 20 seconds indexing
    latency

•   Index of hundreds of millions of tweets
Search API Use Cases
•   Search interfaces: collecta, oneriot, crowdeye, ...

•   Dashboard clients: tweetdeck, seesmic, ...

•   Widgets: twitter, tweetgrid, monitter, ...

•   Location search: trendsmap, foursquare, ...

•   Visualizations: radian6, crimsonhexagon, twistori, ...

•   Analytics: stocktwits, trendrr, tweetstats, ...

•   Recommenders: mrtweet, ...

•   Thousands not listed here + not invented yet
Search vs. Streaming
•   Do use the search API for your app when:

    •   The user can input a query

    •   You need immediate results, not tracking

•   Don’t use the search API for your app when:

    •   Your user experience requires comprehensive
        results (all the tweets, not just the best ones)

    •   You only need tweets from/to/at particular users
Refreshing Results
Client                                           API
                search.json?q=twitter

   "refresh_url":"?since_id=9290798834&q=twitter"




                                                       seconds
                                                         ~20
     search.json?since_id=9290798834&q=twitter

   "refresh_url":"?since_id=9290800152&q=twitter"
Why is this OK?
search.json?q=twitter   search.json?since_id=9290798834
                                   &q=twitter


  Timeline Cache               Timeline Cache
                             q=twitter    1   2 3 4




      Search                                          Tweets
      Index
Search API History

                                                                                             Quality Filtering on Trends
                                                                                             Nov 5, 2009

Summize Launches Twitter Search                                                                                            Top Results Include Popular
Apr 4, 2008                                                                                                                Apr 1, 2010

                 Summize Acquired by Twitter           Search on Twitter.com                             Local Trends        Chirp!
                 Jul 14, 2008                          Apr 1, 2009                                       Jan 6, 2010         Apr 15, 2010


                                                                                                                                     Twitter Search API
                    Sep 1, 2008          Jan 1, 2009   May 1, 2009             Sep 1, 2009        Jan 1, 2010
Ranking Top Results
             • Best stuff for a query

             • Many factors

             • First step

             • Available from API
Top Results API
•   New parameter: result_type

    •   mixed: Eventually this will become the
        default value. Include both popular and real
        time results in the response.

    •   recent: The current default value. Return
        only the most recent results in the response.

    •   popular: Return only the most popular
        results in the response.
Top Results Metadata
{"results":[
     {"text":"@twitterapi  http://
tinyurl.com/ctrefg",
     "from_user":"jkoum",
     "metadata":
     {
      "result_type":"popular",
      "recent_retweets": 100
     },
     "id":1478555574,   
Top Results API Example
        • Initial load includes top results

        • Metadata annotates them

        • Refreshes recent results on top
Include Top Results
url =
  ‘http://search.twitter.com/search.' +
  format +
  '?q=' + query +
  '&result_type=mixed'
Annotate w/ Metadata
if (tweet.metadata.result_type ==
     'popular') {


    return '<div class="twtr-popular">' +
     tweet.metadata.recent_retweets +
     ' recent retweets</div>';
}
Refresh Recent Results
refresh_url = response.refresh_url


...


url =
  ‘http://search.twitter.com/search.' +
  format +
  refresh_url
The Near Future
•   Remove duplicates (retweets)

•   Deeper index

•   Hit highlighting in the API

•   More consistency (with the REST API)

•   Better rate limiting
The Future (cont)
•   More relevance

•   More metadata

•   More stuff

•   More operators

    •   places, @anywhere, annotations
Open Source in Search
•   http://twitter.com/about/opensource

    •   mysql, hadoop, kestrel, twitter-text, etc.

•   lucene

•   commons-pipeline

•   varnish

•   jmeter

•   nutch language identifier

•   mecab
We’re Hiring
•   http://twitter.com/jobs

•   Data Analyst - Search

•   Product Manager - Search

•   Software Engineer - Search

•   Software Engineer - Search Front-End

•   Software Engineer - Search Relevance
Questions?

http://bit.ly/chirpsearch
or hashtag #chirpsearch

Also join us at the Real-Time
Search Birds of a Feather @
1:30 in The Coop

More Related Content

PDF
SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information
PPTX
#tmeetup BirdHackers API 101
PDF
Hacking RSS: Filtering & Processing Obscene Amounts of Information (short ve...
PDF
Parsing real-time data using Twitter Streaming API
PDF
Programming to the Twitter API: ReTweeter
PPTX
Social Media Data
PPTX
Building Social Tools
PPTX
Twitter API, Streaming and SharePoint 2013
SXSW Hacking RSS: Filtering & Processing Obscene Amounts of Information
#tmeetup BirdHackers API 101
Hacking RSS: Filtering & Processing Obscene Amounts of Information (short ve...
Parsing real-time data using Twitter Streaming API
Programming to the Twitter API: ReTweeter
Social Media Data
Building Social Tools
Twitter API, Streaming and SharePoint 2013

Similar to Effective Use of the Twitter Search API (20)

PPTX
Social Developers London update for Twitter Developers
PPTX
Twitter api
PPTX
Harvesting Data from Twitter Workshop: Hands-on Experience
PDF
CSE5656 Complex Networks - Gathering Data from Twitter
PDF
Open Network Live - Chirp 情報共有
PDF
Internship
PPTX
We are losing our tweets!
PDF
PDF
iPhoneアプリのTwitter連携
PDF
A case about Twitter
PPTX
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
PPTX
Twitter - What, Why, Who & How
PDF
HootSuite 101 Workshop
PPTX
Sentiment analysis on demonetisation
PDF
Jinchao demo v7
PPTX
Potential of twitter archives
ODP
Twitter
PDF
Unleashing Twitter Data for Fun and Insight
PDF
Unleashing twitter data for fun and insight
PDF
Turbocharge Twitter With Apps SMBMTL 082510
Social Developers London update for Twitter Developers
Twitter api
Harvesting Data from Twitter Workshop: Hands-on Experience
CSE5656 Complex Networks - Gathering Data from Twitter
Open Network Live - Chirp 情報共有
Internship
We are losing our tweets!
iPhoneアプリのTwitter連携
A case about Twitter
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
Twitter - What, Why, Who & How
HootSuite 101 Workshop
Sentiment analysis on demonetisation
Jinchao demo v7
Potential of twitter archives
Twitter
Unleashing Twitter Data for Fun and Insight
Unleashing twitter data for fun and insight
Turbocharge Twitter With Apps SMBMTL 082510
Ad

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPT
What is a Computer? Input Devices /output devices
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Unlock new opportunities with location data.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Benefits of Physical activity for teenagers.pptx
PPT
Geologic Time for studying geology for geologist
PPTX
Tartificialntelligence_presentation.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Hybrid model detection and classification of lung cancer
DOCX
search engine optimization ppt fir known well about this
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
Group 1 Presentation -Planning and Decision Making .pptx
What is a Computer? Input Devices /output devices
1 - Historical Antecedents, Social Consideration.pdf
sustainability-14-14877-v2.pddhzftheheeeee
NewMind AI Weekly Chronicles – August ’25 Week III
Unlock new opportunities with location data.pdf
Module 1.ppt Iot fundamentals and Architecture
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
CloudStack 4.21: First Look Webinar slides
Zenith AI: Advanced Artificial Intelligence
Benefits of Physical activity for teenagers.pptx
Geologic Time for studying geology for geologist
Tartificialntelligence_presentation.pptx
Enhancing emotion recognition model for a student engagement use case through...
Hindi spoken digit analysis for native and non-native speakers
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Hybrid model detection and classification of lung cancer
search engine optimization ppt fir known well about this
O2C Customer Invoices to Receipt V15A.pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Ad

Effective Use of the Twitter Search API

  • 2. Effective Use of the Twitter Search API Eric Jensen Twitter Search Submit your questions via http://bit.ly/chirpsearch or hashtag #chirpsearch
  • 3. Agenda • Mission of the Twitter Search API • History • Most recently: ranking the top results • What’s next
  • 4. Search API Mission Connect users with what's most important and interesting to them in the here and now (return the best stuff for a query)
  • 5. Search Stats • Over 600 million queries per day • Typically less than 200 milliseconds per query • Typically less than 20 seconds indexing latency • Index of hundreds of millions of tweets
  • 6. Search API Use Cases • Search interfaces: collecta, oneriot, crowdeye, ... • Dashboard clients: tweetdeck, seesmic, ... • Widgets: twitter, tweetgrid, monitter, ... • Location search: trendsmap, foursquare, ... • Visualizations: radian6, crimsonhexagon, twistori, ... • Analytics: stocktwits, trendrr, tweetstats, ... • Recommenders: mrtweet, ... • Thousands not listed here + not invented yet
  • 7. Search vs. Streaming • Do use the search API for your app when: • The user can input a query • You need immediate results, not tracking • Don’t use the search API for your app when: • Your user experience requires comprehensive results (all the tweets, not just the best ones) • You only need tweets from/to/at particular users
  • 8. Refreshing Results Client API search.json?q=twitter "refresh_url":"?since_id=9290798834&q=twitter" seconds ~20 search.json?since_id=9290798834&q=twitter "refresh_url":"?since_id=9290800152&q=twitter"
  • 9. Why is this OK? search.json?q=twitter search.json?since_id=9290798834 &q=twitter Timeline Cache Timeline Cache q=twitter 1 2 3 4 Search Tweets Index
  • 10. Search API History Quality Filtering on Trends Nov 5, 2009 Summize Launches Twitter Search Top Results Include Popular Apr 4, 2008 Apr 1, 2010 Summize Acquired by Twitter Search on Twitter.com Local Trends Chirp! Jul 14, 2008 Apr 1, 2009 Jan 6, 2010 Apr 15, 2010 Twitter Search API Sep 1, 2008 Jan 1, 2009 May 1, 2009 Sep 1, 2009 Jan 1, 2010
  • 11. Ranking Top Results • Best stuff for a query • Many factors • First step • Available from API
  • 12. Top Results API • New parameter: result_type • mixed: Eventually this will become the default value. Include both popular and real time results in the response. • recent: The current default value. Return only the most recent results in the response. • popular: Return only the most popular results in the response.
  • 13. Top Results Metadata {"results":[      {"text":"@twitterapi  http:// tinyurl.com/ctrefg",      "from_user":"jkoum",      "metadata":      {       "result_type":"popular",       "recent_retweets": 100      },      "id":1478555574,   
  • 14. Top Results API Example • Initial load includes top results • Metadata annotates them • Refreshes recent results on top
  • 15. Include Top Results url = ‘http://search.twitter.com/search.' + format + '?q=' + query + '&result_type=mixed'
  • 16. Annotate w/ Metadata if (tweet.metadata.result_type == 'popular') { return '<div class="twtr-popular">' + tweet.metadata.recent_retweets + ' recent retweets</div>'; }
  • 17. Refresh Recent Results refresh_url = response.refresh_url ... url = ‘http://search.twitter.com/search.' + format + refresh_url
  • 18. The Near Future • Remove duplicates (retweets) • Deeper index • Hit highlighting in the API • More consistency (with the REST API) • Better rate limiting
  • 19. The Future (cont) • More relevance • More metadata • More stuff • More operators • places, @anywhere, annotations
  • 20. Open Source in Search • http://twitter.com/about/opensource • mysql, hadoop, kestrel, twitter-text, etc. • lucene • commons-pipeline • varnish • jmeter • nutch language identifier • mecab
  • 21. We’re Hiring • http://twitter.com/jobs • Data Analyst - Search • Product Manager - Search • Software Engineer - Search • Software Engineer - Search Front-End • Software Engineer - Search Relevance
  • 22. Questions? http://bit.ly/chirpsearch or hashtag #chirpsearch Also join us at the Real-Time Search Birds of a Feather @ 1:30 in The Coop

Editor's Notes

  • #4: i will talk about: - start by giving some of our thinking about why we have a search api and what differentiates it from the other api&amp;#x2019;s twitter offers - i&amp;#x2019;ll get into some technical implications of these differences with respect to polling on search versus tracking keywords on the streaming api - next, i&amp;#x2019;ll talk briefly about how the search api has changed over time - and then we&amp;#x2019;ll dig into the most recent change where we began ranking the top results beyond recency order. i&amp;#x2019;ll show you how i&amp;#x2019;ve modified one of our own search api clients to take advantage of that change
  • #5: simple definition: user provides a query by engaging with an api application, we provide the best stuff (currently tweets and trends) for that query Obviously the &amp;#x201C;best&amp;#x201D; stuff for twitter has a lot to do with how recent it is, so our primary focus is on the &amp;#x201C;here and now&amp;#x201D;
  • #6: Just to give you an idea of the parameters search operates under: - as ev told you yesterday we are doing more than 600M queries per day, seen up to 750M on a day recently - while realtime is our main focus, our index does contain hundreds of millions of tweets and we&amp;#x2019;ve roughly doubled its size in the last six months. - of course, the amount of tweets has grown even faster than we&amp;#x2019;ve increased that index size, so this only covers about a week of them right now, but that is something we&amp;#x2019;re currently working on expanding
  • #7: So obviously we&amp;#x2019;re operating a large scale, but what&amp;#x2019;s really interesting to me about the search API is the variety of applications you as developers have found for it. I&amp;#x2019;ve listed just a few here to illustrate what people are currently doing with the API.
  • #8: So that&amp;#x2019;s what people are doing with the search api, but the streaming api also supports tracking keywords and some location and language filtering. So, if you&amp;#x2019;re developing a new app, how do you decide which to use?
  • #9: The biggest difference between the search API and the track API is how you get new results matching your standing query. On the streaming API the push model makes this obvious: new results are sent to you as they come in. Since the focus of the search API is on apps that let the user manipulate the query (whether explicitly or implicitly), registering a standing query for every request makes less sense. Instead, the search API uses a polling model with a cursor. --- make sure you explain this diagram by pointing at it (or at least describing it). It took me a minute to get the visual presentation
  • #10: One question that comes up frequently is why we encourage apps to use this cursor to poll and how that helps us to support refreshes more efficiently, so here&amp;#x2019;s a diagram of what happens under the covers. A lot like the streaming API, when you make any query to search we actually do register that as a standing query, but only in one of our caching layers we call the timeline cache.
  • #11: Next I&amp;#x2019;d like to take a step back and talk briefly about the history of the search API and how our thinking about it has developed. twitter search and the API have been around for about two years now, and we made a lot of changes early on like supporting location search, but after that we had to shift our focus to scaling the system to support the growth in tweets and queries. It&amp;#x2019;s really just in the last six months that we&amp;#x2019;ve made enough progress with scaling and grown the search team enough to be able to focus more on relevance and figuring out what that means for twitter search.
  • #12: Our mission: ---- Under &amp;#x201C;many factors&amp;#x201D; you should note that it&amp;#x2019;s not always the popular users that show up here -- that seems to be an early misconception. Our algorithm looks to find things that are interesting from any user - things that &amp;#x201C;resonate,&amp;#x201D; to use a word that Dick talked about yesterday (good to tie it in to other things being said at Chirp). Rather than &amp;#x201C;not final&amp;#x201D; (which seems to imply there is a &amp;#x201C;final&amp;#x201D; step when we won&amp;#x2019;t be improving this) I&amp;#x2019;d say something like &amp;#x201C;First step of a long road of relevance improvements&amp;#x201D; (implying that we&amp;#x2019;ve got lots of ideas and we&amp;#x2019;ll be delivering cool stuff for a long way.
  • #13: right now at the top
  • #18: explain that this uses since_id
  • #23: we want to hear from you