Floyd Morgan
Floyd_Morgan@intuit.com
       @fmorgan
 Lucene Revolution, 2011
Agenda
•    About Me
•    About Live Community
•    Live Community Search
•    NLP
•    Next Steps
•    Questions? Answers?
About Me
•  Principal Software Engineer at Intuit
	
  
Intuit QuickBase




Intuit Inc. is a leading provider of business and financial management solutions
 for small and mid-sized businesses; financial institutions, including banks and
              credit unions; consumers and accounting professionals.

          More	
  than	
  200	
  applica0ons	
  and	
  7700	
  employees	
  worldwide.	
  
About Me
•  Principal Software Engineer at Intuit
•  TurboTax Engineering
	
  
TurboTax is the nation’s No. 1 rated, best-selling, do-it-yourself tax
preparation software. TurboTax helps more than 20 million people a
                                year.
                        $1 billion in revenue
About Me
•  Principal Software Engineer at Intuit
•  TurboTax Engineering
       –  Core tax engine
	
  
About Me
•  Principal Software Engineer at Intuit
•  TurboTax Engineering
  –  Core tax engine
  –  TurboTax Online
About Me
•  Principal Software Engineer at Intuit
•  TurboTax Engineering
  –  Core tax engine
  –  TurboTax Online
  –  TurboTax Live Community
About Me
•  Principal Software Engineer at Intuit
•  TurboTax Engineering
  –  Core tax engine
  –  TurboTax Online
  –  TurboTax Live Community
•  Central Technology Organization
  –  Live Community Platform
Morgan Floyd - Intuit's Live Community
About Live Community
•  It’s a user contribution system
    –  Q&A
About Live Community
•  It’s a user contribution system
    –  Q&A
•  It can be integrated into an application, contextually
    –  Page-to-page relevance
About Live Community
•  It’s a user contribution system
    –  Q&A
•  It can be integrated into an application, contextually
    –  Page-to-page relevance
•  We use social, technology and data
    –  To create our value proposition…assisting users
About Live Community
•  It’s a user contribution system
    –  Q&A
•  It can be integrated into an application, contextually
    –  Page-to-page relevance
•  We use social, technology and data
    –  To create our value proposition…assisting users
•  We launched our Beta in 2007
    –  TurboTax Online Home & Business
About Live Community
•  It’s a user contribution system
    –  Q&A
•  It can be integrated into an application, contextually
    –  Page-to-page relevance
•  We use social, technology and data
    –  To create our value proposition…assisting users
•  We launched our Beta in 2007
    –  TurboTax Online Home & Business
•  We use open source…primarily open source
    –  Apache HTTP, Ruby on Rails, MySQL, memcached ...
About Live Community
•  It’s a user contribution system
    –  Q&A
•  It can be integrated into an application, contextually
    –  Page-to-page relevance
•  We use social, technology and data
    –  To create our value proposition…assisting users
•  We launched our Beta in 2007
    –  TurboTax Online Home & Business
•  We use open source…primarily open source
    –  Apache HTTP, Ruby on Rails, MySQL, memcached ...
•  It’s a platform
    –  APIs, skinning, dynamic provisioning (AWS in progress)
Intuit Money Manager, India
QuickBooks Online, UK
devZone, Intuit dev
QuickBooks Online, US
TurboTax Desktop & Online, US
Terminology
Consumers (in the millions)
Contributors (in the thousands)
Top Contributors (in the hundreds)
Employees (contribute too)
Tax Season


Officially begins on December 1 and ends
                 on April 15.
About TurboTax Live Community
•  Largest community
    –  150+ servers, 200 thousand concurrent users
About TurboTax Live Community
•  Largest community
    –  150+ servers, 200 thousand concurrent users
•  Over 23 million users have used the service
    –  Over 8 million last tax season alone
About TurboTax Live Community
•  Largest community
    –  150+ servers, 200 thousand concurrent users
•  Over 23 million users have used the service
    –  Over 8 million last tax season alone
•  Over 32 million pages views last tax season
    –  In-product views in the billions
About TurboTax Live Community
•  Largest community
    –  150+ servers, 200 thousand concurrent users
•  Over 23 million users have used the service
    –  Over 8 million last tax season alone
•  Over 32 million pages views last tax season
    –  In-product views in the billions
•  Over 750 thousand answered questions
    –  10 thousand questions asked on peak day
About TurboTax Live Community
•  Largest community
    –  150+ servers, 200 thousand concurrent users
•  Over 23 million users have used the service
    –  Over 8 million last tax season alone
•  Over 32 million pages views last tax season
    –  In-product views in the billions
•  Over 750 thousand answered questions
    –  10 thousand questions asked on peak day
•  Our contributors answers thousands of
   questions
    –  Top contributor – 70 thousand answers
Demo
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Why Solr?
•  Lots of features/functionality
	
  
Why Solr?
•  Lots of features/functionality
•  Ease of integration
	
  
Why Solr?
•  Lots of features/functionality
•  Ease of integration
•  We can scale it independently
	
  
Why Solr?
•      Lots of features/functionality
•      Ease of integration
•      We can scale it independently
•      You’ll need some search expertise…that’s
       ok
       –  Community and Lucid Imagination!
	
  
Why Solr?
•      Lots of features/functionality
•      Ease of integration
•      We can scale it independently
•      You’ll need some search expertise…that’s
       ok
       –  Community and Lucid Imagination!
•  Search is really important
       –  Search everywhere…
	
  
Why Solr?
•      Lots of features/functionality
•      Ease of integration
•      We can scale it independently
•      You’ll need some search expertise…that’s
       ok
       –  Community and Lucid Imagination!
•  Search is really important
       –  Search everywhere…
	
  
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
Auto suggest
•  Provides a glimpse of our vast content
Auto suggest
•  Provides a glimpse of our vast content
•  facet query (Solr 1.2)
Auto suggest
•  Provides a glimpse of our vast content
•  facet query (Solr 1.2)
•  We use NLP…
Auto suggest
•    Provides a glimpse of our vast content
•    facet query (Solr 1.2)
•    We use NLP…
•    It’s used on every search touch point
Auto suggest
•    Provides a glimpse of our vast content
•    facet query (Solr 1.2)
•    We use NLP…
•    It’s used on every search touch point
•    Second most frequent request
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
In-product “mini” search
•  Primary search interface for consumers
	
  
In-product “mini” search
•  Primary search interface for consumers
•  It appears integrated
	
  
In-product “mini” search
•  Primary search interface for consumers
•  It appears integrated
•  Now the most utilized search interface
	
  
In-product “mini” search
•      Primary search interface for consumers
•      It appears integrated
•      Now the most utilized search interface
•      It makes all content available
	
  
In-product “mini” search
•      Primary search interface for consumers
•      It appears integrated
•      Now the most utilized search interface
•      It makes all content available
•      Over 3 million users last tax season
	
  
# using Solr is easy!	
  
require 'solr’	
	
c = Solr::Connection.new( 	
  "http://localhost:8090/solr/posts" )	
c.search( "how do i input 1099”,	
  :filter_queries => "post_status: #
   {Post::ANSWERED}" )
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
Web-site “full” search
•  Primary search interface for contributors
   and employees
Web-site “full” search
•  Primary search interface for contributors
   and employees
•  More real estate, more facets, more
   suggestions ...
Web-site “full” search
•  Primary search interface for contributors
   and employees
•  More real estate, more facets, more
   suggestions ...
•  Faceted search empowers development
   teams to narrow on issues
Web-site “full” search
•  Primary search interface for contributors
   and employees
•  More real estate, more facets, more
   suggestions ...
•  Faceted search empowers development
   teams to narrow on issues
•  200+ TurboTax issues discovered last tax
   season
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
# using Solr is easy!	
  
require 'solr’	
	
c = Solr::Connection.new( 	
  "http://localhost:8090/solr/posts" )	
c.search( ”bug”,	
  :filter_queries => "post_status: #
   {Post::OPEN}" )
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Instant answer
•  Present similar answered question
Instant answer
•  Present similar answered question
•  Search with the terms of the new question
Instant answer
•  Present similar answered question
•  Search with the terms of the new question
•  Narrow the focus to the subject
Instant answer
•    Present similar answered question
•    Search with the terms of the new question
•    Narrow the focus to the subject
•    Show snippet of a recommended answer
Instant answer
•    Present similar answered question
•    Search with the terms of the new question
•    Narrow the focus to the subject
•    Show snippet of a recommended answer
•    Accidental A/B test
Demo
# using Solr is easy!	
  
require 'solr’	
	
c = Solr::Connection.new( 	
  "http://localhost:8090/solr/posts" )	
c.search( "how do i input 1099”,	
  { :query_fields =>
   "subject", :filter_queries =>
   "post_status: #{Post::ANSWERED}" } )
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Instant question
•  Present similar unanswered questions
Instant question
•  Present similar unanswered questions
•  Answer reuse
Instant question
•  Present similar unanswered questions
•  Answer reuse
•  Search with the terms of the answered
   question
Instant question
•  Present similar unanswered questions
•  Answer reuse
•  Search with the terms of the answered
   question
•  Narrow the focus to the subject
Instant question
•  Present similar unanswered questions
•  Answer reuse
•  Search with the terms of the answered
   question
•  Narrow the focus to the subject
•  We also use a date filter
“Aren’t	
  we	
  addicted	
  
     enough!”	
  
Demo
# using Solr is easy!	
  
require 'solr’	
	
c = Solr::Connection.new( 	
  "http://localhost:8090/solr/posts" )	
today =
   DateTime.now.at_beginning_of_day.utc.to_time	
date_from = 7.to_i.days.ago
   ( today ).getutc.iso8601	
c.search( "how do i input 1099", { :query_fields
   => "subject", :filter_queries => "post_status:
   #{Post::OPEN} AND created_at_d:[#{date_from}
   TO *]" } )
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Answer bot
•  We continue to search for you
  –  The day after you ask
Answer bot
•  We continue to search for you
  –  The day after you ask
•  Send an email
Answer bot
•  We continue to search for you
  –  The day after you ask
•  Send an email
•  Runs for 7 days
Answer bot
•  We continue to search for you
  –  The day after you ask
•  Send an email
•  Runs for 7 days
•  We only send another email if the results
   have changed
Answer bot
•  We continue to search for you
  –  The day after you ask
•  Send an email
•  Runs for 7 days
•  We only send another email if the results
   have changed
•  From our explicit feedback
  –  39% answered question
Morgan Floyd - Intuit's Live Community
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Advertising
•  We use our user generated content in
     advertising
	
  
Advertising
•  We use our user generated content in
     advertising
•  Has 300% higher click through rate than
     static banner ads
	
  
Advertising
•  We use our user generated content in
     advertising
•  Has 300% higher click through rate than
     static banner ads
•  Ads displayed throughout the tax season
     on many ad networks
	
  
Advertising
•  We use our user generated content in
     advertising
•  Has 300% higher click through rate than
     static banner ads
•  Ads displayed throughout the tax season
     on many ad networks
•  Content selection is automated and
     continuous
	
  
Morgan Floyd - Intuit's Live Community
Logs
    Logs
     Logs




MapReduce


 Carrot2


   Solr


Heuristics
<?xml version="1.0" encoding="UTF-8"?> 

    <lc_trending end_date="2011-05-21" include_popular="true" type="queries" duration="day"> 

      <topic> 

        <rank>1</rank> 

        <text>Ptp</text> 

    <post> 

      <post_id>aBHMBWxzar4lKMacfArRo0</post_id> 

      <subject>Final K-1 Disposition of PTP Units</subject> 

      <detail>I bought units in a PTP in five separate transactions in 2008; I sold all my
    units in five separate transactions in 2010. TT does not allow me to report all 5
    transactions while stepping through the K-1 form -- these transactions are reported on
    Schedule D, but also need to be on Form 4797, Part II, Box 10. I can't seem to make the
    linkage work. I would appreciate some guidance on how to make this happen.</detail> 

      <response>OK, several steps needed for your situation:

    1) on the K-1 on the screen entitled Describe the Partnership Disposal, choose "Disposition
    was not via a sale"

    2) Then search for the topic "sale of business property" -   you will be taked to a topic
    entitled "Any Other Property Sales?" - select the first option. Ove rthe next few screens
    here you will have the opportunityut to enter the sale amounts associated witht he Form
    4797.

    

    3) then choose the topic on the income landing table for "Stocke, Mutual Funds, Bonds,
    other - here you will enter the rest of the sale, that portion attributable to capital
    gains.

    

    Hope this helps you,

    </response> 

         <viewsCount>60</viewsCount> 

         <answersCount>2</answersCount> 

         <asker>Xuxan</asker> 

         <display_post_url>https://ttlc.intuit.com/post/show_full/aBHMBWxzar4lKMacfArRo0?
    rmode=ad</display_post_url> 

    </post> 

    	
  
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
Search everywhere
•  Search first, ask second
   –  Used to be ask first, search later or never!
Morgan Floyd - Intuit's Live Community
Morgan Floyd - Intuit's Live Community
Search everywhere
•  Search first, ask second
   –  Used to be ask first, search later or never!
•  Auto complete everywhere too
   –  64 bit Linux, 10 (8 core) slaves, 300 req/s
Search everywhere
•  Search first, ask second
   –  Used to be ask first, search later or never!
•  Auto complete everywhere too
   –  64 bit Linux, 10 (8 core) slaves, 300 req/s
•  Search requests
   –  900 % increase
Search everywhere
•  Search first, ask second
   –  Used to be ask first, search later or never!
•  Auto complete everywhere too
   –  64 bit Linux, 10 (8 core) slaves, 300 req/s
•  Search requests
   –  900 % increase
•  Questions asked
   –  50 % decrease…is that good?
Search everywhere
•  Search first, ask second
    –  Used to be ask first, search later or never!
•  Auto complete everywhere too
    –  64 bit Linux, 10 (8 core) slaves, 300 req/s
•  Search requests
    –  900 % increase
•  Questions asked
    –  50 % decrease…is that good?
•  Increased consumption
    –  38% users, 43% content…very good!
Live Community Search
•    Why Solr?
•    Auto suggest
•    In-product search
•    Web-site search
•    Instant answer
•    Instant question
•    Answer bot
•    Advertising
•    Search everywhere
•    Architecture
Search cluster




App server                      Indexing server




             Database cluster
NLP
•  Search is not enough…unfortunately
NLP
•  Search is not enough…unfortunately
•  Our domain is noisy…ugly at times
Uh, what?
Too much what!
?
I wish NLP could help!
NLP
•  Search is not enough…unfortunately
•  Our domain is noisy…ugly at times
•  How it works…
HwO do iput 10 99 i don,t
know what to do need help
       help me.
Where do I enter a 1099?
schema.xml
<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">

         <analyzer type="index">

           <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>

           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

           <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
    generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"
    preserveOriginal="1"/>

           <filter class="solr.LowerCaseFilterFactory"/>

           <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>

           <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

         </analyzer>

         <analyzer type="query">

           <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>

           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
    ignoreCase="true" expand="true"/>

           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

           <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
    generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"
    preserveOriginal="1"/>

           <filter class="solr.LowerCaseFilterFactory"/>

           <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

           <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

         </analyzer>	
</fieldtype>

    	
  
dictionary
<?xml version="1.0" encoding="US-ASCII"?>

    <dictionary>

           <entry score="10" root="none" synonym="none" domain="ttlc"
    id="suitcas">suitcase</entry>

           <entry score="10" root="form" synonym="none" domain="ttlc" id="2210"></entry>

           <entry score="10" root="none" synonym="none" domain="ttlc" id="xrai">x-ray</
    entry>

           <entry score="10" root="none" synonym="townhom" domain="ttlc"
    id="townhous">townhouse</entry>

           <entry score="10" root="none" synonym="none" domain="ttlc" id="grosssal">gross
    sale</entry>

           <entry score="10" root="none" synonym="none" domain="ttlc"
    id="trinidad">Trinidad</entry>

           <entry score="10" root="none" synonym="none" domain="ttlc" id="home"></entry>

           <entry score="10" root="none" synonym="know" domain="ttlc" id="knew"></entry>

           <entry score="10" root="none" synonym="none" domain="ttlc"
    id="massachusett">Massachusetts</entry>

           <entry score="10" root="none" synonym="none" domain="ttlc"
    id="denver">Denver</entry>

           <entry score="5" root="none" synonym="none" domain="ttlc" id="instead"></
    entry>

           <entry score="10" root="none" synonym="unallow" domain="ttlc"
    id="disallow">not allowed</entry>

           <entry score="5" root="none" synonym="see" domain="ttlc" id="saw"></entry>

    

      	
  
regular expressions (many)
if text =~ / any/	
  	text.gsub!(/ any where /, ' anywhere ')

     text.gsub!(/ any(body| body| one) /, ' anyone ')

     text.gsub!(/ any( thing| things|things) /, ' anything ')

     text.gsub!(/ any(one|thing|where) else /, ' any1 ’)	
end	
	
if text =~ / don / 	
  	text.gsub!(/ don i /, ' do not i ')

     text.gsub!(/ don (have|know|see|want) /, ' do not 1 ')

     text.gsub!(/ (are|be|have|is|was|were) don /, ' 1 done ’)	
  	text.gsub!(/ don (not|nt|t) /, ' do not ’)	
end

     	
text.gsub!(/ (do|can) (ai|ii) /, ' 1 i ’)	
text.gsub!(/ d (oyou|you) /, ' do you ')

     	
text.gsub!(/ (1|ai|ii|my) (did|do|had|have|was) /, ' i 2 ’)	
text.gsub!(/ crap{1,10} /, ' crap ’)	
text.gsub!(/ gr{1,} /, ' ') 	


    

Spell Checker

  Stemmer (Porter)

  Word Collocation

Stop Phrase Correction

 Stop Word Removal

Synonyms Substitution

Tax Domain Correction

   Phrase Encoding
# NLP is not easy!	
  
# this class wraps our NLP	
sf = SemanticFilter.new	
	
# does it work?	
sf.act_on_post( "HwO do iput 10 99 i don,t
     know what to do need help help me." )	
=>[" wheretoent 1099 ”]	
	
sf.act_on_post( "Where do I enter a 1099?" )	
=>[" wheretoent 1099 ”]	
	
  
NLP
•    Search is not enough…unfortunately
•    Our domain is noisy…ugly at times
•    How it works…
•    It works well, but it’s not perfect
“Stop guessing what
  I’m looking for!”
NLP
•    Search is not enough…unfortunately
•    Our domain is noisy…ugly at times
•    How it works…
•    It works well, but it’s not perfect
•    Not just for search…
Morgan Floyd - Intuit's Live Community
Recommendations
•  Deliver unanswered questions to
   contributors
Recommendations
•  Deliver unanswered questions to
   contributors
•  Too much content to scan manually
Recommendations
•  Deliver unanswered questions to
   contributors
•  Too much content to scan manually
•  Based on past answering behavior
Recommendations
•  Deliver unanswered questions to
   contributors
•  Too much content to scan manually
•  Based on past answering behavior
•  Recommend a question to multiple
   contributors
Recommendations
•  Deliver unanswered questions to
   contributors
•  Too much content to scan manually
•  Based on past answering behavior
•  Recommend a question to multiple
   contributors
•  Uses Mahout machine learning library
Answered      Unanswered


  NLP                NLP

  User           Post
 vectors        vectors

         Mahout


        Heuristics
Morgan Floyd - Intuit's Live Community
Next Steps
•  We’re going to rewrite it!
Next Steps
•  We’re going to rewrite it! … most of it ;)
Next Steps
•  We’re going to rewrite it! … most of it ;)
•  Real-time indexing
Next Steps
•  We’re going to rewrite it! … most of it ;)
•  Real-time indexing
•  Question vs. Query
Next Steps
•    We’re going to rewrite it! … most of it ;)
•    Real-time indexing
•    Question vs. Query
•    Social feedback
     – Page ranking
Next Steps
•  We’re going to rewrite it! … most of it ;)
•  Real-time indexing
•  Question vs. Query
•  Social feedback
   – Page ranking
•  Social dictionaries
   – Content classification
Next Steps
•  We’re going to rewrite it! … most of it ;)
•  Real-time indexing
•  Question vs. Query
•  Social feedback
   – Page ranking
•  Social dictionaries
   – Content classification
•  Beer?!
Thank	
  you.	
  
           	
  
Floyd_Morgan@intuit.com	
  
       @fmorgan	
  
Appendix	
  
•  User	
  search	
  
•  SEO	
  

More Related Content

PDF
Social Media analysis of Intuit
PPT
Mujer, pajaro y estrella
PPT
Adobe Photoshop
PDF
Integration of apache solr with crawlers
PDF
Lucene rev preso bialecki solr crawlers-lr
PPT
Hellosong
PPTX
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
PDF
最新ブラウザー UI 比較
Social Media analysis of Intuit
Mujer, pajaro y estrella
Adobe Photoshop
Integration of apache solr with crawlers
Lucene rev preso bialecki solr crawlers-lr
Hellosong
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
最新ブラウザー UI 比較

Viewers also liked (18)

PPTX
Cmd Training Institute - New Premises
PDF
Open Source Search Applications
PPTX
Presentation to the Old Dominion University (ODU) MBA Association, 3/20/13
DOCX
Already, just, still, yet
PPTX
まっちゃ4451LT「IE の InPrivateブラウズ」
PDF
IAMAS 2010 First presentation
PPTX
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
PPTX
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
PDF
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
PPTX
Pista American Idiot
PDF
Discover the new techniques about search application
PPTX
Нестандартные методы интернет рекламы
PDF
Integrating Advanced Text Analytics into Solr
PPTX
基于成本代理模型的Ip长途网络成本仿真研究
PDF
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
PDF
Indexing Text and HTML Files with Solr
PDF
What’s new in apache lucene 3.0
PPTX
Lucy in the sky[1]
Cmd Training Institute - New Premises
Open Source Search Applications
Presentation to the Old Dominion University (ODU) MBA Association, 3/20/13
Already, just, still, yet
まっちゃ4451LT「IE の InPrivateブラウズ」
IAMAS 2010 First presentation
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Pista American Idiot
Discover the new techniques about search application
Нестандартные методы интернет рекламы
Integrating Advanced Text Analytics into Solr
基于成本代理模型的Ip长途网络成本仿真研究
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
Indexing Text and HTML Files with Solr
What’s new in apache lucene 3.0
Lucy in the sky[1]
Ad

Similar to Morgan Floyd - Intuit's Live Community (20)

PPTX
Solr site search makes shopping simple
KEY
Solr 101
PDF
Building Lanyrd
PDF
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
PDF
Faster Cheaper Better-Replacing Oracle with Hadoop & Solr
PDF
In search of: A meetup about Liferay and Search 2016-04-20
PDF
Solr @ eBay Kleinanzeigen
PDF
Migrating Fast to Solr
PPTX
Intro to Apache Lucene and Solr
PDF
Download full ebook of Apache Solr Search Patterns Jayant Kumar instant downl...
PDF
Parse.ly: Inside a modern RIA built with Solr
PDF
E-commerce Search Engine with Apache Lucene/Solr
PPTX
This Ain't Your Parents' Search Engine
PDF
Alfresco tech talk live on solr august 2012
PDF
Getting started faster with LucidWorks for Solr
KEY
Intro to Apache Solr for Drupal
PPTX
Open Source Search FTW
PDF
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
PPTX
Leveraging Solr and Mahout
PDF
Ease of use in Apache Solr
Solr site search makes shopping simple
Solr 101
Building Lanyrd
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr
Faster Cheaper Better-Replacing Oracle with Hadoop & Solr
In search of: A meetup about Liferay and Search 2016-04-20
Solr @ eBay Kleinanzeigen
Migrating Fast to Solr
Intro to Apache Lucene and Solr
Download full ebook of Apache Solr Search Patterns Jayant Kumar instant downl...
Parse.ly: Inside a modern RIA built with Solr
E-commerce Search Engine with Apache Lucene/Solr
This Ain't Your Parents' Search Engine
Alfresco tech talk live on solr august 2012
Getting started faster with LucidWorks for Solr
Intro to Apache Solr for Drupal
Open Source Search FTW
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Leveraging Solr and Mahout
Ease of use in Apache Solr
Ad

More from Lucidworks (Archived) (20)

PDF
Integrating Hadoop & Solr
PDF
The Data-Driven Paradigm
PDF
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
PDF
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
PPTX
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
PPTX
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
PPTX
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
PPTX
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
PPTX
What's new in solr june 2014
PPTX
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
PDF
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
PDF
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PPTX
Solr At AOL, Presented by Sean Timm at SolrExchage DC
PPTX
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
PPTX
Building a data driven search application with LucidWorks SiLK
PPTX
Introducing LucidWorks App for Splunk Enterprise webinar
PDF
Solr4 nosql search_server_2013
PPTX
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
PDF
Seeley yonik solr performance key innovations
Integrating Hadoop & Solr
The Data-Driven Paradigm
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
What's new in solr june 2014
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Building a data driven search application with LucidWorks SiLK
Introducing LucidWorks App for Splunk Enterprise webinar
Solr4 nosql search_server_2013
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Seeley yonik solr performance key innovations

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Getting Started with Data Integration: FME Form 101
DOCX
search engine optimization ppt fir known well about this
PPTX
Modernising the Digital Integration Hub
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
The various Industrial Revolutions .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
CloudStack 4.21: First Look Webinar slides
Chapter 5: Probability Theory and Statistics
Enhancing emotion recognition model for a student engagement use case through...
Benefits of Physical activity for teenagers.pptx
STKI Israel Market Study 2025 version august
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Developing a website for English-speaking practice to English as a foreign la...
sustainability-14-14877-v2.pddhzftheheeeee
Getting Started with Data Integration: FME Form 101
search engine optimization ppt fir known well about this
Modernising the Digital Integration Hub
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
NewMind AI Weekly Chronicles – August ’25 Week III
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
observCloud-Native Containerability and monitoring.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A novel scalable deep ensemble learning framework for big data classification...
A contest of sentiment analysis: k-nearest neighbor versus neural network
The various Industrial Revolutions .pptx

Morgan Floyd - Intuit's Live Community

  • 1. Floyd Morgan Floyd_Morgan@intuit.com @fmorgan Lucene Revolution, 2011
  • 2. Agenda •  About Me •  About Live Community •  Live Community Search •  NLP •  Next Steps •  Questions? Answers?
  • 3. About Me •  Principal Software Engineer at Intuit  
  • 4. Intuit QuickBase Intuit Inc. is a leading provider of business and financial management solutions for small and mid-sized businesses; financial institutions, including banks and credit unions; consumers and accounting professionals. More  than  200  applica0ons  and  7700  employees  worldwide.  
  • 5. About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering  
  • 6. TurboTax is the nation’s No. 1 rated, best-selling, do-it-yourself tax preparation software. TurboTax helps more than 20 million people a year. $1 billion in revenue
  • 7. About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine  
  • 8. About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine –  TurboTax Online
  • 9. About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community
  • 10. About Me •  Principal Software Engineer at Intuit •  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community •  Central Technology Organization –  Live Community Platform
  • 12. About Live Community •  It’s a user contribution system –  Q&A
  • 13. About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance
  • 14. About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users
  • 15. About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users •  We launched our Beta in 2007 –  TurboTax Online Home & Business
  • 16. About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users •  We launched our Beta in 2007 –  TurboTax Online Home & Business •  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ...
  • 17. About Live Community •  It’s a user contribution system –  Q&A •  It can be integrated into an application, contextually –  Page-to-page relevance •  We use social, technology and data –  To create our value proposition…assisting users •  We launched our Beta in 2007 –  TurboTax Online Home & Business •  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ... •  It’s a platform –  APIs, skinning, dynamic provisioning (AWS in progress)
  • 22. TurboTax Desktop & Online, US
  • 24. Consumers (in the millions)
  • 25. Contributors (in the thousands)
  • 26. Top Contributors (in the hundreds)
  • 28. Tax Season Officially begins on December 1 and ends on April 15.
  • 29. About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users
  • 30. About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone
  • 31. About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone •  Over 32 million pages views last tax season –  In-product views in the billions
  • 32. About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone •  Over 32 million pages views last tax season –  In-product views in the billions •  Over 750 thousand answered questions –  10 thousand questions asked on peak day
  • 33. About TurboTax Live Community •  Largest community –  150+ servers, 200 thousand concurrent users •  Over 23 million users have used the service –  Over 8 million last tax season alone •  Over 32 million pages views last tax season –  In-product views in the billions •  Over 750 thousand answered questions –  10 thousand questions asked on peak day •  Our contributors answers thousands of questions –  Top contributor – 70 thousand answers
  • 34. Demo
  • 35. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 37. Why Solr? •  Lots of features/functionality  
  • 38. Why Solr? •  Lots of features/functionality •  Ease of integration  
  • 39. Why Solr? •  Lots of features/functionality •  Ease of integration •  We can scale it independently  
  • 40. Why Solr? •  Lots of features/functionality •  Ease of integration •  We can scale it independently •  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!  
  • 41. Why Solr? •  Lots of features/functionality •  Ease of integration •  We can scale it independently •  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination! •  Search is really important –  Search everywhere…  
  • 42. Why Solr? •  Lots of features/functionality •  Ease of integration •  We can scale it independently •  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination! •  Search is really important –  Search everywhere…  
  • 43. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 48. Auto suggest •  Provides a glimpse of our vast content
  • 49. Auto suggest •  Provides a glimpse of our vast content •  facet query (Solr 1.2)
  • 50. Auto suggest •  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP…
  • 51. Auto suggest •  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP… •  It’s used on every search touch point
  • 52. Auto suggest •  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP… •  It’s used on every search touch point •  Second most frequent request
  • 53. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 56. In-product “mini” search •  Primary search interface for consumers  
  • 57. In-product “mini” search •  Primary search interface for consumers •  It appears integrated  
  • 58. In-product “mini” search •  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface  
  • 59. In-product “mini” search •  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface •  It makes all content available  
  • 60. In-product “mini” search •  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface •  It makes all content available •  Over 3 million users last tax season  
  • 61. # using Solr is easy!   require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, :filter_queries => "post_status: # {Post::ANSWERED}" )
  • 62. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 65. Web-site “full” search •  Primary search interface for contributors and employees
  • 66. Web-site “full” search •  Primary search interface for contributors and employees •  More real estate, more facets, more suggestions ...
  • 67. Web-site “full” search •  Primary search interface for contributors and employees •  More real estate, more facets, more suggestions ... •  Faceted search empowers development teams to narrow on issues
  • 68. Web-site “full” search •  Primary search interface for contributors and employees •  More real estate, more facets, more suggestions ... •  Faceted search empowers development teams to narrow on issues •  200+ TurboTax issues discovered last tax season
  • 71. # using Solr is easy!   require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( ”bug”, :filter_queries => "post_status: # {Post::OPEN}" )
  • 72. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 74. Instant answer •  Present similar answered question
  • 75. Instant answer •  Present similar answered question •  Search with the terms of the new question
  • 76. Instant answer •  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject
  • 77. Instant answer •  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject •  Show snippet of a recommended answer
  • 78. Instant answer •  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject •  Show snippet of a recommended answer •  Accidental A/B test
  • 79. Demo
  • 80. # using Solr is easy!   require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, { :query_fields => "subject", :filter_queries => "post_status: #{Post::ANSWERED}" } )
  • 81. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 83. Instant question •  Present similar unanswered questions
  • 84. Instant question •  Present similar unanswered questions •  Answer reuse
  • 85. Instant question •  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered question
  • 86. Instant question •  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered question •  Narrow the focus to the subject
  • 87. Instant question •  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered question •  Narrow the focus to the subject •  We also use a date filter
  • 88. “Aren’t  we  addicted   enough!”  
  • 89. Demo
  • 90. # using Solr is easy!   require 'solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) today = DateTime.now.at_beginning_of_day.utc.to_time date_from = 7.to_i.days.ago ( today ).getutc.iso8601 c.search( "how do i input 1099", { :query_fields => "subject", :filter_queries => "post_status: #{Post::OPEN} AND created_at_d:[#{date_from} TO *]" } )
  • 91. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 93. Answer bot •  We continue to search for you –  The day after you ask
  • 94. Answer bot •  We continue to search for you –  The day after you ask •  Send an email
  • 95. Answer bot •  We continue to search for you –  The day after you ask •  Send an email •  Runs for 7 days
  • 96. Answer bot •  We continue to search for you –  The day after you ask •  Send an email •  Runs for 7 days •  We only send another email if the results have changed
  • 97. Answer bot •  We continue to search for you –  The day after you ask •  Send an email •  Runs for 7 days •  We only send another email if the results have changed •  From our explicit feedback –  39% answered question
  • 99. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 101. Advertising •  We use our user generated content in advertising  
  • 102. Advertising •  We use our user generated content in advertising •  Has 300% higher click through rate than static banner ads  
  • 103. Advertising •  We use our user generated content in advertising •  Has 300% higher click through rate than static banner ads •  Ads displayed throughout the tax season on many ad networks  
  • 104. Advertising •  We use our user generated content in advertising •  Has 300% higher click through rate than static banner ads •  Ads displayed throughout the tax season on many ad networks •  Content selection is automated and continuous  
  • 106. Logs Logs Logs MapReduce Carrot2 Solr Heuristics
  • 107. <?xml version="1.0" encoding="UTF-8"?> 
 <lc_trending end_date="2011-05-21" include_popular="true" type="queries" duration="day"> 
 <topic> 
 <rank>1</rank> 
 <text>Ptp</text> 
 <post> 
 <post_id>aBHMBWxzar4lKMacfArRo0</post_id> 
 <subject>Final K-1 Disposition of PTP Units</subject> 
 <detail>I bought units in a PTP in five separate transactions in 2008; I sold all my units in five separate transactions in 2010. TT does not allow me to report all 5 transactions while stepping through the K-1 form -- these transactions are reported on Schedule D, but also need to be on Form 4797, Part II, Box 10. I can't seem to make the linkage work. I would appreciate some guidance on how to make this happen.</detail> 
 <response>OK, several steps needed for your situation:
 1) on the K-1 on the screen entitled Describe the Partnership Disposal, choose "Disposition was not via a sale"
 2) Then search for the topic "sale of business property" - you will be taked to a topic entitled "Any Other Property Sales?" - select the first option. Ove rthe next few screens here you will have the opportunityut to enter the sale amounts associated witht he Form 4797.
 
 3) then choose the topic on the income landing table for "Stocke, Mutual Funds, Bonds, other - here you will enter the rest of the sale, that portion attributable to capital gains.
 
 Hope this helps you,
 </response> 
 <viewsCount>60</viewsCount> 
 <answersCount>2</answersCount> 
 <asker>Xuxan</asker> 
 <display_post_url>https://ttlc.intuit.com/post/show_full/aBHMBWxzar4lKMacfArRo0? rmode=ad</display_post_url> 
 </post> 
  
  • 108. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 111. Search everywhere •  Search first, ask second –  Used to be ask first, search later or never!
  • 114. Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s
  • 115. Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s •  Search requests –  900 % increase
  • 116. Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s •  Search requests –  900 % increase •  Questions asked –  50 % decrease…is that good?
  • 117. Search everywhere •  Search first, ask second –  Used to be ask first, search later or never! •  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s •  Search requests –  900 % increase •  Questions asked –  50 % decrease…is that good? •  Increased consumption –  38% users, 43% content…very good!
  • 118. Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture
  • 119. Search cluster App server Indexing server Database cluster
  • 120. NLP •  Search is not enough…unfortunately
  • 121. NLP •  Search is not enough…unfortunately •  Our domain is noisy…ugly at times
  • 124. ?
  • 125. I wish NLP could help!
  • 126. NLP •  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works…
  • 127. HwO do iput 10 99 i don,t know what to do need help help me.
  • 128. Where do I enter a 1099?
  • 129. schema.xml <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 <analyzer type="query">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer> </fieldtype>
  
  • 130. dictionary <?xml version="1.0" encoding="US-ASCII"?>
 <dictionary>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="suitcas">suitcase</entry>
 <entry score="10" root="form" synonym="none" domain="ttlc" id="2210"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="xrai">x-ray</ entry>
 <entry score="10" root="none" synonym="townhom" domain="ttlc" id="townhous">townhouse</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="grosssal">gross sale</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="trinidad">Trinidad</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="home"></entry>
 <entry score="10" root="none" synonym="know" domain="ttlc" id="knew"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="massachusett">Massachusetts</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="denver">Denver</entry>
 <entry score="5" root="none" synonym="none" domain="ttlc" id="instead"></ entry>
 <entry score="10" root="none" synonym="unallow" domain="ttlc" id="disallow">not allowed</entry>
 <entry score="5" root="none" synonym="see" domain="ttlc" id="saw"></entry>
 
  
  • 131. regular expressions (many) if text =~ / any/ text.gsub!(/ any where /, ' anywhere ')
 text.gsub!(/ any(body| body| one) /, ' anyone ')
 text.gsub!(/ any( thing| things|things) /, ' anything ')
 text.gsub!(/ any(one|thing|where) else /, ' any1 ’) end if text =~ / don / text.gsub!(/ don i /, ' do not i ')
 text.gsub!(/ don (have|know|see|want) /, ' do not 1 ')
 text.gsub!(/ (are|be|have|is|was|were) don /, ' 1 done ’) text.gsub!(/ don (not|nt|t) /, ' do not ’) end
 text.gsub!(/ (do|can) (ai|ii) /, ' 1 i ’) text.gsub!(/ d (oyou|you) /, ' do you ')
 text.gsub!(/ (1|ai|ii|my) (did|do|had|have|was) /, ' i 2 ’) text.gsub!(/ crap{1,10} /, ' crap ’) text.gsub!(/ gr{1,} /, ' ') 
 

  • 132. Spell Checker Stemmer (Porter) Word Collocation Stop Phrase Correction Stop Word Removal Synonyms Substitution Tax Domain Correction Phrase Encoding
  • 133. # NLP is not easy!   # this class wraps our NLP sf = SemanticFilter.new # does it work? sf.act_on_post( "HwO do iput 10 99 i don,t know what to do need help help me." ) =>[" wheretoent 1099 ”] sf.act_on_post( "Where do I enter a 1099?" ) =>[" wheretoent 1099 ”]  
  • 134. NLP •  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works… •  It works well, but it’s not perfect
  • 135. “Stop guessing what I’m looking for!”
  • 136. NLP •  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works… •  It works well, but it’s not perfect •  Not just for search…
  • 138. Recommendations •  Deliver unanswered questions to contributors
  • 139. Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually
  • 140. Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually •  Based on past answering behavior
  • 141. Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually •  Based on past answering behavior •  Recommend a question to multiple contributors
  • 142. Recommendations •  Deliver unanswered questions to contributors •  Too much content to scan manually •  Based on past answering behavior •  Recommend a question to multiple contributors •  Uses Mahout machine learning library
  • 143. Answered Unanswered NLP NLP User Post vectors vectors Mahout Heuristics
  • 145. Next Steps •  We’re going to rewrite it!
  • 146. Next Steps •  We’re going to rewrite it! … most of it ;)
  • 147. Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing
  • 148. Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query
  • 149. Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query •  Social feedback – Page ranking
  • 150. Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query •  Social feedback – Page ranking •  Social dictionaries – Content classification
  • 151. Next Steps •  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query •  Social feedback – Page ranking •  Social dictionaries – Content classification •  Beer?!
  • 152. Thank  you.     Floyd_Morgan@intuit.com   @fmorgan  
  • 153. Appendix   •  User  search   •  SEO