SlideShare a Scribd company logo
Products Classification  Joyce Chan
Preliminary Knowledge the words  labeling ,  tagging ,  classification ,  categorization  are used   interchangeably the words  taxonomy, hierarchy  are used   interchangeably facet : is one of the path of the hierachy static taxonomy : products are manually, or editorially mapped to the hierarchy dynamic taxonomy : mapping of product to hierarchy generated by the system automatically, no help from people document : a commonly used term when talking about searching here, we are refering to a product, plus all the metadata that are associated with the product ie.  document  Beatrice homo milk , it's metadata or attributes can be: title is Beatrice homo milk it is a type of recipe ingredient description is that it's tasty and rich & creamy made by the Beatrice company it's price is $3.60 per bag, etc it is on sale at the Oakville location Loblaws users highly recommend buying this milk it's image name is b-milk.jpg
Classification: Different strategies No Classification products are not tagged to anything Single level classification single level static taxonomy Multi level classification 1 dimensional static taxonomy applying tree breakdown hybrid of one dimensional & single level static taxonomy w/ Jeremy's tree breakdown multi-dimensional static taxonomy applying Jeremy's tree breakdown dynamic taxonomy w/ supervised extraction of facet from annotated text documents dynamic taxonomy w/ unsupervised extraction of facets static taxonomy w/ dataproviders' labels & unsupervised extraction of facets
No tagging / No Classification all products are to be directly retrieved through sql or search engine queries  we assume users can find relevant information quickly with no further assistance Pros simple to implement, this is done already as we have a product database Cons with a large product database, it is confusing to users ie. users search milk, many types of results are returned, they may have to flip through a few pages before finding what they need
The case for having a products taxonomy Pros helps people find & explore what they are looking for in website and concierge device if they cannot  quickly  find it though directly searching users have became used to e-commerce interfaces with product taxonomy
The case for static/editorially classified taxonomy Pros highly mappable to  product shelfing , kind of like the dewey classification system for the library Cons a lot of manual labor effort to maintain the classification structure that we provide, since we have thousands and thousands of products and hope to expand our product database in the future
Single Level Static Taxonomy - only labeling / tagging each product has  one  label ie. Beatrice brand homo milk <= 'dairy' Pros provided by Gladson already, very straightforward to implement Cons not incredibly descriptive, not useful to users (customers, managers, inventory staff, or us) 
One dimensional (one path) static Taxonomy with fixed levels of classifications     here there is  a path  from the root - department down to product for instance Beatrice homo milk is classified as dairy, milk, homo, upc=1234567890 Pros easy to implement everything classified under standard number of level of concepts improves searching quite a bit Cons not allowing a product to be classified in multiple 'classes' labour intensive to editorially edit product - classifications
The case for dynamic taxonomy Pros cheap to have the computer place the products on the taxonomy by itself every time we add a new product to the database  Cons we're probably going to be applying a fairly complex taxonomy scheme, such as Amazon's some possible implementational challenges, such as the correct use of machine learning libraries
Dynamic taxonomy predefined w/ a fixed product db & supervised facet extraction from collections of text  annotated  items we would have a predefined taxonomy, with some data already mapped under it when a new item appears that has not mapped to the base taxonomy, use of machine learning algorithms to put it in the correct place Pros completely automated classification with Amazon, because it's the most feature complete grocery multi-leveled taxonomy that I found (Tesco being another good one, but it's n/a right now) Cons new types of facets cannot be discovered, because we're using the predefined one
Dynamic Taxonomy & unsupervised facet extraction for collections of text documents no prior facets to begin with, algorithm will build taxonomy all by itself  usually used on things like unclassified articles, etc algorithm for each item in products collection, identify which term is important for each important term, query 1+ external resources & get contextual terms that appear in the result.  Add retrieved terms to the original document as part of its meta-data, now it is a context-aware document analyze the frequency of the terms, both in the original collection & the expanded collection to identify the candidate facet terms pros new facet keywords can be created and automatically inserted into the taxonomy with no human intervention cons for each step in above algo, we need to use a ML algorithm hard (for our company) to evaluate recall & precision given our small and non-standardized set of data
Hybrid: i) Dynamic taxonomy w/ a fixed hierarchy& supervised facet extraction + ii) social tagging (aka. folksonomy) we see that unsupervised learning is not suitable for our dataset, therefore I propose the use of a hybrid scheme to enable taxonomy creation we can use our dynamic taxonomy scheme and also allow users to create new facet keywords, but maybe only the moderator can add the the new keyword into the taxonomy the rest of the tags are just freely floating outside of the taxonomy ie:  http://www.amazon.com/gp/product/tags-on-product/B001EO5XTO/ref=tag_dpp_cust_edpp_sa Amazon had allowed their customers to create their own tags of the product that is helpful for their own purposes possibly to even merge our tags with facebook http://techcrunch.com/2010/07/27/amazon-now-taps-into-facebook-for-social-product-recommendations/ Pros : possibly more useful to shoppers for them to remember their own stuff Cons:  we'd have to get comfortable with having a plethora of tags not necessarily related to each other
Hybrid: 4 level (Jeremy's) taxonomy creation, w/ Gladson or GS1 labels and unsupervised facet extraction Pro sounds the closes to what we're trying to accomplish possible extensions with social tagging as well works kind of well w/ shelfing Cons not as richly descriptive due to having only a fewer levels on the taxonomy since taxonomy is confined to a certain number of levels, I don't really know how to implement this right now (I can research)

More Related Content

PPTX
Taiwan rice challenge
PPTX
Th2_Integrating Physiology, Crop Modeling and Genetics
PPT
1175 System of Rice Intensification (SRI) Sistema Intensivo de Cultivo Arroc...
PPTX
Solar radiation in rice ppt
PPT
Quality analysis in rice
PPTX
Rice mill production process
PPTX
Anatomy of stem,root and leaf of rice
DOCX
The origin of rice
Taiwan rice challenge
Th2_Integrating Physiology, Crop Modeling and Genetics
1175 System of Rice Intensification (SRI) Sistema Intensivo de Cultivo Arroc...
Solar radiation in rice ppt
Quality analysis in rice
Rice mill production process
Anatomy of stem,root and leaf of rice
The origin of rice

Viewers also liked (7)

PPT
1 Origin of Rice in the world
PPT
Introduction to principles of plant physiology
PPT
Rice.pptx
PPT
Rice lecture final
PPTX
The Origin of Rice (Tagalog Version)
PDF
PPTX
Presentation on morphology of rice plant
1 Origin of Rice in the world
Introduction to principles of plant physiology
Rice.pptx
Rice lecture final
The Origin of Rice (Tagalog Version)
Presentation on morphology of rice plant
Ad

Similar to Classifying (20)

PPT
Hybrid Approaches to Taxonomy & Folksonmy
PPT
Aiim motorola-taxo-integration-03-15-10-cg
PPT
Some thoughts on social tagging
PPTX
Taxonomy 101
PPT
Tag Camp 2005
PDF
Taxonomy folksonomy cookbook
PPTX
Advanced Taxonomy for Content Strategists
PDF
Shaping Content with the Deep Magic of Taxonomy
PPT
Sharepoint & Taxonomy
PPTX
Personalizing Content Using Taxonomy with Megan Gilhooly, Vice President Cust...
PPTX
Understanding Website Taxonomy
PDF
Using Tags & Taxonomies to super charge your eCommerce SEO
PDF
Hybrid Approaches to Taxonomy and Folksonomy 11-20-09
PPT
Semantic Technology 2009: Hybrid Approaches to Taxonomy and Folksonomy
PPTX
Linear books to open world adventure
PDF
Taxonomy: a powerful magnifier with a harsh lens
PPT
Classification, Tagging & Search
PPTX
Creating a Future Proof Taxonomy
PDF
Managing Taxonomy Tagging
PPTX
Three flavours of taxonomy tools Joyce van Aalten
Hybrid Approaches to Taxonomy & Folksonmy
Aiim motorola-taxo-integration-03-15-10-cg
Some thoughts on social tagging
Taxonomy 101
Tag Camp 2005
Taxonomy folksonomy cookbook
Advanced Taxonomy for Content Strategists
Shaping Content with the Deep Magic of Taxonomy
Sharepoint & Taxonomy
Personalizing Content Using Taxonomy with Megan Gilhooly, Vice President Cust...
Understanding Website Taxonomy
Using Tags & Taxonomies to super charge your eCommerce SEO
Hybrid Approaches to Taxonomy and Folksonomy 11-20-09
Semantic Technology 2009: Hybrid Approaches to Taxonomy and Folksonomy
Linear books to open world adventure
Taxonomy: a powerful magnifier with a harsh lens
Classification, Tagging & Search
Creating a Future Proof Taxonomy
Managing Taxonomy Tagging
Three flavours of taxonomy tools Joyce van Aalten
Ad

Classifying

  • 2. Preliminary Knowledge the words  labeling ,  tagging ,  classification ,  categorization  are used   interchangeably the words taxonomy, hierarchy  are used   interchangeably facet : is one of the path of the hierachy static taxonomy : products are manually, or editorially mapped to the hierarchy dynamic taxonomy : mapping of product to hierarchy generated by the system automatically, no help from people document : a commonly used term when talking about searching here, we are refering to a product, plus all the metadata that are associated with the product ie.  document Beatrice homo milk , it's metadata or attributes can be: title is Beatrice homo milk it is a type of recipe ingredient description is that it's tasty and rich & creamy made by the Beatrice company it's price is $3.60 per bag, etc it is on sale at the Oakville location Loblaws users highly recommend buying this milk it's image name is b-milk.jpg
  • 3. Classification: Different strategies No Classification products are not tagged to anything Single level classification single level static taxonomy Multi level classification 1 dimensional static taxonomy applying tree breakdown hybrid of one dimensional & single level static taxonomy w/ Jeremy's tree breakdown multi-dimensional static taxonomy applying Jeremy's tree breakdown dynamic taxonomy w/ supervised extraction of facet from annotated text documents dynamic taxonomy w/ unsupervised extraction of facets static taxonomy w/ dataproviders' labels & unsupervised extraction of facets
  • 4. No tagging / No Classification all products are to be directly retrieved through sql or search engine queries  we assume users can find relevant information quickly with no further assistance Pros simple to implement, this is done already as we have a product database Cons with a large product database, it is confusing to users ie. users search milk, many types of results are returned, they may have to flip through a few pages before finding what they need
  • 5. The case for having a products taxonomy Pros helps people find & explore what they are looking for in website and concierge device if they cannot quickly find it though directly searching users have became used to e-commerce interfaces with product taxonomy
  • 6. The case for static/editorially classified taxonomy Pros highly mappable to product shelfing , kind of like the dewey classification system for the library Cons a lot of manual labor effort to maintain the classification structure that we provide, since we have thousands and thousands of products and hope to expand our product database in the future
  • 7. Single Level Static Taxonomy - only labeling / tagging each product has one label ie. Beatrice brand homo milk <= 'dairy' Pros provided by Gladson already, very straightforward to implement Cons not incredibly descriptive, not useful to users (customers, managers, inventory staff, or us) 
  • 8. One dimensional (one path) static Taxonomy with fixed levels of classifications     here there is a path from the root - department down to product for instance Beatrice homo milk is classified as dairy, milk, homo, upc=1234567890 Pros easy to implement everything classified under standard number of level of concepts improves searching quite a bit Cons not allowing a product to be classified in multiple 'classes' labour intensive to editorially edit product - classifications
  • 9. The case for dynamic taxonomy Pros cheap to have the computer place the products on the taxonomy by itself every time we add a new product to the database  Cons we're probably going to be applying a fairly complex taxonomy scheme, such as Amazon's some possible implementational challenges, such as the correct use of machine learning libraries
  • 10. Dynamic taxonomy predefined w/ a fixed product db & supervised facet extraction from collections of text annotated items we would have a predefined taxonomy, with some data already mapped under it when a new item appears that has not mapped to the base taxonomy, use of machine learning algorithms to put it in the correct place Pros completely automated classification with Amazon, because it's the most feature complete grocery multi-leveled taxonomy that I found (Tesco being another good one, but it's n/a right now) Cons new types of facets cannot be discovered, because we're using the predefined one
  • 11. Dynamic Taxonomy & unsupervised facet extraction for collections of text documents no prior facets to begin with, algorithm will build taxonomy all by itself  usually used on things like unclassified articles, etc algorithm for each item in products collection, identify which term is important for each important term, query 1+ external resources & get contextual terms that appear in the result.  Add retrieved terms to the original document as part of its meta-data, now it is a context-aware document analyze the frequency of the terms, both in the original collection & the expanded collection to identify the candidate facet terms pros new facet keywords can be created and automatically inserted into the taxonomy with no human intervention cons for each step in above algo, we need to use a ML algorithm hard (for our company) to evaluate recall & precision given our small and non-standardized set of data
  • 12. Hybrid: i) Dynamic taxonomy w/ a fixed hierarchy& supervised facet extraction + ii) social tagging (aka. folksonomy) we see that unsupervised learning is not suitable for our dataset, therefore I propose the use of a hybrid scheme to enable taxonomy creation we can use our dynamic taxonomy scheme and also allow users to create new facet keywords, but maybe only the moderator can add the the new keyword into the taxonomy the rest of the tags are just freely floating outside of the taxonomy ie:  http://www.amazon.com/gp/product/tags-on-product/B001EO5XTO/ref=tag_dpp_cust_edpp_sa Amazon had allowed their customers to create their own tags of the product that is helpful for their own purposes possibly to even merge our tags with facebook http://techcrunch.com/2010/07/27/amazon-now-taps-into-facebook-for-social-product-recommendations/ Pros : possibly more useful to shoppers for them to remember their own stuff Cons:  we'd have to get comfortable with having a plethora of tags not necessarily related to each other
  • 13. Hybrid: 4 level (Jeremy's) taxonomy creation, w/ Gladson or GS1 labels and unsupervised facet extraction Pro sounds the closes to what we're trying to accomplish possible extensions with social tagging as well works kind of well w/ shelfing Cons not as richly descriptive due to having only a fewer levels on the taxonomy since taxonomy is confined to a certain number of levels, I don't really know how to implement this right now (I can research)