SlideShare a Scribd company logo
Crab
                A Python Framework for Building
                    Recommendation Engines
                       Scipy 2011, Austin TX


Marcel Caraciolo Ricardo Caspirro             Bruno Melo
   @marcelcaraciolo       @ricardocaspirro        @brunomelo

                                                               1
What is Crab ?

 A python framework for building recommendation engines
A Scikit module for collaborative, content and hybrid filtering
       Mahout Alternative for Python Developers :D
             Open-Source under the BSD license


             https://github.com/muricoca/crab




                                                                 2
When started ?

It began one year ago
Community-driven, 4 members
Since April,2011 the open-source labs Muriçoca incorporated it
Since April,2011 rewritting it as Scikit




                https://github.com/muricoca/
                                                             3
Knowing Scikits
Scikits are Scipy Toolkits - independent and projects hosted
                under a common namespace.


                       Scikits Image
                     Scikits MlabWrap
                     Scikits AudioLab
                      Scikit Learn
                             ....

           http://scikits.appspot.com/scikits



                                                               4
Knowing Scikits

                        Scikit-Learn

    Machine Learning Algorithms + scientific Python packages
                (Numpy, Scipy and Matplotlib)

           http://scikit-learn.sourceforge.net/


Our goal: Incorporate the Crab as Scikit and incorporate
           some parts of them at Scikit-learn


                                                              5
Why Recommendations ?
The world is an over-crowded place
 !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#




                                      6
Why Recommendations
     * +,&-.$/).#&0#/"1.#$%234(".#                   ?
       $/)#5(&6 7&.2.#"$4,#)$8
                   We are overloaded
     * 93((3&/.#&0#:&'3".;#5&&<.#
         $/)#:-.34#2%$4<.#&/(3/"
Thousands of news articles and blog posts each day
       * =/#>$/&3;#?#@A#+B#4,$//"(.;#
          2,&-.$/).#&0#7%&6%$:.#
 Millions of movies, books and music tracks online
          "$4,#)$8
          Several Places, Offers and Events

     * =/#C"1#D&%<;#."'"%$(#
  Even Friends sometimes we are overloaded !

         2,&-.$/).#&0#$)#:"..$6".#
         ."/2#2&#-.#7"%#)$8




                                                         7
Why Recommendations ?
We really need and consume only a few of them!

   “A lot of times, people don’t know what
   they want until you show it to them.”
                                         Steve Jobs

  “We are leaving the Information age, and
  entering into the Recommendation age.”
                      Chris Anderson, from book Long Tail



                                                            8
Why Recommendations ?
Can Google help ?
  Yes, but only when we really know what we are looking for
           But, what’s does it mean by “interesting” ?
Can Facebook help ?
  Yes, I tend to find my friends’ stuffs interesting
   What if i had only few friends and what they like do not always
                             attract me ?
Can experts help ?
  Yes, but it won’t scale well.
    But it is what they like, not me! Exactly same advice!


                                                                     9
Why Recommendations ?
         Recommendation Systems
Systems designed to recommend to me something I may like




                                                           10
Why Recommendations ?
     !"#$%&"'$"'(')*#*+,)
     Recommendation Systems

      -+*#)+.               -#/')             0#)1#




                                    !
2'              23&4"+')1               5,6           7),*%'"&863


                      Graph Representation




                                                                    11
The current Crab

Collaborative Filtering algorithms
  User-Based, Item-Based and Slope One

Evaluation of the Recommender Algorithms
 Precision, Recall, F1-Score, RMSE




                           Precision-Recall Charts

                                                     12
The current Crab




   Precision-Recall Charts

                             13
The current Crab




                   14
The current Crab




Using REST APIs to deploy the recommender
          django-piston, django-rest, django-tastypie




                                                        15
Crab is already in production

  Brazilian Social Network called Atepassar.com
        Educational network with more than 60.000 students and 3000 video-classes




     Running on Python
    + Numpy + Scipy and
          Django


Backend for Recommendations
MongoDB - mongoengine

   Daily Recommendations
    with Explanations



                                                                                    16
Evaluating your recommender
 Crab implements the most used recommender metrics.
     Precision, Recall, F1-Score, RMSE



     Using matplotlib
     for a plotter utility

 Implement new metrics

Simulations support maybe (??)




                                                  17
Evaluating your recommender
All you have to do is implement your Evaluator




                                                 18
Distributing the recommendation computations


Use Hadoop and Map-Reduce intensively
  Investigating the Yelp mrjob framework   https://github.com/pfig/mrjob



Develop the Netflix and novel standard-of-the-art used
   Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines



The most commonly used is Slope One technique.



                                                                                 19
Why migrate ?
Old Crab running only using Pure Python
     Recommendations demand heavy maths calculations and lots of processing

Compatible with Numpy and Scipy libraries
   High Standard and popular scientific libraries optimized for scientific calculations in Python

Scikits projects are amazing!
    Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

Turn the Crab framework visible for the community
 Join the scientific researchers and machine learning developers around the Globe coding with
                                 Python to help us in this project


                              Be Fast and Furious

                                                                                                  20
Why migrate ?



Numpy optimized with PyPy

     2.x - 48.x Faster



  http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html




                                                                               21
How are we working ?
            Sprints, Online Discussions and Issues




https://github.com/muricoca/crab/wiki/UpcomingEvents

                                                       22
Future Releases
        Planned Release 0.1
   Collaborative Filtering Algorithms working, sample datasets to load and test


        Planned Release 0.11
       Evaluation of Recommendation Algorithms and Database Models support


        Planned Release 0.12
   Recommendation as Services with REST APIs




....



                                                                                  23
Join us!

1. Read our Wiki Page
    https://github.com/muricoca/crab/wiki/Developer-Resources

2. Check out our current sprints and open issues
    https://github.com/muricoca/crab/issues

3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
                     discussion list
                http://groups.google.com/group/scikit-crab




                                                                24
Crab
              A Python Framework for Building
                  Recommendation Engines

           https://github.com/muricoca/crab

Marcel Caraciolo Ricardo Caspirro                            Bruno Melo
   @marcelcaraciolo           @ricardocaspirro                 @brunomelo

                      {marcel, ricardo,bruno}@muricoca.com

                                                                            25

More Related Content

PDF
Mining Scipy Lectures
KEY
Crab - A Python Framework for Building Recommendation Systems
PDF
Construindo Sistemas de Recomendação com Python
PDF
Doonish
PDF
Crab: A Python Framework for Building Recommender Systems
PDF
Content Recommendation Based on Data Mining in Adaptive Social Networks
PDF
Computação Científica com Python, Numpy e Scipy
PDF
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Mining Scipy Lectures
Crab - A Python Framework for Building Recommendation Systems
Construindo Sistemas de Recomendação com Python
Doonish
Crab: A Python Framework for Building Recommender Systems
Content Recommendation Based on Data Mining in Adaptive Social Networks
Computação Científica com Python, Numpy e Scipy
Recommender Systems with Ruby (adding machine learning, statistics, etc)

Viewers also liked (19)

PDF
Imerior Crab Meat
KEY
Sistemas de Recomendação e Inteligência Coletiva
PPT
Priyasha Rocky Shores
PPT
Lobster and crab fisheries in INDIA
PDF
Crab Bank Project
PPTX
Mud crab farming tarang shah
PPTX
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
PDF
Seed production mudcrab
PDF
Mud crab farming in India
PPTX
Mud crab
PPT
Overview Breeding And Seed Production
PPT
Crab Power Point
PPT
香港六合彩
PDF
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
PPTX
Working away from the office: Benefits and drawbacks
PPT
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
KEY
Presentatie 2 Maart
PPTX
Describing exercise
PPT
Scmad Chapter09
Imerior Crab Meat
Sistemas de Recomendação e Inteligência Coletiva
Priyasha Rocky Shores
Lobster and crab fisheries in INDIA
Crab Bank Project
Mud crab farming tarang shah
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Seed production mudcrab
Mud crab farming in India
Mud crab
Overview Breeding And Seed Production
Crab Power Point
香港六合彩
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Working away from the office: Benefits and drawbacks
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Presentatie 2 Maart
Describing exercise
Scmad Chapter09
Ad

Similar to Introduction to Crab - Python Framework for Building Recommender Systems (20)

PDF
Python on Science ? Yes, We can.
PDF
Singularity Registry HPC
PDF
Keynote at Converge 2019
PDF
Walter api
PPTX
Intro to Python Data Analysis in Wakari
PDF
GitOps Core Concepts & Ways of Structuring Your Repos
PPTX
A Year of Pyxley: My First Open Source Adventure
PDF
Introduction to python
PDF
We are Digital Puppets
PDF
PyData NYC by Akira Shibata
PDF
Qcon beijing 2010
KEY
Importance of Developers to HE in the UK
PDF
The quality of the python ecosystem - and how we can protect it!
KEY
Release management with NuGet/Chocolatey/JIRA
PDF
High Performance Python 2nd Edition Micha Gorelick Ian Ozsvald
PPTX
JustEnoughDevOpsForDataScientists
PDF
SciPy 2025 - Packaging a Scientific Python Project
PDF
A Whirlwind Tour Of Python
PDF
Collaborations in the Extreme: 
The rise of open code development in the scie...
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
Python on Science ? Yes, We can.
Singularity Registry HPC
Keynote at Converge 2019
Walter api
Intro to Python Data Analysis in Wakari
GitOps Core Concepts & Ways of Structuring Your Repos
A Year of Pyxley: My First Open Source Adventure
Introduction to python
We are Digital Puppets
PyData NYC by Akira Shibata
Qcon beijing 2010
Importance of Developers to HE in the UK
The quality of the python ecosystem - and how we can protect it!
Release management with NuGet/Chocolatey/JIRA
High Performance Python 2nd Edition Micha Gorelick Ian Ozsvald
JustEnoughDevOpsForDataScientists
SciPy 2025 - Packaging a Scientific Python Project
A Whirlwind Tour Of Python
Collaborations in the Extreme: 
The rise of open code development in the scie...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Ad

More from Marcel Caraciolo (20)

PDF
Como interpretar seu próprio genoma com Python
PDF
Joblib: Lightweight pipelining for parallel jobs (v2)
PDF
Construindo softwares de bioinformática para análises clínicas : Desafios e...
PDF
Como Python ajudou a automatizar o nosso laboratório v.2
PDF
Como Python pode ajudar na automação do seu laboratório
PDF
Oficina Python: Hackeando a Web com Python 3
PDF
Opensource - Como começar e dá dinheiro ?
PDF
Big Data com Python
PDF
Benchy, python framework for performance benchmarking of Python Scripts
PDF
Python e 10 motivos por que devo conhece-la ?
PDF
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
PDF
Benchy: Lightweight framework for Performance Benchmarks
PDF
Python, A pílula Azul da programação
PDF
Construindo Soluções Científicas com Big Data & MapReduce
PDF
Como Python está mudando a forma de aprendizagem à distância no Brasil
PDF
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
PDF
Aula WebCrawlers com Regex - PyCursos
PDF
Arquivos Zip com Python - Aula PyCursos
PDF
PyFoursquare: Python Library for Foursquare
PDF
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Como interpretar seu próprio genoma com Python
Joblib: Lightweight pipelining for parallel jobs (v2)
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python pode ajudar na automação do seu laboratório
Oficina Python: Hackeando a Web com Python 3
Opensource - Como começar e dá dinheiro ?
Big Data com Python
Benchy, python framework for performance benchmarking of Python Scripts
Python e 10 motivos por que devo conhece-la ?
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
Benchy: Lightweight framework for Performance Benchmarks
Python, A pílula Azul da programação
Construindo Soluções Científicas com Big Data & MapReduce
Como Python está mudando a forma de aprendizagem à distância no Brasil
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Aula WebCrawlers com Regex - PyCursos
Arquivos Zip com Python - Aula PyCursos
PyFoursquare: Python Library for Foursquare
Sistemas de Recomendação: Como funciona e Onde Se aplica?

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation theory and applications.pdf
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A comparative analysis of optical character recognition models for extracting...
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25-Week II
gpt5_lecture_notes_comprehensive_20250812015547.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Introduction to Crab - Python Framework for Building Recommender Systems

  • 1. Crab A Python Framework for Building Recommendation Engines Scipy 2011, Austin TX Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo 1
  • 2. What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license https://github.com/muricoca/crab 2
  • 3. When started ? It began one year ago Community-driven, 4 members Since April,2011 the open-source labs Muriçoca incorporated it Since April,2011 rewritting it as Scikit https://github.com/muricoca/ 3
  • 4. Knowing Scikits Scikits are Scipy Toolkits - independent and projects hosted under a common namespace. Scikits Image Scikits MlabWrap Scikits AudioLab Scikit Learn .... http://scikits.appspot.com/scikits 4
  • 5. Knowing Scikits Scikit-Learn Machine Learning Algorithms + scientific Python packages (Numpy, Scipy and Matplotlib) http://scikit-learn.sourceforge.net/ Our goal: Incorporate the Crab as Scikit and incorporate some parts of them at Scikit-learn 5
  • 6. Why Recommendations ? The world is an over-crowded place !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0# 6
  • 7. Why Recommendations * +,&-.$/).#&0#/"1.#$%234(".# ? $/)#5(&6 7&.2.#"$4,#)$8 We are overloaded * 93((3&/.#&0#:&'3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Thousands of news articles and blog posts each day * =/#>$/&3;#?#@A#+B#4,$//"(.;# 2,&-.$/).#&0#7%&6%$:.# Millions of movies, books and music tracks online "$4,#)$8 Several Places, Offers and Events * =/#C"1#D&%<;#."'"%$(# Even Friends sometimes we are overloaded ! 2,&-.$/).#&0#$)#:"..$6".# ."/2#2&#-.#7"%#)$8 7
  • 8. Why Recommendations ? We really need and consume only a few of them! “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long Tail 8
  • 9. Why Recommendations ? Can Google help ? Yes, but only when we really know what we are looking for But, what’s does it mean by “interesting” ? Can Facebook help ? Yes, I tend to find my friends’ stuffs interesting What if i had only few friends and what they like do not always attract me ? Can experts help ? Yes, but it won’t scale well. But it is what they like, not me! Exactly same advice! 9
  • 10. Why Recommendations ? Recommendation Systems Systems designed to recommend to me something I may like 10
  • 11. Why Recommendations ? !"#$%&"'$"'(')*#*+,) Recommendation Systems -+*#)+. -#/') 0#)1# ! 2' 23&4"+')1 5,6 7),*%'"&863 Graph Representation 11
  • 12. The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Slope One Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall Charts 12
  • 13. The current Crab Precision-Recall Charts 13
  • 15. The current Crab Using REST APIs to deploy the recommender django-piston, django-rest, django-tastypie 15
  • 16. Crab is already in production Brazilian Social Network called Atepassar.com Educational network with more than 60.000 students and 3000 video-classes Running on Python + Numpy + Scipy and Django Backend for Recommendations MongoDB - mongoengine Daily Recommendations with Explanations 16
  • 17. Evaluating your recommender Crab implements the most used recommender metrics. Precision, Recall, F1-Score, RMSE Using matplotlib for a plotter utility Implement new metrics Simulations support maybe (??) 17
  • 18. Evaluating your recommender All you have to do is implement your Evaluator 18
  • 19. Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjob Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. 19
  • 20. Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and Furious 20
  • 21. Why migrate ? Numpy optimized with PyPy 2.x - 48.x Faster http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html 21
  • 22. How are we working ? Sprints, Online Discussions and Issues https://github.com/muricoca/crab/wiki/UpcomingEvents 22
  • 23. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Evaluation of Recommendation Algorithms and Database Models support Planned Release 0.12 Recommendation as Services with REST APIs .... 23
  • 24. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list http://groups.google.com/group/scikit-crab 24
  • 25. Crab A Python Framework for Building Recommendation Engines https://github.com/muricoca/crab Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo {marcel, ricardo,bruno}@muricoca.com 25