SlideShare a Scribd company logo
Generic Framework  for Knowledge Classification-1
2
Generic Framework For Knowledge
Classification
By
Venkata Vineel
3
Agenda
•  Introduction
•  Problem at Hand
•  How is it solved ?
•  Challenges
•  Skills and Career alignment
•  Q & A
4
Introduction
•  Masters in Computer Science
University of Utah, SaltLakeCity, UT
•  Systems Engineering Intern
Internal tools team - Knowledge Management
Interests:
Scalability challenges, Machine Learning and Visualization.
5
Problem at Hand
•  Generic Framework for classifying knowledge
•  Classifying questions in Answer Hub
6
How did I solve ??
•  Developed an generic algorithm.
•  Answer Hub Knowledge Base that learns.
7
Project High Points
•  72 % percent accuracy has been achieved.
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
1
3
5
7
9
11
13
15
17
19
21
23
Rank Statastics
No of Questions RANK CATEGORIES
8
Confusion matrix
Categories	
   V3	
   GBX	
   C3	
   Hadoop	
   BES	
   DAL	
   Raptor	
   Stratus	
   Security	
  Pla>orm	
   General	
   User	
  Tracking	
   ExperimentaEon	
   Service	
  Frameworks	
   Search	
  Services	
   Sherlock	
   Batch	
  Frameword	
   Trinity	
   Commerce	
  OS	
  	
   Teradata	
   AnalyEcs	
  Pla>orm	
   Total	
  
V3	
   1552	
   2	
   1	
   2	
   6	
   263	
   217	
   3	
   23	
   455	
   2	
   41	
   290	
   9	
   3	
   6	
   0	
   0	
   0	
   0	
   2875	
  
GBX	
   1	
   68	
   0	
   0	
   0	
   6	
   37	
   0	
   1	
   9	
   1	
   26	
   4	
   8	
   0	
   0	
   0	
   1	
   0	
   0	
   162	
  
C3	
   0	
   0	
   318	
   1	
   1	
   25	
   27	
   54	
   5	
   32	
   1	
   6	
   1	
   4	
   0	
   1	
   0	
   1	
   1	
   0	
   478	
  
Hadoop	
   0	
   0	
   2	
   173	
   1	
   10	
   8	
   0	
   0	
   20	
   1	
   3	
   4	
   0	
   3	
   0	
   0	
   0	
   0	
   0	
   225	
  
BES	
   11	
   0	
   0	
   0	
   300	
   59	
   39	
   1	
   0	
   5	
   0	
   1	
   22	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   438	
  
DAL	
   67	
   0	
   1	
   0	
   3	
   2307	
   89	
   0	
   2	
   16	
   0	
   13	
   99	
   5	
   0	
   1	
   0	
   0	
   0	
   0	
   2603	
  
Raptor	
   11	
   10	
   5	
   2	
   25	
   396	
   5352	
   3	
   62	
   212	
   26	
   184	
   337	
   25	
   6	
   17	
   0	
   0	
   1	
   0	
   6674	
  
Stratus	
   1	
   0	
   82	
   2	
   1	
   40	
   188	
   435	
   4	
   40	
   0	
   13	
   6	
   0	
   2	
   1	
   0	
   1	
   0	
   0	
   816	
  
Security	
  Pla>orm	
   4	
   0	
   0	
   0	
   0	
   32	
   38	
   0	
   174	
   11	
   0	
   6	
   129	
   1	
   0	
   1	
   0	
   0	
   0	
   0	
   396	
  
General	
   100	
   2	
   12	
   15	
   6	
   129	
   258	
   16	
   13	
   1200	
   3	
   88	
   64	
   29	
   4	
   3	
   0	
   0	
   5	
   0	
   1947	
  
User	
  Tracking	
   3	
   0	
   0	
   1	
   0	
   16	
   43	
   0	
   3	
   8	
   126	
   41	
   10	
   1	
   0	
   0	
   0	
   0	
   0	
   0	
   252	
  
ExperimentaEon	
   1	
   1	
   0	
   0	
   0	
   27	
   40	
   0	
   1	
   8	
   0	
   868	
   29	
   1	
   0	
   0	
   0	
   0	
   3	
   0	
   979	
  
Service	
  Frameworks	
   124	
   3	
   0	
   0	
   6	
   90	
   299	
   2	
   67	
   83	
   0	
   56	
   1977	
   38	
   5	
   3	
   0	
   11	
   0	
   0	
   2764	
  
Search	
  Services	
   0	
   1	
   1	
   0	
   1	
   5	
   9	
   1	
   2	
   8	
   0	
   4	
   32	
   163	
   0	
   0	
   0	
   0	
   0	
   0	
   227	
  
Sherlock	
   2	
   0	
   0	
   4	
   0	
   67	
   31	
   2	
   0	
   17	
   0	
   29	
   19	
   0	
   85	
   0	
   0	
   0	
   0	
   0	
   256	
  
Batch	
  Frameword	
   11	
   0	
   0	
   2	
   2	
   100	
   92	
   2	
   2	
   10	
   0	
   2	
   22	
   0	
   0	
   67	
   0	
   0	
   1	
   0	
   313	
  
Trinity	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   4	
   1	
   1	
   0	
   0	
   0	
   0	
   0	
   0	
   6	
  
Commerce	
  OS	
  	
   0	
   0	
   0	
   0	
   0	
   10	
   48	
   0	
   4	
   15	
   0	
   14	
   15	
   8	
   0	
   0	
   0	
   103	
   0	
   0	
   217	
  
Teradata	
   0	
   0	
   1	
   1	
   0	
   10	
   0	
   0	
   0	
   0	
   1	
   16	
   2	
   1	
   0	
   1	
   0	
   0	
   49	
   0	
   82	
  
AnalyEcs	
  Pla>orm	
   0	
   0	
   1	
   1	
   0	
   5	
   1	
   0	
   1	
   23	
   1	
   14	
   0	
   3	
   1	
   0	
   0	
   0	
   1	
   11	
   63	
  
Total	
   1888	
   87	
   424	
   204	
   352	
   3597	
   6816	
   519	
   364	
   2172	
   162	
   1429	
   3063	
   297	
   109	
   101	
   0	
   117	
   61	
   11	
   21773	
  
Percentage	
  correct	
   82.20339	
   78.16092	
   75	
   84.80392	
   85.22727	
   64.13678	
   78.52113	
   83.81503	
   47.8021978	
   55.24862	
   77.77777778	
   60.74177747	
   64.54456415	
   54.88215488	
   77.98165	
   66.33663366	
   #DIV/0!	
   88.03418803	
   80.32787	
   100	
  	
  	
  
9
Challenges and How Did We Overcome Those
•  Sparse data.
•  Large number of features.
•  Chi- Square test came to the rescue.
10
Skills Obtained
•  Lucene
•  Literature survey of existing techniques
•  Machine Learning and NLP
•  Exposure to productizing research
11
Alignment With My Career Path
•  Interested in Text and Machine Learning.
•  eBay has tonnes of data.
12
Future Scope for Improvement
•  User profile
•  Support Vector Machine, TF-IDF and k-NN algorithms
13
Q&A

More Related Content

PPTX
Wildcard13 - warmup slides for the "Roundtable discussion with Oracle Profess...
PPTX
Industry framework of e commerce
DOCX
E commerce full notes for mba
PPTX
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
PPTX
Roll grinding Six Sigma project
PPTX
Panoramic Video in Environmental Monitoring Software Development and Applica...
DOC
VizzMaintenance Eclipse Plugin Metrics
PDF
StaffingModel_EXAMPLE
Wildcard13 - warmup slides for the "Roundtable discussion with Oracle Profess...
Industry framework of e commerce
E commerce full notes for mba
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Roll grinding Six Sigma project
Panoramic Video in Environmental Monitoring Software Development and Applica...
VizzMaintenance Eclipse Plugin Metrics
StaffingModel_EXAMPLE

Similar to Generic Framework for Knowledge Classification-1 (20)

PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PPTX
realestate and MySQL devops melbourne
PDF
Detecting Malicious Websites using Machine Learning
PDF
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
PDF
QCon London.pdf
PPTX
Software vulnerability discovery and exploitation during red team assessments
PPTX
Hive, Presto, and Spark on TPC-DS benchmark
PPTX
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
PDF
Using Machine Learning to Debug Oracle RAC Issues
PPTX
Empowering the quantum revolution with Q#
PPTX
Performance Risk Management
PPTX
HYDSPIN Dec14 visual story telling
PDF
Machine Learning and React Native
PDF
Quick Wins
PPT
Investigating the Semantic Gap through Query Log Analysis
PDF
Large-scale Experimentation with Network Abstraction for Network Configuratio...
PDF
SQL on Hadoop benchmarks using TPC-DS query set
PDF
Experiences in ELK with D3.js for Large Log Analysis and Visualization
PDF
Practical attacks on commercial white-box cryptography solutions
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
realestate and MySQL devops melbourne
Detecting Malicious Websites using Machine Learning
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
QCon London.pdf
Software vulnerability discovery and exploitation during red team assessments
Hive, Presto, and Spark on TPC-DS benchmark
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Using Machine Learning to Debug Oracle RAC Issues
Empowering the quantum revolution with Q#
Performance Risk Management
HYDSPIN Dec14 visual story telling
Machine Learning and React Native
Quick Wins
Investigating the Semantic Gap through Query Log Analysis
Large-scale Experimentation with Network Abstraction for Network Configuratio...
SQL on Hadoop benchmarks using TPC-DS query set
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Practical attacks on commercial white-box cryptography solutions
Ad

Generic Framework for Knowledge Classification-1

  • 2. 2 Generic Framework For Knowledge Classification By Venkata Vineel
  • 3. 3 Agenda •  Introduction •  Problem at Hand •  How is it solved ? •  Challenges •  Skills and Career alignment •  Q & A
  • 4. 4 Introduction •  Masters in Computer Science University of Utah, SaltLakeCity, UT •  Systems Engineering Intern Internal tools team - Knowledge Management Interests: Scalability challenges, Machine Learning and Visualization.
  • 5. 5 Problem at Hand •  Generic Framework for classifying knowledge •  Classifying questions in Answer Hub
  • 6. 6 How did I solve ?? •  Developed an generic algorithm. •  Answer Hub Knowledge Base that learns.
  • 7. 7 Project High Points •  72 % percent accuracy has been achieved. 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 1 3 5 7 9 11 13 15 17 19 21 23 Rank Statastics No of Questions RANK CATEGORIES
  • 8. 8 Confusion matrix Categories   V3   GBX   C3   Hadoop   BES   DAL   Raptor   Stratus   Security  Pla>orm   General   User  Tracking   ExperimentaEon   Service  Frameworks   Search  Services   Sherlock   Batch  Frameword   Trinity   Commerce  OS     Teradata   AnalyEcs  Pla>orm   Total   V3   1552   2   1   2   6   263   217   3   23   455   2   41   290   9   3   6   0   0   0   0   2875   GBX   1   68   0   0   0   6   37   0   1   9   1   26   4   8   0   0   0   1   0   0   162   C3   0   0   318   1   1   25   27   54   5   32   1   6   1   4   0   1   0   1   1   0   478   Hadoop   0   0   2   173   1   10   8   0   0   20   1   3   4   0   3   0   0   0   0   0   225   BES   11   0   0   0   300   59   39   1   0   5   0   1   22   0   0   0   0   0   0   0   438   DAL   67   0   1   0   3   2307   89   0   2   16   0   13   99   5   0   1   0   0   0   0   2603   Raptor   11   10   5   2   25   396   5352   3   62   212   26   184   337   25   6   17   0   0   1   0   6674   Stratus   1   0   82   2   1   40   188   435   4   40   0   13   6   0   2   1   0   1   0   0   816   Security  Pla>orm   4   0   0   0   0   32   38   0   174   11   0   6   129   1   0   1   0   0   0   0   396   General   100   2   12   15   6   129   258   16   13   1200   3   88   64   29   4   3   0   0   5   0   1947   User  Tracking   3   0   0   1   0   16   43   0   3   8   126   41   10   1   0   0   0   0   0   0   252   ExperimentaEon   1   1   0   0   0   27   40   0   1   8   0   868   29   1   0   0   0   0   3   0   979   Service  Frameworks   124   3   0   0   6   90   299   2   67   83   0   56   1977   38   5   3   0   11   0   0   2764   Search  Services   0   1   1   0   1   5   9   1   2   8   0   4   32   163   0   0   0   0   0   0   227   Sherlock   2   0   0   4   0   67   31   2   0   17   0   29   19   0   85   0   0   0   0   0   256   Batch  Frameword   11   0   0   2   2   100   92   2   2   10   0   2   22   0   0   67   0   0   1   0   313   Trinity   0   0   0   0   0   0   0   0   0   0   0   4   1   1   0   0   0   0   0   0   6   Commerce  OS     0   0   0   0   0   10   48   0   4   15   0   14   15   8   0   0   0   103   0   0   217   Teradata   0   0   1   1   0   10   0   0   0   0   1   16   2   1   0   1   0   0   49   0   82   AnalyEcs  Pla>orm   0   0   1   1   0   5   1   0   1   23   1   14   0   3   1   0   0   0   1   11   63   Total   1888   87   424   204   352   3597   6816   519   364   2172   162   1429   3063   297   109   101   0   117   61   11   21773   Percentage  correct   82.20339   78.16092   75   84.80392   85.22727   64.13678   78.52113   83.81503   47.8021978   55.24862   77.77777778   60.74177747   64.54456415   54.88215488   77.98165   66.33663366   #DIV/0!   88.03418803   80.32787   100      
  • 9. 9 Challenges and How Did We Overcome Those •  Sparse data. •  Large number of features. •  Chi- Square test came to the rescue.
  • 10. 10 Skills Obtained •  Lucene •  Literature survey of existing techniques •  Machine Learning and NLP •  Exposure to productizing research
  • 11. 11 Alignment With My Career Path •  Interested in Text and Machine Learning. •  eBay has tonnes of data.
  • 12. 12 Future Scope for Improvement •  User profile •  Support Vector Machine, TF-IDF and k-NN algorithms