SlideShare a Scribd company logo
Adventures in Real-World
Data Science
Automated Patent Classification
Rollie D. Goodman
TechLink Center
• Established as a technology transfer center in 1996
• Facilitates ~60% of DoD’s license agreements with industry
• Helps small companies secure R&D contracts
• Established as a technology transfer center in 1996
• Facilitates ~60% of DoD’s license agreements with industry
• Helps small companies secure R&D contracts
• Established as a technology transfer center in 1996
• Facilitates ~60% of DoD’s license agreements with industry
• Helps small companies secure R&D contracts
Adventures in Real-World Data Science
training set: ~9,000 labeled patents
Patent Data
Document Number US 9,832,220
Assignee US Air Force
Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin
Attorneys Mancini, Joseph A.
Title Security Method for Allocation of Virtual Machines in a Cloud Computing
Network
Abstract A method for enhancing security in a cloud computing system by allocating
virtual machines over hypervisors, in a cloud computing environment, in a
security-aware fashion. The invention solves the cloud user risk problem by
inducing a state such that, unless there is a change in the conditions under
which the present invention operates, the cloud users do not gain by deviating
from the allocation induced by the present invention. The invention’s methods
include grouping virtual machines of similar loss potential on the same
hypervisor, creating hypervisor environments of similar total loss, and
implementing a risk tiered system of hypervisors based on expense factors.
Publication Date 11-28-2017
CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
Patent Data
Document Number US 9,832,220
Assignee US Air Force
Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin
Attorneys Mancini, Joseph A.
Title Security Method for Allocation of Virtual Machines in a Cloud Computing
Network
Abstract A method for enhancing security in a cloud computing system by allocating
virtual machines over hypervisors, in a cloud computing environment, in a
security-aware fashion. The invention solves the cloud user risk problem by
inducing a state such that, unless there is a change in the conditions under
which the present invention operates, the cloud users do not gain by deviating
from the allocation induced by the present invention. The invention’s methods
include grouping virtual machines of similar loss potential on the same
hypervisor, creating hypervisor environments of similar total loss, and
implementing a risk tiered system of hypervisors based on expense factors.
Publication Date 11-28-2017
CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
Patent Data
Document Number US 9,832,220
Assignee US Air Force
Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin
Attorneys Mancini, Joseph A.
Title Security Method for Allocation of Virtual Machines in a Cloud Computing
Network
Abstract A method for enhancing security in a cloud computing system by allocating
virtual machines over hypervisors, in a cloud computing environment, in a
security-aware fashion. The invention solves the cloud user risk problem by
inducing a state such that, unless there is a change in the conditions under
which the present invention operates, the cloud users do not gain by deviating
from the allocation induced by the present invention. The invention’s methods
include grouping virtual machines of similar loss potential on the same
hypervisor, creating hypervisor environments of similar total loss, and
implementing a risk tiered system of hypervisors based on expense factors.
Publication Date 11-28-2017
CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
A01B33/028
A: Human necessities
A01: Agriculture
A01B: Machines for soil working in agriculture or industry
A01B33: Tilling implements with rotary driven tools
A01B33/02: …with tools on horizontal shaft transverse to direction of travel
A01B33/028: …of the walk-behind type
CPC Terms
A01B33/028
A: Human necessities
A01: Agriculture
A01B: Machines for soil working in agriculture or industry
A01B33: Tilling implements with rotary driven tools
A01B33/02: …with tools on horizontal shaft transverse to direction of travel
A01B33/028: …of the walk-behind type
CPC Terms
vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045}
instance: {A61K038/00, A61K038/005, A61K039/00}
CPC Vectorization
vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045}
instance: {A61K038/00, A61K038/005, A61K039/00}
{A61K038, A61K038, A61K039}
CPC Vectorization
vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045}
instance: {A61K038/00, A61K038/005, A61K039/00}
{A61K038, A61K038, A61K039}
[ 0, 2, 1, 0, 0 ]
CPC Vectorization
Support Vector Machines
a
x
y
Support Vector Machines
a
b
x
y
Support Vector Machines
a
b
x
x
y
y
Support Vector Machines
a
b
x
x
x
y
y
z
Support Vector Machines
a
b
x
x
x
y
y
z
Cross-Validation
experiment 1
experiment 2
experiment 3
experiment 4
experiment 5
fold 1 fold 2 fold 3 fold 4 fold 5
overall accuracy
randomized training data
Ensemble Learners
• Train and combine multiple learners to solve a single problem
• also: “multiple classifier systems”
• Often outperform single classifiers
• e.g. Netflix Competition, KDD 2009, and Kaggle
Text Processing
Text Processing
• Stopwords: remove words that appear frequently but do not
give any information about content
• a, an, and, for, from, is, it, the, to, with…
Text Processing
• Stopwords: remove words that appear frequently but do not
give any information about content
• a, an, and, for, from, is, it, the, to, with…
• Stemming: reduce derived words to root (“stemmed”) form
• different, differently, differ, differing, differed → differ
Text Processing
• Stopwords: remove words that appear frequently but do not
give any information about content
• a, an, and, for, from, is, it, the, to, with…
• Stemming: reduce derived words to root (“stemmed”) form
• different, differently, differ, differing, differed → differ
• Weighting: term frequency – inverse document frequency
!"#$"% = '()* +)(,-(./0% ∗ log
.-*5() 6+ 76/-*(.'8
.-*5() 6+ 76/-*(.'8 9ℎ()( '()* ; 6//-)8
the results are computed from the resulting generated text
Text Processing
the results are computed from the resulting generated text
results computed resulting generated text
Text Processing
the results are computed from the resulting generated text
results computed resulting generated text
result comput result gener text
Text Processing
the results are computed from the resulting generated text
results computed resulting generated text
result comput result gener text
3.03, 1.24, 0.68, 4.79. . .
Text Processing
CPC classifier
(SVM)
text classifier
(SVM)
?
“The results are computed from the
resulting generated text…”
{A61K036, A61K038, A61K039,
A61K041, A61K045}
final classification
class 1 class 2[class 1, class 2]
Decision Trees
outlook
humidity wind
N Y
Y
Y N
high low
sunny
overcast
rainy
high low
outlook: {sunny, overcast, rainy}
humidity: {high, low}
wind: {high, low}
hiking: {Yes, No}
CPC classifier
(SVM)
text classifier
(SVM)
decision tree
“The results are computed from the
resulting generated text…”
{A61K036, A61K038, A61K039,
A61K041, A61K045}
final classification
class 1 class 2[class 1, class 2]
87% 76%
98%
Adventures in Real-World Data Science
Adventures in Real-World Data Science
Adventures in Real-World Data Science
Adventures in Real-World Data Science
Questions?

More Related Content

PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
Virtualization Technology using Virtual Machines for Cloud Computing
PPTX
CLOUD COMPUTING UNIT-1
PDF
A Result on Novel Approach for Load Balancing in Cloud Computing
PDF
IRJET- A Statistical Approach Towards Energy Saving in Cloud Computing
PDF
Classification of Virtualization Environment for Cloud Computing
DOC
Distributed, concurrent, and independent access to encrypted cloud databases
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
Virtualization Technology using Virtual Machines for Cloud Computing
CLOUD COMPUTING UNIT-1
A Result on Novel Approach for Load Balancing in Cloud Computing
IRJET- A Statistical Approach Towards Energy Saving in Cloud Computing
Classification of Virtualization Environment for Cloud Computing
Distributed, concurrent, and independent access to encrypted cloud databases

Similar to Adventures in Real-World Data Science (20)

DOC
Distributed, concurrent, and independent access to encrypted cloud databases
PDF
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
PDF
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
PDF
Iaetsd effective fault toerant resource allocation with cost
PDF
BillmsResume
PDF
Performance Improvement of Cloud Computing Data Centers Using Energy Efficien...
PPTX
it is the presentation which shows the various topics of the loud computing
PDF
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
PDF
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
PPTX
Usage Patterns to Provision for Scientific Experiments in Clouds
PDF
Server Consolidation through Virtual Machine Task Migration to achieve Green ...
PDF
Simulation Based Workflow Scheduling for Scientific Application
PPTX
Unit 2
PDF
Multi objective genetic approach with Ranking
DOC
Probabilistic consolidation of virtual machines in self organizing cloud data...
PDF
An Architecture for Providing Security to Cloud Resources
PDF
IRJET- In Cloud Computing Resource Allotment by using Resource Provisioning A...
PDF
Scheduling in Virtual Infrastructure for High-Throughput Computing
PDF
Top Viewed Articles from Academia in 2019- International Journal of Distribu...
PDF
dynamic resource allocation using virtual machines for cloud computing enviro...
Distributed, concurrent, and independent access to encrypted cloud databases
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
Iaetsd effective fault toerant resource allocation with cost
BillmsResume
Performance Improvement of Cloud Computing Data Centers Using Energy Efficien...
it is the presentation which shows the various topics of the loud computing
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Usage Patterns to Provision for Scientific Experiments in Clouds
Server Consolidation through Virtual Machine Task Migration to achieve Green ...
Simulation Based Workflow Scheduling for Scientific Application
Unit 2
Multi objective genetic approach with Ranking
Probabilistic consolidation of virtual machines in self organizing cloud data...
An Architecture for Providing Security to Cloud Resources
IRJET- In Cloud Computing Resource Allotment by using Resource Provisioning A...
Scheduling in Virtual Infrastructure for High-Throughput Computing
Top Viewed Articles from Academia in 2019- International Journal of Distribu...
dynamic resource allocation using virtual machines for cloud computing enviro...
Ad

More from roblund (12)

PPTX
2 years into drinking the Microservice kool-aid (Fact and Fiction)
PPTX
ES6 Primer
PDF
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
PDF
Will Price - Venture Capital in Montana - BSDC 2016
PDF
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
PDF
Chris Omland - AWS Code Deploy - BSDC 2016
PPTX
Josef Verbanac - Voice is (a) Best Practice
PPTX
Emergence Of Code Schools
PPT
Nora McDougall-Collins - I Can Do That
PPTX
Better tests automagically (big sky dev con 2015)
PDF
Ben Werner - Mountains and startups
PDF
Jason Moore - Interaction design in enterprise teams
2 years into drinking the Microservice kool-aid (Fact and Fiction)
ES6 Primer
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
Will Price - Venture Capital in Montana - BSDC 2016
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
Chris Omland - AWS Code Deploy - BSDC 2016
Josef Verbanac - Voice is (a) Best Practice
Emergence Of Code Schools
Nora McDougall-Collins - I Can Do That
Better tests automagically (big sky dev con 2015)
Ben Werner - Mountains and startups
Jason Moore - Interaction design in enterprise teams
Ad

Recently uploaded (20)

PPTX
Welding lecture in detail for understanding
PPTX
web development for engineering and engineering
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Construction Project Organization Group 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
PPT on Performance Review to get promotions
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Welding lecture in detail for understanding
web development for engineering and engineering
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Embodied AI: Ushering in the Next Era of Intelligent Systems
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lesson 3_Tessellation.pptx finite Mathematics
Construction Project Organization Group 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT on Performance Review to get promotions
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

Adventures in Real-World Data Science

  • 1. Adventures in Real-World Data Science Automated Patent Classification Rollie D. Goodman TechLink Center
  • 2. • Established as a technology transfer center in 1996 • Facilitates ~60% of DoD’s license agreements with industry • Helps small companies secure R&D contracts
  • 3. • Established as a technology transfer center in 1996 • Facilitates ~60% of DoD’s license agreements with industry • Helps small companies secure R&D contracts
  • 4. • Established as a technology transfer center in 1996 • Facilitates ~60% of DoD’s license agreements with industry • Helps small companies secure R&D contracts
  • 6. training set: ~9,000 labeled patents
  • 7. Patent Data Document Number US 9,832,220 Assignee US Air Force Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin Attorneys Mancini, Joseph A. Title Security Method for Allocation of Virtual Machines in a Cloud Computing Network Abstract A method for enhancing security in a cloud computing system by allocating virtual machines over hypervisors, in a cloud computing environment, in a security-aware fashion. The invention solves the cloud user risk problem by inducing a state such that, unless there is a change in the conditions under which the present invention operates, the cloud users do not gain by deviating from the allocation induced by the present invention. The invention’s methods include grouping virtual machines of similar loss potential on the same hypervisor, creating hypervisor environments of similar total loss, and implementing a risk tiered system of hypervisors based on expense factors. Publication Date 11-28-2017 CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
  • 8. Patent Data Document Number US 9,832,220 Assignee US Air Force Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin Attorneys Mancini, Joseph A. Title Security Method for Allocation of Virtual Machines in a Cloud Computing Network Abstract A method for enhancing security in a cloud computing system by allocating virtual machines over hypervisors, in a cloud computing environment, in a security-aware fashion. The invention solves the cloud user risk problem by inducing a state such that, unless there is a change in the conditions under which the present invention operates, the cloud users do not gain by deviating from the allocation induced by the present invention. The invention’s methods include grouping virtual machines of similar loss potential on the same hypervisor, creating hypervisor environments of similar total loss, and implementing a risk tiered system of hypervisors based on expense factors. Publication Date 11-28-2017 CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
  • 9. Patent Data Document Number US 9,832,220 Assignee US Air Force Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin Attorneys Mancini, Joseph A. Title Security Method for Allocation of Virtual Machines in a Cloud Computing Network Abstract A method for enhancing security in a cloud computing system by allocating virtual machines over hypervisors, in a cloud computing environment, in a security-aware fashion. The invention solves the cloud user risk problem by inducing a state such that, unless there is a change in the conditions under which the present invention operates, the cloud users do not gain by deviating from the allocation induced by the present invention. The invention’s methods include grouping virtual machines of similar loss potential on the same hypervisor, creating hypervisor environments of similar total loss, and implementing a risk tiered system of hypervisors based on expense factors. Publication Date 11-28-2017 CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
  • 10. A01B33/028 A: Human necessities A01: Agriculture A01B: Machines for soil working in agriculture or industry A01B33: Tilling implements with rotary driven tools A01B33/02: …with tools on horizontal shaft transverse to direction of travel A01B33/028: …of the walk-behind type CPC Terms
  • 11. A01B33/028 A: Human necessities A01: Agriculture A01B: Machines for soil working in agriculture or industry A01B33: Tilling implements with rotary driven tools A01B33/02: …with tools on horizontal shaft transverse to direction of travel A01B33/028: …of the walk-behind type CPC Terms
  • 12. vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045} instance: {A61K038/00, A61K038/005, A61K039/00} CPC Vectorization
  • 13. vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045} instance: {A61K038/00, A61K038/005, A61K039/00} {A61K038, A61K038, A61K039} CPC Vectorization
  • 14. vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045} instance: {A61K038/00, A61K038/005, A61K039/00} {A61K038, A61K038, A61K039} [ 0, 2, 1, 0, 0 ] CPC Vectorization
  • 20. Cross-Validation experiment 1 experiment 2 experiment 3 experiment 4 experiment 5 fold 1 fold 2 fold 3 fold 4 fold 5 overall accuracy randomized training data
  • 21. Ensemble Learners • Train and combine multiple learners to solve a single problem • also: “multiple classifier systems” • Often outperform single classifiers • e.g. Netflix Competition, KDD 2009, and Kaggle
  • 23. Text Processing • Stopwords: remove words that appear frequently but do not give any information about content • a, an, and, for, from, is, it, the, to, with…
  • 24. Text Processing • Stopwords: remove words that appear frequently but do not give any information about content • a, an, and, for, from, is, it, the, to, with… • Stemming: reduce derived words to root (“stemmed”) form • different, differently, differ, differing, differed → differ
  • 25. Text Processing • Stopwords: remove words that appear frequently but do not give any information about content • a, an, and, for, from, is, it, the, to, with… • Stemming: reduce derived words to root (“stemmed”) form • different, differently, differ, differing, differed → differ • Weighting: term frequency – inverse document frequency !"#$"% = '()* +)(,-(./0% ∗ log .-*5() 6+ 76/-*(.'8 .-*5() 6+ 76/-*(.'8 9ℎ()( '()* ; 6//-)8
  • 26. the results are computed from the resulting generated text Text Processing
  • 27. the results are computed from the resulting generated text results computed resulting generated text Text Processing
  • 28. the results are computed from the resulting generated text results computed resulting generated text result comput result gener text Text Processing
  • 29. the results are computed from the resulting generated text results computed resulting generated text result comput result gener text 3.03, 1.24, 0.68, 4.79. . . Text Processing
  • 30. CPC classifier (SVM) text classifier (SVM) ? “The results are computed from the resulting generated text…” {A61K036, A61K038, A61K039, A61K041, A61K045} final classification class 1 class 2[class 1, class 2]
  • 31. Decision Trees outlook humidity wind N Y Y Y N high low sunny overcast rainy high low outlook: {sunny, overcast, rainy} humidity: {high, low} wind: {high, low} hiking: {Yes, No}
  • 32. CPC classifier (SVM) text classifier (SVM) decision tree “The results are computed from the resulting generated text…” {A61K036, A61K038, A61K039, A61K041, A61K045} final classification class 1 class 2[class 1, class 2] 87% 76% 98%