SlideShare a Scribd company logo
Similarity Computation Exploiting
the Semantic and Syntactic
Inherent Structure Among Job
Titles
Authors: Sarthak Ahuja1, Joydeep Mondal1, Sudhanhsu Shekhar Singh1 and
David Glenn George2
1 IBM Research Lab, India
2 IBM Talent Management Solutions, Portsmouth, UK
Presenter: Joydeep
What exactly it is trying to solve?
List of Available Job Titles
• System Engineer
• Software Developer
• Senior Software Engineer
• Junior Network Engineer
• Junior Software Tester
Query Job Title
• Junior Software Engineer
No Other information (job descriptions or other
details except TITLE) is available corresponding to
these jobs
Similarity
Computation
Similarity
ComputationSimilarity
Computation
Similarity
Computation
Similarity
Computation
Best Match
Business Application (Where & Why
it is needed?)
• IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using computation intensive and
time consuming sate of the art document similarity
methods by narrow down the search space
How the problem has been solved?
Job Title Matching
Split Title keywords
into Three categories
(Domain, Functional,
Attribute)
Map each category of
one job title to those
of the other title
Example
• Title = “Junior Software Engineer”
• Domain keywords Set = [“Software”]
• Functional keywords Set = [“Engineer”]
• Attribute Keywords set = [“Junior”]
Title = “Junior Software Engineer”
Map Domain, Functional, Attribute keyword sets of one title to those of the
other title
Methods
• Objective: Any job title can be split into the attribute, functional and core descriptor/domain words.
• Input:
• Job Title (T)
• Output:
• 3 sets , Attribute words set (SA), functional words set (SF) and core descriptor/domain words set (SD)
• Resources/ Existing techniques used:
• Acronym dictionary (DictA ), Spell checker technique (TechS ), Classifier model (Mclass)
• Algorithm:
• Step 1: SWord = split the title T into separate words
• Step 2: for each word in Sword
• Step 2.1: word = resolve acronyms of word using DictA
• Step 2.2: word = resolve the spelling mistake using TechS
• Step 2.3: classify word using Mclass as either a Attribute (A) word or a functional word (F) or a core descriptor/domain word (D)
• Step2.4: Append word to the corresponding set (SA , SF , SD ) depending upon it’s class label (A, F, D)
• Feature vector used in Classifier model (Mclass):
• [POS (part of speech) of the word, position of the word in job title (T) (first word/last word/in between
word), POS of the root word for each word, word ends with “er”/”or”/”ar” or not]
• Why we used these features?
• POS (part of speech) of the word : We found most of the attribute-words are adjectives, e.g. Senior, Junior etc., most of the
functional-words are noun, e.g. developer, tester, teacher and most of the core descriptor/domain words are also noun, e.g.
Software, Network etc.
• position of the word in job title (T) (first word/last word/in between word) : We found that attribute-words are generally the first or
last words of the title e.g.: Senior software developer, Network administrator junior etc. Most of the functional-words appear as in-
between or last word of the title e.g.: Senior software developer, Network administrator junior etc. We also found that most of the
core descriptor/domain words appears as in-between or first word in a title e.g.: Senior software developer, Network administrator
junior etc.
• POS of the root word for each word : Our analysis showed that POS of the root word corresponding to the functional-words are verb,
e.g. : Senior software developer : root word for developer = “develop” which is a verb. We used
https://www.vocabulary.com/dictionary/ open source online dictionary to get the root words.
• word ends with “er”/”or”/”ar” or not: We also found that most of the functional words end with either of these three substrings
“er”/”or”/”ar”, e.g. : teacher, developer, engineer etc.
I’m the
Best!
Functional classifier o/p
-> input of Attribute
Classifier
Functional Classifier o/p
+ Attribute Classifier
o/p -> input of Domain
Classifier
Methods
Objective: mapping three category-set of words (Attribute, Functional and core descriptor/domain)
corresponding to the two titles among themselves using classical imbalanced assignment problem. Then the
mapping scores are combined based on weighted or hierarchical scoring scheme to generate job title similarity.
• Input:
• Job Title1 (T1), Job Titl2 (T2)
• Output:
• Similarity score (s) between T1 and T2
• Resources/ Existing techniques used:
• Wordnet Dictionary API (W), Hungarian method to solve imbalanced assignment problem (TH)
• Algorithm:
• Step 1: extract (SA1 , SF1 , SD1 ) from T1 and (SA2 , SF2 , SD2 ) from T2 by previous method
• Step 2: Get the mappings as MA(SA1 : SA2 ), MF(SF1 : SF2 ) and MD(SD1 : SD2 ) by TH
• Step 3: calculate the mapping similarity score simA , simF and simD for MA , MF and MD respectively.
• Step 4: S = simD (1+ simF (1 + simA ))/ (IndicatorD + IndicatorF + IndicatorA ) // importance order : D, F and A respectively.
• We used Wordnet Dictionary API (W) to calculate semantic similarity between two words. We built a
semantic similarity score matrix for each pair of sets (SA1 : SA2 ), (SF1 : SF2 ) and (SD1 : SD2 ) and provide this
matrix to TH as input. We also use the same matrix to calculate simA , simF and simD for MA , MF and MD.
System Architecture Diagram
System Architecture Diagram + Example
Results
Core Novelty
1 . Any job title can be split into three categories the attribute, functional and core
descriptor/domain words.
2. Job title similarity calculation involves mapping of these three categories of
words corresponding to the two titles among themselves using classical imbalanced
assignment problem. Then the mapping scores can be combined based on
weighted or hierarchical scoring scheme to generate job title similarity.
16
Similarity computation exploiting the semantic and syntactic inherent structure among job titles
Similarity computation exploiting the semantic and syntactic inherent structure among job titles

More Related Content

PPT
Domain object model
PPTX
Localization and Shared Preferences in android
PPTX
PROCEDURAL ORIENTED PROGRAMMING VS OBJECT ORIENTED PROGRAMING
PPTX
[OOP - Lec 02] Why do we need OOP
PPTX
Software testing lab 3 & 4 (2)
PPT
4 lexical and syntax
PPT
Object Oriented Analysis and Design
PDF
Database Management System-session1-2
Domain object model
Localization and Shared Preferences in android
PROCEDURAL ORIENTED PROGRAMMING VS OBJECT ORIENTED PROGRAMING
[OOP - Lec 02] Why do we need OOP
Software testing lab 3 & 4 (2)
4 lexical and syntax
Object Oriented Analysis and Design
Database Management System-session1-2

What's hot (20)

PPTX
Semantics analysis
PPTX
PPT
DDL,DML,1stNF
PPT
358 33 powerpoint-slides_1-introduction-c_chapter-1
PPT
Oop lec 2(introduction to object oriented technology)
PPTX
Chap1java5th
PPT
classes & objects introduction
PDF
New c sharp3_features_(linq)_part_iv
PPTX
Week 2: Getting Your Hands Dirty – Part 2
PDF
Project Lambda: To Multicore and Beyond
PPTX
Week 1: Getting Your Hands Dirty - Part 1
PPTX
Chap2java5th
PPTX
Database management systems 3 - Data Modelling
PDF
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
PPTX
Language design and translation issues
PPT
Introduction to Object Oriented Design
PPTX
Java tokens
PPTX
Semantics analysis
DDL,DML,1stNF
358 33 powerpoint-slides_1-introduction-c_chapter-1
Oop lec 2(introduction to object oriented technology)
Chap1java5th
classes & objects introduction
New c sharp3_features_(linq)_part_iv
Week 2: Getting Your Hands Dirty – Part 2
Project Lambda: To Multicore and Beyond
Week 1: Getting Your Hands Dirty - Part 1
Chap2java5th
Database management systems 3 - Data Modelling
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Language design and translation issues
Introduction to Object Oriented Design
Java tokens
Ad

Similar to Similarity computation exploiting the semantic and syntactic inherent structure among job titles (20)

PPTX
ProgrammingPrimerAndOOPS
PPT
Designing A Syntax Based Retrieval System03
PPTX
Ladies Be Architects - Apex Basics
PDF
Language processors
PPT
ppt
PPT
ppt
DOCX
CS 112 PA #4Like the previous programming assignment, this assignm.docx
PPTX
System Programming Overview
PPTX
Graph Databases in the Microsoft Ecosystem
PPT
2 rel-algebra
PPT
Advanced full text searching techniques using Lucene
PPTX
Combinators, DSLs, HTML and F#
PPTX
Designing Optimized Symbols for InduSoft Web Studio Projects
PDF
Wiki dev nlp
PDF
Task-oriented Conversational semantic parsing
PDF
Basics of R programming for analytics [Autosaved] (1).pdf
PPTX
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
PPTX
Feature Engineering for NLP
PPTX
DataScience SG | Undergrad Series | 26th Sep 19
PPTX
ClassifyingIssuesFromSRTextAzureML
ProgrammingPrimerAndOOPS
Designing A Syntax Based Retrieval System03
Ladies Be Architects - Apex Basics
Language processors
ppt
ppt
CS 112 PA #4Like the previous programming assignment, this assignm.docx
System Programming Overview
Graph Databases in the Microsoft Ecosystem
2 rel-algebra
Advanced full text searching techniques using Lucene
Combinators, DSLs, HTML and F#
Designing Optimized Symbols for InduSoft Web Studio Projects
Wiki dev nlp
Task-oriented Conversational semantic parsing
Basics of R programming for analytics [Autosaved] (1).pdf
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Feature Engineering for NLP
DataScience SG | Undergrad Series | 26th Sep 19
ClassifyingIssuesFromSRTextAzureML
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
20250228 LYD VKU AI Blended-Learning.pptx

Similarity computation exploiting the semantic and syntactic inherent structure among job titles

  • 1. Similarity Computation Exploiting the Semantic and Syntactic Inherent Structure Among Job Titles Authors: Sarthak Ahuja1, Joydeep Mondal1, Sudhanhsu Shekhar Singh1 and David Glenn George2 1 IBM Research Lab, India 2 IBM Talent Management Solutions, Portsmouth, UK Presenter: Joydeep
  • 2. What exactly it is trying to solve?
  • 3. List of Available Job Titles • System Engineer • Software Developer • Senior Software Engineer • Junior Network Engineer • Junior Software Tester Query Job Title • Junior Software Engineer No Other information (job descriptions or other details except TITLE) is available corresponding to these jobs Similarity Computation Similarity ComputationSimilarity Computation Similarity Computation Similarity Computation Best Match
  • 4. Business Application (Where & Why it is needed?)
  • 5. • IBM Watson Recruitment (IWR) : https://www.ibm.com/talent- management/hr-solutions/recruiting-software Mapping requisition jobs to the available job taxonomy without using computation intensive and time consuming sate of the art document similarity methods by narrow down the search space
  • 6. How the problem has been solved?
  • 7. Job Title Matching Split Title keywords into Three categories (Domain, Functional, Attribute) Map each category of one job title to those of the other title
  • 8. Example • Title = “Junior Software Engineer” • Domain keywords Set = [“Software”] • Functional keywords Set = [“Engineer”] • Attribute Keywords set = [“Junior”] Title = “Junior Software Engineer” Map Domain, Functional, Attribute keyword sets of one title to those of the other title
  • 9. Methods • Objective: Any job title can be split into the attribute, functional and core descriptor/domain words. • Input: • Job Title (T) • Output: • 3 sets , Attribute words set (SA), functional words set (SF) and core descriptor/domain words set (SD) • Resources/ Existing techniques used: • Acronym dictionary (DictA ), Spell checker technique (TechS ), Classifier model (Mclass) • Algorithm: • Step 1: SWord = split the title T into separate words • Step 2: for each word in Sword • Step 2.1: word = resolve acronyms of word using DictA • Step 2.2: word = resolve the spelling mistake using TechS • Step 2.3: classify word using Mclass as either a Attribute (A) word or a functional word (F) or a core descriptor/domain word (D) • Step2.4: Append word to the corresponding set (SA , SF , SD ) depending upon it’s class label (A, F, D) • Feature vector used in Classifier model (Mclass): • [POS (part of speech) of the word, position of the word in job title (T) (first word/last word/in between word), POS of the root word for each word, word ends with “er”/”or”/”ar” or not]
  • 10. • Why we used these features? • POS (part of speech) of the word : We found most of the attribute-words are adjectives, e.g. Senior, Junior etc., most of the functional-words are noun, e.g. developer, tester, teacher and most of the core descriptor/domain words are also noun, e.g. Software, Network etc. • position of the word in job title (T) (first word/last word/in between word) : We found that attribute-words are generally the first or last words of the title e.g.: Senior software developer, Network administrator junior etc. Most of the functional-words appear as in- between or last word of the title e.g.: Senior software developer, Network administrator junior etc. We also found that most of the core descriptor/domain words appears as in-between or first word in a title e.g.: Senior software developer, Network administrator junior etc. • POS of the root word for each word : Our analysis showed that POS of the root word corresponding to the functional-words are verb, e.g. : Senior software developer : root word for developer = “develop” which is a verb. We used https://www.vocabulary.com/dictionary/ open source online dictionary to get the root words. • word ends with “er”/”or”/”ar” or not: We also found that most of the functional words end with either of these three substrings “er”/”or”/”ar”, e.g. : teacher, developer, engineer etc.
  • 11. I’m the Best! Functional classifier o/p -> input of Attribute Classifier Functional Classifier o/p + Attribute Classifier o/p -> input of Domain Classifier
  • 12. Methods Objective: mapping three category-set of words (Attribute, Functional and core descriptor/domain) corresponding to the two titles among themselves using classical imbalanced assignment problem. Then the mapping scores are combined based on weighted or hierarchical scoring scheme to generate job title similarity. • Input: • Job Title1 (T1), Job Titl2 (T2) • Output: • Similarity score (s) between T1 and T2 • Resources/ Existing techniques used: • Wordnet Dictionary API (W), Hungarian method to solve imbalanced assignment problem (TH) • Algorithm: • Step 1: extract (SA1 , SF1 , SD1 ) from T1 and (SA2 , SF2 , SD2 ) from T2 by previous method • Step 2: Get the mappings as MA(SA1 : SA2 ), MF(SF1 : SF2 ) and MD(SD1 : SD2 ) by TH • Step 3: calculate the mapping similarity score simA , simF and simD for MA , MF and MD respectively. • Step 4: S = simD (1+ simF (1 + simA ))/ (IndicatorD + IndicatorF + IndicatorA ) // importance order : D, F and A respectively. • We used Wordnet Dictionary API (W) to calculate semantic similarity between two words. We built a semantic similarity score matrix for each pair of sets (SA1 : SA2 ), (SF1 : SF2 ) and (SD1 : SD2 ) and provide this matrix to TH as input. We also use the same matrix to calculate simA , simF and simD for MA , MF and MD.
  • 16. Core Novelty 1 . Any job title can be split into three categories the attribute, functional and core descriptor/domain words. 2. Job title similarity calculation involves mapping of these three categories of words corresponding to the two titles among themselves using classical imbalanced assignment problem. Then the mapping scores can be combined based on weighted or hierarchical scoring scheme to generate job title similarity. 16