SlideShare a Scribd company logo
Finding
Potential Candidates
via GitHub Social Network
1
Smart
Recruitment
Kritsana Jongsathitphaibul
Watcharapong Wongrattanasirikul
Rangsarid Pringwanid
Theptat Santanon
Chawanvit Ekarittikrai
2
Agenda
I. Business Values
• Existing Business Pain point
• Business Problem Solving
II. Data Collections
• Data Gathering
• Data Pre-processing
III. Exploratory Analysis
• Network Analysis
• Geographic Search
IV. Recommendation System
• Node Embedding technique
• Similarities
3
I. Business Concept
Customer Request
Headhunter agency
Variety Sources
Potential Candidate
Customer Request
Headhunter Targeted Ads
Potential Candidate
Criteria Matching
in variety sources
Automatic email
Job description to
target selection cluster
Update CV
Automatic email
CV + skills record
reference to customer
• Link candidate with
skillsets
• Matching Qualification
Request
Send
Job Qualification
Searching
Review
Matching
By human
Send
Job Qualification
Select &
Contact
Update CV
Matching
proposal
Filled
experience
Data in
many
website
Pass on
to CV
customer
Business Pain point (Take-time, High-Cost, High-Human effort) Business Solving (Less-time, Lower-Cost, Lower-effort, Automatic )
Reactive
Proactive
Proactive
Reactive
II. Data Collections (Data Gathering)
GitHub APIPython
Limit 1,500 request/ hour
4
GitHub API
Multiple
Github Users
II. Data Collections (Data Gathering)
Token
5
II. Data Collections (Data Gathering)
Repository &
Contributor
Repository
Name
Reopsitory
Score
Programming
Language
Contributor
Information
Contributor
Name
Follower
Name
Follower
Address
Contributor &
Follower
6
II. Data Cleansing (Data Preprocessing)
GitHub user location
7
II. Data Cleansing (Data Preprocessing)
Ambiguous location
Incorrect location
Typo and multiple spell
Bangalore
8
III. Exploratory Analysis
9
10
III. Exploratory Analysis
Projected Graph
11
III. Exploratory Analysis
Overview
Repository Contributor
Average Degree : 5.12
Density : 0.018
Assortivity : -0.095
Average Clustering : 0.29
Average Degree : 198.76
Density : 0.024
Assortivity : 0.482
Average Clustering : 0.96
III. Exploratory Analysis
12
Influential Repositories
13
III. Exploratory Analysis
Influential Contributors
14
III. Exploratory Analysis (Country Analysis)
26 contributors
727 Followers
Coding Together at Scale: GitHub as a
Collaborative Social Network.
Antonio Lima, Luca Rossi and Mirco
Musolesi
878 contributors
34,840 Followers
The top 10 rank of contributors and followers :
- Majority of users is located in North America and in Europe
- The leading countries are the United States (USA) and China, Canada on both graph
15
III. Exploratory Analysis(Country Analysis )
Overview withThailand
Repository Users ( Countributor + Follower )
Average Degree : 3.14
Density : 0.006
Assortivity : -0.452
Average Clustering : 0.26
Average Degree : 34.32
Density : 0.006
Assortivity : 0.788
Average Clustering : 0.68
III. Exploratory Analysis (Country
Analysis )
16
Red : Users in Thailand
Bule : Users in Other Country
Number of Repository Nodes : 499
Number of User Nodes : 753
Number of Edges : 324
IV. Recommendation System
Job Description Skill Extraction
Recommendation Potential Candidates
Network Modeling
17
IV. Recommendation System
Node2Vec
A node embedding model by extending Skip-
gram architecture to networks is used to satisfy the
exploration-exploitation trade-off with random walk
sampling to explore neighborhoods in local and
global structure.
Yu Qu, Ting Liu. 2018
Concept:
- Breadth-First Sampling (BFS):
Focus on local neighborhoods.
- Depth-First Sampling (DFS): Focus on global
neighborhoods.
Parameters:
P,Q for model transition probabilies.
- P is return parameter
- Q is "walk away" parameter
18
IV. Recommendation System
Airflow Potential Candidates
Follows
Contributor
19
IV. Recommendation System
Airflow Potential Candidates Contributor
Follows
20
IV. Recommendation System
Airflow Potential Candidates
Follows
Contributor
21
Thank You

More Related Content

PDF
More thinking about xApi and IMS Caliper - Structural/Syntactic & Ontological...
PPTX
EDUPUB Implementation Demo Showcase - Reference SW using Readium JS
PPTX
LIS688_Group1
PDF
An Automated Assessment of the FAIRness of Research Data
PPTX
Classifying malicious websites using an ensemble weighted features
PPTX
Data Mining: Graph mining and social network analysis
PPTX
Segmentation
PDF
Role of PIDs in connecting scholarly works
More thinking about xApi and IMS Caliper - Structural/Syntactic & Ontological...
EDUPUB Implementation Demo Showcase - Reference SW using Readium JS
LIS688_Group1
An Automated Assessment of the FAIRness of Research Data
Classifying malicious websites using an ensemble weighted features
Data Mining: Graph mining and social network analysis
Segmentation
Role of PIDs in connecting scholarly works

Similar to Finding potential candidates via git hub network analysis (20)

PDF
It's MY JOB: Identifying and Improving Content Quality for Online recruitmen...
PPTX
Data council sf amundsen presentation
PDF
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
PDF
Recommender Systems @ Scale, Big Data Europe Conference 2019
PPTX
Data Science Demystified
PPTX
Strata sf - Amundsen presentation
PDF
Recommender Systems @ Scale - PyData 2019
PDF
Disrupting Data Discovery
PDF
Meetup SF - Amundsen
PPT
Data Mining and the Web_Past_Present and Future
PDF
Entity-Centric Data Management
PPTX
How Lyft Drives Data Discovery
PDF
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
PPTX
Machine Learned Relevance at A Large Scale Search Engine
PPTX
Advanced Analytics and Data Science Expertise
PPTX
How Lyft Drives Data Discovery
PDF
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
PDF
Beyond User Research
PPTX
South Big Data Hub: Text Data Analysis Panel
PDF
The path to be a data scientist
It's MY JOB: Identifying and Improving Content Quality for Online recruitmen...
Data council sf amundsen presentation
Gutmacher In-House Sourcing Model Offshore and Onshore Nov. 2016
Recommender Systems @ Scale, Big Data Europe Conference 2019
Data Science Demystified
Strata sf - Amundsen presentation
Recommender Systems @ Scale - PyData 2019
Disrupting Data Discovery
Meetup SF - Amundsen
Data Mining and the Web_Past_Present and Future
Entity-Centric Data Management
How Lyft Drives Data Discovery
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Machine Learned Relevance at A Large Scale Search Engine
Advanced Analytics and Data Science Expertise
How Lyft Drives Data Discovery
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Beyond User Research
South Big Data Hub: Text Data Analysis Panel
The path to be a data scientist
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Computer network topology notes for revision
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Lecture1 pattern recognition............
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Mega Projects Data Mega Projects Data
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Computer network topology notes for revision
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Lecture1 pattern recognition............
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Ad

Finding potential candidates via git hub network analysis

  • 1. Finding Potential Candidates via GitHub Social Network 1 Smart Recruitment Kritsana Jongsathitphaibul Watcharapong Wongrattanasirikul Rangsarid Pringwanid Theptat Santanon Chawanvit Ekarittikrai
  • 2. 2 Agenda I. Business Values • Existing Business Pain point • Business Problem Solving II. Data Collections • Data Gathering • Data Pre-processing III. Exploratory Analysis • Network Analysis • Geographic Search IV. Recommendation System • Node Embedding technique • Similarities
  • 3. 3 I. Business Concept Customer Request Headhunter agency Variety Sources Potential Candidate Customer Request Headhunter Targeted Ads Potential Candidate Criteria Matching in variety sources Automatic email Job description to target selection cluster Update CV Automatic email CV + skills record reference to customer • Link candidate with skillsets • Matching Qualification Request Send Job Qualification Searching Review Matching By human Send Job Qualification Select & Contact Update CV Matching proposal Filled experience Data in many website Pass on to CV customer Business Pain point (Take-time, High-Cost, High-Human effort) Business Solving (Less-time, Lower-Cost, Lower-effort, Automatic ) Reactive Proactive Proactive Reactive
  • 4. II. Data Collections (Data Gathering) GitHub APIPython Limit 1,500 request/ hour 4
  • 5. GitHub API Multiple Github Users II. Data Collections (Data Gathering) Token 5
  • 6. II. Data Collections (Data Gathering) Repository & Contributor Repository Name Reopsitory Score Programming Language Contributor Information Contributor Name Follower Name Follower Address Contributor & Follower 6
  • 7. II. Data Cleansing (Data Preprocessing) GitHub user location 7
  • 8. II. Data Cleansing (Data Preprocessing) Ambiguous location Incorrect location Typo and multiple spell Bangalore 8
  • 11. 11 III. Exploratory Analysis Overview Repository Contributor Average Degree : 5.12 Density : 0.018 Assortivity : -0.095 Average Clustering : 0.29 Average Degree : 198.76 Density : 0.024 Assortivity : 0.482 Average Clustering : 0.96
  • 14. 14 III. Exploratory Analysis (Country Analysis) 26 contributors 727 Followers Coding Together at Scale: GitHub as a Collaborative Social Network. Antonio Lima, Luca Rossi and Mirco Musolesi 878 contributors 34,840 Followers The top 10 rank of contributors and followers : - Majority of users is located in North America and in Europe - The leading countries are the United States (USA) and China, Canada on both graph
  • 15. 15 III. Exploratory Analysis(Country Analysis ) Overview withThailand Repository Users ( Countributor + Follower ) Average Degree : 3.14 Density : 0.006 Assortivity : -0.452 Average Clustering : 0.26 Average Degree : 34.32 Density : 0.006 Assortivity : 0.788 Average Clustering : 0.68
  • 16. III. Exploratory Analysis (Country Analysis ) 16 Red : Users in Thailand Bule : Users in Other Country Number of Repository Nodes : 499 Number of User Nodes : 753 Number of Edges : 324
  • 17. IV. Recommendation System Job Description Skill Extraction Recommendation Potential Candidates Network Modeling 17
  • 18. IV. Recommendation System Node2Vec A node embedding model by extending Skip- gram architecture to networks is used to satisfy the exploration-exploitation trade-off with random walk sampling to explore neighborhoods in local and global structure. Yu Qu, Ting Liu. 2018 Concept: - Breadth-First Sampling (BFS): Focus on local neighborhoods. - Depth-First Sampling (DFS): Focus on global neighborhoods. Parameters: P,Q for model transition probabilies. - P is return parameter - Q is "walk away" parameter 18
  • 19. IV. Recommendation System Airflow Potential Candidates Follows Contributor 19
  • 20. IV. Recommendation System Airflow Potential Candidates Contributor Follows 20
  • 21. IV. Recommendation System Airflow Potential Candidates Follows Contributor 21

Editor's Notes

  • #10: จากข้อมูลของ Repository กับ Contributor นะครับ เราจะเห็นว่า Repository แต่ละตัวนั้นจะมี Contributor อยู่หลายคน และ Contributor เองก็สามารถ Join Repository ได้หลาย Repository เช่นกัน ซึ่งทำให้เราได้ ฺBipartite ของ Graph นั้นออกมา ซ
  • #11: หลังจากนั้นเราจะทำการ Projected Graph ลงไปในแต่ละแกนนะครับจะกลายเป็น Repository-Repository และ Contributor-Contributor โดยที่ Repository ที่มีการเชื่อมกันนั้นหมายความว่ามีอย่างน้อย 1 Contributor ที่เข้าร่วมทั้ง 2 Repository นั้นครับ และ เช่นเดียวกันสำหรับ Contributor ที่มีการเชื่อมกันนั้นเกิดจากการที่ Contributor 2 คนนั้นต้องเคยร่วม repositpry เดียวกัน
  • #12: ต่อไปสำหรับสไลด์นี้นะครับจะเป็นสถิติภาพรวมของ Repo – Repo และ Contri-Contri ซึ่งจะได้ค่าต่างๆดังนี้ครับ 
  • #13: ถัดมาเราได้ทำการหา Influential ของ Repository โดยการใช้ PageRank ครับ พบว่าทำให้เราได้ Repo ที่เกี่ยวข้องกับงานของทางด้าน DataEn เนื่่องจากเราใช้ Keyword นี้ ได้แก่ Airflow , Presto , Hadoop และ ต่างๆนาๆครับ
  • #14: ในส่วนของ Contributor-Contributor เราก็ได้ทำการหา  Influential Contributor เช่นกัน ซึ่งทำให้เราได้กลุ่มคนหรือผู้คน แต่กลุ่มคนนี้นั้นจะเป็นกลุ่มคนที่ค่อนข้างที่จะยากต่อการ Recruit ครับ เนื่องจากค่อนข้าง High Profile กันเลยทีเดียว   ซึ่งยังไม่ตรงกับสิ่งที่เราต้องการเนื่องจากเราต้องการคนที่มี ศักยภาพ หรือ Potential ดังนั้นเราจึงได้ทำการ Scope ประเทศลงมาเนื่องจากสิ่งที่เราทำนั้นยังคงเป็น Global ครับ
  • #16: ต่อไปสำหรับสไลด์นี้นะครับจะเป็นสถิติภาพรวมของ Repo – Repo และ Contri-Contri ซึ่งจะได้ค่าต่างๆดังนี้ครับ