SlideShare a Scribd company logo
Ana	Paula	Appel
Data	Scientist	&	Master	Inventor
Discovering	the	hidden	treasure	of	data	using	graph	
analytic
©	2015	IBM	Corporation2
IBM Research – Brazil
view from Rio de Janeiro Lab
Mission:	To	be	known	for	our	science	and	technology	and	vital	to	IBM,	Brazil,	our	
clients	in	the	region	and	worldwide
Healthcare Data
• Medical	attention	transactional	data
• Large	healthcare	insurance	company	in	
Brazil
• Nationwide
• Spanning	1.5	years	(2013-2014)
• 0.6	Tb	(compressed)
©	2015	IBM	Corporation5
Healthcare Data:	Stakeholders
Physicians
Patients
Healthcare	providers
Health	Services
Claims
Health	Insurance	
Company
©	2015	IBM	Corporation6
• Paid Claims
• Total:	109M
• Doctors:	220k	(almost	half	of	all	doctors	In	Brazil)
• Patients:	2.2M
• Unique	Doctor-Patient	pairs:	11.6M
• Other support data:	
• Company
• Providers
• Authorizations ~3M
• Claim denials ~13M
• Geolocation
• ...
Over	40	tables,	
hundreds of fields
Healthcare Data:	Claims
CLAIM
• Physician	ID
• Patient	ID
• Timestamp
• Service	code
• Disease	– ICD9
• (80+	extra	rows)
©	2015	IBM	Corporation7
A	Complex Network	Perspective
©	2015	IBM	Corporation8
PhysID ICD9				 PatientID DATE
SP45962 - 1001									 09/04/13
SP45962 Z017		 1001 26/04/13
SP47108 Z017		 1001 06/12/13
SP47108 Z017		 1001 16/12/13
SP45962 - 1002 11/07/13
SP45962 Z017		 1002 12/07/13
SP45962 - 1002 19/08/13
SP59938 Z000			 1002 24/10/13
… … … …
Bipartite	graph
Weighted	graph
Directed	graph
• Bipartite	network	of	doctors	and	patients
• |V|=2.4M,	|E|=11.6M
• Keep	only	the	largest	connected	component	(92%-99%	of	all	links)
• Remove	multiple	edges	and	map	to	weights
A	Network	Approach
©	2015	IBM	Corporation9
Phys - Patient
Nodes	=	402	
Links			=	403
Patient	- Patient
Nodes	=	377
Links	=	5488
Phys - Phys
Nodes	=	25
Links	=	30
Patient-Sharing networks	
Links	represent	
a	shared	patient
©	2015	IBM	Corporation10
One	patient	with		
123	different	
physicians
409k	patients	with	
only	1	physician
Patient	Histogram Physician	Histogram
Physican and Patient Degree Distributions
26	physicians	with	more	
than	5k	different	
patients,	1	with	30k
(possibly	spurious)
©	2015	IBM	Corporation11
Network-Derived Metrics
• Aim:	extend the doctors description with
relevant metrics
• Metrics which,	in	combination with other
data,	will allow to:	
• classify
• filter
• reduce
35 0.1 3.2 0 4% 7% ... ...
17 0.2 5.1 1 9% 1% ... ...
Compliant doctors Not-compliant doctors
Case:	Build	Metrics for	Describe Physicians using
Complex Network
Mutual	Reference CentralityLoyalty
Health	Insurance:	Similarity	between	Complex	
Network
Friendship Physician	Network
©	2015	IBM	Corporation14
Mutual	Reference
©	2015	IBM	Corporation15
a b
w(ab)	=	17
Δt =	7	days
w(ba)	=	8
Δt =	2 days
time
1 1 2 2
a b b a
visit visit visit visit
Patients
Doctors
Mutual	Reference
Same	patient	visits	two	doctors	
+	
Happens	in	both	directions	
Δt =	7	days Δt =	2	days
Reciprocal	Link
Goal
Identify	strong	connections	between	each	pair	of	physicians,	in	particular,	the	outliers.
©	2015	IBM	Corporation16
BA DF SP
Top	50Top	20
PE RJ
Dens.:
Dens.:
0.809 0.4470.8050.845
0.913 0.963 0.834 0.568 0.802
0.576
Mutual	Reference
Mutual	Reference
Alergy Oftalmology
©	2015	IBM	Corporation18
Mutual	Reference
Conclusions	and	Insights
• Claim	data	is	rich	to	identify	connections	among	physicians	
and	how	a partnership	is	done.	
• The	Mutual	Reference	is	an	indicative	of	physician	
relationship	and	can	potentially	generate	other	analyses,	
especially	in	a	large	volume	of	data.	
• The	proposed	metric	makes	possible	a	frequent	
computational	analyze	of	that	relationship.	
Physician A Physician B rm Rank
MMS028 MMS027 1 1
MSP145 MSP144 0.31 10
Mutual	Reference
• Specialties	that	appear	more
• Ophthalmology	to	ophthalmology
• Gynecologic	and	obstetrician	to	Gynecologic	and	
obstetrician	
• DF	has	most	of	consultation	with	irregular	interval	
• MDF010 and	MDF009 with	267	consultations	and	
average	of	days	equal	to	0
• Top	pair;
• 205	from	MMS028 to	MMS027
• 196	from	MMS027 to	MMS028
©	2015	IBM	Corporation19
Patient Loyalty
©	2015	IBM	Corporation20
Patient Loyalty
Goal
Identify (and quantify)	doctors that have recurring patients in	a	systematic way,	
suggesting ‘loyalty’
1.	Consider	patients	with	many	visits	to	doctors
2.	Compute	the	relative	weight	for	each	doctor	visited
3.	Count	the	relative	number	of	‘loyal’	patients	for	that	doctor
Time
Consultations
©	2015	IBM	Corporation21
Patient Loyalty
São	Paulo
1.00
• Weight	wij represents	the	number	of	visits	of	patient	i to	dr.	j
• Strength	s:	sum	of	the	weights	attached	to	links	belonging	
to	a	node	(i.e.,	all	visits	from	i)
• Relative	weight rw(ij):	fraction	of	weight	ij over	total
Strength	s
Degree	k
High	rw Low	rw
©	2015	IBM	Corporation22
• The	more	patients	with	high	rw and	high	
s,	the	most	likely	the	doctor	is	a	
candidate	to	have	‘loyalty’	capacity
• Stability:	Many	doctors	maintain	
sustained	values	of	the	metric	across	
time.
• A	given	doctor	is	in	rank	1	or	2	during	
all	5	quarters.
• 20%	mean	turnover	across	quarters
• Top	5	specialty	among	physicians	with	higher	
loyalty	(mf >	0.5)
• Orthopedic	and	traumatology	(5	in	top	10)
• Ophthalmology	(3)
• Gynecologic		and	obstetrician(2)
• Pediatric	(1)
Patient Loyalty
Relativeweight
strength strength
Cardio Cardio
Physician mf RANK
MSP 139 1.54 175
MSP 261 1.18 432
Loyalty
©	2015	IBM	Corporation23
Centrality
©	2015	IBM	Corporation24
Goal
Identify	physicians	role	in	the	network	using	their	relative	importance	over	other	
physicians.
• We	applied	several	centrality	measures:
• Eigenvalue;
• Degree;
• Betweeness;
• Closeness
• Do	the	values	of	these	metrics	change	overtime?
• Is	it	seasonal?
Physician Centrality
physician eigen Rank Grau
MSP 153 1 1 253
MSP 139 0.55 8 335
2Q	2014
Centrality
Conclusion	and	insights
• Centrality	recommends	which	physicians	are	important	in	the	physician	
community	
• There	is	a	set	of	physicians	with	high	scores	
• This	set	of	physician	has	a	a	higher	number	of	patients		in	common	
building	a	block
• The	relative	centrality	has	a	positive	correlation	among	close	physicians
• This	group	of	physician	with	high	score	is	stable	overtime,	with	few	change	
in	each	quartile.
©	2015	IBM	Corporation25
Summary &	Take Home	Messages
• Networks	are	all	about	relationships,	as	most	data	is.	
• Network-derived	insights	are	usually	not	reachable	from	other	analyses.
• Complex	Networks	methods	are	very	valuable	to	data	science.
• Large	Healthcare	claim	database	from	Brazilian	insurance	company.
• Applied	complex	network	methods	to	find	how	physicians	build	their	
network.
• Examples:	Temporality,	reciprocity	and	‘loyalty’.
Where find more	information..	
Introduction basic Advanced
Database API’s Visualization
GRAPH	ANALYTICS
Thanks!
apappel@br.ibm.com

More Related Content

DOC
asha legree
PDF
Liberty pediatrics adhd_patient_database
PPT
Presentation 2010 mMD EMR\\EHR
PDF
Consumers' Checkbook Submission to RWJF & HHS Provider Network Challenge
PDF
patient charting
PPTX
Direct Boot Camp 2 0 Federal Agency requirements for exchange via direct
PDF
DocSpot Plan Compare
PDF
Vericred
asha legree
Liberty pediatrics adhd_patient_database
Presentation 2010 mMD EMR\\EHR
Consumers' Checkbook Submission to RWJF & HHS Provider Network Challenge
patient charting
Direct Boot Camp 2 0 Federal Agency requirements for exchange via direct
DocSpot Plan Compare
Vericred

What's hot (20)

PPTX
Smart health prediction using data mining by customsoft
PPTX
Point of-care final version
PDF
HEALTH INFORMATION MANAGEMENT AND MEDICAL RECORDS REQUEST
 
PPT
Elmallah june27 11am_room230_a
DOC
PPT
Health-e-cITi NJ
PDF
Mainstay DATASTAX
PPT
Medical Banking Leadership Forum
PPTX
Workflow & Business Process Automation Opportunities in the Healthcare Market
PPTX
Doctors for Kroner - Presentation at #Dataharvest 2012
PPT
E Healthcare Systems Hb Emr Prep Pp
PDF
How to Make a boring slide Interesting WITHOUT using images
PDF
American Heart Association Case Study
DOCX
Natalia Korina_3
PPTX
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
PPTX
Nexus project showcase - HealthOne
PPTX
What is e prescribing
PPTX
Electronic Health Exchange by Avinash Bodare
DOCX
Resume' 2016 June
Smart health prediction using data mining by customsoft
Point of-care final version
HEALTH INFORMATION MANAGEMENT AND MEDICAL RECORDS REQUEST
 
Elmallah june27 11am_room230_a
Health-e-cITi NJ
Mainstay DATASTAX
Medical Banking Leadership Forum
Workflow & Business Process Automation Opportunities in the Healthcare Market
Doctors for Kroner - Presentation at #Dataharvest 2012
E Healthcare Systems Hb Emr Prep Pp
How to Make a boring slide Interesting WITHOUT using images
American Heart Association Case Study
Natalia Korina_3
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Nexus project showcase - HealthOne
What is e prescribing
Electronic Health Exchange by Avinash Bodare
Resume' 2016 June
Ad

Similar to Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017 (20)

PPTX
CISummit: Zach Henderson, Data-tastick: A Deep Dive on Data Driving Networks
PDF
Blockchain-Based AI-Assisted Hospital Management System
PDF
Introduction to Doctor Social Graph Project
PPTX
PageRank for Anomaly Detection
PDF
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
PPTX
PageRank for Anomaly Detection
PPTX
PageRank for anomaly detection - Hadoop Summit
PPTX
Page rank for anomaly detection - Big Things meetup in Israel
PDF
Personalised Medicine | Topic #4 of PharmaLedger's 2nd Open Webinar
PDF
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
PDF
Page rank for anomaly detection
PPTX
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
PDF
Research and Digital Priorities for the Health Sector
PPT
Turning Big Data Insights into Action through Advanced Analytics
PPTX
Going Beyond the EMR for Data-driven Insights in Healthcare
PPTX
Real-world evidence and claims databases
PDF
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
PDF
Primary Care data signposting
PDF
A Data-centric perspective on Data-driven healthcare: a short overview
PPTX
ClickMedix Case Studies 2015
CISummit: Zach Henderson, Data-tastick: A Deep Dive on Data Driving Networks
Blockchain-Based AI-Assisted Hospital Management System
Introduction to Doctor Social Graph Project
PageRank for Anomaly Detection
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
PageRank for Anomaly Detection
PageRank for anomaly detection - Hadoop Summit
Page rank for anomaly detection - Big Things meetup in Israel
Personalised Medicine | Topic #4 of PharmaLedger's 2nd Open Webinar
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
Page rank for anomaly detection
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Research and Digital Priorities for the Health Sector
Turning Big Data Insights into Action through Advanced Analytics
Going Beyond the EMR for Data-driven Insights in Healthcare
Real-world evidence and claims databases
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Primary Care data signposting
A Data-centric perspective on Data-driven healthcare: a short overview
ClickMedix Case Studies 2015
Ad

More from PAPIs.io (20)

PDF
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
PDF
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
PDF
Extracting information from images using deep learning and transfer learning ...
PDF
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
PDF
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PDF
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
PDF
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
PDF
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
PDF
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
PDF
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
PDF
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
PDF
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
PDF
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
PDF
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
PDF
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
PDF
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
PDF
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
PDF
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
PDF
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
PDF
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Extracting information from images using deep learning and transfer learning ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...

Recently uploaded (20)

PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
project resource management chapter-09.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Architecture types and enterprise applications.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
1 - Historical Antecedents, Social Consideration.pdf
Enhancing emotion recognition model for a student engagement use case through...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
project resource management chapter-09.pdf
A novel scalable deep ensemble learning framework for big data classification...
Programs and apps: productivity, graphics, security and other tools
Architecture types and enterprise applications.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Developing a website for English-speaking practice to English as a foreign la...
A comparative study of natural language inference in Swahili using monolingua...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Hybrid model detection and classification of lung cancer
A contest of sentiment analysis: k-nearest neighbor versus neural network
Zenith AI: Advanced Artificial Intelligence
TLE Review Electricity (Electricity).pptx
Group 1 Presentation -Planning and Decision Making .pptx
cloud_computing_Infrastucture_as_cloud_p

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM research) @PAPIs Connect — São Paulo 2017