SlideShare a Scribd company logo
Large-scale analysis of bibliometric
data sources
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
8th LCDS Meeting: Statistics & Data Science
Leiden, November 13, 2015
About myself
• Master in computer science
• PhD thesis on bibliometric
mapping of science
• Researcher at CWTS since 2009
• Research focus on analysis and
visualization of bibliometric
networks
1
Centre for Science and Technology
Studies (CWTS)
• Research center at Leiden University
focusing on science and technology
studies
• About 30 staff members
• History of more than 25 years in
bibliometric and scientometric
research
• Contract research
• Full access to large bibliographic
database (Web of Science and
Scopus)
2
Bibliographic databases: ‘Big data’
3
Web of Science Scopus
Journals 12,000 20,000
Publications 45 million 35 million
Citations 1 billion 0.9 billion
Bibliometric networks
4
Web of
Science
Scopus
Citation network
of publications
Co-authorship network
of authors / organizations
Co-citation network
of pubs / authors / journals
Co-occurrence network
of terms
Bibliographic coupling network
of pubs / authors / journals
Bibliographic
database
Outline
• Software tools
• Network analysis techniques
• Analysis of data science
5
Software tools
6
Software tools
• VOSviewer (www.vosviewer.com)
– Tool for constructing and visualizing bibliometric networks
• CitNetExplorer (www.citnetexplorer.nl)
– Tool for visualizing and analyzing citation networks of
publications
• Both tools have been developed together
with my colleague Ludo Waltman 7
VOSviewer
8
Map of university co-authorship
network
9
Map of journal citation network
10
CitNetExplorer
11
Network
analysis
techniques
13
Network analysis techniques
14
Layout:
• Visualization of similarities
(VOS)
Community detection:
• Weighted modularity
• Smart local moving algorithm
Smart local moving algorithm
15
Q = 0.4198
Q = 0.3791
Reduced
network
Local moving
heuristic in
subnetworks
Local moving heuristic
Original
network
Algorithmically constructed
classification system of science
• 16.2 million publications from the period 2000–
2014 indexed in Web of Science
• 241.7 million citation relations
• Classification system of 3 hierarchical levels:
– 28 broad disciplines
– 813 fields
– 3,822 subfields
16
17
Breakdown of scientific literature into
813 fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Publications in scientometrics
subfield
18
Time-line map of highly cited
scientometrics publications
19
Analysis of
data science
20
What is data science?
• Empirical operationalization of data science based
on publications with ‘data’ in title or abstract
21
Wikipedia: “Data Science is an interdisciplinary field
about processes and systems to extract knowledge
or insights from data … which is a continuation of
some of the data analysis fields such as statistics,
data mining, and predictive analytics”
LCDS: “Data Science … deals with finding, analyzing
and validating complex patterns in data. Data
Science methods are indispensable for maintaining a
competitive edge in all disciplines in science”
Growth of data-driven research
22
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
1990 1995 2000 2005 2010 2015
Percentageofpublications
% 'data' publications % 'theory' publications
23
Breakdown of scientific literature into
813 fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
24
Data-driven nature of different
scientific fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
% pub. with ‘data’ in title or abstract
25
Data-driven nature of different
scientific fields
artificial
intelligence
statistics
bioinformatics
neuroimaging
pattern
recognition astronomy
earth
water
weather
climate
remote
sensing
nutrition
obesity
addiction
% pub. with ‘data’ in title or abstract
Data science fields (at least 20% ‘data’
publications)
26
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Term map of data science fields
27
28
Leiden University’s publication output
in data science fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Leiden University’s institutes with most
publications in data science fields
• Leiden Observatory
• LUMC
• Faculty of Archaeology
• Institute of Psychology (FSW)
• Centre for Science and Technology Studies (FSW)
• Mathematical Institute (Science)
• Institute of Biology Leiden (Science)
• Leiden Institute of Advanced Computer Science
(Science)
29
LUMC departments with most
publications in data science fields
• Medical Statistics and Bioinformatics
• Rheumatology
• Psychiatry
• Radiology
• Clinical Epidemiology
• Human Genetics
• Neurosurgery
• Cardiology
• Clinical Oncology
• Endocrinology 30
Term map based on Leiden University’s
publications in data science fields
31
Do it yourself!
32
www.vosviewer.com www.citnetexplorer.nl
Thank you for your attention!
33

More Related Content

PPTX
Applications of community detection in bibliometric network analysis
PPTX
A new software tool for large-scale analysis of citation networks
PDF
Large-scale analysis of bibliometric networks
PDF
Advanced bibliometric software tools for publishers and editors
PDF
Multiple perspectives on bibliometric data
PDF
Bibliometric network analysis: Software tools, techniques, and an analysis o...
PDF
Science Mapping and Research Positioning
PDF
Advanced citation matching and large-scale cited reference extraction
Applications of community detection in bibliometric network analysis
A new software tool for large-scale analysis of citation networks
Large-scale analysis of bibliometric networks
Advanced bibliometric software tools for publishers and editors
Multiple perspectives on bibliometric data
Bibliometric network analysis: Software tools, techniques, and an analysis o...
Science Mapping and Research Positioning
Advanced citation matching and large-scale cited reference extraction

What's hot (20)

PDF
A systematic empirical comparison of different approaches for normalizing cit...
PDF
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
PPTX
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
PDF
VOSviewer and CitNetExplorer Tutorial
PDF
VOSviewer: A software tool for analyzing and visualizing scientific literature
PDF
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
PDF
Network visualization: Fine-tuning layout techniques for different types of n...
PDF
Getting started with CitNetExplorer
PPTX
Large-scale visualization of science
PDF
Cluster stability
PPTX
Open data sources in VOSviewer
PPTX
Intermediacy of publications
PDF
Using full-text data to create improved term maps
PPTX
On cluster stability
PPTX
Scientometric approaches to classification
PPTX
Large-scale visualization of science: Methods, tools, and applications
PPTX
Visualizing science based on open data sources
PPTX
Bibliometric visualization using VOSviewer
PPTX
Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...
PPTX
The landscape of research on research
A systematic empirical comparison of different approaches for normalizing cit...
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
VOSviewer and CitNetExplorer Tutorial
VOSviewer: A software tool for analyzing and visualizing scientific literature
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
Network visualization: Fine-tuning layout techniques for different types of n...
Getting started with CitNetExplorer
Large-scale visualization of science
Cluster stability
Open data sources in VOSviewer
Intermediacy of publications
Using full-text data to create improved term maps
On cluster stability
Scientometric approaches to classification
Large-scale visualization of science: Methods, tools, and applications
Visualizing science based on open data sources
Bibliometric visualization using VOSviewer
Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...
The landscape of research on research
Ad

Viewers also liked (17)

PPT
Bibliographic coupling
PDF
Interactive topic identification using CitNetExplorer
PPT
Kevin Swingler: Introduction to Data Mining
PPTX
Implementing a Scholarly Impact Program for Faculty and Graduate Students
PPTX
The need for contextualized scientometric analysis
PPT
Mike Thelwall: Introduction to Webometrics
PPTX
Webometrics
PPTX
What is your h-index and other measures of impact
PPTX
Rodrigo Costas & Stefanie Haustein: Citation theories and their application t...
PPTX
Research-only rankings of HEIs: Is it possible to measure scientific performa...
PPTX
Comparing scientific performance across disciplines: Methodological and conce...
PDF
SSH & the City. A network approach for tracing the societal contribution of t...
PPTX
Citation analysis: State of the art, good practices, and future developments
PDF
How to build your own citation index
ZIP
Bibliometrics and scientometrics
PPTX
Bibliometrics, Scintometrics, Citation analysis, Content analysis
PPTX
Bibliometrics
Bibliographic coupling
Interactive topic identification using CitNetExplorer
Kevin Swingler: Introduction to Data Mining
Implementing a Scholarly Impact Program for Faculty and Graduate Students
The need for contextualized scientometric analysis
Mike Thelwall: Introduction to Webometrics
Webometrics
What is your h-index and other measures of impact
Rodrigo Costas & Stefanie Haustein: Citation theories and their application t...
Research-only rankings of HEIs: Is it possible to measure scientific performa...
Comparing scientific performance across disciplines: Methodological and conce...
SSH & the City. A network approach for tracing the societal contribution of t...
Citation analysis: State of the art, good practices, and future developments
How to build your own citation index
Bibliometrics and scientometrics
Bibliometrics, Scintometrics, Citation analysis, Content analysis
Bibliometrics
Ad

Similar to Large-scale analysis of bibliometric data sources (20)

PDF
Lcwebinar rise of-the_databrarian_73961
PPTX
1 UNIT-DSP.pptx
PPTX
Data science.chapter-1,2,3
PPTX
Open Science and Open Data for Librarians
PPTX
Data science unit1
PDF
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
PPTX
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
PDF
When data journalism meets science | Erice, June 10th, 2014
PDF
Data publication: Discover, Explore, Visualise
PPTX
Welcome to CS310!
PPTX
Rare (and emergent) disciplines in the light of science studies
PPTX
Data Science
PPTX
Data Science & Analytics (light overview)
PDF
Data Science In Societal Applications Siddharth Swarup Rautaray
PDF
Data Science In Societal Applications Siddharth Swarup Rautaray
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PPTX
Data Science
PPTX
A Non-technical Introduction to Data Science
PPTX
Data 101: A Gentle Introduction
PPTX
Unit 1 Introduction to Data Analytics .pptx
Lcwebinar rise of-the_databrarian_73961
1 UNIT-DSP.pptx
Data science.chapter-1,2,3
Open Science and Open Data for Librarians
Data science unit1
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
When data journalism meets science | Erice, June 10th, 2014
Data publication: Discover, Explore, Visualise
Welcome to CS310!
Rare (and emergent) disciplines in the light of science studies
Data Science
Data Science & Analytics (light overview)
Data Science In Societal Applications Siddharth Swarup Rautaray
Data Science In Societal Applications Siddharth Swarup Rautaray
Dataverse in the Universe of Data by Christine L. Borgman
Data Science
A Non-technical Introduction to Data Science
Data 101: A Gentle Introduction
Unit 1 Introduction to Data Analytics .pptx

More from Nees Jan van Eck (10)

PPTX
Crossref as a source of open bibliographic metadata
PPTX
Community detection using citation relations and textual similarities in a la...
PPTX
Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...
PPTX
A scientometric perspective on university ranking
PPTX
Open data sources in VOSviewer
PPTX
A scientometric perspective on university ranking
PPTX
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
PPTX
Open data sources in VOSviewer
PDF
Accuracy of citation data in Web of Science and Scopus
PDF
How to design a ranking system: Criteria and opportunities for a comparison
Crossref as a source of open bibliographic metadata
Community detection using citation relations and textual similarities in a la...
Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...
A scientometric perspective on university ranking
Open data sources in VOSviewer
A scientometric perspective on university ranking
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
Open data sources in VOSviewer
Accuracy of citation data in Web of Science and Scopus
How to design a ranking system: Criteria and opportunities for a comparison

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
famous lake in india and its disturibution and importance
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Biophysics 2.pdffffffffffffffffffffffffff
Sciences of Europe No 170 (2025)
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Introduction to Cardiovascular system_structure and functions-1
famous lake in india and its disturibution and importance
TOTAL hIP ARTHROPLASTY Presentation.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
2Systematics of Living Organisms t-.pptx
ECG_Course_Presentation د.محمد صقران ppt
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
7. General Toxicologyfor clinical phrmacy.pptx
Comparative Structure of Integument in Vertebrates.pptx
Cell Membrane: Structure, Composition & Functions
Introduction to Fisheries Biotechnology_Lesson 1.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
The KM-GBF monitoring framework – status & key messages.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Biophysics 2.pdffffffffffffffffffffffffff

Large-scale analysis of bibliometric data sources

  • 1. Large-scale analysis of bibliometric data sources Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University 8th LCDS Meeting: Statistics & Data Science Leiden, November 13, 2015
  • 2. About myself • Master in computer science • PhD thesis on bibliometric mapping of science • Researcher at CWTS since 2009 • Research focus on analysis and visualization of bibliometric networks 1
  • 3. Centre for Science and Technology Studies (CWTS) • Research center at Leiden University focusing on science and technology studies • About 30 staff members • History of more than 25 years in bibliometric and scientometric research • Contract research • Full access to large bibliographic database (Web of Science and Scopus) 2
  • 4. Bibliographic databases: ‘Big data’ 3 Web of Science Scopus Journals 12,000 20,000 Publications 45 million 35 million Citations 1 billion 0.9 billion
  • 5. Bibliometric networks 4 Web of Science Scopus Citation network of publications Co-authorship network of authors / organizations Co-citation network of pubs / authors / journals Co-occurrence network of terms Bibliographic coupling network of pubs / authors / journals Bibliographic database
  • 6. Outline • Software tools • Network analysis techniques • Analysis of data science 5
  • 8. Software tools • VOSviewer (www.vosviewer.com) – Tool for constructing and visualizing bibliometric networks • CitNetExplorer (www.citnetexplorer.nl) – Tool for visualizing and analyzing citation networks of publications • Both tools have been developed together with my colleague Ludo Waltman 7
  • 10. Map of university co-authorship network 9
  • 11. Map of journal citation network 10
  • 14. Network analysis techniques 14 Layout: • Visualization of similarities (VOS) Community detection: • Weighted modularity • Smart local moving algorithm
  • 15. Smart local moving algorithm 15 Q = 0.4198 Q = 0.3791 Reduced network Local moving heuristic in subnetworks Local moving heuristic Original network
  • 16. Algorithmically constructed classification system of science • 16.2 million publications from the period 2000– 2014 indexed in Web of Science • 241.7 million citation relations • Classification system of 3 hierarchical levels: – 28 broad disciplines – 813 fields – 3,822 subfields 16
  • 17. 17 Breakdown of scientific literature into 813 fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 19. Time-line map of highly cited scientometrics publications 19
  • 21. What is data science? • Empirical operationalization of data science based on publications with ‘data’ in title or abstract 21 Wikipedia: “Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data … which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics” LCDS: “Data Science … deals with finding, analyzing and validating complex patterns in data. Data Science methods are indispensable for maintaining a competitive edge in all disciplines in science”
  • 22. Growth of data-driven research 22 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 1990 1995 2000 2005 2010 2015 Percentageofpublications % 'data' publications % 'theory' publications
  • 23. 23 Breakdown of scientific literature into 813 fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 24. 24 Data-driven nature of different scientific fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering % pub. with ‘data’ in title or abstract
  • 25. 25 Data-driven nature of different scientific fields artificial intelligence statistics bioinformatics neuroimaging pattern recognition astronomy earth water weather climate remote sensing nutrition obesity addiction % pub. with ‘data’ in title or abstract
  • 26. Data science fields (at least 20% ‘data’ publications) 26 Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 27. Term map of data science fields 27
  • 28. 28 Leiden University’s publication output in data science fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 29. Leiden University’s institutes with most publications in data science fields • Leiden Observatory • LUMC • Faculty of Archaeology • Institute of Psychology (FSW) • Centre for Science and Technology Studies (FSW) • Mathematical Institute (Science) • Institute of Biology Leiden (Science) • Leiden Institute of Advanced Computer Science (Science) 29
  • 30. LUMC departments with most publications in data science fields • Medical Statistics and Bioinformatics • Rheumatology • Psychiatry • Radiology • Clinical Epidemiology • Human Genetics • Neurosurgery • Cardiology • Clinical Oncology • Endocrinology 30
  • 31. Term map based on Leiden University’s publications in data science fields 31
  • 32. Do it yourself! 32 www.vosviewer.com www.citnetexplorer.nl
  • 33. Thank you for your attention! 33