Scientometric approaches to classification
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Colloquium Research Information Systems and Science Classifications: Revisiting the NARCIS Classification
Museum Meermanno, The Hague, The Netherlands
September 28, 2018
Outline
• Bibliographic databases
• Classification systems of scientific literature
• CWTS publication-level classification system of science
– Methodology
– Structure
– Applications
• Quality of classification systems
1
Bibliographic
databases
2
Bibliographic databases
3
Bibliographic databases
4
Web of Science Scopus
Journals 20,000 24,000
Publications 55 million 45 million
Citations 1.2 billion 1.2 billion
Classification systems
of scientific literature
5
Classification systems of scientific literature
• Mono-disciplinary vs. multidisciplinary
• Journal-level vs. publication-level
• Manual vs. algorithmic
6
Classification systems of scientific literature
• Mono-disciplinary:
– Chemical Abstracts: 80 different sections and 5 broad headings
– EconLit: Journal of Economic Literature (JEL) classification system
– PubMed: Medical Subject Headings (MeSH)
• Multidisciplinary:
– Web of Science: 250 categories
– Scopus (ASJC): bottom level has 304 categories and top level includes 27 categories
– Science-Metrix: 176 categories
– National Science Foundation (NSF): 125 categories
– University of California, San Diego (UCSD): more than 500 categories
– Australian and New Zealand Standard Research Classification (FoR): 3 hierarchical levels
7
CWTS publication-
level classification
system of science
8
Algorithmic classification system of science
• First version created in 2012
• Publications (not journals) are clustered into research areas based on citation
relations
• Research areas are defined at different levels of granularity and are
organized hierarchically
• Clustering is performed using the smart local moving algorithm (improved
Louvain algorithm; Waltman & Van Eck, 2013)
9
Objectives
To create a classification system
• in a fully algorithmic manner
• covering all sciences and social sciences
• at the level of individual publications
• with a hierarchical structure
• using transparent, freely available algorithms
• without excessive computational requirements
10
Main challenges
• Dealing with huge volumes of data
• Avoiding disciplinary biases
• Reaching a high level of accuracy
• Being flexible in terms of number of hierarchical levels and size of research
areas
• Obtaining proper labels for the research areas
• Keeping the methodology reasonably simple and transparent
11
Dealing with huge volumes of data
• Linking publications based on direct citations only; no co-citations,
bibliographic coupling, or word co-occurrences
• Efficient clustering algorithm based on ideas taken from:
– Newman (2004): Modularity-based clustering
– Blondel et al. (2008): ‘Louvain method’
– Waltman et al. (2010): VOS clustering technique
– Rotta & Noack (2011): Multilevel local search algorithms
12
Avoiding disciplinary biases
• cij: Relatedness of publications i and j, i.e., 1 if there is a direct citation
relation between i and j, 0 otherwise
• aij: Normalized relatedness of publications i and j, defined as
• Similar to fractional citation counting (Small & Sweeney, 1985)


k ik
ij
ij
c
c
a
13
Reaching a high level of accuracy
• Clustering technique based on maximization of a quality function:
• xi denotes the cluster (research area) to which publication i is assigned
• (xi, xj) = 1 if xi = xj and 0 otherwise
• r denotes a resolution parameter
• Quality function is maximized with respect to x1, ..., xn
 
i j
ijji raxx ))(,(
14
Being flexible in terms of number of hierarchical levels
and size of research areas
• Three types of parameters:
– Number of hierarchical levels
– Each level’s resolution parameter
– Each level’s minimum number of publications per research area
15
Obtaining proper labels for the research areas
1. Identification of terms in titles and abstracts of articles using part-of-speech
tagging
2. Calculation of term relevance scores based on a combination of a term’s
absolute and relative frequency of occurrence
3. Selection of the most relevant terms based on term relevance scores
combined with a filter for removing similar terms
16
CWTS publication-level classification system of
science
• 21.2 million publications from the period 2000–2017 indexed in Web of
Science
• 374.1 million citation relations
• Classification system of 3 hierarchical levels:
– 22 broad disciplines
– 868 fields
– 4,047 subfields
• Computational performance: less than 2 hours
17
18
Breakdown of scientific literature into 22 broad
disciplines
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
22 broad disciplines
19
20
Breakdown of scientific literature into 868 fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
21
Breakdown of scientific literature into 4,047 subfields
Social sciences
and humanities
Biomedical and
health sciences Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
22
Breakdown of scientific literature into 4,047 subfields
Social sciences
and humanities
Biomedical and
health sciences Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Scientometrics
Summary of scientometrics subfield
23
Cluster: 145
No. publications: 16,312
Top 5 terms No. pubs
bibliometric analysis 852
impact factor 495
h index 264
peer review 515
citation 642
Top 5 publications No. cits
hirsch, je (2005). an index to quantify an individual's scientific research output. p natl acad sci usa, 102(46), 16569-16572. 2,635
wuchty, s; et al. (2007). the increasing dominance of teams in production of knowledge. science, 316(5827), 1036-1039. 699
egghe, l (2006). theory and practise of the g-index. scientometrics, 69(1), 131-152. 609
king, da (2004). the scientific impact of nations. nature, 430(6997), 311-316. 496
newman, mej (2004). coauthorship networks and patterns of scientific collaboration. p natl acad sci usa, 101, 5200-5205. 488
Top 5 authors No. pubs Top 5 journals No. pubs
bornmann, l 221 scientometrics 2,865
thelwall, m 202 journal of informetrics 700
leydesdorff, l 175 journal of the american society for information science and technology 613
rousseau, r 161 plos one 339
egghe, l 133 research evaluation 324
Top 5 institutes No. pubs Top 5 departments No. pubs
univ granada 316 sch lib & informat sci (indiana univ) 106
kathol univ leuven 256 amsterdam sch commun res ascor (univ amsterdam) 97
leiden univ 249 ctr sci & technol studies (leiden univ) 90
indiana univ 246 sch publ policy (georgia inst technol - atlanta) 88
univ wolverhampton 216 trend res ctr (asia univ) 84
0
200
400
600
800
1,000
1,200
1,400
1,600
2000 2002 2004 2006 2008 2010 2012 2014 2016
No.publications
Publications in scientometrics subfield
24
25
Term map of scientometrics subfield
Peer review,
OA, careers,
and gender
CollaborationScientometric
indicators and
networks
Medical research
Country-level
analyses
26
Time-line map of highly cited scientometrics
publications
27
Overlay visualizations
Social sciences
and humanities
Biomedical and
health sciences Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Time trend
28
Social sciences
and humanities
Biomedical and
health sciences Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Time trend
29
MicroRNA Graphene
Summary of graphene subfield
30
Cluster: 9
No. publications: 27,771
Top 5 terms No. pubs
bilayer graphene 836
epitaxial graphene 491
silicene 401
graphene nanoribbon 1,035
graphene field effect transistor 207
Top 5 publications No. cits
novoselov, ks; et al. (2004). electric field effect in atomically thin carbon films. science, 306(5696), 666-669. 27,743
geim, ak; et al. (2007). the rise of graphene. nat mater, 6(3), 183-191. 20,073
novoselov, ks; et al. (2005). two-dimensional gas of massless dirac fermions in graphene. nature, 438(7065), 197-200. 11,359
castro neto, ah; et al. (2009). the electronic properties of graphene. rev mod phys, 81(1), 109-162. 11,368
zhang, yb; et al. (2005). experimental observation of the quantum hall effect and berry's phase in graphene. nature, 438(7065), 201-204. 8,110
Top 5 authors No. pubs Top 5 journals No. pubs
watanabe, k 249 physical review b 4,013
taniguchi, t 240 applied physics letters 1,834
peeters, fm 233 carbon 994
lin, mf 178 nano letters 906
katsnelson, mi 177 journal of applied physics 841
Top 5 institutes No. pubs Top 5 departments No. pubs
chinese acad sci 1,394 dept phys (natl univ singapore) 257
russian acad sci 778 inst phys (chinese acad sci) 226
peking univ 557 inst mol & mat (radboud univ nijmegen) 216
natl univ singapore 482 dept phys (mit) 209
tsing hua univ 458 dept phys (univ calif berkeley and berkeley national lab) 206
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
2000 2002 2004 2006 2008 2010 2012 2014 2016
No.publications
Open access
31
Social sciences
and humanities
Biomedical and
health sciences Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
University profiles
32
Delft University of TechnologyLeiden University
Applications
• Field normalization
– CWTS Leiden Ranking/U-Multirank
– Dutch University Medical Centers
• Field delineation
– European research funders
• High-resolution research strengths analysis
– European universities
– European research funders
• Identification of interdisciplinary and emerging research areas
– UK Engineering and Physical Sciences Research Council
33
Adopters and potential adopters
• Adopters:
– CWTS
– SciTech Strategies (e.g. SciVal)
– Royal School of Technology (KTH) Stockholm
• Potential adopters:
– Chinese Academy of Sciences
– European Research Council
– Max Planck
34
Quality of
classification systems
35
Empirical micro study using papers on overall water
splitting
• Haunschild et al. (2018)
• Case study comparing CWTS classification to
journal-based and manually constructed
classifications
• Ability of CWTS classification to distinguish
between fields is questioned
36
Accuracy of the journal classification systems of Web
of Science and Scopus
• Wang and Waltman (2016)
• Two criteria to identify journals with questionable
classifications:
– journals that have weak connections with their assigned
categories
– journals that are not assigned to categories with which they
have strong connections
• Web of Science performs significantly better than
Scopus
37
Field classification of publications in Dimensions
• Bornmann (2018)
• Field classification in Dimensions:
– Based on Fields of Research (FOR) from Australian and New
Zealand Standard Research Classification (ANZSRC)
– Machine learning approach
– Each publication is assigned to at least one field
• Based on Bornmann’s own publications
• Questions reliability and validity of Dimensions
classification
38
Response from Dimensions
• Herzog and Lunn (2018)
• Implementation at launch was first step and
requires improvements:
– Improvement of training sets
– Adding new subcategories to FOR system
39
Large-scale system to organize publications into
hierarchical concept structure
• Shen et al. (2018)
• Core component in Microsoft Academic
• Iterative approach to:
– concept discovery (Wikipedia)
– concept tagging to publications (both textual data and graph
structure are considered)
– concept hierarchy construction
• Based on 2000 initial seed concepts, over 228K
concepts have been identified
• Concepts are organized in six-level hierarchy
• 1 billion publication-concept relations
40
Conclusions
41
Conclusions
• Algorithmic approaches can be used to construct large-scale classifications
• Algorithmic classifications at the level of publications gain popularity
• Algorithmic possibilities depend on data availability
• Algorithmic classifications may have the disadvantage of mixing up different
principles for classifying items (e.g., research topic, research method,
scientific community, theoretical tradition, basic vs. applied)
42
Thank you for your attention!
43

More Related Content

PPTX
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
PPTX
Open data sources in VOSviewer
PPTX
Large-scale visualization of science
PDF
Accuracy of citation data in Web of Science and Scopus
PDF
Bibliometric network analysis: Software tools, techniques, and an analysis o...
PPTX
Visualizing science based on open data sources
PDF
Using full-text data to create improved term maps
PPTX
Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
Open data sources in VOSviewer
Large-scale visualization of science
Accuracy of citation data in Web of Science and Scopus
Bibliometric network analysis: Software tools, techniques, and an analysis o...
Visualizing science based on open data sources
Using full-text data to create improved term maps
Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...

What's hot (20)

PDF
VOSviewer and CitNetExplorer Tutorial
PDF
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
PPTX
Large-scale visualization of science: Methods, tools, and applications
PDF
Advanced citation matching and large-scale cited reference extraction
PDF
Science Mapping and Research Positioning
PPTX
Intermediacy of publications
PDF
VOSviewer: A software tool for analyzing and visualizing scientific literature
PPTX
Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...
PPTX
Bibliometric visualization using VOSviewer
PPTX
Applications of community detection in bibliometric network analysis
PDF
Large-scale analysis of bibliometric data sources
PPTX
Scientometrics for research assessment
PPTX
Scientific information retrieval: Challenges and opportunities
PPTX
Crossref as a source of open bibliographic metadata
PPTX
Comparing bibliographic data sources
PPTX
The landscape of research on research
PPTX
Open data sources in VOSviewer
PDF
A systematic empirical comparison of different approaches for normalizing cit...
PPTX
Comparing scientific performance across disciplines: Methodological and conce...
PDF
Multiple perspectives on bibliometric data
VOSviewer and CitNetExplorer Tutorial
VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...
Large-scale visualization of science: Methods, tools, and applications
Advanced citation matching and large-scale cited reference extraction
Science Mapping and Research Positioning
Intermediacy of publications
VOSviewer: A software tool for analyzing and visualizing scientific literature
Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...
Bibliometric visualization using VOSviewer
Applications of community detection in bibliometric network analysis
Large-scale analysis of bibliometric data sources
Scientometrics for research assessment
Scientific information retrieval: Challenges and opportunities
Crossref as a source of open bibliographic metadata
Comparing bibliographic data sources
The landscape of research on research
Open data sources in VOSviewer
A systematic empirical comparison of different approaches for normalizing cit...
Comparing scientific performance across disciplines: Methodological and conce...
Multiple perspectives on bibliometric data
Ad

Similar to Scientometric approaches to classification (20)

PPT
MESUR: Making sense and use of usage data
PPT
Value-added services for the Wageningen Institutional Repository (WaY)
PDF
Investigation of Partition Cells as a Structural Basis Suitable for Assessmen...
PDF
Using Bibliometrics in the Library
PPT
Paper 6: World University's Evaluation (Qiu & Zhao)
PPTX
Bibliometric analysis tools on top of the university’s bibliographic database...
PPTX
A new role for libraries in research assessments
PPTX
Where to publish_130709
PDF
Öppen data och forskningens genomslag
PPT
Publication strategy for LEI
PPT
Presentation of a bibliometric Analysis of Quantum machine Learning.ppt
PDF
Broad altmetric analysis of Mendeley readerships through the ‘academic status...
PPTX
What is your h-index and other measures of impact
PPTX
Towards Automatic Classification of LOD Datasets
PDF
PLOS Visualization Project
PPTX
THOR Workshop - Data Publishing PLOS
PPTX
Determining cognitive distance between publication portfolios of evaluators a...
PDF
بنك المعرفة-المصرى
PPTX
A new software tool for large-scale analysis of citation networks
PDF
بنك المعرفة المصرى Egyptian knowledge bank
MESUR: Making sense and use of usage data
Value-added services for the Wageningen Institutional Repository (WaY)
Investigation of Partition Cells as a Structural Basis Suitable for Assessmen...
Using Bibliometrics in the Library
Paper 6: World University's Evaluation (Qiu & Zhao)
Bibliometric analysis tools on top of the university’s bibliographic database...
A new role for libraries in research assessments
Where to publish_130709
Öppen data och forskningens genomslag
Publication strategy for LEI
Presentation of a bibliometric Analysis of Quantum machine Learning.ppt
Broad altmetric analysis of Mendeley readerships through the ‘academic status...
What is your h-index and other measures of impact
Towards Automatic Classification of LOD Datasets
PLOS Visualization Project
THOR Workshop - Data Publishing PLOS
Determining cognitive distance between publication portfolios of evaluators a...
بنك المعرفة-المصرى
A new software tool for large-scale analysis of citation networks
بنك المعرفة المصرى Egyptian knowledge bank
Ad

More from Nees Jan van Eck (13)

PPTX
Community detection using citation relations and textual similarities in a la...
PPTX
Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...
PPTX
A scientometric perspective on university ranking
PPTX
A scientometric perspective on university ranking
PPTX
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
PPTX
Open data sources in VOSviewer
PDF
How to design a ranking system: Criteria and opportunities for a comparison
PDF
Advanced bibliometric software tools for publishers and editors
PDF
Large-scale analysis of bibliometric networks
PPTX
On cluster stability
PDF
Network visualization: Fine-tuning layout techniques for different types of n...
PDF
Cluster stability
PDF
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
Community detection using citation relations and textual similarities in a la...
Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...
A scientometric perspective on university ranking
A scientometric perspective on university ranking
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking
Open data sources in VOSviewer
How to design a ranking system: Criteria and opportunities for a comparison
Advanced bibliometric software tools for publishers and editors
Large-scale analysis of bibliometric networks
On cluster stability
Network visualization: Fine-tuning layout techniques for different types of n...
Cluster stability
CWTS Leiden Ranking: An advanced bibliometric approach to university ranking

Recently uploaded (20)

PPTX
Introcution to Microbes Burton's Biology for the Health
PPT
Computional quantum chemistry study .ppt
PDF
Packaging materials of fruits and vegetables
PDF
Wound infection.pdfWound infection.pdf123
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
Understanding the Circulatory System……..
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
PMR- PPT.pptx for students and doctors tt
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPT
veterinary parasitology ````````````.ppt
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
LEC Synthetic Biology and its application.ppt
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
Introcution to Microbes Burton's Biology for the Health
Computional quantum chemistry study .ppt
Packaging materials of fruits and vegetables
Wound infection.pdfWound infection.pdf123
A powerpoint on colorectal cancer with brief background
Understanding the Circulatory System……..
Presentation1 INTRODUCTION TO ENZYMES.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Enhancing Laboratory Quality Through ISO 15189 Compliance
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Seminar Hypertension and Kidney diseases.pptx
PMR- PPT.pptx for students and doctors tt
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
veterinary parasitology ````````````.ppt
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
LEC Synthetic Biology and its application.ppt
Hypertension_Training_materials_English_2024[1] (1).pptx

Scientometric approaches to classification

  • 1. Scientometric approaches to classification Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Colloquium Research Information Systems and Science Classifications: Revisiting the NARCIS Classification Museum Meermanno, The Hague, The Netherlands September 28, 2018
  • 2. Outline • Bibliographic databases • Classification systems of scientific literature • CWTS publication-level classification system of science – Methodology – Structure – Applications • Quality of classification systems 1
  • 5. Bibliographic databases 4 Web of Science Scopus Journals 20,000 24,000 Publications 55 million 45 million Citations 1.2 billion 1.2 billion
  • 7. Classification systems of scientific literature • Mono-disciplinary vs. multidisciplinary • Journal-level vs. publication-level • Manual vs. algorithmic 6
  • 8. Classification systems of scientific literature • Mono-disciplinary: – Chemical Abstracts: 80 different sections and 5 broad headings – EconLit: Journal of Economic Literature (JEL) classification system – PubMed: Medical Subject Headings (MeSH) • Multidisciplinary: – Web of Science: 250 categories – Scopus (ASJC): bottom level has 304 categories and top level includes 27 categories – Science-Metrix: 176 categories – National Science Foundation (NSF): 125 categories – University of California, San Diego (UCSD): more than 500 categories – Australian and New Zealand Standard Research Classification (FoR): 3 hierarchical levels 7
  • 10. Algorithmic classification system of science • First version created in 2012 • Publications (not journals) are clustered into research areas based on citation relations • Research areas are defined at different levels of granularity and are organized hierarchically • Clustering is performed using the smart local moving algorithm (improved Louvain algorithm; Waltman & Van Eck, 2013) 9
  • 11. Objectives To create a classification system • in a fully algorithmic manner • covering all sciences and social sciences • at the level of individual publications • with a hierarchical structure • using transparent, freely available algorithms • without excessive computational requirements 10
  • 12. Main challenges • Dealing with huge volumes of data • Avoiding disciplinary biases • Reaching a high level of accuracy • Being flexible in terms of number of hierarchical levels and size of research areas • Obtaining proper labels for the research areas • Keeping the methodology reasonably simple and transparent 11
  • 13. Dealing with huge volumes of data • Linking publications based on direct citations only; no co-citations, bibliographic coupling, or word co-occurrences • Efficient clustering algorithm based on ideas taken from: – Newman (2004): Modularity-based clustering – Blondel et al. (2008): ‘Louvain method’ – Waltman et al. (2010): VOS clustering technique – Rotta & Noack (2011): Multilevel local search algorithms 12
  • 14. Avoiding disciplinary biases • cij: Relatedness of publications i and j, i.e., 1 if there is a direct citation relation between i and j, 0 otherwise • aij: Normalized relatedness of publications i and j, defined as • Similar to fractional citation counting (Small & Sweeney, 1985)   k ik ij ij c c a 13
  • 15. Reaching a high level of accuracy • Clustering technique based on maximization of a quality function: • xi denotes the cluster (research area) to which publication i is assigned • (xi, xj) = 1 if xi = xj and 0 otherwise • r denotes a resolution parameter • Quality function is maximized with respect to x1, ..., xn   i j ijji raxx ))(,( 14
  • 16. Being flexible in terms of number of hierarchical levels and size of research areas • Three types of parameters: – Number of hierarchical levels – Each level’s resolution parameter – Each level’s minimum number of publications per research area 15
  • 17. Obtaining proper labels for the research areas 1. Identification of terms in titles and abstracts of articles using part-of-speech tagging 2. Calculation of term relevance scores based on a combination of a term’s absolute and relative frequency of occurrence 3. Selection of the most relevant terms based on term relevance scores combined with a filter for removing similar terms 16
  • 18. CWTS publication-level classification system of science • 21.2 million publications from the period 2000–2017 indexed in Web of Science • 374.1 million citation relations • Classification system of 3 hierarchical levels: – 22 broad disciplines – 868 fields – 4,047 subfields • Computational performance: less than 2 hours 17
  • 19. 18 Breakdown of scientific literature into 22 broad disciplines Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 21. 20 Breakdown of scientific literature into 868 fields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 22. 21 Breakdown of scientific literature into 4,047 subfields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 23. 22 Breakdown of scientific literature into 4,047 subfields Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering Scientometrics
  • 24. Summary of scientometrics subfield 23 Cluster: 145 No. publications: 16,312 Top 5 terms No. pubs bibliometric analysis 852 impact factor 495 h index 264 peer review 515 citation 642 Top 5 publications No. cits hirsch, je (2005). an index to quantify an individual's scientific research output. p natl acad sci usa, 102(46), 16569-16572. 2,635 wuchty, s; et al. (2007). the increasing dominance of teams in production of knowledge. science, 316(5827), 1036-1039. 699 egghe, l (2006). theory and practise of the g-index. scientometrics, 69(1), 131-152. 609 king, da (2004). the scientific impact of nations. nature, 430(6997), 311-316. 496 newman, mej (2004). coauthorship networks and patterns of scientific collaboration. p natl acad sci usa, 101, 5200-5205. 488 Top 5 authors No. pubs Top 5 journals No. pubs bornmann, l 221 scientometrics 2,865 thelwall, m 202 journal of informetrics 700 leydesdorff, l 175 journal of the american society for information science and technology 613 rousseau, r 161 plos one 339 egghe, l 133 research evaluation 324 Top 5 institutes No. pubs Top 5 departments No. pubs univ granada 316 sch lib & informat sci (indiana univ) 106 kathol univ leuven 256 amsterdam sch commun res ascor (univ amsterdam) 97 leiden univ 249 ctr sci & technol studies (leiden univ) 90 indiana univ 246 sch publ policy (georgia inst technol - atlanta) 88 univ wolverhampton 216 trend res ctr (asia univ) 84 0 200 400 600 800 1,000 1,200 1,400 1,600 2000 2002 2004 2006 2008 2010 2012 2014 2016 No.publications
  • 26. 25 Term map of scientometrics subfield Peer review, OA, careers, and gender CollaborationScientometric indicators and networks Medical research Country-level analyses
  • 27. 26 Time-line map of highly cited scientometrics publications
  • 28. 27 Overlay visualizations Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 29. Time trend 28 Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 31. Summary of graphene subfield 30 Cluster: 9 No. publications: 27,771 Top 5 terms No. pubs bilayer graphene 836 epitaxial graphene 491 silicene 401 graphene nanoribbon 1,035 graphene field effect transistor 207 Top 5 publications No. cits novoselov, ks; et al. (2004). electric field effect in atomically thin carbon films. science, 306(5696), 666-669. 27,743 geim, ak; et al. (2007). the rise of graphene. nat mater, 6(3), 183-191. 20,073 novoselov, ks; et al. (2005). two-dimensional gas of massless dirac fermions in graphene. nature, 438(7065), 197-200. 11,359 castro neto, ah; et al. (2009). the electronic properties of graphene. rev mod phys, 81(1), 109-162. 11,368 zhang, yb; et al. (2005). experimental observation of the quantum hall effect and berry's phase in graphene. nature, 438(7065), 201-204. 8,110 Top 5 authors No. pubs Top 5 journals No. pubs watanabe, k 249 physical review b 4,013 taniguchi, t 240 applied physics letters 1,834 peeters, fm 233 carbon 994 lin, mf 178 nano letters 906 katsnelson, mi 177 journal of applied physics 841 Top 5 institutes No. pubs Top 5 departments No. pubs chinese acad sci 1,394 dept phys (natl univ singapore) 257 russian acad sci 778 inst phys (chinese acad sci) 226 peking univ 557 inst mol & mat (radboud univ nijmegen) 216 natl univ singapore 482 dept phys (mit) 209 tsing hua univ 458 dept phys (univ calif berkeley and berkeley national lab) 206 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 2000 2002 2004 2006 2008 2010 2012 2014 2016 No.publications
  • 32. Open access 31 Social sciences and humanities Biomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 33. University profiles 32 Delft University of TechnologyLeiden University
  • 34. Applications • Field normalization – CWTS Leiden Ranking/U-Multirank – Dutch University Medical Centers • Field delineation – European research funders • High-resolution research strengths analysis – European universities – European research funders • Identification of interdisciplinary and emerging research areas – UK Engineering and Physical Sciences Research Council 33
  • 35. Adopters and potential adopters • Adopters: – CWTS – SciTech Strategies (e.g. SciVal) – Royal School of Technology (KTH) Stockholm • Potential adopters: – Chinese Academy of Sciences – European Research Council – Max Planck 34
  • 37. Empirical micro study using papers on overall water splitting • Haunschild et al. (2018) • Case study comparing CWTS classification to journal-based and manually constructed classifications • Ability of CWTS classification to distinguish between fields is questioned 36
  • 38. Accuracy of the journal classification systems of Web of Science and Scopus • Wang and Waltman (2016) • Two criteria to identify journals with questionable classifications: – journals that have weak connections with their assigned categories – journals that are not assigned to categories with which they have strong connections • Web of Science performs significantly better than Scopus 37
  • 39. Field classification of publications in Dimensions • Bornmann (2018) • Field classification in Dimensions: – Based on Fields of Research (FOR) from Australian and New Zealand Standard Research Classification (ANZSRC) – Machine learning approach – Each publication is assigned to at least one field • Based on Bornmann’s own publications • Questions reliability and validity of Dimensions classification 38
  • 40. Response from Dimensions • Herzog and Lunn (2018) • Implementation at launch was first step and requires improvements: – Improvement of training sets – Adding new subcategories to FOR system 39
  • 41. Large-scale system to organize publications into hierarchical concept structure • Shen et al. (2018) • Core component in Microsoft Academic • Iterative approach to: – concept discovery (Wikipedia) – concept tagging to publications (both textual data and graph structure are considered) – concept hierarchy construction • Based on 2000 initial seed concepts, over 228K concepts have been identified • Concepts are organized in six-level hierarchy • 1 billion publication-concept relations 40
  • 43. Conclusions • Algorithmic approaches can be used to construct large-scale classifications • Algorithmic classifications at the level of publications gain popularity • Algorithmic possibilities depend on data availability • Algorithmic classifications may have the disadvantage of mixing up different principles for classifying items (e.g., research topic, research method, scientific community, theoretical tradition, basic vs. applied) 42
  • 44. Thank you for your attention! 43