This document presents a method for clustering citation distributions of authors to categorize them semantically and predict future citations. It uses hierarchical clustering with normalized Euclidean distance on citation distributions. Clusters are evaluated based on homogeneity of citation patterns over time. Semantic features of author bibliometric data are represented using the BiDO ontology to link numeric and categorical data over time. The method was evaluated on a dataset of 20,000 computer scientists from 1990-2010. Future work involves augmenting features, applying it to groups, extending the ontology, and creating a linked bibliometric triplestore.
Related topics: