A paper is accepted in the 7th International Conference on Information System and Data Mining
- Title: Exploiting Topic Modelling for the Identification of Untapped Scientific Collaborations
- Authors: Arman Arzani, Marcus Handte, Matteo Zella, Pedro José Marrón
- Abstract: Finding potential collaborators has become a challenge due to the growing number of scientists in organizations such as universities, research institutes, or companies. Collaboration Recommendation Systems (CRSs) have been developed to help researchers identify possible collaboration partners, but they often rely on citation graphs or paper abstracts which may not be readily available in organizational databases or online sources. However, scientific publication titles provide consistent bibliometric data that can provide insights into research areas. TOMOSCO is a topic modelling framework that uses transformer-based methods to extract research area information from small amounts of text, such as publication titles or brief project descriptions. TOMOSCO can classify, cluster, and match research topics across different disciplines, uncovering relationships among scientists and suggesting potential interdisciplinary collaborations. In experiments, TOMOSCO was able to identify existing collaborations with over 90% accuracy based solely on publication titles and propose new collaborations based on previously unseen publications and project descriptions.