Which fields drive the h-index: Difference between revisions

Revision as of 11:10, 16 September 2024


Published
September 16, 2024
Title
Which fields drive the h-index?
Authors
Paolo Giudici, Luca Boscolo.
DOI
10.62684/FSOZ4761
Keywords
H-index, Poisson models, Scaling

Paolo Giudici^(a), Luca Boscolo^(b).

^(a) Department of Economics and Management Sciences, University of Pavia, Italy.

^(b) Top Italian Scientists founder.

Abstract

The measurement of the quality of academic research is often done by means of the h-index measure. Although widely accepted, the h-index has some issues and, in particular, it may depend on the scientific field in which a researcher operates. To date there is not a definitive answer as to whether this difference holds, and to what extent it varies. To fill the gap, we propose to operationally measure the difference in h-index across the sectors of a relatively homogeneous population of all scientists of a nation. To answer the heterogeneity issue we apply three different explainable machine learning models: linear regression, Poisson regression and tree models. Our results show that the latter two models better explain the data. They show that the only sectors for which a difference in h-index is significant are Physics, Biology and Clinical Sciences.

Introduction

The measurement of the quality of academic research is a rather controversial issue. in the 2000s [11] has proposed a measure that has the advantage of summarizing in a single summary statistics the information that is contained in the citation counts of each author. From that seminal paper, a large amount of research has been produces, focusing on in particular on the development of correction factors to the h index ([13], [3], [9]), [2], [12] that may take into account differences between sectors.

In this stream of research, [9] analysed the mathematical properties of the h index, and [3]proposed to employ a stochastic model for an author’s production/citation patterns. Following this mathematical formalisation, it becomes possible to analyse the h-index of individual researchers, whether or not in different fields, and compare them with each other.

Along a more empirical research line, [13] proposed to use a simple multiplicative correction to the h index to take into account the differences among researchers coming from different sectors, thus allowing a fair and sustainable comparison. They propose in particular a table with such normalizing factors, according to specific distributional assumptions of the citations. Their approach provides a simple way to explain and measure differences between different scientific fields. In a similar vein, [2] propose a rescaling procedure based on the Gini entropy and [12] propose a different rescaling, that takes into account the number of coautors: the fractional h-index.

We employ both streams of research as a starting point. More precisely, we follow [6], who, expanding the contribution of [9], propose a statistical approach that indicates that a Poisson distribution is a well suited approximation for the distribution of the h-index. In this paper we will show that a Poisson distribution is well suited to explain the drivers of the h-index. And we will employ this theoretical result to understand whether the h-index of a scientist depends on his/her filed of research, following the research line of [13], also followed by [15].

The paper is organized as follows: in Methodology section we review the proposal of [6] and formalise the model; in Application section we apply the new approach to a database of scientists homogeneous by nationality and, therefore, by scientific culture. Finally, Discussion section contains some concluding remarks.

Methodology

The paper of [11] has proposed a ”transparent, unbiased and very hard to rig measure” ([1]): the h index.

According to the definition, a scientist has index h if [math]\displaystyle{ h }[/math] of his or her n papers have at least [math]\displaystyle{ h }[/math] citations each and the other [math]\displaystyle{ (n-h) }[/math] papers have [math]\displaystyle{ ≤ h }[/math] citations each.

Following the work of Hirsch, many papers have discussed its application, especially in the bibliometric community. Some papers have focused on the statistical learning aspects behind the h index, and, among them, [9] who has stressed relevance of a ”statistical background” for the h index. Recently [?] has provided a complete statistical framework for the h index that holds for all sample sizes and respects the discrete nature of the citations data which are behind the h-index. We now recall their proposal as it dorms the basis of our analysis.

Let [math]\displaystyle{ X_1, . . . , X_n }[/math] be random variables which describe the number of citations of the n articles of a scientist. We assume that [math]\displaystyle{ X_1, . . . , X_n }[/math] are independent with a common citation distribution function [math]\displaystyle{ F }[/math]. Let us then assume that [math]\displaystyle{ F }[/math] is continuous, at least asymptotically, although the citation counts are integers. According to this assumption, the h index can be formally defined by the following:

Declarations

Conflict of Interest

The Authors declare that there is no conflict of interest.

References

@@ Line 57: / Line 57: @@
 The paper is organized as follows: in Methodology section we review the proposal of [6] and formalise the model; in Application section we apply the new approach to a database of scientists homogeneous by nationality and, therefore, by scientific culture. Finally, Discussion section contains some concluding remarks.
-===Methodology==
+==Methodology==
+The paper of [11] has proposed a ”transparent, unbiased and very hard to rig measure” ([1]): the h index.
+According to the definition, a scientist has index h if <math>h</math> of his or her n papers have at least <math>h</math> citations each and the other <math>(n-h)</math> papers have <math>≤ h</math> citations each.
+Following the work of Hirsch, many papers have discussed its application, especially in the bibliometric community. Some papers have focused on the statistical learning aspects behind the h index, and, among them, [9] who has stressed relevance of a ”statistical background” for the h index. Recently [?] has provided a complete statistical framework for the h index that holds for all sample sizes and respects the discrete nature of the citations data which are behind the h-index. We now recall their proposal as it dorms the basis of our analysis.
+Let <math>X_1, . . . , X_n</math> be random variables which describe the number of citations of the n articles of a scientist. We assume that <math>X_1, . . . , X_n</math> are independent with a common citation distribution function <math>F</math>. Let us then assume that <math>F</math> is continuous, at least asymptotically, although the citation counts are integers. According to this assumption, the h index can be formally defined by the following:
 ==Declarations==

Anonymous

Search

Which fields drive the h-index: Difference between revisions

Namespaces

More

Page actions

Revision as of 11:10, 16 September 2024

Contents

Abstract

Introduction

Methodology

Declarations

Conflict of Interest

References

Navigation

Navigation

Other links

Wiki tools

Wiki tools

Anonymous

Search

Which fields drive the h-index: Difference between revisions

Revision as of 11:10, 16 September 2024

Abstract

Introduction

Methodology

Declarations

Conflict of Interest

References

Navigation

Wiki tools

Page tools

Categories