NOTES  DE LECTURE 

  1. Maurice Allais  
  2. Huber-Darmois  
  3. R.A. Fisher    
  4. Bourdieu  
  5. Mahalanobis    
  6. G. Calot  
  7. Paul Valéry  
  8. Edmond Malinvaud

NOTES DE LECTURE: P.C.MAHALANOBIS 


Historical note on the D²-statistic (1948),
Sankhya; 9, 237-240 
        Karl Pearson had devised the Coefficient of Racial Likeness (C.R.L.) for comparing the resemblance of racial groups and this coefficient was first used in 1921 by Miss M.L. Tildesley in her paper on "A First Study of the Burmese Skull" (Biom. Vol. 13, 1921, 247-251). At about the same time I had started the statistical analysis of anthropological measurements of Anglo-Indians taken by the late Dr. N. Annandale, and had been considering the problem of classification. I soon realised that Karl Pearson's C² was properly a test of divergence between two samples rather than a measure of the actual magnitude of the divergence. I however, wanted an actual measure of the group distance, and adopting the notion of a generalized distance I formulated an early expression and used it in my paper on the "Analysis of Race Mixture in Bengal" in 1925.
    The point to be emphasized is that the Pearsonian C.R.L. is, properly speaking. a test and not a measure of group divergence. The magnitude of C² determines the degree of certainty with which differentiation between the samples under consideration can be asserted but does not supply any information regarding the magnitude or extent of such divergence. So long as samples are drawn from the same population, C² would be approximately equal to 0 (within the margin of errors of sampling), whatever the size of the samples. When the two samples are however drawn from two different populations, the magnitude of C² would depend on the size of the samples. The value of D², however, would remain sensibly constant (within the margin of errors of sampling) when samples are drawn from the same two different populations, whatever might be the size of the respective samples. In other words, D² supplies a measure of the actual magnitude of divergence between the two groups under comparison. This was my chief reason for using D² in preference to C².
In June 1927, when I was working in the University College, London, during a period of leave abroad, I showed my work on D² to Karl Pearson and discussed with him on several occasions the difficulties in using C.R.L. arising from the fluctuating size of samples. Karl Pearson was unable to accept my views, and pointed out that I had obtained only an approximate expression for the standard error of D² as I had not retained deviations of an order higher than the second. A second line of argument used by him was that the C.R.L. had actually proved to be of great value in the classification of races (based on skull measurements), a subject which he had discussed a short time ago in considerable detail in a paper "On the Coefficient of Racial Likeness."
After my return to India in 1927, I made further calculations retaining statistical deviations up to much higher orders, and succeeded in getting the exact distribution of D² for the null-hypothesis, that is, for the case when the two samples were drawn from the same population. I also obtained expressions for moment-coefficients up to the fourth order (which later on turned out to be exact) of the D²-statistic in the non-null case when the samples were drawn from different populations. These results were communicated to the Indian Science Congress towards the end of 1928.
As already stated, Pearson had shown in a review of about 750 computed values of C² that in actual practice the C.R.L. had been found to be an extremely useful tool in craniometric researches (Biom. 1926, 105-177). For purposes of comparison, I obtained by direct calculation values of C² reviewed by Pearson. I found that a very large number of C²-coefficients (reviewed by Pearson) referred to closely associated groups for which both C² and D² would have low values. Further, owing to paucity of material, the number of skulls in each sample was also usually small, so that the size of samples did not fluctuate very widely from sample to sample. In other words, in craniometric work reviewed by Pearson, values of C² and D² gave more or less concordant results in a large number of cases. I, however, came across a certain number of comparisons for which C² and D² gave widely different results. In most of these critical cases, values of D² (rather than of C²) were more in accordance with known anthropological facts showing the superiority of D² for purposes of classifications.
I also used the D²-statistic for the analysis of extensive measurements given by H. Lundborg and F. J. Linders in their great publication on "Racial Characters of the Swedish Nation" (Swedish Institute for Race Biology, Upsala, 1926). I prepared a memoir containing (a) theoretical work on D²-statistics, (b) its application to the Swedish material, and (c) the comparison of about 750 values of C² and D² to which I have already referred, and sent it to Karl Pearson in 1929 for Biometrika, but the paper was not accepted for publication. I, therefore, sent a second copy to R. A. Fisher requesting that he might communicate to some other journal for publication. In the mean time Karl Pearson, without sending any information to me, had published the anthropological portion of the work on the Swedish material under my name in a paper on "A Statistical Study of Certain Anthropometric measurements from Sweden" {Biometrika, Vol. 22, 1930. 94-108). This naturally prevented the publication of the full paper in England. The theoretical portion the work was subsequently published in the form of a paper "On Tests and Measures of Group Divergence", in the Journal of the Asiatic Society of Bengal, New Series Vol. 26. 1930, No. 4. The second portion on the Swedish material, as already noted, was published in Biometrika. I had sent to the Asiatic Society of Bengal the third portion of the original memoir (which dealt with the comparison of about 750 values of C² and D²). It appeared, however, that papers dealing only with Asiatic matters could be published in the Journal of the Asiatic Society of Bengal; and as the C² and D² coefficients related mostly to non-Asiatic races it was held that this portion of the paper could not be published in this journal. The work on the comparison of C² and D² values was thus never published....



Commentaire
A mon sens, la lecture de cet article vieux d'une soixantaine d'années, rarement cité, peut avantageusement épargner la lecture des myriades de papiers ultérieurs consacrés à une de ces  pseudo-controverses  à rallonge dont les Anglo-Saxons ont le secret: significance testing  revisited twenty years later, ... fourty years later, etc.
Pour évaluer  l'écart entre deux groupes d'observations à partir d'un ensemble de variables, Mahalanobis explique qu'il a introduit la statistique D²  "as a measure of a group divergence",  en opposition au coefficient C² ou CRL  ("Coefficient of Racial Likeness" de K. Pearson) . De fait, D² est techniquement ce que nous appelons une statistique descriptive -- comme la différence de deux moyennes dont elle constitue une généralisation --  en ce sens  qu'elle ne dépend pas des effectifs; alors que C²,  qui est un test d'homogénéité,  dépend des effectifs.
Dans son argumentation, Mahalanobis ne songe pas à donner un statut à la statistique descriptive, ni  à défendre la démarche  "la description d'abord"; il  se place (comme Pearson)  dans le cadre de l'échantillonnage. Si les deux échantillons sont extraits de populations différentes,  nous dit-il, D² reste à peu près constant, aux erreurs d'échantillonnage près, alors que C² est fonction croissante des effectifs des groupes. En d'autres termes, D² est une statistique  qui estime l'écart entre les populations dont les groupes sont des échantillons; alors que C² est une statistique de test, qui n'estime rien du tout.

On remarque que Mahalanobis  n'a pas réussi à convaincre K. Pearson, et qu'en outre Pearson  lui a objecté (accusation majeure dans  la "sample-minded" approach) que l'erreur-type de D² qu'il avait calculée n'était qu'approchée!  

Chose curieuse, c'est la statistique D² que  l'"histoire" a retenue:  la distance de Mahalanobis est aujourd'hui classique, alors que le coefficient  CRL (apparemment supplanté par la statistique de test T² de Hotelling) est tombée aux oubliettes. Mais est-ce pour de bonnes raisons que le coefficient CRL, conçu pour différencier les races, a disparu avec la craniométrie, science de pointe devenue science maudite; qu'en aurait-il été s'il avait été conçu pour différencier les catégories professionnelles et dénommé "coefficient of social  likeness"?

Aujourd'hui, les malentendus sont les mêmes qu'il y a soixante ans.  Pour maint chercheur,  "comparer deux moyennes", c'est (sans se soucier de  la valeur de la différence) effectuer un t de Student, avec le risque de tomber dans la significance fallacy en interprétant  la  p-value comme un indice de proximité entre les groupes.  Quant à la statistique académique, elle traite ce problème majeur par le mépris. Après avoir consacré des pages au test T² d'Hotelling, le traité de Kendall & Stuart signale en passant  la statistique D² , définie à partir du  T² à un facteur près (qui dépend des effectifs, mais ce "détail" ne fait pas l'objet de commentaire).

P.S. La  note  de Mahalanobis apporte aussi   un éclairage édifiant sur les moeurs éditoriales des années 1930 (articles tronqués publiés sans l'accord de l'auteur, etc.). Pour étayer empiriquement son argumentation, Mahalanobis avait calculé 750 valeurs  des cofficients  C²  et D² sur des données réelles; son travail   ne fut jamais publié.



Retour à la page d'accueil
Début de page