Measure vs Variable Duality in Correspondence  Analysis

 

Henry Rouanet and Brigitte Le Roux

Université René Descartes, Paris, France

Rouanet@math-info.univ-paris5.fr  and  Lerb@math-info.univ-paris5.fr

 

Abstract

 

The formal approach  used by  Benzécri to develop CA was not an accidental matter of  notation, but an integral part of the construction (Benzécri & Coll, 1973). The properties of CA are entirely  founded on the underlying mathematical theory,  essentially abstract linear algebra , as found in  the  classical  mathematical books  by MacLane, Halmos, etc.: finite-dimensional  vector space, homomorphism, scalar product, etc. The cornerstone of the formal-geometric  approach is the measure  vs variable duality, which formalizes the distinction between two sorts of quantities: those for which grouping units entails summing (adding up) values, such as weights, frequencies, etc.,  we call them measures (like in  mathematical measure theory), versus those  for which grouping units entails averaging values, such as scores, rates, etc., we call them variables. This duality is reflected in the duality notation (alias transition notation), putting lower indices for measures and upper indices for variables. See  Rouanet & Le Roux (1993), Le Roux  & Rouanet  (2004).

In the paper, we describe measure vs variable duality in CA at the following two crucial stages of  geometric modeling: i) Construction of clouds and the chi-square metric. The marginal frequencies of the  table firstly provide reference measures over rows and columns. Secondly, they define Euclidean isomorphisms from variable vector spaces to dual  measure vector spaces, hence scalar products and Euclidean norms, therefore they determine without arbitrariness  the chi-square metric over those spaces. ii) Principal  directions of clouds and  principal coordinates. The fundamental  mathematical result is that the solution of spectral equations is the singular decomposition of two adjoint homomorphisms and/or the associated  bilinear form. Applying these results to CA immediately yields the transition equations and the reconstitution formulas.

 CA is a sase in point to exemplify the superiority of the formal approach to multivariate statistics over the usual  matrix approach. 

 

References

Benzécri J.P. & Coll.(1973). Analyse  des Données, Volume 2, Analyse des Correspondances. Paris, Dunod.

Rouanet H. & Le Roux B. (1993). Analyse des  Données Multidimensionnelles Paris, Dunod.

Le Roux B., Rouanet H.  Geometric Data  Analysis (in press). Dordrecht, Kluwer.