Measure vs Variable Duality in Correspondence Analysis
Henry Rouanet and Brigitte Le Roux
Université
René Descartes, Paris, France
Rouanet@math-info.univ-paris5.fr and Lerb@math-info.univ-paris5.fr
Abstract
The formal
approach used by Benzécri to develop CA was not an accidental
matter of notation, but an integral
part of the construction (Benzécri & Coll, 1973). The properties of CA are
entirely founded on the underlying
mathematical theory, essentially
abstract linear algebra , as found in
the classical mathematical books by MacLane, Halmos, etc.: finite-dimensional vector space, homomorphism, scalar product,
etc. The cornerstone of the formal-geometric
approach is the measure vs
variable duality, which formalizes the distinction between two sorts of
quantities: those for which grouping units entails summing (adding up) values,
such as weights, frequencies, etc., we
call them measures (like in mathematical measure theory), versus
those for which grouping units entails
averaging values, such as scores, rates, etc., we call them variables. This duality is reflected in
the duality notation (alias transition notation), putting lower indices for
measures and upper indices for variables. See Rouanet & Le Roux (1993), Le Roux & Rouanet (2004).
In the paper, we
describe measure vs variable duality in CA at the following two crucial stages
of geometric modeling: i) Construction of clouds and the chi-square
metric. The marginal frequencies of the
table firstly provide reference measures over rows and columns.
Secondly, they define Euclidean isomorphisms from variable vector spaces to
dual measure vector spaces, hence
scalar products and Euclidean norms, therefore they determine without
arbitrariness the chi-square metric
over those spaces. ii) Principal directions of clouds and principal coordinates. The
fundamental mathematical result is that
the solution of spectral equations is the singular decomposition of two adjoint
homomorphisms and/or the associated
bilinear form. Applying these results to CA immediately yields the
transition equations and the reconstitution formulas.
CA is a sase in point to exemplify the
superiority of the formal approach to multivariate statistics over the usual matrix approach.
References
Benzécri J.P.
& Coll.(1973). Analyse des Données,
Volume 2, Analyse des Correspondances.
Paris, Dunod.
Rouanet H. & Le Roux B. (1993). Analyse des
Données Multidimensionnelles Paris, Dunod.
Le Roux B., Rouanet H. Geometric Data
Analysis (in press). Dordrecht,
Kluwer.