PREFACE: "The statistics" vs "Statistics"
Statisticians should fit the needs of the users, not the reverse! - J.W. Tukey
From the statistics to Statistics
There are statistics and statistics. "The statistics" - familiarly
the "stats" - are the statistical data (averages, percentages,
numbers of
all sorts) ubiquitous in the media and found in all
possible and imaginable areas: official statistics , surveys, etc.
By "Statistics" - often written with a capital
S (the "science of Statistics") - is meant the scientific
discipline dealing with the methods for analyzing
statistical data. My work has been concerned with "Statistics".
Before we proceed, two massive facts must be mentioned,
that overhang any
developments. Firstly, there is the overwhelming dominance
(from 1945 to the present day) of Anglo-saxon statistics;
see
a quasi-monopoly. Secondly, there is the phenomenon of
hyperspecialization, which fragments the same topic (such as
statistics) into isolated subspecialties.
Academic Statistics and Statistics for researchers
The statistical discipline is a "metadiscipline", whose raw material lies outside
the discipline. By its very nature, it lies at the junction of two lines
of thinking, namely mathematics and empirical sciences. Among the
founding fathers of statistics, the two lines were always present; whereas nowadays they
are well separated, with academic statistics on the one hand, and statistics for researchers on the
other hand.
One finds academic statistics in the mathematics departments of universities and in the
"theoretical" teaching in institutions like (in France) INSEE
(Economical Research) or INSERM (Medical Research). This
discipline is self-called "mathematical
statistics" and purports to be a deductive theory, like
mathematical physics.
One finds statistics for researchers in laboratories and
empirical studies, from natural
sciences to social sciences. It is an
essentially normative discipline, which aims at providing
legitimate
"scientific proof", controlled by the referees of scientific journals.
Let us be clear: We do say "statistics for
researchers", not "applied statistics", because even though the
canons of academic
statistics are recognized by statistics for researchers in
principle, they are hardly
applied in practice.
My conviction is that while separating the
two lines, it is vital to maintain the unity of Statistics (On the
unfortunate consequences of the current division, see
Medical statistics on the carpet).
What
justifies Statistics is its auxiliary role ( "Hilfswissenschaft")
of empirical disciplines. Statistics for researchers should
guide theoretical
statistics. The ideal situation is where the statistician
participates in a large-scale empirical research, with scientists
specifying the questions, and works out the statistical procedures to help answer these
questions.The key ideas that give sense to my work have all
emerged in interaction with research problems; and
my contributions have
tended to construct an autonomous statistics for researchers.
The foundations of statistics; the history of statistics
The statistical
discipline is a recent one, highly dependent on
computational tools. Not surprisingly, it has faced persisting
identity problems. Originally
a branch of probability theory, it was then, in the
blooming days of
Operation Research, nearly absorbed in the "science of
decisions". Nowadays,
it would rather tend to become a part of algorithmics (a field
surely more creative).
Our key ideas indeed refer to the
fundamentals of statistics. But talking of the foundations of a discipline means a specialized
area, on the side of a discipline whose content is "well
established." The status of the key ideas, in contrast, is to call for a
restructuring of the traditional chapters of statistics.
The same
goes for the history of statistics, to which I have been initiated by G.Th. Guilbaud
and B. Bru: cf. Rouanet & Bru (1994b). At the age of the Internet,
browsing the "Electronic Journal of the history of probability and
statistics" is for me a real pleasure. However, I must confess that epistemology
is not my strong point. If history fascinates me, it is (to paraphrase
Marc Ferro about history in general), "provided that his study provides
an understanding of the problems of our time." Rather than scrutinize
the forerunners of present dominating trends, I try to (re)discover
neglected ways, that the tools of our time can make practicable.
It is clear that many theoretical constructions in the past
have been built in order to bypass the obstacle of computation:
for example, the
normal model. Other theories remained in sketch form: for
example,
classification procedures, or permutation modeling. Now
that the computational obstacle
is virtually removed, that the era of statistical tables is (or
should
be) over, one can and should prefer, I believe, a direct approach
to
tackle the real issues that justify using statistical methods. In
fact, what were the
problems that Binet, or Durkheim, were attempting to
solve ? What if
they had had computers at their disposal, with their colossal databases
and their
fabulous means of calculation?
Statistics in Human Sciences
My work has focused on statistics in
the human sciences, mainly psychology
and social science, in other words, behavioral sciences, bordered
by bio-medical statistics on one
side, and econometry on the other. As far as
statistics is concerned, this constitutes a quite homogeneous field:
There is "statistics for human sciences", not
really "statistics for psychologists," "statistics for
sociologists", and
so on.
In my view, the role of statistics in a research paper should always conform to the following pattern:
Problem Research --> relevant data --> Statistical Analysis
--> Statistical Results --> Research conclusions.
Relevant data must constitute a
representative inventory of the area under study. This is the
"completeness requirement" of Benzécri, close to the
notion of "field" of Bourdieu. The statistical analysis
should either
bring an answer to the research questions, or else show
that the available data are
insufficient to meet them. Enforcing the foregoing scheme should
facilitate the
critical examination of a research report and enable one to
pinpoint at which stage(s) errors may have been
committed: 1) Relevant data have been omitted;
2) The statistical analysis carried out is inadequate; 3) The
conclusions drawn exceed those authorized by the
statistical results (over-interpretation).
In academic
statistics, "real-life data" are often just invoked in order to
illustrate techniques, while ignoring research problems. Blatant
violations to the requirement of completeness abound. Suffice to mention
an article by Goodman (1991), which purports to seriously discuss the comparative
merits of methods on a simple 4x5 array of social mobility,
disconnected from any context. In his reply, D.R. Cox notes shrewdly: "A key question concerns how the models are
to be adapted to address detailed substantive questions (etc.); for
example, there may be further dimensions or concurrent comment on the
individuals concerned. . "
Two crucial distinctions
Beyond the diversity of disciplines, two distinctions are essential:
1) Between experimental data (factors of interest are controlled)
and observational data (factors of interest are only observed).
2) Descriptive procedures (the findings relate to the data)
and inductive ones, alias statistical inference (the
conclusions go beyond data); with in the background, the perennial problem of
the role of probabilities in statistics.
Texts and publications
The references to my texts and publications are given on the one hand in
chronological order, on the other hand by themes
(domains). Some texts are mathematically oriented and may call mathematicians
interested in the applications. Others texts are case studies, where the statistical approach is exposed
"in situation", and are directly readable by researchers (not necessarily versed
in mathematics).
Organization of the heading "Statistical Work" (travaux statistiques)
.
Key ideas: Formalization, geometric, descriptive-inductive, specific, probability.
.
Domains: Stochastic
models, analysis of variance and structured data, Combinatorial
inference, Bayesian inference, Geometric Data Analysis,
Regression.
. Software, teaching, etc...
. Reading Notes.
Heading "Personalia"
. CV and Scientific trajectory.
Please note. The heading "Loisirs" and "Feuilles et Bons Mots" lie outside of my work.
Hyperspecialisation.
A quasi- monopoly.
Medical statistics on the carpet