Skip to main content

François Bavaud

We show how the introduction of the power divergence family proposed by Cressie and Read (1984) permits to link various aspects of log likelihood model selection and factorial data description. Our approach, illustrated on bigram textual... more
We show how the introduction of the power divergence family proposed by Cressie and Read (1984) permits to link various aspects of log likelihood model selection and factorial data description. Our approach, illustrated on bigram textual frequencies, generalizes Factorial Correspondance Analysis beyond the independence model, as exemplified by the symmetry model and an "independence-within classes" model, the latter seeming promising
Minimizing the relative inertia of a statistical group with respect to the inertia of the overall sample defines an unique point, the in-focus, which constitutes a context-dependent measure of typical group tendency, biased in comparison... more
Minimizing the relative inertia of a statistical group with respect to the inertia of the overall sample defines an unique point, the in-focus, which constitutes a context-dependent measure of typical group tendency, biased in comparison to the group centroid. Maximizing the relative inertia yields an unique out-focal point, polarized in the reverse direction. This mechanism evokes the relative variability reduction
Schoenberg transformations, mapping Euclidean configurations into Euclidean configurations, define in turn a transformed inertia, whose minimization produces robust location estimates. The procedure only depends upon Euclidean distances... more
Schoenberg transformations, mapping Euclidean configurations into Euclidean configurations, define in turn a transformed inertia, whose minimization produces robust location estimates. The procedure only depends upon Euclidean distances between observations, and applies equivalently to univariate and multivariate data. The choice of the family of transformations and their parameters defines a flexible location strategy, generalizing M-estimators. Two regimes of solutions are identified.
Quantifying the concept of co- occurrence and iterated co-occurrence yields indices of similarity between words or between documents. These similarities are associated with a re- versible Markov transition matrix, whose formal properties... more
Quantifying the concept of co- occurrence and iterated co-occurrence yields indices of similarity between words or between documents. These similarities are associated with a re- versible Markov transition matrix, whose formal properties enable us to define euclidean distances, allowing in turn to perform words-documents correspondence analysis as well as words (or documents) classifications at various co-occurrences orders.
A fuzzy partition assigns to each among n objects a distribution over a categories. Elementary linear algebraic methods permit to introduce and investigate concepts and properties such as a) variance and inertia decomposition; b) coarse-... more
A fuzzy partition assigns to each among n objects a distribution over a categories. Elementary linear algebraic methods permit to introduce and investigate concepts and properties such as a) variance and inertia decomposition; b) coarse- and fine-graining (nestedness); c) iteration of fuzzy partitions; d) stability of a group in regard to another partition; e) (euclidean embeddable) dissimilarities between objects; f)
Spectral clustering is a procedure aimed at partitionning a weighted graph into minimally interacting components. The resulting eigen-structure is determined by a reversible Markov chain, or equivalently by a symmetric transition matrix... more
Spectral clustering is a procedure aimed at partitionning a weighted graph into minimally interacting components. The resulting eigen-structure is determined by a reversible Markov chain, or equivalently by a symmetric transition matrix F. On the other hand, multidimensional scaling procedures (and factorial correspondence analysis in particular) consist in the spectral decomposition of a kernel matrix K. This paper shows how
Research Interests:
Formalising the confrontation of opinions (models) to observations (data) is the task of Inferential Statistics. Information Theory provides us with a basic functional, the relative entropy (or Kullback-Leibler divergence), an... more
Formalising the confrontation of opinions (models) to observations (data) is the task of Inferential Statistics. Information Theory provides us with a basic functional, the relative entropy (or Kullback-Leibler divergence), an asymmetrical measure of dissimilarity between the empirical and the theoretical distributions. The formal properties of the relative entropy turn out to be able to capture every aspect of Inferential Statistics, as illustrated here, for simplicity, on dices (= i.i.d. process with finitely many outcomes): refutability (strict or probabilistic): the asymmetry data / models; small deviations: rejecting a single hypothesis; competition between hypotheses and model selection; maximum likelihood: model inference and its limits; maximum entropy: reconstructing partially observed data; EM-algorithm; flow data and gravity modelling; determining the order of a Markov chain.
ABSTRACT
Classical factorial treatments applied on words-documents counts matrices (such as Correspondence Analysis (FCA), Latent Semantic Indexing (LSI), as well as non-linear generalizations of FCA (NLCA)) can be described in the frame- work of... more
Classical factorial treatments applied on words-documents counts matrices (such as Correspondence Analysis (FCA), Latent Semantic Indexing (LSI), as well as non-linear generalizations of FCA (NLCA)) can be described in the frame- work of kernels associated to Support Vector Machines (SVM). This paper exposes the relationships between those formalisms, and demonstrates how textual pre-processing by a "power kernel" can improve (with respect to the clas- sical FCA kernel) the documents classification in the Reuters-21578 corpus.
"Minerality has emerged as a widespread term in the language of professionals and wine consumers, yet lacking a precise and broadly shared definition. This contribution studies three parallel corpora, consisting of responses from 1697... more
"Minerality has emerged as a widespread term in the language of professionals and wine consumers, yet lacking a
precise and broadly shared definition. This contribution studies three parallel corpora, consisting of responses
from 1697 consumers, underlining what minerality evokes (or not), how consumers define it and what terms can
be considered as synonymous. Two methods are compared, namely Correspondence Analysis on one hand,
highlighting the textual salience within each corpus, and clustering of textual networks generated by the
renormalized Markov Associativities on the other hand, based on associations between terms. The two analyzes,
complementary, distinguish and identify the various consumers’ views regarding the concept of minerality in
wines.
"