Thèse
Année : 2012
Résumé
Telling cow from sheep is effortless for most animals, but requires much engineering for computers. In this thesis, we seek to tease out basic principles that underlie many recent advances in image recognition. First, we recast many methods into a common unsu- pervised feature extraction framework based on an alternation of coding steps, which encode the input by comparing it with a collection of reference patterns, and pooling steps, which compute an aggregation statistic summarizing the codes within some re- gion of interest of the image. Within that framework, we conduct extensive comparative evaluations of many coding or pooling operators proposed in the literature. Our results demonstrate a robust superiority of sparse coding (which decomposes an input as a linear combination of a few visual words) and max pooling (which summarizes a set of inputs by their maximum value). We also propose macrofeatures, which import into the popu- lar spatial pyramid framework the joint encoding of nearby features commonly practiced in neural networks, and obtain significantly improved image recognition performance. Next, we analyze the statistical properties of max pooling that underlie its better perfor- mance, through a simple theoretical model of feature activation. We then present results of experiments that confirm many predictions of the model. Beyond the pooling oper- ator itself, an important parameter is the set of pools over which the summary statistic is computed. We propose locality in feature configuration space as a natural criterion for devising better pools. Finally, we propose ways to make coding faster and more powerful through fast convolutional feedforward architectures, and examine how to incorporate supervision into feature extraction schemes. Overall, our experiments offer insights into what makes current systems work so well, and state-of-the-art results on several image recognition benchmarks.
Minsu Cho : Connectez-vous pour contacter le contributeur
https://theses.hal.science/tel-01063353
Soumis le : mardi 16 septembre 2014-11:11:39
Dernière modification le : mercredi 26 février 2025-15:24:03
Archivage à long terme le : mercredi 17 décembre 2014-10:16:08
Dates et versions
- HAL Id : tel-01063353 , version 1
Citer
Y-Lan Boureau. Learning Hierarchical Feature Extractors For Image Recognition. Computer Vision and Pattern Recognition [cs.CV]. New York University, 2012. English. ⟨NNT : ⟩. ⟨tel-01063353⟩
261
Consultations
626
Téléchargements