Active Perception perspectives claim that action is closely related to perception. An empirical a... more Active Perception perspectives claim that action is closely related to perception. An empirical approach that supports these theories is the minimalist, in which participants perform a task using an interface that provides minimal information. Their exploratory movements are crucial to generating a meaningful sequence of information. Previous studies analyzed sensorimotor trajectories describing qualitative strategies and linear quantification of participants’ movement performance, but that approach struggles to capture the behavior of non-stationary data. In the present study, we applied the recurrence plot (RP) and recurrence quantification analysis (RQA) to study the structure of sensorimotor trajectories developed by participants trying to discriminate between two invisible geometric shapes (Triangle or Rectangle). The exploratory movements were made using a computer mouse and sonification-mediated feedback was provided, which depended exclusively on whether the pointer was inside or outside the shape. We applied RP and RQA to the sensorimotor trajectories, with the aim of studying their fine structure characteristics, focusing on their repetitive patterns. Recurrence analysis proved to be useful for quantifying differences in dynamic behavior that emerge when participants explore invisible virtual geometric shapes. The differences obtained in RQA-based measures associated with the vertical structures allowed to postulate the existence of particular exploration strategies for each figure. It was also possible to determine that the complexity of the dynamics changed according to the shape. We discuss these results in light of antecedents in haptic and visual perceptual exploration.
Bartolomé Drozdowicz 1,2 , Adrián Salvatelli 1 , Gustavo Bizai 1,3 , Alejandro Hadad 1,2 , Diego ... more Bartolomé Drozdowicz 1,2 , Adrián Salvatelli 1 , Gustavo Bizai 1,3 , Alejandro Hadad 1,2 , Diego Evin 4 and Rodrigo Torres 5 , 1 Laboratorio de Sistemas de Información, Facultad de Ingeniería, Univ. Nac. De Entre Ríos, Ruta 11 Km.10 Oro Verde, Entre Ríos, Argentina 2 Facultad de Ciencia y Tecnología, Universidad Autónoma de Entre Ríos 3 Facultad de Ciencias de la Vida y la Salud, Universidad Autónoma de Entre Ríos 4 Laboratorio de Investigaciones Sensoriales, INIGEM, CONICET-UBA 4 Centro de Ojos Dr. Lódolo, Paraná, Entre Ríos
El Proyecto de Investigación y Desarrollo “Sistema de Información Plenóptica como medio diagnósti... more El Proyecto de Investigación y Desarrollo “Sistema de Información Plenóptica como medio diagnóstico para Lámparas de Hendidura” propone la utilización de campos de luz como estrategia para agregar información tridimensional a la imagen del fondo ocular. Una de las líneas de trabajo consiste en el desarrollo de algoritmos para la obtención de enfoque dinámico, multiperspectiva y mapa de profundidades a partir de imágenes plenópticas que se generan en una única toma. La percepción de profundidades constituye un valioso apoyo a la toma de decisiones en múltiples aplicaciones. Se presentan algunas metodologías para el tratamiento computacional de capturas con una cámara plenóptica, utilizando imágenes de dominio público mientras se está trabajando en la implementación de la óptica asociada a una lámpara de hendidura. Se muestra resultados de los algoritmos y se discute su eficiencia en términos de su complejidad y tiempo de procesamiento. El trabajo futuro es optimizar estas propuestas,...
In this paper two acoustic speech analysis systems are presented with applications to the descrip... more In this paper two acoustic speech analysis systems are presented with applications to the description of spontaneous speech segments and a system of automatic spontaneous speech recognition oriented to word detection. The first analysis system presents in detail ...
Desarrollar sistemas informaticos capaces de interactuar con sus usuarios de la forma mas natural... more Desarrollar sistemas informaticos capaces de interactuar con sus usuarios de la forma mas natural y eficiente posible es uno de los requisitos esenciales para lograr la integracion del mundo tecnologico en la so- ciedad. En ese marco el habla se presenta como una de las formas de comu- nicacion mas [...]
This paper describes an approach to predict non-verbal cues from speech-related features. Our pre... more This paper describes an approach to predict non-verbal cues from speech-related features. Our previous investigations of audiovisual speech showed that there are strong correlations between the two modalities. In this work we developed two models using different kinds of Recurrent Artificial Neural Networks: Elman and NARX, to predict parameters of activity for head motion using linguistic and prosodic inputs, and compared their performance. Prosodic inputs included F0 and intensity, while linguistic parameters included the former plus additional information such as the type of syllables, phrases, and different relations between them. Using speaker specific models for six subjects, performance measures in terms of root mean square error (RMSE) showed that there are significant differences between the models with respect to the input parameters, and that NARX network outperformed the Elman network on the prediction task.
This paper explores the relationship between perceived syllable prominence and the acoustic prope... more This paper explores the relationship between perceived syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishing a link between the linguistic meaning of an utterance in terms of sentence modality and focus and its underlying prosodic features. Applications of such knowledge can be found in computer-based pronunciation training as well as general automatic speech recognition and understanding. Our acoustic analysis confirms earlier results in that focus and sentence mode modify the fundamental frequency contour, syllabic durations and intensity. However, we could not find consistent differences between utterances produced with noncontrastive and contrastive focus, respectively. Only one third of utterances with broad focus were identified as such. Ratings of syllable prominence are strongly correlated with the amplitude Aa of underlying accent commands, syllable duration, maximum intensity and mean harmonics-to-noise ratio.
Fil: Guirao, Miguelina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Co... more Fil: Guirao, Miguelina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Houssay. Instituto de Inmunologia, Genetica y Metabolismo. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Inmunologia, Genetica y Metabolismo; Argentina. Universidad de Buenos Aires. Facultad de Medicina. Hospital de Clinicas General San Martin; Argentina
The goal of this study is to explore the association between tonal accents and fundamental freque... more The goal of this study is to explore the association between tonal accents and fundamental frequency parameters obtained from the Fujisaki model in Buenos Aires Spanish. Results indicate that three-syllable words in final position which are stressed on the third syllable are associated with early peaks. In non-final word accents, late peaks are found for words stressed in the first and second syllables. Alignments were calculated for accent commands relative to the onset of both the accented syllable and its nucleus. Similarly to an earlier study on German, anchoring is supported in favor of tonal transitions than of F0 peaks, considering that alignment of onset and offset times of accent commands appear earlier for final than for non-final accents.
This paper presents a novel method for rescoring the n-best recognition hypotheses using intonati... more This paper presents a novel method for rescoring the n-best recognition hypotheses using intonation knowledge. The model synthesizes the f0 contours for each of the n-best hypotheses and estimates an intonative matching index between the synthetic shapes and the real f0 contour. This index is applied in the rescoring process, and can be viewed as a degree of intonation compatibility between the hypotheses and the input sentence. The f0 prediction is based on classification and regression trees and the Fujisaki model. We evaluate our approach using a single speaker of the Buenos Aires Spanish LIS-SECYT database under clean and babblenoisy conditions. Considering the systems under no grammar condition, the proposed model reduces the mean absolute word error rate in 3.1% with respect to the baseline system, in a consistent manner and under different noise conditions.
Two concentrations (8% and 15 %) of ethanol were combined with three concentrations (135, 303 and... more Two concentrations (8% and 15 %) of ethanol were combined with three concentrations (135, 303 and 683 mM) of sucrose and three of (5, 15 and 45 mM) of citric acid. As prompted by a computer ten trained panelists assessed intensity/time responses to sweetness and sourness. Mixed and unmixed solutions of the same taste were evaluated, in triplicate, in the same experiment. Maximum intensity, plateau time for maximum intensity, total time and area were extracted from response curves. Ethanol enhanced all four sweetness properties. The amount of increment decreased with concentration. Responses were no significantly affected by increasing addition of ethanol. Persistence was more clearly augmented than the other attributes. The effect on sourness was different for each one of the concentrations. When the weak sample was tasted all four dimensions increased with ethanol. At the moderate concentration sourness was suppressed by the median 8% level but was enhanced with 15% ethanol. At the...
IEEE/ACM Transactions on Audio, Speech, and Language Processing
This paper addresses the issue of local disturbances in the fundamental frequency contour of spee... more This paper addresses the issue of local disturbances in the fundamental frequency contour of speech, caused by the articulation of voiced/unvoiced consonant phonemes. Depending on the intended use of the F0 contour, these disturbances are usually eliminated by a filtering, smoothing or stylization procedure. These procedures that seek to preserve only the F0 points perceptually relevant, are generally applied roughly at a global level, which may not completely eliminate micro intonation in some cases or distort macro intonation in others. In this work we propose a local filtering algorithm based on a fine level analysis of the microprosodic morphologies. The performance of the algorithm is validated by a perceptual experiment. Assuming the algorithm allows partial/total disturbance elimination, we perform a statistical description of the perturbation morphologies. Statistics were collected from a corpus of 741 sentences designed to study Argentine Spanish prosody. The corpus was recorded by four professional announcers native speakers from Buenos Aires city. The results show that perturbation morphologies are affected by: consonant phoneme identity; global F0 contour shape; and speaker identity. As an application case, we use the proposed filtering algorithm as a pre-processing stage in our automatic prominent syllable detection system, with a statistically significant improvement in its performance.
En este trabajo se presentan los resultados de los experimentos llevados a cabo con un sistema de... more En este trabajo se presentan los resultados de los experimentos llevados a cabo con un sistema de reconocimiento automático de habla continua para el español de Argentina. El reconocedor implementado basado en palabras utilizó unidades independientes del contexto, denominadas en la literatura “monofonos”, como unidades básicas del modelo acústico. Para la creación de dichos modelos se emplearon modelos ocultos de Markov HMM (Hidden Markov Models) de 3 estados de izquierda a derecha del tipo semi-continuo “SC-HMM” asociados a cada uno de los 31 monofonos (30 fonemas + alófonos y un modelo de silencio). La base de datos acústica estuvo conformada por 741 oraciones con 2.837 palabras distintas, que cubren el 97% de las sílabas del español, emitidas en una cámara acústica por dos locutores profesionales. Los valores óptimos de los parámetros fueron seleccionados para maximizar la tasa de reconocimiento y simultáneamente reducir el tiempo de procesamiento. La tasa de reconocimiento prome...
Active Perception perspectives claim that action is closely related to perception. An empirical a... more Active Perception perspectives claim that action is closely related to perception. An empirical approach that supports these theories is the minimalist, in which participants perform a task using an interface that provides minimal information. Their exploratory movements are crucial to generating a meaningful sequence of information. Previous studies analyzed sensorimotor trajectories describing qualitative strategies and linear quantification of participants’ movement performance, but that approach struggles to capture the behavior of non-stationary data. In the present study, we applied the recurrence plot (RP) and recurrence quantification analysis (RQA) to study the structure of sensorimotor trajectories developed by participants trying to discriminate between two invisible geometric shapes (Triangle or Rectangle). The exploratory movements were made using a computer mouse and sonification-mediated feedback was provided, which depended exclusively on whether the pointer was inside or outside the shape. We applied RP and RQA to the sensorimotor trajectories, with the aim of studying their fine structure characteristics, focusing on their repetitive patterns. Recurrence analysis proved to be useful for quantifying differences in dynamic behavior that emerge when participants explore invisible virtual geometric shapes. The differences obtained in RQA-based measures associated with the vertical structures allowed to postulate the existence of particular exploration strategies for each figure. It was also possible to determine that the complexity of the dynamics changed according to the shape. We discuss these results in light of antecedents in haptic and visual perceptual exploration.
Bartolomé Drozdowicz 1,2 , Adrián Salvatelli 1 , Gustavo Bizai 1,3 , Alejandro Hadad 1,2 , Diego ... more Bartolomé Drozdowicz 1,2 , Adrián Salvatelli 1 , Gustavo Bizai 1,3 , Alejandro Hadad 1,2 , Diego Evin 4 and Rodrigo Torres 5 , 1 Laboratorio de Sistemas de Información, Facultad de Ingeniería, Univ. Nac. De Entre Ríos, Ruta 11 Km.10 Oro Verde, Entre Ríos, Argentina 2 Facultad de Ciencia y Tecnología, Universidad Autónoma de Entre Ríos 3 Facultad de Ciencias de la Vida y la Salud, Universidad Autónoma de Entre Ríos 4 Laboratorio de Investigaciones Sensoriales, INIGEM, CONICET-UBA 4 Centro de Ojos Dr. Lódolo, Paraná, Entre Ríos
El Proyecto de Investigación y Desarrollo “Sistema de Información Plenóptica como medio diagnósti... more El Proyecto de Investigación y Desarrollo “Sistema de Información Plenóptica como medio diagnóstico para Lámparas de Hendidura” propone la utilización de campos de luz como estrategia para agregar información tridimensional a la imagen del fondo ocular. Una de las líneas de trabajo consiste en el desarrollo de algoritmos para la obtención de enfoque dinámico, multiperspectiva y mapa de profundidades a partir de imágenes plenópticas que se generan en una única toma. La percepción de profundidades constituye un valioso apoyo a la toma de decisiones en múltiples aplicaciones. Se presentan algunas metodologías para el tratamiento computacional de capturas con una cámara plenóptica, utilizando imágenes de dominio público mientras se está trabajando en la implementación de la óptica asociada a una lámpara de hendidura. Se muestra resultados de los algoritmos y se discute su eficiencia en términos de su complejidad y tiempo de procesamiento. El trabajo futuro es optimizar estas propuestas,...
In this paper two acoustic speech analysis systems are presented with applications to the descrip... more In this paper two acoustic speech analysis systems are presented with applications to the description of spontaneous speech segments and a system of automatic spontaneous speech recognition oriented to word detection. The first analysis system presents in detail ...
Desarrollar sistemas informaticos capaces de interactuar con sus usuarios de la forma mas natural... more Desarrollar sistemas informaticos capaces de interactuar con sus usuarios de la forma mas natural y eficiente posible es uno de los requisitos esenciales para lograr la integracion del mundo tecnologico en la so- ciedad. En ese marco el habla se presenta como una de las formas de comu- nicacion mas [...]
This paper describes an approach to predict non-verbal cues from speech-related features. Our pre... more This paper describes an approach to predict non-verbal cues from speech-related features. Our previous investigations of audiovisual speech showed that there are strong correlations between the two modalities. In this work we developed two models using different kinds of Recurrent Artificial Neural Networks: Elman and NARX, to predict parameters of activity for head motion using linguistic and prosodic inputs, and compared their performance. Prosodic inputs included F0 and intensity, while linguistic parameters included the former plus additional information such as the type of syllables, phrases, and different relations between them. Using speaker specific models for six subjects, performance measures in terms of root mean square error (RMSE) showed that there are significant differences between the models with respect to the input parameters, and that NARX network outperformed the Elman network on the prediction task.
This paper explores the relationship between perceived syllable prominence and the acoustic prope... more This paper explores the relationship between perceived syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishing a link between the linguistic meaning of an utterance in terms of sentence modality and focus and its underlying prosodic features. Applications of such knowledge can be found in computer-based pronunciation training as well as general automatic speech recognition and understanding. Our acoustic analysis confirms earlier results in that focus and sentence mode modify the fundamental frequency contour, syllabic durations and intensity. However, we could not find consistent differences between utterances produced with noncontrastive and contrastive focus, respectively. Only one third of utterances with broad focus were identified as such. Ratings of syllable prominence are strongly correlated with the amplitude Aa of underlying accent commands, syllable duration, maximum intensity and mean harmonics-to-noise ratio.
Fil: Guirao, Miguelina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Co... more Fil: Guirao, Miguelina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Houssay. Instituto de Inmunologia, Genetica y Metabolismo. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Inmunologia, Genetica y Metabolismo; Argentina. Universidad de Buenos Aires. Facultad de Medicina. Hospital de Clinicas General San Martin; Argentina
The goal of this study is to explore the association between tonal accents and fundamental freque... more The goal of this study is to explore the association between tonal accents and fundamental frequency parameters obtained from the Fujisaki model in Buenos Aires Spanish. Results indicate that three-syllable words in final position which are stressed on the third syllable are associated with early peaks. In non-final word accents, late peaks are found for words stressed in the first and second syllables. Alignments were calculated for accent commands relative to the onset of both the accented syllable and its nucleus. Similarly to an earlier study on German, anchoring is supported in favor of tonal transitions than of F0 peaks, considering that alignment of onset and offset times of accent commands appear earlier for final than for non-final accents.
This paper presents a novel method for rescoring the n-best recognition hypotheses using intonati... more This paper presents a novel method for rescoring the n-best recognition hypotheses using intonation knowledge. The model synthesizes the f0 contours for each of the n-best hypotheses and estimates an intonative matching index between the synthetic shapes and the real f0 contour. This index is applied in the rescoring process, and can be viewed as a degree of intonation compatibility between the hypotheses and the input sentence. The f0 prediction is based on classification and regression trees and the Fujisaki model. We evaluate our approach using a single speaker of the Buenos Aires Spanish LIS-SECYT database under clean and babblenoisy conditions. Considering the systems under no grammar condition, the proposed model reduces the mean absolute word error rate in 3.1% with respect to the baseline system, in a consistent manner and under different noise conditions.
Two concentrations (8% and 15 %) of ethanol were combined with three concentrations (135, 303 and... more Two concentrations (8% and 15 %) of ethanol were combined with three concentrations (135, 303 and 683 mM) of sucrose and three of (5, 15 and 45 mM) of citric acid. As prompted by a computer ten trained panelists assessed intensity/time responses to sweetness and sourness. Mixed and unmixed solutions of the same taste were evaluated, in triplicate, in the same experiment. Maximum intensity, plateau time for maximum intensity, total time and area were extracted from response curves. Ethanol enhanced all four sweetness properties. The amount of increment decreased with concentration. Responses were no significantly affected by increasing addition of ethanol. Persistence was more clearly augmented than the other attributes. The effect on sourness was different for each one of the concentrations. When the weak sample was tasted all four dimensions increased with ethanol. At the moderate concentration sourness was suppressed by the median 8% level but was enhanced with 15% ethanol. At the...
IEEE/ACM Transactions on Audio, Speech, and Language Processing
This paper addresses the issue of local disturbances in the fundamental frequency contour of spee... more This paper addresses the issue of local disturbances in the fundamental frequency contour of speech, caused by the articulation of voiced/unvoiced consonant phonemes. Depending on the intended use of the F0 contour, these disturbances are usually eliminated by a filtering, smoothing or stylization procedure. These procedures that seek to preserve only the F0 points perceptually relevant, are generally applied roughly at a global level, which may not completely eliminate micro intonation in some cases or distort macro intonation in others. In this work we propose a local filtering algorithm based on a fine level analysis of the microprosodic morphologies. The performance of the algorithm is validated by a perceptual experiment. Assuming the algorithm allows partial/total disturbance elimination, we perform a statistical description of the perturbation morphologies. Statistics were collected from a corpus of 741 sentences designed to study Argentine Spanish prosody. The corpus was recorded by four professional announcers native speakers from Buenos Aires city. The results show that perturbation morphologies are affected by: consonant phoneme identity; global F0 contour shape; and speaker identity. As an application case, we use the proposed filtering algorithm as a pre-processing stage in our automatic prominent syllable detection system, with a statistically significant improvement in its performance.
En este trabajo se presentan los resultados de los experimentos llevados a cabo con un sistema de... more En este trabajo se presentan los resultados de los experimentos llevados a cabo con un sistema de reconocimiento automático de habla continua para el español de Argentina. El reconocedor implementado basado en palabras utilizó unidades independientes del contexto, denominadas en la literatura “monofonos”, como unidades básicas del modelo acústico. Para la creación de dichos modelos se emplearon modelos ocultos de Markov HMM (Hidden Markov Models) de 3 estados de izquierda a derecha del tipo semi-continuo “SC-HMM” asociados a cada uno de los 31 monofonos (30 fonemas + alófonos y un modelo de silencio). La base de datos acústica estuvo conformada por 741 oraciones con 2.837 palabras distintas, que cubren el 97% de las sílabas del español, emitidas en una cámara acústica por dos locutores profesionales. Los valores óptimos de los parámetros fueron seleccionados para maximizar la tasa de reconocimiento y simultáneamente reducir el tiempo de procesamiento. La tasa de reconocimiento prome...
This paper presents a novel method for rescoring the n-best recognition hypotheses using intonati... more This paper presents a novel method for rescoring the n-best recognition hypotheses using intonation knowledge. The model synthesizes the f0 contours for each of the n-best hypotheses and estimates an intonative matching index between the synthetic shapes and the real f0 contour. This index is applied in the rescoring process, and can be viewed as a degree of intonation compatibility between the hypotheses and the input sentence. The f0 prediction is based on classification and regression trees and the Fujisaki model. We evaluate our approach using a single speaker of the Buenos Aires Spanish LIS-SECYT database under clean and babblenoisy conditions. Considering the systems under no grammar condition, the proposed model reduces the mean absolute word error rate in 3.1% with respect to the baseline system, in a consistent manner and under different noise conditions.
The goal of this study is to explore the position of pitch accent commands relative to the accent... more The goal of this study is to explore the position of pitch accent commands relative to the accented syllable in final and non-final words for absolute interrogative sentences in Spanish. Fundamental frequency parameters are obtained from the Fujisaki model. Results indicate that accent commands for three-syllable words in final position are associated with late peaks no matter which the stressed syllable position is. In non-final words, accent commands are associated with early peaks also for all stressed syllable positions. These results are compared and presented with those obtained for declarative sentences. The influence of both phrase accents and boundary tones over pitch accents show that: 1) F0 contours ending with a high tone produce an attraction of H* accents; 2) F0 contours ending with a low tone have a tendency to keep distance from the realization of H* accents.
The goal of this study is to explore the association between tonal accents and fundamental freque... more The goal of this study is to explore the association between tonal accents and fundamental frequency parameters obtained from the Fujisaki model in Buenos Aires Spanish. Results indicate that three-syllable words in final position which are stressed on the third syllable are associated with early peaks. In non-final word accents, late peaks are found for words stressed in the first and second syllables. Alignments were calculated for accent commands relative to the onset of both the accented syllable and its nucleus. Similarly to an earlier study on German, anchoring is supported in favor of tonal transitions than of F0 peaks, considering that alignment of onset and offset times of accent commands appear earlier for final than for non-final accents.
En este trabajo se define una guía para la segmentación fonética y su transcripción mediante alfa... more En este trabajo se define una guía para la segmentación fonética y su transcripción mediante alfabeto SAMPA (Speech Assessment Methods: Phonetic Alphabet). La transcripción fonética SAMPA es de uso creciente en las tecnologías de habla en las tareas de preparación de datos acústicos para ser utilizados en el entrenamiento de sistemas de reconocimiento de habla/hablante y en la preparación de unidades de concatenación en sistema de conversión texto a habla. presentan ejemplos de cada realización fonética, los criterios de segmentación y su etiquetado.
This paper explores the relationship between perceived syllable prominence and the acoustic prope... more This paper explores the relationship between perceived syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishing a link between the linguistic meaning of an utterance in terms of sentence modality and focus with its underlying prosodic features. Our acoustic analysis compares traditional parameters modified by focus and sentence mode like fundamental frequency, syllabic durations and intensity against Fujisaki model accent command parameters. Listeners identified narrow focus correctly but only one third ofutterances with no focus. Ratings of perceived prominence are moderately correlated with most prosodic parameters. The proportion rate of syllable duration to the underlying accent command duration resulted to be the parameter combination that best correlates to prominence. A simple classifier based on a regression model is presented to detect prominences automatically. This model could explain up to 60% of the observed variance.
syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishin... more syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishing a link between the linguistic meaning of an utterance in terms of sentence modality and focus and its underlying prosodic features. Applications of such knowledge can be found in computerbased pronunciation training as well as general automatic speech recognition and understanding. Our acoustic analysis confirms earlier results in that focus and sentence mode modify the fundamental frequency contour, syllabic durations and intensity. However, we could not find consistent differences between utterances produced with non-contrastive and contrastive focus, respectively. Only one third of utterances with broad focus were identified as such. Ratings of syllable prominence are strongly correlated with the amplitude Aa of underlying accent commands, syllable duration, maximum intensity and mean harmonics-to-noise ratio.
En este trabajo se presentan dos sistemas de an{\'a}lisis ac{\'u}stico del habla con aplicaciones... more En este trabajo se presentan dos sistemas de an{\'a}lisis ac{\'u}stico del habla con aplicaciones a la descripci{\'o}n de segmentos de discurso espont{\'a}neo y un sistema de reconocimiento autom{\'a}tico de habla espont{\'a}nea orientado a la detecci{\'o}n de palabras. El primer sistema de an{\'a}lisis presenta detalladamente todos los rasgos instintivos segmentales y suprasegmentales del habla en forma simult{\'a}nea asociados a la frecuencia, energía y duraci{\'o}n. El segundo presenta autom{\'a}ticamente los par{\'a}metros físicos asociados a la entonaci{\'o}n en una superficie que cuantifica el campo vocal del hablante y mide el rango vocal y din{\'a}mico en el discurso hablado. Se presenta un histograma de la frecuencia fundamental {\'u}til para comparar las tendencias entonativas de sesión a sesi{\'o}n. Finalmente se ha desarrollado una herramienta de reconocimiento con modelos ac{\'u}sticos para el español hablado en la Argentina. El mismo transcribe los sonidos grabados a texto y posibilita la aplicaci{\'o}n de otras herramientas para el procesamiento de lenguaje natural.
Uploads