Skip to main content
Research Interests:
Performance of distant-talking speech recognizers in real noisy environments can be increased using a microphone array. In this work we propose an N-best extension of the Limabeam algorithm, which is a likelihood-based adaptive lter... more
Performance of distant-talking speech recognizers in real noisy environments can be increased using a microphone array. In this work we propose an N-best extension of the Limabeam algorithm, which is a likelihood-based adaptive lter -and-sum beamformer. We show that this algorithm can be used to optimize the noisy acoustic features using in parallel the N-best hypothesized transcriptions generated at a rst recognition step. The parallel and independent optimizations increase the likelihood of minimal word error rate hypotheses and the resulting N-best hypotheses list is automatically re-ranked. Results show improvements over delay-and-sum beamforming and Unsupervised Limabeam on a real database with considerable amount of noise and limited reverberation.
Autonomous navigation in novel environments still represents a challenge for people with visual impairment (VI). Pin array matrices (PAM) are an effective way to display spatial information to VI people in educative/rehabilitative... more
Autonomous navigation in novel environments still represents a challenge for people with visual impairment (VI). Pin array matrices (PAM) are an effective way to display spatial information to VI people in educative/rehabilitative contexts, as they provide high flexibility and versatility. Here, we tested the effectiveness of a PAM in VI participants in an orientation and mobility task. They haptically explored a map showing a scaled representation of a real room on the PAM. The map further included a symbol indicating a virtual target position. Then, participants entered the room and attempted to reach the target three times. While a control group only reviewed the same, unchanged map on the PAM between trials, an experimental group also received an updated map representing, in addition, the position they previously reached in the room. The experimental group significantly improved across trials by having both reduced self-location errors and reduced completion time, unlike the con...
Objective: To investigate whether training with tactile matrices displayed with a programmable tactile display improves recalling performance of spatial images in blind, low-vision and sighted youngsters. To code and understand the... more
Objective: To investigate whether training with tactile matrices displayed with a programmable tactile display improves recalling performance of spatial images in blind, low-vision and sighted youngsters. To code and understand the behavioral underpinnings of learning two-dimensional tactile dispositions, in terms of spontaneous exploration strategies. Methods: Three groups of blind, low-vision and sighted youngsters between 6 and 18 years old performed four training sessions with a weekly schedule in which they were asked to memorize single or double spatial layouts, featured as two-dimensional matrices. Results: Results showed that all groups of participants significantly improved their recall performance compared to the first session baseline in the single-matrix task. No statistical difference in performance between groups emerged in this task. Instead, the learning effect in visually impaired participants is reduced in the double-matrix task, whereas it is still robust in blind...
We present a fully latching and scalable 4 × 4 haptic display with 4 mm pitch, 5 s refresh time, 400 mN holding force, and 650 μm displacement per taxel. The display serves to convey dynamic graphical information to blind and visually... more
We present a fully latching and scalable 4 × 4 haptic display with 4 mm pitch, 5 s refresh time, 400 mN holding force, and 650 μm displacement per taxel. The display serves to convey dynamic graphical information to blind and visually impaired users. Combining significant holding force with high taxel density and large amplitude motion in a very compact overall form factor was made possible by exploiting the reversible, fast, hundred-fold change in the stiffness of a thin shape memory polymer (SMP) membrane when heated above its glass transition temperature. Local heating is produced using an addressable array of 3 mm in diameter stretchable microheaters patterned on the SMP. Each taxel is selectively and independently actuated by synchronizing the local Joule heating with a single pressure supply. Switching off the heating locks each taxel into its position (up or down), enabling holding any array configuration with zero power consumption. A 3D-printed pin array is mounted over the...
Actuator density is an important parameter in the design of vibrotactile displays. When it comes to obstacle detection or navigation tasks, a high number of tactors may provide more information, but not necessarily better performance.... more
Actuator density is an important parameter in the design of vibrotactile displays. When it comes to obstacle detection or navigation tasks, a high number of tactors may provide more information, but not necessarily better performance. Depending on the body site and vibration parameters adopted, high density can make it harder to detect tactors in an array. In this paper, we explore the trade-off between actuator density and precision by comparing three kinds of directional cues. After performing a within-subject naive search task using a head-mounted vibrotactile display, we found that increasing the density of the array locally provides higher performance in detecting directional cues.
ABSTRACT Conveying spatial information to visually impaired people is possible by leveraging residual tactile abilities. It is still unclear how to effectively evaluate mental map construction beyond performance-based metrics. Here we use... more
ABSTRACT Conveying spatial information to visually impaired people is possible by leveraging residual tactile abilities. It is still unclear how to effectively evaluate mental map construction beyond performance-based metrics. Here we use a minimalistic mouse-shaped tactile device to display tactile virtual objects. We study how task complexity and visual deprivation influence behavioral, subjective and performance variables both in blind and sighted subjects. Complexity shows to be a factor equally affecting both groups. As well we show that performance, amount of acquired information and subjective judgments of task difficulty do not depend on visual deprivation. Results can help with technological solutions in rehabilitation programs for impaired individuals.
Distant-talking speech recognition in noisy environments is generally tackled by using a microphone array and a related multi-channel processing. Based on that framework, this paper proposes an N-best extension of the Limabeam algorithm,... more
Distant-talking speech recognition in noisy environments is generally tackled by using a microphone array and a related multi-channel processing. Based on that framework, this paper proposes an N-best extension of the Limabeam algorithm, that is an adaptive maximum likelihood beamformer. N-best hypothesized transcriptions are generated at a first recognition step and then optimized independently one to each other. As a result, the N-best list is re-ranked, which allows selection of the maximally likely transcription to clean ...
The purpose of this work is to describe the Microphone Network presently used at ITC-irst for multi-microphone data collection and prototype development, with the specific aim of conducting research inside the CHIL European Project. In... more
The purpose of this work is to describe the Microphone Network presently used at ITC-irst for multi-microphone data collection and prototype development, with the specific aim of conducting research inside the CHIL European Project. In the project, we define a generic multi-sensor system which consists of two main components: a distributed multi-camera system for visual room observation, including several calibrated cameras, and a multi-microphone system for acoustic scene analysis, which consists of microphone arrays, ...
This work aims at improving speech recognition in noisy environments using a microphone array. The proposed approach is based on a preliminary generation of N-best hypotheses. The use of an adaptive maximum likelihood beamformer (the... more
This work aims at improving speech recognition in noisy environments using a microphone array. The proposed approach is based on a preliminary generation of N-best hypotheses. The use of an adaptive maximum likelihood beamformer (the Limabeam algorithm), applied in parallel to each hypothesis, leads to an updated set of transcriptions, among which the maximally likely to clean speech models is selected. Results show that this method improves recognition accuracy over both Delay and Sum Beamforming and ...
This work describes an activity that led to the realization of a modified NIST Microphone Array MarkIII. This system is able to acquire 64 synchronous audio signals at 44.1 kHz and is primarily conceived for far-field automatic speech... more
This work describes an activity that led to the realization of a modified NIST Microphone Array MarkIII. This system is able to acquire 64 synchronous audio signals at 44.1 kHz and is primarily conceived for far-field automatic speech recognition, speaker localization and in general for hands-free voice message acquisition and enhancement. Preliminary experiments conducted on the original array had showed that coherence among a generic pair of signals was affected by a bias due to common mode electrical noise, ...
The paper addresses the problem of noise robustness from the standpoint of the sensitivity to noise estimation errors. Since the noise is usually estimated in the power-spectral domain, we show that the implied error in the cepstral... more
The paper addresses the problem of noise robustness from the standpoint of the sensitivity to noise estimation errors. Since the noise is usually estimated in the power-spectral domain, we show that the implied error in the cepstral domain has interesting properties. These properties allow us to compare two key methods used in noise robust speech recognition: spectral subtraction and parallel model combination. We show that parallel model combination has an advantage over spectral subtraction because it is less ...
We have recently shown that vision is important to improve spatial auditory cognition. In this study, we investigate whether touch is as effective as vision to create a cognitive map of a soundscape. In particular, we tested whether the... more
We have recently shown that vision is important to improve spatial auditory cognition. In this study, we investigate whether touch is as effective as vision to create a cognitive map of a soundscape. In particular, we tested whether the creation of a mental representation of a room, obtained through tactile exploration of a 3D model, can influence the perception of a complex auditory task in sighted people. We tested two groups of blindfolded sighted people – one experimental and one control group – in an auditory space bisection task. In the first group, the bisection task was performed three times: specifically, the participants explored with their hands the 3D tactile model of the room and were led along the perimeter of the room between the first and the second execution of the space bisection. Then, they were allowed to remove the blindfold for a few minutes and look at the room between the second and third execution of the space bisection. Instead, the control group repeated for two consecutive times the space bisection task without performing any environmental exploration in between. Considering the first execution as a baseline, we found an improvement in the precision after the tactile exploration of the 3D model. Interestingly, no additional gain was obtained when room observation followed the tactile exploration, suggesting that no additional gain was obtained by vision cues after spatial tactile cues were internalized. No improvement was found between the first and the second execution of the space bisection without environmental exploration in the control group, suggesting that the improvement was not due to task learning. Our results show that tactile information modulates the precision of an ongoing space auditory task as well as visual information. This suggests that cognitive maps elicited by touch may participate in cross-modal calibration and supra-modal representations of space that increase implicit knowledge about sound propagation.
Research Interests:
Some blind people have developed a unique technique, called echolocation, to orient themselves in unknown environments. More specifically, by self-generating a clicking noise with the tongue, echolocators gain knowledge about the external... more
Some blind people have developed a unique technique, called echolocation, to orient themselves in unknown environments. More specifically, by self-generating a clicking noise with the tongue, echolocators gain knowledge about the external environment by perceiving more detailed object features. It is not clear to date whether sighted individuals can also develop such an extremely useful technique. To investigate this, here we test the ability of novice sighted participants to perform a depth echolocation task. Moreover, in order to evaluate whether the type of room (anechoic or reverberant) and the type of clicking sound (with the tongue or with the hands) influences the learning of this technique, we divided the entire sample into four groups. Half of the participants produced the clicking sound with their tongue , the other half with their hands. Half of the participants performed the task in an anechoic chamber, the other half in a reverberant room. Subjects stood in front of five bars, each of a different size, and at five different distances from the subject. The dimension of the bars ensured a constant subtended angle for the five distances considered. The task was to identify the correct distance of the bar. We found that, even by the second session, the participants were able to judge the correct depth of the bar at a rate greater than chance. Improvements in both precision and accuracy were observed in all experimental sessions. More interestingly, we found significantly better performance in the reverberant room than in the anechoic chamber. The type of clicking did not modulate our results. This suggests that the echolocation technique can also be learned by sighted individuals and that room reverberation can influence this learning process. More generally, this study shows that total loss of sight is not a prerequisite for echolocation skills this suggests important potential implications on rehabilitation settings for persons with residual vision.
Research Interests:
This paper presents a multimodal interactive system for non-visual (auditory-haptic) exploration of virtual maps. The system is able to display haptically the height profile of a map, through a tactile mouse. Moreover, spatial auditory... more
This paper presents a multimodal interactive system for non-visual (auditory-haptic) exploration of virtual maps. The system is able to display haptically the height profile of a map, through a tactile mouse. Moreover, spatial auditory information is provided in the form of virtual anchor sounds located in specific points of the map, and delivered through headphones using customized Head-Related Transfer Functions (HRTFs). The validity of the proposed approach is investigated through two experiments on non-visual exploration of virtual maps. The first experiment has a preliminary nature and is aimed at assessing the effectiveness and the complementarity of auditory and haptic information in a goal reaching task. The second experiment investigates the potential of the system in providing subjects with spatial knowledge: specifically in helping with the construction of a cognitive map depicting simple geometrical objects. Results from both experiments show that the proposed concept, design, and implementation allow to effectively exploit the complementary natures of the " proximal " haptic modality and the " distal " auditory modality. Implications for orientation & mobility (O&M) protocols for visually impaired subjects are discussed.
Research Interests:
Visual information is paramount to space perception. Vision influences auditory space estimation. Many studies show that simultaneous visual and auditory cues improve precision of the final multisensory estimate. However, the amount or... more
Visual information is paramount to space perception. Vision influences auditory space estimation. Many studies show that simultaneous visual and auditory cues improve precision of the final multisensory estimate. However, the amount or the temporal extent of visual information, that is sufficient to influence auditory perception, is still unknown. It is therefore interesting to know if vision can improve auditory precision through a short-term environmental observation preceding the audio task and whether this influence is task-specific or environment-specific or both. To test these issues we investigate possible improvements of acoustic precision with sighted blindfolded participants in two audio tasks [minimum audible angle (MAA) and space bisection] and two acoustically different environments (normal room and anechoic room). With respect to a baseline of auditory precision, we found an improvement of precision in the space bisection task but not in the MAA after the observation of a normal room. No improvement was found when performing the same task in an anechoic chamber. In addition, no difference was found between a condition of short environment observation and a condition of full vision during the whole experimental session. Our results suggest that even short-term environmental observation can calibrate auditory spatial performance. They also suggest that echoes can be the cue that underpins visual calibration. Echoes may mediate the transfer of information from the visual to the auditory system.
Research Interests:
Tactile maps are efficient tools to improve spatial understanding and mobility skills of visually impaired people. Their limited adaptability can be compensated with haptic devices which display graphical information, but their assessment... more
Tactile maps are efficient tools to improve spatial understanding and mobility skills of visually impaired people. Their limited adaptability can be compensated with haptic devices which display graphical information, but their assessment is frequently limited to performance-based metrics only which can hide potential spatial abilities in O&M protocols. We assess a low-tech tactile mouse able to deliver three-dimensional content considering how performance, mental workload, behavior, and anxiety status vary with task difficulty and gender in congenitally blind, late blind, and sighted subjects. Results show that task difficulty coherently modulates the efficiency and difficulty to build mental maps, regardless of visual experience. Although exhibiting attitudes that were similar and gender-independent, the females had lower performance and higher cognitive load, especially when congenitally blind. All groups showed a significant decrease in anxiety after using the device. Tactile graphics with our device seems therefore to be applicable with different visual experiences, with no negative emotional consequences of mentally demanding spatial tasks. Going beyond performance-based assessment, our methodology can help with better targeting technological solutions in orientation and mobility protocols.
Vision loss has severe impacts on physical, social and emotional well-being. The education of blind children poses issues as many scholar disciplines (e.g. geometry, mathematics) are normally taught by heavily relying on vision.... more
Vision loss has severe impacts on physical, social and emotional well-being. The education of blind children poses issues as many scholar disciplines (e.g. geometry, mathematics) are normally taught by heavily relying on vision. Touch-based assistive technologies are potential tools to provide graphical contents to blind users, improving learning possibilities and social inclusion. Raised-lines drawings are still the golden standard, but stimuli cannot be reconfigured or adapted and the blind person constantly requires assistance.
We have recently shown that vision is important to improve spatial auditory cognition. In this study, we investigate whether touch is as effective as vision to create a cognitive map of a soundscape. In particular, we tested whether the... more
We have recently shown that vision is important to improve spatial auditory cognition. In this study, we investigate whether touch is as effective as vision to create a cognitive map of a soundscape. In particular, we tested whether the creation of a mental representation of a room, obtained through tactile exploration of a 3D model, can influence the perception of a complex auditory task in sighted people. We tested two groups of blindfolded sighted people - one experimental and one control group - in an auditory space bisection task. In the first group, the bisection task was performed three times: specifically, the participants explored with their hands the 3D tactile model of the room and were led along the perimeter of the room between the first and the second execution of the space bisection. Then, they were allowed to remove the blindfold for a few minutes and look at the room between the second and third execution of the space bisection. Instead, the control group repeated for two consecutive times the space bisection task without performing any environmental exploration in between. Considering the first execution as a baseline, we found an improvement in the precision after the tactile exploration of the 3D model. Interestingly, no additional gain was obtained when room observation followed the tactile exploration, suggesting that no additional gain was obtained by vision cues after spatial tactile cues were internalized. No improvement was found between the first and the second execution of the space bisection without environmental exploration in the control group, suggesting that the improvement was not due to task learning. Our results show that tactile information modulates the precision of an ongoing space auditory task as well as visual information. This suggests that cognitive maps elicited by touch may participate in cross-modal calibration and supra-modal representations of space that increase implicit knowledge about sound propagation.
Some blind people have developed a unique technique, called echolocation, to orient themselves in unknown environments. More specifically, by self-generating a clicking noise with the tongue, echolocators gain knowledge about the external... more
Some blind people have developed a unique technique, called echolocation, to orient themselves in unknown environments. More specifically, by self-generating a clicking noise with the tongue, echolocators gain knowledge about the external environment by perceiving more detailed object features. It is not clear to date whether sighted individuals can also develop such an extremely useful technique. To investigate this, here we test the ability of novice sighted participants to perform a depth echolocation task. Moreover, in order to evaluate whether the type of room (anechoic or reverberant) and the type of clicking sound (with the tongue or with the hands) influences the learning of this technique, we divided the entire sample into four groups. Half of the participants produced the clicking sound with their tongue, the other half with their hands. Half of the participants performed the task in an anechoic chamber, the other half in a reverberant room. Subjects stood in front of five bars, each of a different size, and at five different distances from the subject. The dimension of the bars ensured a constant subtended angle for the five distances considered. The task was to identify the correct distance of the bar. We found that, even by the second session, the participants were able to judge the correct depth of the bar at a rate greater than chance. Improvements in both precision and accuracy were observed in all experimental sessions. More interestingly, we found significantly better performance in the reverberant room than in the anechoic chamber. The type of clicking did not modulate our results. This suggests that the echolocation technique can also be learned by sighted individuals and that room reverberation can influence this learning process. More generally, this study shows that total loss of sight is not a prerequisite for echolocation skills this suggests important potential implications on rehabilitation settings for persons with residual vision.
Recent evidence, in early blinds individuals, supports the idea that the visual modality might be fundamental to calibrate the complex auditory space perception (Gori et al. 2014). Here we examined in blindfolded sighted participants... more
Recent evidence, in early blinds individuals, supports the idea that the visual modality might be fundamental to calibrate the complex auditory space perception (Gori et al. 2014). Here we examined in blindfolded sighted participants whether the observation of the room, in which they are immersed, improve complex auditory accuracy. We asked two groups of blindfolded sighted participants to perform two tasks which require different skills: auditory spatial bisection and spatial discrimination. The first group performed the tasks in an anechoic chamber, the first time without the possibility to see the room; the second time, after the room was visually inspected. The second group followed the same procedure, but the tasks were performed in a normal room. The accuracy of responses in both tasks in the anechoic chamber was the same before and after seeing the room. Interestingly, the second group had an improvement of bisection accuracy, but not in the discrimination accuracy, after seeing the normal room. These evidence suggests that the visual system aids the auditory system to exploit spatial knowledge of the room when localizing complex sounds. Meeting abstract presented at VSS 2015.
Due to the perceptual characteristics of the head, vibrotactile Head-mounted Displays are built with low actuator density. Therefore, vibrotactile guidance is mostly assessed by pointing towards objects in the azimuthal plane. When it... more
Due to the perceptual characteristics of the head, vibrotactile Head-mounted Displays are built with low actuator density. Therefore, vibrotactile guidance is mostly assessed by pointing towards objects in the azimuthal plane. When it comes to multisensory interaction in 3D environments, it is also important to convey information about objects in the elevation plane. In this paper, we design and assess a haptic guidance technique for 3D environments. First, we explore the modulation of vibration frequency to indicate the position of objects in the elevation plane. Then, we assessed a vibrotactile HMD made to render the position of objects in a 3D space around the subject by varying both stimulus loci and vibration frequency. Results have shown that frequencies modulated with a quadratic growth function allowed a more accurate, precise, and faster target localization in an active head pointing task. The technique presented high usability and a strong learning effect for a haptic search across different scenarios in an immersive VR setup.
Several studies evaluated vibrotactile stimuli on the head to aid orientation and communication. However, the acuity for vibration of the head’s skin still needs to be explored. In this paper, we report the assessment of the spatial... more
Several studies evaluated vibrotactile stimuli on the head to aid orientation and communication. However, the acuity for vibration of the head’s skin still needs to be explored. In this paper, we report the assessment of the spatial resolution on the head. We performed a 2AFC psychophysical experiment systematically varying the distance between pairs of stimuli in a standard-comparison approach. We took into consideration not only the perceptual thresholds but also the reaction times and subjective factors, like workload and vibration pleasantness. Results show that the region around the forehead is not only the most sensitive, with thresholds under 5mm, but it is also the region wherein the spatial discrimination was felt to be easier to perform. We also have found that it is possible to describe acuity on the head for vibrating stimulus as a function of skin type (hairy or glabrous) and of the distance of the stimulated loci from the head midline.
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be
Visual information is paramount to space perception. Vision influences auditory space estimation. Many studies show that simultaneous visual and auditory cues improve precision of the final multisensory estimate. However, the amount or... more
Visual information is paramount to space perception. Vision influences auditory space estimation. Many studies show that simultaneous visual and auditory cues improve precision of the final multisensory estimate. However, the amount or the temporal extent of visual information, that is sufficient to influence auditory perception, is still unknown. It is therefore interesting to know if vision can improve auditory precision through a short-term environmental observation preceding the audio task and whether this influence is task-specific or environment-specific or both. To test these issues we investigate possible improvements of acoustic precision with sighted blindfolded participants in two audio tasks [minimum audible angle (MAA) and space bisection] and two acoustically different environments (normal room and anechoic room). With respect to a baseline of auditory precision, we found an improvement of precision in the space bisection task but not in the MAA after the observation of a normal room. No improvement was found when performing the same task in an anechoic chamber. In addition, no difference was found between a condition of short environment observation and a condition of full vision during the whole experimental session. Our results suggest that even short-term environmental observation can calibrate auditory spatial performance. They also suggest that echoes can be the cue that underpins visual calibration. Echoes may mediate the transfer of information from the visual to the auditory system.
Tactile maps are efficient tools to improve spatial understanding and mobility skills of visually impaired people. Their limited adaptability can be compensated with haptic devices which display graphical information, but their assessment... more
Tactile maps are efficient tools to improve spatial understanding and mobility skills of visually impaired people. Their limited adaptability can be compensated with haptic devices which display graphical information, but their assessment is frequently limited to performance-based metrics only which can hide potential spatial abilities in O&M protocols. We assess a low-tech tactile mouse able to deliver three-dimensional content considering how performance, mental workload, behavior, and anxiety status vary with task difficulty and gender in congenitally blind, late blind, and sighted subjects. Results show that task difficulty coherently modulates the efficiency and difficulty to build mental maps, regardless of visual experience. Although exhibiting attitudes that were similar and gender-independent, the females had lower performance and higher cognitive load, especially when congenitally blind. All groups showed a significant decrease in anxiety after using the device. Tactile graphics with our device seems therefore to be applicable with different visual experiences, with no negative emotional consequences of mentally demanding spatial tasks. Going beyond performance-based assessment, our methodology can help with better targeting technological solutions in orientation and mobility protocols.
ABSTRACT Conveying spatial information to visually impaired people is possible by leveraging residual tactile abilities. It is still unclear how to effectively evaluate mental map construction beyond performance-based metrics. Here we use... more
ABSTRACT Conveying spatial information to visually impaired people is possible by leveraging residual tactile abilities. It is still unclear how to effectively evaluate mental map construction beyond performance-based metrics. Here we use a minimalistic mouse-shaped tactile device to display tactile virtual objects. We study how task complexity and visual deprivation influence behavioral, subjective and performance variables both in blind and sighted subjects. Complexity shows to be a factor equally affecting both groups. As well we show that performance, amount of acquired information and subjective judgments of task difficulty do not depend on visual deprivation. Results can help with technological solutions in rehabilitation programs for impaired individuals.

And 32 more