We describe the architecture and operation of Elija, a computational infant that learns to pronou... more We describe the architecture and operation of Elija, a computational infant that learns to pronounce speech sounds. Elija is modelled as an agent who can interact with his environment but who has no a priori articulatory or perceptual knowledge of speech. His sensory system responds to touch and acoustic input. He judges the value of action and response using a reward mechanism, and can associate and remember the correspondences between his actions, their reward, and prior and subsequent sensory inputs. Elija first develops the ability to babble using unsupervised learning, which is formulated as an optimization problem. Then he takes advantage of tutored interactions with his caregivers. Such interactions consist of naturalistic exchanges in which the caregivers reformulate Elija’s output. He uses these to learn the importance of his productions and this process selects for good productions and discards poor ones. In addition, using associative memory, the reformulations build up a...
Converging Clinical and Engineering Research on Neurorehabilitation III, 2018
Prior work has shown that independent motor memories of opposing dynamics can be learned when the... more Prior work has shown that independent motor memories of opposing dynamics can be learned when the movements are preceded by unique lead-in movements, each associated with a different direction of dynamics. Here we examine generalization effects using visual lead-in movements. Specifically, we test how variations in lead-in kinematics, in terms of duration, speed and distance, effect the expression of the learned motor memory. We show that the motor system is more strongly affected by changes in the duration of the movement, whereas longer movement distances have no effect.
The Time of Excitation (Tx) of speech, also widely known as the Glottal Closure Instants (GCI) de... more The Time of Excitation (Tx) of speech, also widely known as the Glottal Closure Instants (GCI) denote the points in time at which the vocal folds close during the production of voiced speech. In this paper, we extend a previous approach based on a multilayer perceptron (MLP) using Echo State Networks (ESN), a variant of a Recurrent Neural Network (RNN). We show that the MLP and ESN approaches lead to similar results. The ESN model performed better than the MLP when the latter used only a single input sample (0.86 vs 0.75 area under the ROC plot), whereas the MLP slightly outperformed the ESN (0.98 vs 0.97 area under the ROC plot) when its was provided with a sufficient number of surrounding speech samples.
Advances in Virtual Reality (VR) technologies allow the investigation of simulated moral actions ... more Advances in Virtual Reality (VR) technologies allow the investigation of simulated moral actions in visually immersive environments. Using a robotic manipulandum and an interactive sculpture, we now also incorporate realistic haptic feedback into virtual moral simulations. In two experiments, we found that participants responded with greater utilitarian actions in virtual and haptic environments when compared to traditional questionnaire assessments of moral judgments. In experiment one, when incorporating a robotic manipulandum, we found that the physical power of simulated utilitarian responses (calculated as the product of force and speed) was predicted by individual levels of psychopathy. In experiment two, which integrated an interactive and life-like sculpture of a human into a VR simulation, greater utilitarian actions continued to be observed. Together, these results support a disparity between simulated moral action and moral judgment. Overall this research combines state-o...
Imitation is almost always assumed to be the mechanism by which infants learn to pronounce speech... more Imitation is almost always assumed to be the mechanism by which infants learn to pronounce speech sounds, which are the elements from which words are made up. Specifically, it is believed that auditory matching enables a child to reproduce speech sounds by copying those that he hears. For several reasons, we believe that this is not the way that this systemic aspect of pronunciation is acquired. We test an alternative account involving a non-imitative mechanism using Elija, a computational model of an infant. Elija started by learning to babble in an unsupervised fashion. Three separate experiments were then run with Elija using one native speaker of English, French and German to play the role of the caregiver. Each caregiver interacted with a different instance of Elija in his or her native language. Using the tutored interactions from each caregiver, which involved their reformulations of his putative speech sounds, Elija learned (1) the importance of his productions, and (2) the correspondence between his and adult speech tokens, thereby developing an ability to imitate a series of such tokens, that is, a word. Finally, using his newly acquired ability to parse input speech sounds in terms of the equivalents to his own tokens, each caregiver taught Elija to say some simple words by serial imitation. We present results from these experiments and discuss the implications of this work.
1 2 Running head: Statistics of Natural Movements 3 4 Ian S. Howard 5 Computational and Biologica... more 1 2 Running head: Statistics of Natural Movements 3 4 Ian S. Howard 5 Computational and Biological Learning Laboratory, 6 Department of Engineering, 7 University of Cambridge, 8 Trumpington Street, 9 Cambridge CB2 1PZ, UK 10 Email: ish22@cam.ac.uk 11 12 James N. Ingram 13 Computational and Biological Learning Laboratory, 14 Department of Engineering, 15 University of Cambridge, 16 Trumpington Street, 17 Cambridge CB2 1PZ, UK 18 Email: jni20@cam.ac.uk 19 20 Konrad P. Körding 21 Physiology and PM and R, 22 Rehabilitation Institute of Chicago, 23 Northwestern University, 24 345 E Superior Street, 25 60611 Chicago, IL, USA 26 Email: kk@northwestern.edu 27 28 Daniel M. Wolpert. 29 Computational and Biological Learning Laboratory, 30 Department of Engineering, 31 University of Cambridge, 32 Trumpington Street, 33 Cambridge CB2 1PZ, UK 34 Email: wolpert@eng.cam.ac.uk 35 36 37
Here we consider the application of state feedback control to stabilize an articulatory speech sy... more Here we consider the application of state feedback control to stabilize an articulatory speech synthesizer during the generation of speech utterances. We first describe the architecture of such an approach from a signal flow perspective. We explain that an internal model is needed for effective operation, which can be acquired during a babbling phase. The required inverse mapping between the synthesizer’s control parameters and their auditory consequences can be learned using a neural network. Such an inverse model provides a means to map output that occur in an acoustic speech domain back to an articulatory domain, where it can assist in compensatory adjustments. We show that it is possible to build such an inverse model for the Birkholz articulatory synthesizer for vowel production. Finally, we illustrate the operation of the inverse model with some simple vowels sequences and static vowel qualities.
Rapid learning can be critical to ensure elite performance in a changing world or to recover basi... more Rapid learning can be critical to ensure elite performance in a changing world or to recover basic movement after neural injuries. Recently it was shown that the variability of follow-through movements affects the rate of motor memory formation. Here we investigate if lead-in movement has a similar effect on learning rate. We hypothesized that both modality and variability of lead-in movement would play critical roles, with simulations suggesting that only changes in active lead-in variability would exhibit slower learning. We tested this experimentally using a two-movement paradigm, with either visual or active initial lead-in movements preceeding a second movement performed in a force field. As predicted, increasing active lead-in variability reduced the rate of motor adaptation, whereas changes in visual lead-in variability had little effect. This demonstrates that distinct neural tuning activity is induced by different lead-in modalities, subsequently influencing the access to, ...
Visual observation of movement plays a key role in action. For example, tennis players have littl... more Visual observation of movement plays a key role in action. For example, tennis players have little time to react to the ball, but still need to prepare the appropriate stroke. Therefore, it might be useful to use visual information about the ball trajectory to recall a specific motor memory. Past visual observation of movement (as well as passive and active arm movement) affects the learning and recall of motor memories. Moreover, when passive or active, these past contextual movements exhibit generalization (or tuning) across movement directions. Here we extend this work, examining whether visual motion also exhibits similar generalization across movement directions and whether such generalization functions can explain patterns of interference. Both the adaptation movement and contextual movement exhibited generalization beyond the training direction, with the visual contextual motion exhibiting much broader tuning. A second experiment demonstrated that this pattern was consistent ...
Humans are able to adapt their motor commands to make accurate movements in novel sensorimotor en... more Humans are able to adapt their motor commands to make accurate movements in novel sensorimotor environments, such as when wielding tools that alter limb dynamics. However, it is unclear to what extent sensorimotor representations, obtained through experience with one limb, are available to the opposite, untrained limb and in which form they are available. Here, we compared crosslimb transfer of force-field compensation after participants adapted to a velocity-dependent curl field, oriented either in the sagittal or the transverse plane. Due to the mirror symmetry of the limbs, the force field had identical effects for both limbs in joint and extrinsic coordinates in the sagittal plane but conflicting joint-based effects in the transverse plane. The degree of force-field compensation exhibited by the opposite arm in probe trials immediately after initial learning was significantly greater after sagittal (26 ± 5%) than transverse plane adaptation (9 ± 4%; P < 0.001), irrespective o...
Almost all theories of child speech development assume that an infant learns speech sounds by dir... more Almost all theories of child speech development assume that an infant learns speech sounds by direct imitation, performing an acoustic matching of adult output to his own speech. Some theories also postulate an innate link between perception and production. We present a computer model which has no requirement for acoustic matching on the part of the infant and which treats speech production and perception as separate processes with no innate link. Instead we propose that the infant initially explores his speech apparatus and reinforces his own actions on the basis of sensory salience, developing vocal motor schemes [1]. As the infant’s production develops, he will start to generate utterances which are sufficiently speech-like to provoke a linguistic response from its mother. Such, interactions are particularly important, because she is better
Here we extend previous work for the estimation of the time of excitation (Tx) from the speech si... more Here we extend previous work for the estimation of the time of excitation (Tx) from the speech signal using a shallow neural network. We make use of a dataset that consists of the simultaneously recorded speech and Laryngograph signals from drama students speaking a phonetically balanced passage. We first use the Laryngograph signal to estimate the location of vocal fold closures as a function of time. Then, by considering the problem as a supervised learning task, we train a multilayer perceptron to map between raw speech samples, selected using a sliding input window, to a single output target sample that represents the presence or absence of an excitation point. We present result of operation across several male speakers and also demonstrate that it is possible to reconstruct the Laryngograph directly from the speech signal.
In our daily life we often make complex actions comprised of linked movements, such as reaching f... more In our daily life we often make complex actions comprised of linked movements, such as reaching for a cup of coffee and bringing it to our mouth to drink. Recent work has highlighted the role of such linked movements in the formation of independent motor memories, affecting the learning rate and ability to learn opposing force fields. In these studies, distinct prior movements (lead-in movements) allow adaptation of opposing dynamics on the following movement. Purely visual or purely passive lead-in movements exhibit different angular generalization functions of this motor memory as the lead-in movements are modified, suggesting different neural representations. However, we currently have no understanding of how different movement kinematics (distance, speed or duration) affect this recall process and the formation of independent motor memories. Here we investigate such kinematic generalization for both passive and visual lead-in movements to probe their individual characteristics. ...
We describe the architecture and operation of Elija, a computational infant that learns to pronou... more We describe the architecture and operation of Elija, a computational infant that learns to pronounce speech sounds. Elija is modelled as an agent who can interact with his environment but who has no a priori articulatory or perceptual knowledge of speech. His sensory system responds to touch and acoustic input. He judges the value of action and response using a reward mechanism, and can associate and remember the correspondences between his actions, their reward, and prior and subsequent sensory inputs. Elija first develops the ability to babble using unsupervised learning, which is formulated as an optimization problem. Then he takes advantage of tutored interactions with his caregivers. Such interactions consist of naturalistic exchanges in which the caregivers reformulate Elija’s output. He uses these to learn the importance of his productions and this process selects for good productions and discards poor ones. In addition, using associative memory, the reformulations build up a...
Converging Clinical and Engineering Research on Neurorehabilitation III, 2018
Prior work has shown that independent motor memories of opposing dynamics can be learned when the... more Prior work has shown that independent motor memories of opposing dynamics can be learned when the movements are preceded by unique lead-in movements, each associated with a different direction of dynamics. Here we examine generalization effects using visual lead-in movements. Specifically, we test how variations in lead-in kinematics, in terms of duration, speed and distance, effect the expression of the learned motor memory. We show that the motor system is more strongly affected by changes in the duration of the movement, whereas longer movement distances have no effect.
The Time of Excitation (Tx) of speech, also widely known as the Glottal Closure Instants (GCI) de... more The Time of Excitation (Tx) of speech, also widely known as the Glottal Closure Instants (GCI) denote the points in time at which the vocal folds close during the production of voiced speech. In this paper, we extend a previous approach based on a multilayer perceptron (MLP) using Echo State Networks (ESN), a variant of a Recurrent Neural Network (RNN). We show that the MLP and ESN approaches lead to similar results. The ESN model performed better than the MLP when the latter used only a single input sample (0.86 vs 0.75 area under the ROC plot), whereas the MLP slightly outperformed the ESN (0.98 vs 0.97 area under the ROC plot) when its was provided with a sufficient number of surrounding speech samples.
Advances in Virtual Reality (VR) technologies allow the investigation of simulated moral actions ... more Advances in Virtual Reality (VR) technologies allow the investigation of simulated moral actions in visually immersive environments. Using a robotic manipulandum and an interactive sculpture, we now also incorporate realistic haptic feedback into virtual moral simulations. In two experiments, we found that participants responded with greater utilitarian actions in virtual and haptic environments when compared to traditional questionnaire assessments of moral judgments. In experiment one, when incorporating a robotic manipulandum, we found that the physical power of simulated utilitarian responses (calculated as the product of force and speed) was predicted by individual levels of psychopathy. In experiment two, which integrated an interactive and life-like sculpture of a human into a VR simulation, greater utilitarian actions continued to be observed. Together, these results support a disparity between simulated moral action and moral judgment. Overall this research combines state-o...
Imitation is almost always assumed to be the mechanism by which infants learn to pronounce speech... more Imitation is almost always assumed to be the mechanism by which infants learn to pronounce speech sounds, which are the elements from which words are made up. Specifically, it is believed that auditory matching enables a child to reproduce speech sounds by copying those that he hears. For several reasons, we believe that this is not the way that this systemic aspect of pronunciation is acquired. We test an alternative account involving a non-imitative mechanism using Elija, a computational model of an infant. Elija started by learning to babble in an unsupervised fashion. Three separate experiments were then run with Elija using one native speaker of English, French and German to play the role of the caregiver. Each caregiver interacted with a different instance of Elija in his or her native language. Using the tutored interactions from each caregiver, which involved their reformulations of his putative speech sounds, Elija learned (1) the importance of his productions, and (2) the correspondence between his and adult speech tokens, thereby developing an ability to imitate a series of such tokens, that is, a word. Finally, using his newly acquired ability to parse input speech sounds in terms of the equivalents to his own tokens, each caregiver taught Elija to say some simple words by serial imitation. We present results from these experiments and discuss the implications of this work.
1 2 Running head: Statistics of Natural Movements 3 4 Ian S. Howard 5 Computational and Biologica... more 1 2 Running head: Statistics of Natural Movements 3 4 Ian S. Howard 5 Computational and Biological Learning Laboratory, 6 Department of Engineering, 7 University of Cambridge, 8 Trumpington Street, 9 Cambridge CB2 1PZ, UK 10 Email: ish22@cam.ac.uk 11 12 James N. Ingram 13 Computational and Biological Learning Laboratory, 14 Department of Engineering, 15 University of Cambridge, 16 Trumpington Street, 17 Cambridge CB2 1PZ, UK 18 Email: jni20@cam.ac.uk 19 20 Konrad P. Körding 21 Physiology and PM and R, 22 Rehabilitation Institute of Chicago, 23 Northwestern University, 24 345 E Superior Street, 25 60611 Chicago, IL, USA 26 Email: kk@northwestern.edu 27 28 Daniel M. Wolpert. 29 Computational and Biological Learning Laboratory, 30 Department of Engineering, 31 University of Cambridge, 32 Trumpington Street, 33 Cambridge CB2 1PZ, UK 34 Email: wolpert@eng.cam.ac.uk 35 36 37
Here we consider the application of state feedback control to stabilize an articulatory speech sy... more Here we consider the application of state feedback control to stabilize an articulatory speech synthesizer during the generation of speech utterances. We first describe the architecture of such an approach from a signal flow perspective. We explain that an internal model is needed for effective operation, which can be acquired during a babbling phase. The required inverse mapping between the synthesizer’s control parameters and their auditory consequences can be learned using a neural network. Such an inverse model provides a means to map output that occur in an acoustic speech domain back to an articulatory domain, where it can assist in compensatory adjustments. We show that it is possible to build such an inverse model for the Birkholz articulatory synthesizer for vowel production. Finally, we illustrate the operation of the inverse model with some simple vowels sequences and static vowel qualities.
Rapid learning can be critical to ensure elite performance in a changing world or to recover basi... more Rapid learning can be critical to ensure elite performance in a changing world or to recover basic movement after neural injuries. Recently it was shown that the variability of follow-through movements affects the rate of motor memory formation. Here we investigate if lead-in movement has a similar effect on learning rate. We hypothesized that both modality and variability of lead-in movement would play critical roles, with simulations suggesting that only changes in active lead-in variability would exhibit slower learning. We tested this experimentally using a two-movement paradigm, with either visual or active initial lead-in movements preceeding a second movement performed in a force field. As predicted, increasing active lead-in variability reduced the rate of motor adaptation, whereas changes in visual lead-in variability had little effect. This demonstrates that distinct neural tuning activity is induced by different lead-in modalities, subsequently influencing the access to, ...
Visual observation of movement plays a key role in action. For example, tennis players have littl... more Visual observation of movement plays a key role in action. For example, tennis players have little time to react to the ball, but still need to prepare the appropriate stroke. Therefore, it might be useful to use visual information about the ball trajectory to recall a specific motor memory. Past visual observation of movement (as well as passive and active arm movement) affects the learning and recall of motor memories. Moreover, when passive or active, these past contextual movements exhibit generalization (or tuning) across movement directions. Here we extend this work, examining whether visual motion also exhibits similar generalization across movement directions and whether such generalization functions can explain patterns of interference. Both the adaptation movement and contextual movement exhibited generalization beyond the training direction, with the visual contextual motion exhibiting much broader tuning. A second experiment demonstrated that this pattern was consistent ...
Humans are able to adapt their motor commands to make accurate movements in novel sensorimotor en... more Humans are able to adapt their motor commands to make accurate movements in novel sensorimotor environments, such as when wielding tools that alter limb dynamics. However, it is unclear to what extent sensorimotor representations, obtained through experience with one limb, are available to the opposite, untrained limb and in which form they are available. Here, we compared crosslimb transfer of force-field compensation after participants adapted to a velocity-dependent curl field, oriented either in the sagittal or the transverse plane. Due to the mirror symmetry of the limbs, the force field had identical effects for both limbs in joint and extrinsic coordinates in the sagittal plane but conflicting joint-based effects in the transverse plane. The degree of force-field compensation exhibited by the opposite arm in probe trials immediately after initial learning was significantly greater after sagittal (26 ± 5%) than transverse plane adaptation (9 ± 4%; P < 0.001), irrespective o...
Almost all theories of child speech development assume that an infant learns speech sounds by dir... more Almost all theories of child speech development assume that an infant learns speech sounds by direct imitation, performing an acoustic matching of adult output to his own speech. Some theories also postulate an innate link between perception and production. We present a computer model which has no requirement for acoustic matching on the part of the infant and which treats speech production and perception as separate processes with no innate link. Instead we propose that the infant initially explores his speech apparatus and reinforces his own actions on the basis of sensory salience, developing vocal motor schemes [1]. As the infant’s production develops, he will start to generate utterances which are sufficiently speech-like to provoke a linguistic response from its mother. Such, interactions are particularly important, because she is better
Here we extend previous work for the estimation of the time of excitation (Tx) from the speech si... more Here we extend previous work for the estimation of the time of excitation (Tx) from the speech signal using a shallow neural network. We make use of a dataset that consists of the simultaneously recorded speech and Laryngograph signals from drama students speaking a phonetically balanced passage. We first use the Laryngograph signal to estimate the location of vocal fold closures as a function of time. Then, by considering the problem as a supervised learning task, we train a multilayer perceptron to map between raw speech samples, selected using a sliding input window, to a single output target sample that represents the presence or absence of an excitation point. We present result of operation across several male speakers and also demonstrate that it is possible to reconstruct the Laryngograph directly from the speech signal.
In our daily life we often make complex actions comprised of linked movements, such as reaching f... more In our daily life we often make complex actions comprised of linked movements, such as reaching for a cup of coffee and bringing it to our mouth to drink. Recent work has highlighted the role of such linked movements in the formation of independent motor memories, affecting the learning rate and ability to learn opposing force fields. In these studies, distinct prior movements (lead-in movements) allow adaptation of opposing dynamics on the following movement. Purely visual or purely passive lead-in movements exhibit different angular generalization functions of this motor memory as the lead-in movements are modified, suggesting different neural representations. However, we currently have no understanding of how different movement kinematics (distance, speed or duration) affect this recall process and the formation of independent motor memories. Here we investigate such kinematic generalization for both passive and visual lead-in movements to probe their individual characteristics. ...
Uploads
Papers by Ian Howard