Introduction

Dystonia is a neurological disorder characterised by abnormal movements and postures caused by involuntary muscle contractions1. It is recognised as the third most prevalent movement disorder, with recent estimates as high as 732 per 100,000 individuals2. Despite advancements in understanding the epidemiological, neurogenetic, and neurobiological factors associated with dystonia, the identification of objective biomarkers remains challenging. Consequently, the diagnosis, monitoring of treatment outcomes, and classification of dystonia heavily rely on clinical phenomenology. This entails considering various factors, such as the distribution of affected body regions, which allows for categorising dystonia along a severity spectrum of focal, segmental and generalised manifestations2. However, dystonic movements exhibit highly complex spatiotemporal characteristics, involving a combination of tonic and phasic elements, such as twisting, tremulous oscillations, and overflow to other body regions, occurring on variable time scales and exacerbated or alleviated by certain movements1,3,4,5. Achieving precise clinical phenotyping of dystonia poses a significant challenge, demanding expert visual perception skills6.

To accurately assess disease progression and therapeutic outcomes in dystonia, it is essential to employ reliable and well-defined operational measures that can be consistently measured and interpreted across diverse clinical settings and practitioners. This is of particular relevance for assessing outcomes of available therapies, ranging from oral medication to Botulinum neurotoxin injections for selective muscle weakening and deep brain stimulation (DBS)7. To date, clinical rating scales such as the Toronto Western Spasmodic Torticollis Rating Scale (TWSTRS) for cervical dystonia and the Burke-Fahn-Marsden Dystonia Rating Scale (BFMDRS) for generalized dystonia have been extensively utilised for this purpose8,9,10. These scales aim to condense complex clinical observations into simplified representations, relying on a limited set of categorical items, such as head-angle deviations in attempted neutral head position, encoded by a few ordinal values. Although this simplification offers advantages in time-sensitive clinical settings, it is accompanied by significant clinimetric limitations, including substantial inter-rater variability11,12,13. Furthermore, the original versions of these scales fail to quantify important information regarding abnormal movement trajectories, action-induced changes of dystonia, dystonic overflow (i.e., the spread of dystonic posturing/movement to adjacent body parts), and tremor, which has recently been recognised as affecting a majority of dystonia patients14. Yet, emerging evidence from animal models highlights the critical role played by the rich spatiotemporal structure of motor behaviour in understanding the pathocircuitry of dystonia, thereby shaping our approach to investigation and treatment15,16,17. The lack of standardised operational and shared measures hampers translational efforts, thus necessitating the development of objective outcome measures3,18.

To address the challenges of dystonia assessment, researchers have explored various instrumented solutions, such as electromyography7 or body-worn sensors19,20. However, the successful integration of these approaches into clinical practice has proven elusive21. Contactless, vision-based methods utilising multiple and/or special depth cameras have shown promise in extracting head angles in cervical dystonia. Nevertheless, their clinical validity has been limited, especially when operating under monocular conditions22,23. In this context, computer vision, a branch of contemporary artificial intelligence, has emerged as a disruptive and promising technology in clinical neuroscience and broader medical applications24,25,26,27,28. By leveraging convolutional neural networks (CNNs), visual perceptive frameworks offer several advantages, including real-time 3D human pose tracking derived from monocular 2D videos captured by consumer-grade camera hardware29,30. These advancements have significantly improved head pose estimation31,32,33, some of which have been employed to semi-automate TWSTRS and TWSTRS-2 ratings21,34,35. However, these studies have primarily focused on reproducing the rating score by quantifying static and dynamic head34 angular deviations in a single fixed head position, thereby reinforcing the aforementioned limitations and biases associated with the rating scale. Our hypothesis is that a naturalistic approach, which incorporates both gestalt aspects and the dynamics of head movement, will lead to a more accurate and ecologically valid assessment of dystonia. This approach will enable us to capture subtle variations and intricate patterns that may have been overlooked by previous constrained methods. Furthermore, we propose that including healthy controls and different dystonia subgroups, with repeated recordings at various therapeutic states (e.g., different DBS settings), will allow us to explore multiple facets of specificity in these digital physiomarkers.

In this study, we have developed a visual perceptive deep learning framework that utilises computer vision to analyse the dynamics of natural head movement (Fig. 1). The goal was to identify distinct patterns, or pathosignatures, that have diagnostic and therapeutic implications. By doing so, we aimed to enhance our understanding of the underlying pathophysiology of dystonia and effects of therapeutic neuromodulation. Specifically, we trained a novel convolutional neural network to predict movement states, and we combined the outputs with head angles obtained from a benchmark algorithm, MediaPipe, to extract both static and kinematic features from patients undergoing clinical dystonia examinations. To demonstrate the feasibility of our approach, we conducted a retrospective cohort study to assess the agreement between predicted severity and clinical ratings, establishing how both static and dynamic variables change in response to DBS. Lastly, through application of our framework to an additional cohort of patients with generalised dystonia, we provide insights into the added value of the dynamic variables in differentiating between patients with cervical dystonia and those with generalised dystonia.

Fig. 1: Measurement of static and kinematic features using computer vision workflow.
figure 1

Videos comprising individual frames are fed into convolutional neural network models that predict the movement state, i.e., the probability of the head direction of a patient, and track face-mesh coordinates to derive head angles for each frame. Head angle deviations can be extracted directly during periods of the video where a patient attempts a neutral face forward position. Using the full video, kinematic features can be constructed from the movement states predictions and angles, e.g., the correlation between axis of rotation or dystonic tremors. Features can be stored and compared across groups, such as operation status or disease.

Results

A total of N = 86 patients were retrospectively rated in both treatment conditions by three independent raters using the TWSTRS severity rating scale, as well as the TWSTRS-2 tremor item. We ensured that the raters were blinded to the disease and treatment status of the patients. For severity ratings, we focused on the attempted neutral, ’null’ head position captured in each video, aiming to capture dystonic head deviations in three principal axes: yaw for torticollis, tilt for laterocollis, and pitch for antero-/retrocollis. We observed considerable variation in the annotated scores among the raters, which differed between axes: while clinical raters strongly agreed (mean Cohen-kappa score: 0.86) on torticollis severity, they only moderately agreed on laterocollis and antero-/retrocollis scores (mean Cohen-kappa scores: 0.65, 0.67, respectively) (Supplementary Fig. 1). Across all axes, DBS treatment led to a significant reduction in clinical ratings, i.e., severity (Supplementary Fig. 2A). However, effect sizes differed considerably within each axis: 0.71 for torticollis, 0.49 for laterocollis and 0.85 for anteroretrocollis. Moreover, individual clinical scores exhibited a strong correlation between the pre- and post-operative DBS conditions (Supplementary Fig. 3A). Longitudinally, post-operative clinical ratings in the torticollis and laterocollis directions demonstrated a negative correlation with the duration between pre- and post-operative evaluations, in line with the clinical observation of delayed effects (Supplementary Fig. 3B). However, there was no correlation between the difference in clinical rating from pre- to post-operation and the duration of time (Supplementary Fig. 3C).

We proceeded to assess the clinical relevance of the visual perceptive framework in accurately capturing angular deviations of the head. We extracted the excursion of head angles from attempted neutral head positions for each patient. The head angles strongly agreed with clinical scores for all prinicipal axes of motion (r ≥ 0.66, Fig. 2a). We further observed a significant reduction of head angle deviations in each axis by DBS (Fig. 2b) with largest effect sizes in torticollis (0.59), followed by laterocollis (0.46) and anteroretrocollis (0.38). To further investigate the relationship between head angle deviations and clinical characteristics, we divided the patients into three phenotypic groups based on their dominant axis of deviation. We discovered that each group of patients exhibited a significant change from pre- to post-operative evaluations only in their respective dominant axis of deviation (effect sizes: torticollis 0.76; laterocollis 0.93; anteroretrocollis 0.60; Fig. 2c). For instance, patients with a dominant torticollis excursion only demonstrated a significant change in yaw but not in other axes. Furthermore, we found no systematic excursion in a particular direction for any axis of movement (Supplementary Fig. 4). The pre- and post-operative head angles exhibited a strong correlation (Fig. 2d), indicating a reduction in angle excursion following treatment but not a complete elimination. However, we did not observe a systematic correlation between head angles in different axes of motion among both patients and controls (Fig. 2e).

Fig. 2: Computer vision analysis of head angle during periods of face-forward.
figure 2

a 2-D histograms for comparing video derived head-angle (absolute angle) and clinically assigned TWSTRS scores for each axis of motion. b Box plots showing (absolute) pre- (grey) and post- (purple) operative angles, for each axes of movement. Median and interquartile ranges are displayed in each plot. c Like (b) but patients are separated into their dominant phenotypes, i.e., their dominant axis of deviation from face-forward. d Scatter plot showing correlation of predicted pre- and post-operative head angles for each movement axis. e Scatter plots correlating each pair of axes of motion for (i) patients and (ii) healthy controls, for pre- and post-op combined. All subplots used N = 86. Correlations were Pearson r tests. Group tests were Mann–Whitney U-tests: *p < 0.05; **p < 0.01; ***p < 0.001.

We argue that the exclusive reliance on measurements of static head angular deviations falls short of capturing the multifaceted abnormalities in dystonic movements that are often evident in clinical assessments. An example of the head-angle kinematics from a full clinical examination both pre- and post-operation is shown in Fig. 3a—highlighting the importance of taking a dynamic approach to clinical assessment. We conducted an explorative analysis of videos encompassing the entire TWSTRS severity assessment, utilising a comprehensive set of clinically inspired kinematic variables (Fig. 3c left). First, we find that several kinematic variables are significantly larger pre-operatively. The top five differentiating features included oscillatory characteristics in each axis (ranging from 2 to 10 Hz) and correlations of movement states. Notably, the effect sizes of these kinematic features were generally larger than those of angle deviations during attempted neutral face-forward positioning, suggesting that they are more responsive to DBS intervention. To identify kinematic features that are predominantly associated with a favourable treatment response, we further divided the sample into responder and non-responder groups based on the degree of improvement in overall clinical rater scores (i.e., patients with ≥ or <30% improvement, respectively, Fig. 3b). After repeating statistical tests between DBS conditions for the responder and non-responder groups, respectively, we found that the top five kinematic features were also more strongly modulated in the responder group (calculated as effect size of responders minus the effect size of non-responders, Fig. 3c right). To understand the time-scales at which dystonic movements were modulated by DBS, we applied multiscale entropy analysis to the head-angle time-series. With DBS on, patients displayed less complex head movements at shorter timescales (i.e., <1 s) (Supplementary Fig. 5), but no significant differences were observed at longer scales (i.e., >1 s), indicating that neural circuit interventions restore movement regularity on subsecond time scales.

Fig. 3: Statistical analysis of kinematic variables from full-videos.
figure 3

Kinematic variables (e.g., head tremor amplitude and frequency, correlations of movement states) were derived from the full-video as patients performed a series of clinically assigned movements. a Example of the head-angle kinematics for a randomly chosen patient pre- and post-operation reveals a more structured movement with DBS. b Responders are patients who observed a 30% relative improvement in their clinically rated score from pre-operative DBS off to post-operative DBS on (N = 86). c Summary of statistical analysis, showing (i) effect size of Wilcoxon tests between pre- and post-operation (rank-biserial correlation, positive effect indicating variable is larger during pre-operation period) and (ii) the difference in effect sizes of the responder group and non-responder group (N = 71, all tests Benjamini–Hochberg FDR corrected). d A scatter plot showing the relationship between predicted values of total severity scores using additive sequential feature selection on a linear model with a combination of kinematic and static features (mean absolute error 4.79). The dotted red line corresponds to line of perfect agreement between predicted and true holistic scores (subset of N = 28 patients for which holistic pre- and post-op scores were available). e 2-D histograms for comparing video derived oscillations for each axis of motion and a clinically assigned tremor severity score (not defined by axis of motion). Fitted linear model in black (N = 86). Significance levels: *p < 0.05; **p < 0.01; ***p < 0.001.

Despite the original purpose of the scores to capture head-angle deviations from the natural face-forward position, we hypothesized a significant influence of a broader clinical impression beyond pure angular deviations. Hence, we investigated whether the kinematic variables also correlated with the clinically annotated scores. We found that various kinematic features positively and negatively correlated with the scores of each axis of head motion (FDR corrected p-values, Supplementary Fig. 6). These kinematic features included symmetries of movement, correlations of movement states, and oscillation amplitudes and frequencies. By collapsing the TWSTRS sub-item scores into an average, we further defined a holistic, clinical dystonia severity measure. Notably, the correlation strength of kinematic features to the holistic score was larger compared to the scores of each independent axis of motion, in some cases surpassing the correlation strength of the head-angle deviations (Supplementary Fig. 6). To independently verify the holistic score, we collected the original total TWSTRS severity score for a sub-cohort of cervical dystonia patients (N = 28, scores ranging from 0 to 27). Using a linear model with additive sequential feature selection, we found the optimal model to predict the total TWSTRS severity score included a combination of static head-angle deviations and kinematic features (mean absolute error 4.79, Fig. 3d). Repeating sequential feature selection with only head-angle deviations produced inferior predictions (mean absolute error 5.63), suggesting that head-angle deviations must be accompanied by kinematic features for the automated assessment of overall dystonic severity. We further examined the oscillatory kinematic features along each axis and found that they correlated with clinically assigned tremor scores (Fig. 3e). However, we found no significant correlation between the oscillation amplitudes and face-forward angle deviations (Supplementary Fig. 7), suggesting that they capture distinct dimensions of dystonic movements independent of the angle deviations in static head position.

Finally, we asked whether kinematic features could differentiate between cervical and generalised dystonia. To do this, we used an independent, out-of-sample dataset comprising pre- and post-operative videos of 30 patients with generalised dystonia. It should be noted that patients with generalised dystonia tend to also have craniocervical disease manifestations13. Due to the video framing, only the upper-half poses of patients were captured, thereby excluding additional signs of generalised dystonia such as twisting in limbs from the analysis (Fig. 4a). As clinical ratings and periods of static face-forward were unavailable for the generalised dystonia dataset, only kinematic variables (and not head angle excursions) were extracted. First, we repeated the statistical analysis of kinematic variables as modulated by DBS in generalised dystonia patients (Supplementary Fig. 8). Remarkably, we observed that the same five dynamic variables that exhibited the strongest response to DBS in cervical dystonia were also significantly modulated in the generalised dystonia patients (Supplementary Fig. 8A). A direct comparison of the generalised and cervical patients in the kinematic feature space revealed seven features that displayed a clear differentiation between the disease sub-types (Fig. 4b). These features were consistently larger in generalised dystonia patients. Among the significant features, four corresponded to the previously identified five kinematic features that were preferentially modulated by DBS: oscillatory features in all three axes and a mean movement correlation with the face-forward position. Additionally, two frequency-related variables capturing the frequency of head oscillations in the laterocollis and anteroretrocollis axes were also significantly larger in generalised dystonia patients. Considering the prominent involvement of oscillatory kinematic features, which are associated with tremor, we further examined the strength of harmonics in cervical and generalised dystonia patients. Our analysis revealed that generalised dystonia patients exhibited stronger harmonics in head tremor oscillations across all axes of motion, in comparison to cervical dystonia patients (Fig. 4d). Moreover, multiscale entropy analysis showed that generalised dystonia patients displayed maximal entropy at much earlier timescales relative to cervical dystonia patients (Supplementary Fig. 9). This suggests that longer-scale movement patterns hold valuable information to distinguish between different types and symptom stages of dystonia.

Fig. 4: Comparison of generalised and cervical dystonia patients using kinematics variables.
figure 4

Annotations of face-forward periods were unavailable and thus only kinematic variables from the full-videos were extracted. a Schematic describing the typical visibility of patient pose captured by videos. Markers indicate common symptoms in cervical and generalised dystonia. b Effect sizes (rank-biserial correlation) of Mann–Whitney U-tests (Benjamini–Hochberg FDR corrected) between generalised and cervical dystonia patients (positive effective indicating variable is larger in generalised dystonia patients). c Violin plots of variables that are significantly larger in generalised dystonia patients relative to cervical dystonia patients (none were found as statistically significant vice-versa). d Comparison of oscillation harmonic strengths between cervical and generalised dystonia patients along each axis of motion. Harmonic strength is measured per patient as the distance correlation between the phases of the dominant tremor frequency and its harmonic (twice the dominant frequency). N(cervical) = 71, N(generalised) = 30. Mann–Whitney U-tests: *p < 0.05; **p < 0.01; ***p < 0.001.

Discussion

In this study, we developed a visual perceptive framework using convolutional neural networks to comprehensively evaluate dystonia based on clinical video recordings. Our unique dataset comprised longitudinal video documentation of cervical and generalised dystonia patients’ full clinical assessments at multiple medical centres, including those with and without DBS. This enabled us to comprehensively evaluate head movements in both task-constrained, static conditions and quasi-naturalistic, dynamic conditions, providing a holistic assessment of dystonia. Beyond technical validation, we demonstrate the framework’s utility to augment clinical judgement and facilitate insights pertinent to disease states and the readout of neural circuit intervention effects.

Clinical scales, commonly used to assess dystonia and other neurological disorders, have inherent limitations due to clinimetric issues, likely stemming from the oversimplification of complex disease phenomenology into low-dimensional ordinal parameters9,11,12,13. While necessary in time-sensitive clinical settings, this oversimplification comes at the expense of precision, granularity, and ultimately, ecological validity. An illustration of this is evident when comparing the relatively simple contemporary scoring approaches with Oppenheim’s detailed phenotypical account of dystonia from 191136. For example, TWSTRS omits some key clinical features of dystonia which only become evident with dynamic, voluntary movements and undoubtedly influence overall clinical judgement. While adaptions to TWSTRS have incorporated tremor-related features9,37, there is still a growing demand for more objective and granular disease metrics3,18. Already widely used for quantitative phenotyping in experimental neuroscience, computer vision approaches have recently emerged as a promising new tool for clinical assessments21,24,25,34,38.

We first demonstrate the robustness and clinical applicability of our deep learning framework in accurately inferring head-angle deviations during attempted ’null’ head positions from diverse clinical videos captured using consumer-grade hardware. Our visual perceptive approach surpasses various vision-based frameworks that relied on multiple or specialized depth cameras to automate ratings22,23, and achieves comparable performance to a recent study by Zhang et al.21 (we note that we use TWSTRS 0-3 ratings for latero-, antero-, and retrocollis, which differs from the TWSTRS-2 0-4 ratings used in Zhang et al.21, which limits quantitative comparison). However, a distinguishing feature of our approach is the ability to estimate head angles in real-time using a portable device such as smartphones or tablets. This capability enables its practical implementation in clinical point-of-care settings and at-home monitoring, enhancing accessibility and convenience. Moreover, we have applied our framework to diverse clinical videos from multiple centres with slightly differing protocols, showing the effects of neuromodulation in both focal and generalised dystonia, highlighting the generalisability of our tool and specificity of our findings.

The key advantage of our framework lies in its capability to analyse full video examinations of patients. To showcase this, we reverse-engineered complex clinical observations such as dystonic overflow, tremor and the action-dependent dynamics of dystonic movements into objectively measurable kinematic features. Albeit highly informative and clinically indispensable, these dimensions are not explicitly part of the TWSTRS. To this end, we first fine-tuned a convolutional neural network to parse naturalistically occurring 3D head positions into discrete, geometrically defined states. By projecting each sample into a high-dimensional space comprising clinically inspired and interpretable kinematic features, we successfully identified a set of five kinematic variables that exhibited maximal differentiation across therapy states. These were in addition to expected improvements in head-angle deviations, which have been shown in prior studies7. These kinematic features demonstrated the most pronounced response to neuromodulation, rendering them highly specific to the behavioural downstream effects of the neural circuit intervention. Furthermore, our analysis revealed that these same features were closely associated with a favourable treatment response to DBS, as defined clinically by a relative score reduction of more than 30%39,40,41. This finding not only underscores the relevance of these features but also highlights their potential as reliable indicators of effect and efficacy of neural circuit interventions.

Our analysis revealed that three of the kinematic features associated with DBS effect were related to head oscillations. Additionally, multi-scale entropy analysis highlighted that neuromodulation exerts the most profound effects on movement regularity on a subsecond time scale, pointing rather to high frequency phasic than low frequency, tonic aspects of dystonic movements. This finding aligns with recent evidence indicating that head tremor is a prevalent manifestation in the majority of patients with cervical dystonia14,42,43. The recognition of tremor as a core clinical characteristic only recently led to the inclusion of a quantifiable tremor item in the revised version of the TWSTRS9. Tremor-related features emerged most consistently across contrasts, strongly highlighting the previously less well-documented role of oscillatory aspects in dystonia pathophysiology and therapy. Notably, tremor has been associated with impaired physical functioning and pain, which are crucial dimensions of quality of life in dystonia44,45. Therefore, the linkage between kinematic features and patient-centred outcomes provides an avenue for further investigations into ‘disease architectures’ comprised by multiple phenotyping axes. The remaining correlational features we identified in our analysis provide further insights into potential manifestations of dystonic overflow and multiaxial involvement, as expressed in an abnormal covariance of head movement trajectories. These features were evaluated throughout dynamic movement trajectories, capturing a key characteristic of dystonia, namely the provocation of involuntary, dystonic movements through voluntary action. Using an independent dataset comprising 30 generalised dystonia patients, we found the same kinematic features reflected pallidal DBS effects, confirming aforementioned results in cervical dystonia patients. Furthermore, experimental investigations in rodent models of dystonia suggest that similar correlational features are independent predictors of genetic susceptibility factors in rodent models of dystonia, establishing a first hint for their neurobiological and translational relevance15.

To gain further insights into the discriminatory potential of these kinematic features, we attempted to distinguish different disease states, namely focal-cervical and generalised dystonia, within the kinematic feature space. A total of seven features exhibited significant differences, with four of the previously identified kinematic features among them. Notably, control patients exhibited the lowest values, followed by cervical and then generalised dystonia patients. Multi-scale entropy analysis further highlighted that focal and generalised dystonias show a pronounced difference of regularity in both subsecond and longer time scales > 1 second, pointing to a more profound dysfunction of motor control in generalised dystonia at short timescales, but improved motor control at longer timescales. This increased complexity at shorter timescales (higher entropy), may speculatively be the cause of the non-linearities generating the larger harmonics that we also observed in generalised patients. However, these harmonics may also be generated by distinct brain signals that vary between the disease sub-types46 or result from mechanisms such as frequency mixing47. Overall, these observations align well with the concept of a dystonic phenotypical continuum wherein severity progressively increases13, and suggests that these kinematic features sensitively capture disease state and progression, which is of critical relevance for interventional studies. Moreover, the emergence of a unified feature set specific to both cervical and generalised dystonia aligns with recent findings demonstrating the convergence of a multisynaptic neural network underlying both dystonia subgroups48. The observed motor behavioural disorganisation is mirrored on the neurobiological level by pathologically irregular neuronal firing patterns associated with the dystonic state15,49, overall suggesting that kinematic features can be a powerful readout of brain circuit function.

To better understand what information the kinematic features captured, we next correlated them with clinically annotated scores and measured angular head deviations. Despite the clinically annotated scores purposed to capture natural head-angle deviations from attempted null position, we found various dynamic features that were correlated with a holistic clinical score, but generally not with individual scores along each axis, nor with head-angle deviations. These results imply that the kinematic features are, at least partially, encoded by different neurobiological substrates. In terms of oscillatory features, this finding aligns with recent work on diverging symptom-specific circuit components for tremor versus dystonia50. Moreover, it suggests that clinicians may incorporate more complex kinematic aspects from a patient’s dystonic symptomatology into their clinical scores either subconciously reflecting the global clinical impression or the presence of dynamics introduces inaccuracies into manually determining clinical scores. This could explain some of the discussed limitations of current scoring approaches, which may be confounding factors for score-based therapeutic or brain-behaviour association studies. Within the context of rapidly emerging adaptive neurotechnologies51 and connectomic neuroimaging techniques48, intriguing use cases for our deep learning approach come to mind, such as pathophysiologically motivated circuit interrogation or guidance of adaptive and personalized neuromodulatory treatment regimes51,52.

Our study has several limitations that should be considered. Firstly, our assessments were limited to videos focusing on the upper body, thereby neglecting dystonic phenomena occurring in other regions. However, it is important to note that cervical dystonia is one of the most common forms of dystonia, and studying head kinematics provides a valuable entry point for investigating digital pathosignatures of dystonia, given the relative simplicity of head movements compared to whole-body movements. Secondly, although our measurements of oscillation amplitudes demonstrated substantial clinical validity, it is important to note that the degree of validity was slightly lower compared to previous investigations that exclusively focused on oscillations occurring in the head’s null position34. We deliberately opted to derive tremor amplitudes from the full videos, considering that tremor in cervical dystonia exhibits variation in relation to head position42,53. This approach allowed for a more ecologically valid estimation of tremor but also introduced natural variability into the measurements. Thirdly, we did not incorporate information on DBS parametrization. The location of the implanted lead and the electrical stimulation fields are known to be important predictors of therapy response in dystonia39,48,54. This omission may have influenced the performance of our kinematic features in capturing therapy state contrasts, as suboptimal responses could reduce the overall distance between therapy states in the feature space. To partially address this limitation, we conducted a subgroup analysis specifically focusing on clinically determined good responders. Fourthly, the retrospective integration of data across three studies may have introduced heterogeneity for which we didn’t control. On the other hand, it also highlights the generalisability and robustness of the identified biomarkers and our results to variation in measurement paradigms across centres. Finally, there is a possibility of undetected monogenetic dystonias in our sample. Some mutations are known to be associated with favourable (e.g. TOR1A, SCGE), others with mixed or even poor DBS outcomes (e.g. THAP1, ANO3)55. While this might introduce some unexplained variance in DBS outcomes and potentially also baseline phenotype, there is no evidence to suggest it negatively affected our framework’s accuracy. Given that responsiveness to DBS can be indicative of specific genetic variants, our granular phenotyping may even provide critical insights guiding neurogenetic investigations, e.g., in cases with conspicuous DBS outcomes which are otherwise unexplained.

Overall, these findings highlight the potential of our visual perceptive framework to enhance and augment dystonia diagnosis, monitoring and therapy by uncovering consistent latent pathosignatures. The proposed modern vision-based approach expands upon traditional principles of ‘medical cinematography’ in movement science. Video-derived kinematic pathosignatures may not only inform neural circuit therapeutics but also address the critical need for objective and standardized evaluation methods in the form of digital biomarkers. Their high sensitivity has recently been shown to facilitate clinical trials, genotype predictions and continuous monitoring in neurological disorders25,38,56. Moreover, our framework may bridge methodological gaps between clinical and experimental neuroscience, which has already widely adapted computer vision for phenotyping animal models of dystonia15,16,17. We envisage the proposed tool to strengthen translational and precision medicine approaches in modern neurology.

Methods

Study design and participants

We sourced clinical video data documenting the severity of cervical and generalised/segmental dystonia from two prospective, longitudinal, multi-centre cohort studies investigating the therapeutical effect of pallidal DBS on dystonia40,41,57 and a third, multi-centre retrospective investigation analysing clinical outcomes using advanced neuroimaging techniques39. The original DBS studies only included idiopathic, primary dystonia patients as per their inclusion criteria—determined using a routine pre-operative MRI.

The sourced data was split into two datasets, based on dystonia subtype: (i) cervical and (ii) generalised dystonia. The cervical dystonia cohort comprised 86 cervical dystonia patients from Rostock, Heidelberg, Dusseldorf, Berlin, Innsbruck, Oslo, Hannover, Kiel, Würzburg. The generalised dystonia cohort comprised 30 patients from the same centres. The characteristics of the datasets are detailed in Table 1. Individual datasets were included if (i) they contained at least one pre-operative clinical rating video showing the full dystonic phenotype and a video from the chronic post-operative phase (3-36 months post surgery) documenting the effects of clinically programmed DBS and (ii) both videos fulfilled minimal criteria ensuring video quality, which were chosen to reflect the current best practice in clinical computer vision approaches21,24,25. These were: (i) front view perspective of a single individual sitting on a chair, (ii) no significantly obscuring items on patients (e.g., excessive head dressings with externalised DBS device), (ii) no excessive camera movements, variable zoom depths or lighting insufficient to identify typical body landmarks (e.g., eyes), (iii) continuous presence of head and neck in the camera frame. A final set of 232 videos, comprising a total of 116 individual patients, was analysed in this study. All videos were recorded with standard consumer grade camera hardware, in most instances mounted on a tripod. The minimal spatiotemporal resolution was 540 × 540 pixels and 24 frames per second. An additional cohort of 22 healthy controls underwent a structured TWSTRS examination and a head position matching task. This task was precisely timed to map ground truths of head movement range along each of the three principal rotational axes (pitch, yaw, tilt; see Supplementary Fig. 10 for the detailed protocol).

Table 1 Demographic and patient characteristics

Ethics approval

This study was approved by the Julius-Maximilians University ethics committee (AZ 301/20). The original studies had been approved by the responsible ethics committees. Informed consent was obtained from all human participants.

Clinical scoring

Respective dystonia severity rating scales, i.e., Burke-Fahn-Marsden dystonia rating scale (BFMDRS) for generalised dystonia and the Toronto Western Spasmodic Torticollis severity part (TWSTRS) for cervical dystonia, had originally been administered in an open-label approach or by one expert rater. In order to eliminate potential scoring biases and to extend the clinical rating to include head tremor9, all videos were re-scored. To this end, video segments in which patients were asked to let their head drift to its natural null position were annotated. These segments partly reflect the individual dystonic phenotype and its severity (corresponding to TWSTRS severity element I). Three raters, two blinded senior movement disorders experts (DZ, CWI) and one junior investigator (LF) specifically trained using the TWSTRS teaching tape8, applied the TWSTRS severity part. Three raters, two blinded senior movement disorders experts (DZ, SRS) and one junior investigator (LF), also applied an additional head tremor subscore from TWSTRS-29. A subset of N = 28 cervical dystonia patients also had a holistic severity score that was available from the original clinical examinations. For subsequent analyses, we mainly focused on TWSTRS severity item assessing the time-weighted deviation of head posture from neutral straight ahead along three main rotational axes, namely pitch for antero-/retrocollis, yaw for torticollis and tilt for laterocollis. The original TWSTRS contains further items, which, however, failed to meet criteria for utility in subsequent investigations9. Each item is scored on an ordinal scale from 0–3 (laterocollis, anterocollis, retrocollis) or 0–4 (torticollis, head tremor), corresponding to increasing angle deviations of the head from the midline or in case of tremor, its amplitude, duration and dominant direction. Assessors’ ratings were collapsed into one ‘mean score’ for subsequent model evaluations.

Computer vision models

We built a comprehensive framework for assessing dystonia phenotype and severity, enabling automated kinematic evaluation directly from video. Our approach involved combining the outputs of two convolutional neural networks: one tracking facial landmarks and the other one for extracting gestalt information, represented as movement states. From each video, utilizing the deep learning outputs, we derived static variables during periods when patients were instructed to allow their heads to drift to a neutral position (referred to as the null position). In addition, dynamic variables capturing the patients’ natural movement patterns were extracted using the entire duration of the TWSTRS video examination. We should note that these clinical examinations often didn’t follow the full TWSTRS protocol, nor did they necessarily follow the prescribed ordering of movements.

To achieve face and head tracking, we utilized a pre-trained model from MediaPipe32. We opted for MediaPipe due to its real-time applicability and compatibility with mobile devices, which holds potential for point-of-care applications (Model 1). The default tracking values of MediaPipe’s video mode (detection confidence: 0.5; tracking confidence: 0.5) were employed. Head angles were calculated relative to a neutral face-forward position along three axes of movement (torticollis, laterocollis, and antero-/retrocollis) using the face mask. We employed the orthogonal Procrustes technique to compute the rotation necessary to minimize the discrepancy between the rotated 3D face mask and a face-forward face mask, thus obtaining accurate head angles58.

To track gestalt patterns throughout the videotaped examinations, which lacked a fixed protocol and order, we aimed to classify the head movement states on a frame-by-frame basis along the three principal axes. For this purpose, we developed a custom model trained on videos of healthy controls and a subset of cervical dystonia patients (Model 2). We fine-tuned a pre-trained resnet50 convolutional neural network model in PyTorch for 30 epochs to achieve loss convergence. We took a subset of 23 cervical dystonia patients and combined them with the 22 healthy controls, giving us a total of 45 participants to train and validate the model. We split the 45 participants into two groups: 30 participants for training (15 cervical dystonia and 15 healthy controls) and 15 participants for test (8 cervical dystonia and 7 healthy controls)—see Table 1. Participants were exclusively assigned to either the training set or test set with no overlap. Video frames were then randomly sampled from the patients within the training and test sets separately. The training set comprised a total of 4600 video frames taken from the 15 cervical dystonia patients (1437 frames) and 15 healthy controls (3163 frames). The testing set comprised 1433 video frames taken from the 8 cervical dystonia (288 frames) and 7 healthy controls (1145 frames). Note that the 8 cervical dystonia patients used to train the custom model were not included in the post hoc statistical analyses of kinematic features that were derived from the custom CNN (e.g., movement correlations with face-forward). Movement states (e.g., ‘face forward’ or ‘tilt left’) were labelled by a junior movement disorders expert (MF) in the patients, while the healthy controls followed a set protocol of head movements (Supplementary Fig. 10). The custom model demonstrated training and test accuracies of 83.8% and 84.6%, respectively. We employed multilabel classification with a binary cross-entropy loss function during model training.

Feature engineering

Using the outputs of the two convolutional neural network models, we engineered several kinematic features that capture the temporal evolution of patients’ head trajectories beyond simple angular deviations. These kinematic features aimed to quantify clinically relevant observations in dystonia that are commonly noted but seldom quantified in clinical settings, such as movement overflow to other bodyparts as well as action-induced changes of dystonia, both resulting in asymmetrical or abnormal movement trajectories, dystonic tremor14 and the complexity of dystonia characterised by the involvement of multiple axes in phasic or tonic movements and movement predictability over time. The features were partially harmonised with kinematic features recently reported to be relevant to dystonia phenotype and genetics in rodent models of dystonia15,16 as well as the characterisation of brain dynamics more broadly59,60.

Both the head-angle measurements derived from Mediapipe (Model 1) and the softmax outputs from the custom-trained CNN (Model 2) are continuous values assigned to each frame of the video. Each feature was engineered from either the head-angle measurements or the movement state predictions, respectively. The derived features primarily included correlations, symmetries, oscillatory and entropy-related characteristics, which are further described below.

During the time periods where patients attempted a neutral face-forward head position (annotated by clinical expert), we derived static head angle deviations from face-forward (derived from Mediapipe face mesh tracking). These features were considered ’static’ since the patients are not attempting to make any particular movements, such as looking to the side.

The primary frequency and amplitude of head-angle oscillations (derived from Mediapipe face mesh tracking) along each axis of motion were assessed using a Fourier spectrogram. To isolate the relevant oscillatory signals and remove intended head movements dictated by examination protocol, a bandpass filter with an order 6 Butterworth filter was applied, limiting the frequencies to the range of 2–10 Hz. This frequency range was chosen to remove unrelated movements including low-frequency camera movements or patient swaying, and high-frequency noise associated with the marker tracking. However, we recommend where possible to use a 1 Hz low-cut when these artifacts are not present. These features aim to capture phasic characteristics, such as dystonic jerks and tremors. It is expected that healthy controls will exhibit minimal or no head oscillations in these frequency ranges.

To investigate the interdependence between movement states, we calculated the Pearson correlation between the softmax outputs from the custom-trained CNN. For instance, a high correlation between the prediction probabilities of rotation left and tilt left would indicate that the movement vectors blend or exhibit a certain degree of overlap when the patient rotates left. This suggests that the movement states become ‘entangled’ or ‘intermixed’ during specific actions, as recently suggested in experimental studies15. Healthy controls are expected to show minimal correlations between movement states and head angles, indicating precise and distinct control of head movements. These features aim to capture phenomena such as overflow and complexity, as well as abnormal movement trajectories.

Each axis of head motion can exhibit movement in opposite directions from a neutral face-forward position, e.g., rotation left and rotation right. To quantify the symmetry of each motion axis, we calculated the proportion of time the head was oriented in one direction compared to the opposite direction using the movement state predictions from our custom-trained CNN. For instance, if a patient spent 7 seconds in rotation left and only 3 seconds in rotation right, the symmetry value would be calculated as (7 − 3)/10 = 4/10 = 0.4. Values closer to zero indicate a stronger symmetry, while large positive or negative values indicate a significant asymmetry. These features aim to capture fixed, tonically abnormal head deviations and asymmetrical movement trajectories. Healthy controls are expected to demonstrate a high degree of symmetry in their head movements.

Entropy measures provide a quantitative way to assess the irregularity or complexity of time series data, making them well-suited for capturing the intricate and nonlinear dynamics often observed in dystonic movements59,60. Abnormal movements in dystonia often exhibit both short-term irregularities (e.g., tremor) and long-term temporal patterns (e.g., sustained postures) that are not easily captured by traditional measures. MSE quantifies the complexity and regularity of dystonic movements at different temporal scales. By applying MSE to the time series of head angles (derived from Mediapipe face mesh), a scale-dependent measure of complexity can be obtained, potentially revealing specific temporal patterns or fluctuations associated with disease states or treatment effects. We hypothesize that DBS will increase the regularity and predictability of their movements, indicative of improved motor control.

Harmonics have been previously identified as reliable markers in differential diagnosis of tremors46,61. The strength of the fundamental tremor frequency and its first harmonic (double the fundamental frequency) were calculated using the distance correlation between their instantaneous phases of the head angles derived from Mediapipe face-mesh tracking. The harmonic strengths were determined using the head angles for each axis of motion, respectively. Distance correlations were calculated in Python with the dcor package (version 0.6).

Evaluation of the visual perceptive framework

Performance in evaluating predominant direction and severity of dystonic head deviation was measured by Pearson correlation between the clinically annotated TWSTRS-2 and the head angle excursion along each axis of motion, respectively. Performance in evaluating the tremor component of patients was measured by Pearson correlation between the clinically annotated tremor score and the head angle oscillation amplitude along each axis of motion, respectively.

Statistical analyses

Univariate variable analysis was performed to discover kinematic features that differed (i) between stimulation conditions in cervical dystonia, and between cervical and generalised dystonia. To establish significance, we used either Wilcoxon (when paired between pre- and postoperatively) or Mann–Whitney U-tests, and report p-values adjusted for multiple comparisons (Benjamini–Hochberg false discovery rate correction, FDR). Effect sizes were computed using rank-biserial correlation. To aid interpretation, we ranked variables by their effect sizes. Statistical analyses were done in Python with the statmodels package (0.15.0). Correlation analysis was performed to identify relationships between head angle excursions and annotated scores. Pearson correlations were calculated in Python with the scipy package (1.4.1).