Skip to main content
Marco Antoniotti
  • Department of Informatics, Systems and Communication
    Università degli Studi di Milano Bicocca
    U14
    Viale Sarca 336
    I-20126 Milan (MI), ITALY
The primary goal of the NYU educational robot project is to create a disseminable, multi-functional and inexpensive laboratory course sequence, aimed at improving the practical skills of undergraduate students specializing in robotics,... more
The primary goal of the NYU educational robot project is to create a disseminable, multi-functional and inexpensive laboratory course sequence, aimed at improving the practical skills of undergraduate students specializing in robotics, vision, AI and manufacturing disciplines. The main work-horse of the NYU educational project was chosen to be a multifunctional ED I robot system, consisting of a 4 DOF DD arm and several auxiliary devices. The system was designed to be simple, inexpensive, exible and safe. In this report, we describe the history, design, structure and evaluation of this robot system. We also describe several robotics and related course sequence that can use the ED I system e ectively. We also provide some example experiments that have been run on ED I successfully. This report has bene ted from the labor, contribution, discussions, advice and criticisms of several people on the ED I project team and the credit for the nal product goes to the entire team. ED I Project Team: M. Antoniotti, A-B. Cen, R. Even, I. Greenfeld, L. Gurvits, F. Hansen, A. Rajkumar, C. Li, J. Li, Z-X. Li, B. Mishra, S. Mallat, E. Pavlakos, J. Schwartz, N. Silver and R. Wallace Supported by NSF Grant #CDA-9018673. Address: Robotics Research Laboratory, Courant Institute of Mathematical Sciences, New York University, 715/719 Broadway, 12th Floor, New York, NY-10003. 1 ED I: NYU Educational Robot Design and Evaluation Bud Mishra &Marco Antoniotti With contributions from: F. Hansen, N. Silver, R. Wallace and R. Even Robotics & Manufacturing Research Lab Courant Institute, New York University
Abstract Describes the authors' experience with a prototype system capable of synthesizing supervisor controller programs based largely on the theory of discrete event systems (DES) first proposed by Ramadge and Wonham... more
Abstract Describes the authors' experience with a prototype system capable of synthesizing supervisor controller programs based largely on the theory of discrete event systems (DES) first proposed by Ramadge and Wonham (1987). The authors augment the theory by also allowing continuous time trajectories modeling transitions between events. The authors illustrate their approach by an example, the discrete control of a walking machine, which poses some challenges on the applicability of the theory and finally, discuss some ...
Background The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis.Introduction The combination of data... more
Background The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis.Introduction The combination of data integration and machine learning approaches can provide new powerful instruments to tackle the complexity of cancer development and deliver effective diagnostic and prognostic strategies.Methods We explore the possibility of exploiting the topological properties of sample-specific metabolic networks as features in a supervised classification task. Such networks are obtained by projecting transcriptomic data from RNA-seq experiments on genome-wide metabolic models to define weighted networks modeling the overall metabolic activity of a given sample.Results We show the classification results on a labeled breast cancer dataset from the TCGA database, including 210 samples (cancer vs. normal). In particular, we investigate how the performance is affected by a threshold-based pruning of the networks by comparing Artificial Neural Networks, Support Vector Machines and Random Forests. Interestingly, the best classification performance is achieved within a small threshold range for all methods, suggesting that it might represent an effective choice to recover useful information while filtering out noise from data. Overall, the best accuracy is achieved with SVMs, which exhibit performances similar to those obtained when gene expression profiles are used as features.Conclusion These findings demonstrate that the topological properties of sample-specific metabolic networks are effective in classifying cancer and normal samples, suggesting that useful information can be extracted from a relatively limited number of features.
ABSTRACT Biology thrives on complexity, and yet our approaches to deciphering complex biological systems have been simple, observational, reductionist, and qualitative. The observational nature of biology may even seem self-evident, as... more
ABSTRACT Biology thrives on complexity, and yet our approaches to deciphering complex biological systems have been simple, observational, reductionist, and qualitative. The observational nature of biology may even seem self-evident, as expressed more than three centuries ago by Robert Hooke, whose work Micrographia of 1665 contained his microscopical investigations that included the first identification of biological cells: “The truth is, the science of Nature has already been too long made only a work of the brain and the fancy. It is now high time that it should return to the plainness and soundness of observations on material and obvious things.” As we begin to observe, infer, and list the fundamental “parts” out of which biology is created, we cannot stop marveling at how these same components and their variants and homologues interconnect, intertwine, and interact via universal principles that still remain to be fully deciphered. To unravel this biological complexity, of which we only have a hint so far, it has become necessary to develop novel tools and approaches that augment and rigorously formalize those human reasoning processes—tools that until now could be used for only tiny toy-like subsystems in biology. To this end, the anticipated computational systems biology tools aim to draw upon constructive mathematical approaches developed in the context of dynamical systems, kinetic analysis, computational theory, and logic. The resulting toolkit aspires to build powerful simulation, analysis, and reasoning facilities that can be used by working biologists for multiple purposes: in making sense of existing data, in devising new experiments, and ultimately in understanding functional properties of genomes, proteomes, cells, organs, and organisms. If this ambitious program is to ultimately succeed, there are certain critical components that require special attention of computer scientists and applied mathematicians. This chapter studies the nature of these components, software architecture for integrating them, and illustrative examples of how such an integrated system may function in practice.
Abstract We address the problem of the synthesis of controller programs for a variety of robotics and manufacturing tasks. The problem we choose for the test and illustrative purposes is the standard “walking machine problem”, a... more
Abstract We address the problem of the synthesis of controller programs for a variety of robotics and manufacturing tasks. The problem we choose for the test and illustrative purposes is the standard “walking machine problem”, a representative instance of a real hybrid problem with both logical/discrete and continuous properties and strong mutual influence without any reasonable separation. We aim to produce a “compiler technology” for this class of problems in a manner analogous to the development of the so-called “silicon ...
A current challenge in cancer research is the development of therapeutic strategies aimed at reducing the toxicity of treatments, since Adverse Events (AEs) typically cause substantial problems and long-term damages to the patients. A... more
A current challenge in cancer research is the development of therapeutic strategies aimed at reducing the toxicity of treatments, since Adverse Events (AEs) typically cause substantial problems and long-term damages to the patients. A possible solution to this issue lies in the personalization of therapy dosages according to demographic factors and in the employment of optimized data-driven drug administration protocols. Control theory can be exploited to this end, as its application in pharmacology allows to define optimized dosages and schedules, aimed at minimizing AEs and maximizing the therapy efficacy. However, an effective application of control theory approaches to this issue is constrained by our ability in inferring the parameters of the mathematical models from currently available data.We here present a closed-loop optimization framework of patient-specific pharmacokinetics (PK) and pharmacodynamics (PD) models, combined with a mathematical model of a liquid tumor, which aims at overcoming such limitations. The most relevant feature of our framework is the ability to learn the value of patient-specific parameters via a Bayesian update, by exploiting a feedback signal obtained monitoring the tumor burden dynamics of the patient. Our framework employs CasADi, an open-source tool for nonlinear optimization, and guarantees a good and robust numerical estimation of the optimized schedule and a parsimonious use of computational time.As a case study, we present the application of our framework to Tyrosine Kinase Inhibitor administration in Chronic Myeloid Leukemia (CML), in which we show that our optimized protocols result in a faster decay of CSCs and in a reduction of the overall toxicity.
By leveraging the ever-increasing availability of cancer omics data and the continuous advances in cancer data science and machine learning, we have discovered the existence of cancer type-specificevolutionary signaturesassociated with... more
By leveraging the ever-increasing availability of cancer omics data and the continuous advances in cancer data science and machine learning, we have discovered the existence of cancer type-specificevolutionary signaturesassociated with different disease outcomes. These signatures represent “favored trajectories” of acquisition of driver mutations that are repeatedly detected in patients with similar prognosis. In this work, we present a novel framework named ASCETIC (Agony-baSedCancerEvoluTion InferenCe) that extracts such signatures from NGS experiments generated by different technologies such as bulk and single-cell sequencing data. In our study, we applied ASCETIC to (i) single-cell sequencing data from 146 patients with distinct myeloid malignancies and bulk whole-exome sequencing data from 366 acute myeloid leukemia patients, (ii) multi-region sequencing data from 100 early-stage lung cancer patients from the TRACERx project, (iii) whole-exome/genome sequencing data from more t...
Background Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational... more
Background Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational profiles of single cells generated via variant calling, in order to reconstruct the evolutionary history of a tumor and characterize the impact of therapeutic strategies, such as the administration of drugs. To this end, we have recently developed the LACE framework for the Longitudinal Analysis of Cancer Evolution. Results The LACE 2.0 release aimed at inferring longitudinal clonal trees enhances the original framework with new key functionalities: an improved data management for preprocessing of standard variant calling data, a reworked inference engine, and direct connection to public databases. Conclusions All of this is accessible through a new and interactive Shiny R graphical interface offering the possibility to apply filters helpful in discri...
Mass Spectrometry (MS)-based technologies represent a pro mising area of research in clinical analysis. They are primarily concerned with measuring the relative intens ity (abundance) of many protein/peptide molecules associated with... more
Mass Spectrometry (MS)-based technologies represent a pro mising area of research in clinical analysis. They are primarily concerned with measuring the relative intens ity (abundance) of many protein/peptide molecules associated with their mass-to-charge ratios over a particu lar range of molecular masses. These measurements (generally referred as proteomic signalsor spectra) constitute a huge amount of information which requires adequate tools to be investigated and interpreted. Followi ng the methodology for testing hypotheses, we investigate theproteomic signalsof the most common type of Renal Cell Carcinoma, the Cl ar Cell variant (ccRCC). Specifically, the aim of our investigation is to det ect changes of the signal correlations from control to case group (ccRCC or non–ccRCC). To this end, we sample and represent each population group through a graph providing, as it will be defined below, the observed signal correlation structure . This way, graphs establish abstract frames of...
Graphical abstract
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Combined analysis of chromosomal instabilities and gene expression for colon cancer progression inference
c © Giulio Caravagna et al. Analysis of the spatial and dynamical properties of a multiscale model of intestinal crypts
technologies provide new opportunities at the interface between basic biological research and medical practice. The unprecedented completeness, accuracy, and volume of genomic and molecular data necessitate a new kind of computational... more
technologies provide new opportunities at the interface between basic biological research and medical practice. The unprecedented completeness, accuracy, and volume of genomic and molecular data necessitate a new kind of computational biology for translational research. Key challenges are standardization of data capture and communication, organization of easily accessible repositories, and algorithms for integrated analysis based on heterogeneous sources of information. Also required are new ways of using complementary clinical and biological data, such as computational methods for predicting disease phenotype from molecular and genetic profiling. New combined experimental and computational methods hold the promise of more accurate diagnosis and prognosis as well as more effective prevention and therapy.
Background: The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis. Introduction: The combination of data... more
Background: The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis. Introduction: The combination of data integration and machine learning approaches can provide new powerful instruments to tackle the complexity of cancer development and deliver effective diagnostic and prognostic strategies. Methods: We explore the possibility of exploiting the topological properties of sample-specific metabolic networks as features in a supervised classification task. Such networks are obtained by projecting transcriptomic data from RNA-seq experiments on genome-wide metabolic models to define weighted networks modeling the overall metabolic activity of a given sample. Results: We show the classification results on a labeled breast cancer dataset from the TCGA database, including 210 samples (cancer vs. normal). In particular, we investigate how the performance is affe...
Svetlana Cojocaru (in honour of her 60th anniversary) . . . . . . . . . . . . . . . . . 111 ... Gheorghe P˘aun, Mario J. Pérez-Jiménez Languages and P Systems: Recent Developments . . . . . . . . . . . . . . . . . . . . . . 112 ... Artiom... more
Svetlana Cojocaru (in honour of her 60th anniversary) . . . . . . . . . . . . . . . . . 111 ... Gheorghe P˘aun, Mario J. Pérez-Jiménez Languages and P Systems: Recent Developments . . . . . . . . . . . . . . . . . . . . . . 112 ... Artiom Alhazov, Marco Antoniotti, Rudolf Freund, Alberto Leporati, Giancarlo Mauri Self-Stabilization in Membrane Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 ... Ana-Maria Suduc, Mihai Bızoi, Florin Gheorghe Filip Usability in Scientific Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 ... Horia-Nicolai L. Teodorescu, Mariana Rusu ...
We outline the features of the R package SparseSignatures and its application to determine the signatures contributing to mutation profiles of tumor samples. We describe installation details and illustrate a step-by-step approach to (1)... more
We outline the features of the R package SparseSignatures and its application to determine the signatures contributing to mutation profiles of tumor samples. We describe installation details and illustrate a step-by-step approach to (1) pre- pare the data for signature analysis, (2) determine the optimal parameters, and (3) employ them to determine the signatures and related exposure levels in the point mutation dataset. For complete details on the use and execution of this protocol, please refer to Lal et al. (2021).
This document presents a new set of portable type specifiers that can be used to improve the "precision" of type declarations in Common Lisp numerical code.
Many large national and transnational studies have been dedicated to the analysis of SARS-CoV-2 genome, most of which focused on missense and nonsense mutations. However, approximately 30% of the SARS-CoV-2 variants are synonymous,... more
Many large national and transnational studies have been dedicated to the analysis of SARS-CoV-2 genome, most of which focused on missense and nonsense mutations. However, approximately 30% of the SARS-CoV-2 variants are synonymous, therefore changing the target codon without affecting the corresponding protein sequence.By performing a large-scale analysis of sequencing data generated from almost 400,000 SARS-CoV-2 samples, we show that silent mutations increasing the similarity of viral codons to the human ones tend to fixate in the viral genome over-time. This indicates that SARS-CoV-2 codon usage is adapting to the human host, likely improving its effectiveness in using the human aminoacyl-tRNA set through the accumulation of deceitfully neutral silent mutations.One-Sentence SummarySynonymous SARS-CoV-2 mutations related to the activity of different mutational processes may positively impact viral evolution by increasing its adaptation to human codon usage.
In the definition of fruitful strategies to contrast the worldwide diffusion of SARS-CoV-2, maximum efforts must be devoted to the early detection of dangerous variants. An effective help to this end is granted by the analysis of deep... more
In the definition of fruitful strategies to contrast the worldwide diffusion of SARS-CoV-2, maximum efforts must be devoted to the early detection of dangerous variants. An effective help to this end is granted by the analysis of deep sequencing data of viral samples, which are typically discarded after the creation of consensus sequences. Indeed, only with deep sequencing data it is possible to identify intra-host low-frequency mutations, which are a direct footprint of mutational processes that may eventually lead to the origination of functionally advantageous variants. Accordingly, a timely and statistically robust identification of such mutations might inform political decision-making with significant anticipation with respect to standard analyses based on con-sensus sequences.To support our claim, we here present the largest study to date of SARS-CoV-2 deep sequencing data, which involves 220,788 high quality samples, collected over 20 months from 137 distinct studies. Importa...
The rise of longitudinal single-cell sequencing experiments on patient-derived cell cultures, xenografts and organoids is opening new opportunities to track cancer evolution, assess the efficacy of therapies and identify resistant... more
The rise of longitudinal single-cell sequencing experiments on patient-derived cell cultures, xenografts and organoids is opening new opportunities to track cancer evolution, assess the efficacy of therapies and identify resistant subclones. We introduce LACE, the first algorithmic framework that processes single-cell mutational profiles from samples collected at different time points to reconstruct longitudinal models of cancer evolution. The approach maximizes a weighted likelihood function computed on longitudinal data points to solve a Boolean matrix factorization problem, via Markov chain Monte Carlo sampling. On simulations, LACE outperforms state-of-the-art methods for both bulk and single-cell sequencing data with respect to the reconstruction of the ground-truth clonal phylogeny and dynamics, also in conditions of unbalanced datasets, significant rates of sequencing errors and sampling limitations. As the results are robust with respect to data-specific errors, LACE is effective with mutational profiles generated by calling variants from (full-length) scRNA-seq data, and this allows one to investigate the relation between genomic and phenotypic evolution of tumors at the single-cell level. Here, we apply LACE to a longitudinal scRNA-seq dataset of patient-derived xenografts of BRAF V600E/K mutant melanomas, dissecting the impact of BRAF/MEK-inhibition on clonal evolution, also in terms of clone-specific gene expression dynamics. Furthermore, the analysis of breast cancer PDXs from longitudinal targeted scDNA-sequencing experiments delivers a high-resolution temporal characterization of intra-tumor heterogeneity.
It is well known that tumors originating from the same tissue have different prognosis and sensitivity to treatments. Over the last decade, cancer genomics consortia like the Cancer Genome Atlas (TCGA) have been generating thousands of... more
It is well known that tumors originating from the same tissue have different prognosis and sensitivity to treatments. Over the last decade, cancer genomics consortia like the Cancer Genome Atlas (TCGA) have been generating thousands of cross-sectional data, for thousands of human primary tumors originated from various tissues. Thanks to that public database, it is today possible to analyze a broad range of relevant information such as gene sequences, expression profiles or metabolite footprints, to capture tumor molecular heterogeneity and improve patient stratification and clinical management. To this aim, it is common practice to analyze datasets grouped into clusters based on clinical observations and/or molecular features. However, the identification of specific properties of each cluster that may be effectively targeted by therapeutic drugs still represents a challenging task. We define a method to generate an activity score for the metabolic reactions of different clusters of ...
Matters Arising from: Sharma, A., Cao, E.Y., Kumar, V. et al. Longitudinal single-cell RNA sequencing of patient-derived primary cells reveals drug-induced infidelity in stem cell hierarchy. Nat Commun9, 4931 (2018).... more
Matters Arising from: Sharma, A., Cao, E.Y., Kumar, V. et al. Longitudinal single-cell RNA sequencing of patient-derived primary cells reveals drug-induced infidelity in stem cell hierarchy. Nat Commun9, 4931 (2018). https://doi.org/10.1038/s41467-018-07261-3.In Sharma, A. et al. Nat Commun9, 4931 (2018) the authors employ longitudinal single-cell transcriptomic data from patient-derived primary and metastatic oral squamous cell carcinomas cell lines, to investigate possible divergent modes of chemo-resistance in tumor cell subpopulations. We integrated the analyses presented in the manuscript, by performing variant calling from scRNA-seq data via GATK Best Practices. As a main result, we show that an extremely high number of singlenucleotide variants representative of the identity of a specific patient is unexpectedly found in the scRNA-seq data of the cell line derived from a second patient, and vice versa. This finding likely suggests the existence of a sample swap, thus jeopardi...
Abstract In this paper we describe an application of a prototype system that combines synthesis and verification techniques, capable of building discrete controller software for a variety of robotics and manufacturing tasks. We developed... more
Abstract In this paper we describe an application of a prototype system that combines synthesis and verification techniques, capable of building discrete controller software for a variety of robotics and manufacturing tasks. We developed and used the CONTROL-D tool to specify the requirements of a real life example: a tray pack line built for the Combat Ration Advanced Manufacturing Technology Demonstration of Rutgers University

And 133 more

A key task of genomic surveillance of infectious viral diseases lies in the early detection of dangerous variants. Unexpected help to this end is provided by the analysis of deep sequencing data of viral samples, which are typically... more
A key task of genomic surveillance of infectious viral diseases lies in the early detection of dangerous variants. Unexpected help to this end is provided by the analysis of deep sequencing data of viral samples, which are typically discarded after creating consensus sequences. Such analysis allows one to detect intra-host low-frequency mutations, which are a footprint of mutational processes underlying the origination of new variants. Their timely identification may improve public-health decision-making with respect to traditional approaches exploiting consensus sequences. We present the analysis of 220,788 high-quality deep sequencing SARS-CoV-2 samples, showing that many spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta, and Omicron, might have been intercepted several months in advance. Furthermore, we show that a refined genomic surveillance system leveraging deep sequencing data might allow one to pinpoint emerging mutation patterns, providing an automated data-driven support to virologists and epidemiologists.
Many large national and transnational studies have been dedicated to the analysis of SARS-CoV-2 genome, most of which focused on missense and nonsense mutations. However, approximately 30% of the SARS-CoV-2 variants are synonymous,... more
Many large national and transnational studies have been dedicated to the analysis of SARS-CoV-2 genome, most of which focused on missense and nonsense mutations. However, approximately 30% of the SARS-CoV-2 variants are synonymous, therefore changing the target codon without affecting the corresponding protein sequence.
By performing a large-scale analysis of sequencing data generated from almost 400,000 SARS-CoV-2 samples, we show that silent mutations increasing the similarity of viral codons to the human ones tend to fixate in the viral genome over-time. This indicates that SARS-CoV-2 codon usage is adapting to the human host, likely improving its effectiveness in using the human aminoacyl-tRNA set through the accumulation of deceitfully neutral silent mutations.
Matters Arising from: Sharma, A., Cao, E.Y., Kumar, V. et al. Longitudinal single-cell RNA sequencing of patient-derived primary cells reveals drug-induced infidelity in stem cell hierarchy. Nat Commun 9, 4931 (2018).... more
Matters Arising from: Sharma, A., Cao, E.Y., Kumar, V. et al. Longitudinal single-cell RNA sequencing of patient-derived primary cells reveals drug-induced infidelity in stem cell hierarchy. Nat Commun 9, 4931 (2018). https://doi.org/10.1038/s41467-018-07261-3. In Sharma, A. et al. Nat Commun 9, 4931 (2018) the authors employ longitudinal single-cell transcriptomic data from patient-derived primary and metastatic oral squamous cell carcinomas cell lines, to investigate possible divergent modes of chemo-resistance in tumor cell subpopulations. We integrated the analyses presented in the manuscript, by performing variant calling from scRNA-seq data via GATK Best Practices. As a main result, we show that an extremely high number of single-nucleotide variants representative of the identity of a specific patient is unexpectedly found in the scRNA-seq data of the cell line derived from a second patient, and vice versa. This finding likely suggests the existence of a sample swap, thus jeopardizing some of the translational conclusions of the article. Our results prove the efficacy of a joint analysis of the genotypic and transcriptomic identity of single-cells.
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which improves over phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic... more
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which improves over phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6726 Amplicon and RNA-seq samples refines the estimation of SARS-CoV-2 evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity, and highlights variants transiting from minor to clonal state and homoplastic variants, some of which falling on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.
The rise of longitudinal single-cell sequencing experiments on patient-derived cell cultures, xenografts and organoids is opening new opportunities to track cancer evolution, assess the efficacy of therapies and identify resistant... more
The rise of longitudinal single-cell sequencing experiments on patient-derived cell cultures, xenografts and organoids is opening new opportunities to track cancer evolution, assess the efficacy of therapies and identify resistant subclones. We introduce LACE, the first algorithmic framework that processes single-cell mutational profiles from samples collected at different time points to reconstruct longitudinal models of cancer evolution. The approach maximizes a weighted likelihood function computed on longitudinal data points to solve a Boolean matrix factorization problem, via Markov chain Monte Carlo sampling. On simulations, LACE outperforms state-of-the-art methods for both bulk and single-cell sequencing data with respect to the reconstruction of the ground-truth clonal phylogeny and dynamics, also in conditions of unbalanced datasets, significant rates of sequencing errors and sampling limitations. As the results are robust with respect to data-specific errors, LACE is effective with mutational profiles generated by calling variants from (full-length) scRNA-seq data, and this allows one to investigate the relation between genomic and phenotypic evolution of tumors at the single-cell level. Here, we apply LACE to a longitudinal scRNA-seq dataset of patient-derived xenografts of BRAF V600E/K mutant melanomas, dissecting the impact of BRAF/MEK-inhibition on clonal evolution, also in terms of clone-specific gene expression dynamics. Furthermore, the analysis of breast cancer PDXs from longitudinal targeted scDNA-sequencing experiments delivers a high-resolution temporal characterization of intra-tumor heterogeneity.
The metabolic processes related to the synthesis of the molecules needed for a new round of cell division underlie the complex behaviour of cell populations in multi-cellular systems , such as tissues and organs, whereas their... more
The metabolic processes related to the synthesis of the molecules needed for a new round of cell division underlie the complex behaviour of cell populations in multi-cellular systems , such as tissues and organs, whereas their deregulation can lead to pathological states, such as cancer. Even within genetically homogeneous populations, complex dynamics, such as population oscillations or the emergence of specific metabolic and/or proliferative patterns, may arise, and this aspect is highly amplified in systems characterized by extreme heterogeneity. * Also affiliated at: Fondazione IRCCS Istituto Nazionale dei Tumori, To investigate the conditions and mechanisms that link metabolic processes to cell population dynamics, we here employ a previously introduced multi-scale model of multi-cellular system, named FBCA (Flux Balance Analysis with Cellular Automata), which couples biomass accumulation , simulated via Flux Balance Analysis of a metabolic network, with the simulation of population and spatial dynamics via Cellular Potts Models. In this work, we investigate the influence that different modes of nutrients diffusion within the system may have on the emerging behaviour of cell populations. In our model, metabolic communication among cells is allowed by letting secreted metabolites to diffuse over the lattice, in addition to diffusion of nutrients from given sources. The inclusion of the diffusion processes in the model proved its effectiveness in characterizing plausible biological scenarios.
Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel... more
Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.
The Spatial Processes package enables an explicit definition of a spatial environment on top of the normal dynamic modeling SBML capabilities. The possibility of an explicit representation of spatial dynamics increases the representation... more
The Spatial Processes package enables an explicit definition of a spatial environment on top of the normal dynamic modeling SBML capabilities. The possibility of an explicit representation of spatial dynamics increases the representation power of SBML. In this work we used those new SBML features to define an extensive model of colonic crypts composed of the main cellular types (from stem cells to fully differentiated cells), alongside their spatial dynamics.
The emergence and development of cancer is a consequence of the accumulation over time of genomic mutations involving a specific set of genes, which provides the cancer clones with a functional selective advantage. In this work, we model... more
The emergence and development of cancer is a consequence of the accumulation over time of genomic mutations involving a specific set of genes, which provides the cancer clones with a functional selective advantage. In this work, we model the order of accumulation of such mutations during the progression, which eventually leads to the disease, by means of probabilistic graphic models, i.e., Bayesian Networks (BNs). We investigate how to perform the task of learning the structure of such BNs, according to experimental evidence, adopting a global optimization meta-heuristics. In particular, in this work we rely on Genetic Algorithms, and to strongly reduce the execution time of the inference—which can also involve multiple repetitions to collect statistically significant assessments of the data—we distribute the calculations using both multi-threading and a multi-node architecture. The results show that our approach is characterized by good accuracy and specificity; we also demonstrate its feasibility, thanks to a 84× reduction of the overall execution time with respect to a traditional sequential implementation.
Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is further complicated by many theoretical issues, such as the I-equivalence among different structures. In this work, we focus on a specific subclass of BNs,... more
Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is further complicated by many theoretical issues, such as the I-equivalence among different structures. In this work, we focus on a specific subclass of BNs, named Suppes-Bayes Causal Networks (SBCNs), which include specific structural constraints based on Suppes’ probabilistic causation to efficiently model cumulative phenomena. Here we compare the performance, via extensive simulations, of various state-of-the-art search strategies, such as local search techniques and Genetic Algorithms, as well as of distinct regularization methods. The assessment is performed on a large number of simulated datasets from topologies with distinct levels of complexity, various sample size and different rates of errors in the data. Among the main results, we show that the introduction of Suppes’ constraints dramatically improve the inference accuracy, by reducing the solution space and providing a temporal ordering on the variables. We also report on trade-offs among different search techniques that can be efficiently employed in distinct experimental settings. This manuscript is an extended version of the paper “Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena” presented at the 2018 International Conference on Computational Science.