WO2022082199A1

WO2022082199A1 - Method for detecting amyotrophic lateral sclerosis

Info

Publication number: WO2022082199A1
Application number: PCT/US2021/071865
Authority: WO
Inventors: Sean James MILLER; Brian G. Williams; Robert Logan; Nicolas A. SCHCOLNICOV
Original assignee: Pluripotent Diagnostics Corp.
Priority date: 2020-10-16
Filing date: 2021-10-14
Publication date: 2022-04-21

Abstract

The inventors have identified 23 genomic loci and established that a majority of amyotrophic lateral sclerosis (ALS) patients have mutations in at least one of the target loci. The present invention is directed to a method for detecting ALS in a subject, comprising the step of: (a) amplifying DNA extracted from a biological sample of a subject by target-specific polymerase chain reaction to amplify specific genomic loci comprising 23 specific chromosome positions of chrl:25854953, chrl:3624870, chr3:158557839, chr3:185543848, chr3: 186923875, chr4: 17685198, chr4: 180358067, chr5:53655366, chr5: 82813472, chr5:94666955, chr7:5338617, chr8: 62196626, chr9:71428255, chr9:89866631, chr9: 130224292, chrlO: 119712877, chrlO: 119712899, chrl2:82295320, chrl5:25687571, chrl 5:74926032, chrl7:2562894, chrl7:40390624, and chr22:41330858, (b) purifying, and sequencing the amplified DNA; (c) analyzing each of the amplified DNA sequences and comparing with its corresponding DNA sequence of the normal genomic loci, (d) identifying one or more mutations, if present, at the 23 chromosome positions, and (e) detecting ALS in the subject if the 23 chromosome positions have one or more mutations.

Description

METHOD FOR DETECTING AMYOTROPHIC LATERAL SCLEROSIS

REFERENCE TO SEQUENCE LISTING, TABLE OR COMPUTER PROGRAM

The Sequence Listing is concurrently submitted herewith with the specification as an ASCII formatted text file via EFS-Web with a file name of Sequence Listing.txt with a creation date of October 13, 2021, and a size of 8.16 kilobytes. The Sequence Listing filed via EFS-Web is part of the specification and is hereby incorporated in its entirety by reference herein.

FIELD OF THE INVENTION

The present invention relates to methods for detecting amyotrophic lateral sclerosis. Each method comprises sequencing 16 target genes or 23 target genomic loci from a biological sample of a subject, and identifying one or more mutations such as single nucleotide polymorphisms or insertions/deletions , if present, in the 16 target genes or 23 target genomic loci.

BACKGROUND

Amyotrophic lateral sclerosis (ALS) is clinically characterized by the loss and degeneration of motor neurons¹. Patients typically live 2-5 years after diagnosis². To date, there remains no cure or genetic test that can pre-determine ALS in a majority of patients³. On a genetic level, prior evidence supports the notion that ALS is a multifactorial disorder, with many genes and cell types influencing the disease⁴'⁷. Currently, there are over 50 known ALS-linked genes. However, these known associated genes are found mutated in less than 10 percent of the total ALS population, with most patients displaying a sporadic etiology⁸’⁹.

ALS cases can be grouped by two categories: familial ALS (fALS), where the patient has a genetically related family member also affected, and sporadic ALS (sALS), where the patient has no family history of ALS⁹. Historically, 5-10% of cases are fALS, and the other 90-95% cases are sALS. In the past ten years, the C9ORF72 hexanucleotide repeat expansion, has been identified as the most prevalent genomic mutation found in the ALS disease population¹³. C9ORF72 repeat expansions can be found in up to 34% of fALS and 5% of sALS cases. High associations with C9ORF72 to the pathologically-related neurodegenerative disorder, Frontotemporal Dementia (FTD), have also been shown¹⁷. There is a need to develop a method to detect ALS early-on and elucidate the pathogenesis of both familial and sporadic ALS.

BRIEF DESCRIPTION OF THE DRAWINGS AND TABLES

FIG. 1 : Distribution of SNPs present only in the 338 ALS sample. rs767982303 and rs760890146 (SNIP IDs) are each in greater than 25% of the ALS population. The dots represent percentage of the ALS sample for each selected SNP with the 68.5% Confidence Level (CL) Clopper-Pearson interval on the true binomial proportion. The grey area represents the range of the possible percentage in the healthy population, with a 95% CL Clopper-Pearson interval.

FIG. 2: SNPs that are not mutated in the control sample. The number of ALS cases out of the overall 338 patient cohort, percent of total ALS cases with the 99% CL Clopper- Pearson interval, and p-value.

FIG. 3: Distribution of mutated genes found only in the 338 ALS sample. Dots represent percentage of the ALS group for each selected gene. The grey area represents an upper-bound on the potential false-positive percentage in the healthy population. This upper bound is set via the 99% CL Clopper-Pearson interval on the binomial proportion. MIR7155 mutations are detected in 51% of the ALS cohort.

FIG. 4: 16 genes that are not mutated in the control sample. The number of ALS cases out of the 338-patient cohort, number of unique SNPs, percent of total ALS cases, and p-value with the 99% CL Clopper-Pearson interval are shown, respectively.

FIGs. 5A-5B: Classifier Analysis using candidate ALS-only mutated genes. Selecting patients with three or more genes mutated of the 16 candidate genes yields a falsepositive rate less than 0.1% and false-negative rate less than 59% at 99% CL. 52% of the ALS cases have at least three of the 16 candidate genes mutated. (5 A) The distribution of the number of genes out of the top 16 candidates found in each of the 338 ALS cases. (5B) The percentage of ALS cases with at least the given number of genes mutated from the candidate list (light). The maximum false positive rate at 99% CL (dark).

FIG. 6: Distribution of candidate ALS-only mutated genes and probability of having ALS based on number of mutations. The distribution of the number of genes out of the top 22 candidates found in each of the 713 ALS cases is shown in grey. The probability of having ALS and the probability of not having ALS is represented is shown. DETAILED DESCRIPTION OF THE INVENTION

Definitions

A “locus” is a specific, fixed position on a chromosome where a particular gene or genetic marker is located.

A “single nucleotide polymorphism” (SNP) is a germline substitution of a single nucleotide at a specific position in the genome. For example, at a specific base position in the human genome, the G nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations - G or A - are the alleles for this specific position.

Amyotrophic Lateral Sclerosis (ALS), a multifactorial neurodegenerative disorder, is widely characterized with the degeneration of motor neurons and neuro-inflammation. Currently, no cures or genetic tests are known, that can diagnose and classify all forms of ALS.

The present invention identifies a set of mutations in genomic-coding regions that are present in ALS patients but not in healthy control samples. The present invention provides methods to detect and diagnose ALS before clinical and pathological onset, which is imperative to prolonging patient lifespan, understanding the pathobiology, and designing therapies for early intervention.

The inventors compute and analyze large datasets of genomes of over 1,500 ALS disease patients and healthy controls. The inventors unravel mutations such as single nucleotide polymorphisms (SNPs) and Indels (insertions and deletions) in gene-coding and inter-genic regions that are associated with ALS disease diagnosis and always absent in healthy control patients.

To identify novel genetic associations to ALS, the inventors have analyzed nextgeneration genomic sequencing data from two cohorts of ALS and healthy controls from the Answer ALS Consortium. In doing so, the inventors discover mutations in protein-coding genes that have not been associated with ALS previously.

This application provides evidence that the detection of selected mutations such as SNPs and indels with genetic sequencing can be correlated with the pathobiology of ALS in a significant percentage of cases. These genetic biomarkers can be used as an early ALS disease diagnostic tool with a rapid and non-invasive technique. The present invention is directed to methods for detecting amyotrophic lateral sclerosis in a subject by detecting one or more mutations in specific genes or gene loci.

The First Aspect

In the first aspect of the invention, the inventors have discovered that 16 target genes of the human genome, MIR7155, NPM1P49, RP11-20B24.3, HNRNPA1P44, OXR1, H2AFZP1, TAB3P1, RPL5P35, ZNF92P2, CIR1P3, GNAI2, CCDC42, RP11-370110.6, ADIPOR1P1, KIAA1841, and AC008074.4, are important for detecting ALS.

The invention provides a method for detecting ALS in a subject, comprising obtaining a biological sample from a subject, and from the sample, detecting one or more mutations in 16 target genes selected from the groups consisting of: MIR7155, NPM1P49, RP11-20B24.3, HNRNPA1P44, OXR1, H2AFZP1, TAB3P1, RPL5P35, ZNF92P2, CIR1P3, GNAI2, CCDC42, RP11-370110.6, ADIPOR1P1, KIAA1841, and AC008074.4.

In one embodiment, the method comprising the steps of: (a) sequencing 16 target genes from a biological sample of a human subject, wherein the target genes are MIR7155, NPM1P49, RP11-20B24.3, HNRNPA1P44, OXR1, H2AFZP1, TAB3P1, RPL5P35, ZNF92P2, CIR1P3, GNAI2, CCDC42, RP11-370110.6, ADIPOR1P1, KIAA1841, and AC008074.4, (b) comparing each of the DNA sequences of the 16 target genes with its corresponding normal genes, (c) identifying one or more mutations such as SNPs, if present, in each of the DNA sequences of the 16 target genes, and (d) detecting amyotrophic lateral sclerosis in the subject if at least one of the 16 target genes has one or more mutations. With at least one target gene found mutated, 67% to 80%, at 99% CL (C-P), of ALS can be detected, with a false positive rate less than 9.5% at 99% confidence level (Clopper Pearson).

In a preferred method, ALS is detected in the subject if at least two of the 16 target genes have one or more mutations. With at least two target genes mutated, 50% to 64%, at 99% CL, of ALS can be detected, with a false positive rate less than 0.9% at 99% CL.

In a further preferred method, ALS is detected in the subject if at least three of the 16 target genes have one or more mutations. With at least three target genes mutated, 45% to 59%, at 99% CL, of ALS can be detected, with a false positive rate less than 0.09% at 99% CL.

In step (a) of the method, the DNA is first extracted from a biological sample of a human subject. For example, the biological sample is blood (such as peripheral whole blood), a tissue sample (such as fibroblast (skin) biopsy, or a mucosal sample), or any cell derived from the patient of a human subject. Method for extracting DNA from a biological sample is well-known to a person skilled in the art. For Example, see protocols for extracting DNAs from blood from Thermo Fisher product sheet catalog CS11040.

The DNA extracted from the biological sample of the human subject is then performed target-specific amplification and target-specific sequencing to sequence each of the 16 target genes: MIR7155, NPM1P49, RP11-20B24.3, HNRNPA1P44, OXR1, H2AFZP1, TAB3P1, RPL5P35, ZNF92P2, CIR1P3, GNAI2, CCDC42, RP11-370110.6, ADIPORIP 1, KIAA1841, and AC008074.4. Whole genome sequencing (WGS), which is a genomic technique for sequencing all the protein-coding regions of genes in a genome, is not performed in this method.

The procedures of target-specific sequencing of DNA libraries from human genomic DNA extracted from a biological sample are known to a person skilled in the art. A range of sequencing platforms can be used, such as PacBio Sequencing (Rhoads & Au, Genomics, Proteomics and Bioinformatics, 13(5), 278-289, 2015), Oxford Nanopore (Jain, et al, Genome Biology, 17(1), 1-11, 2016), Ilumina, or lOx Genomics (Zheng et al., Nature Biotechnology, 34(3), 303-311, 2016). For example, see Product Instruction Sheet of NextSeq™ 550Dx Instrument (Illumina)

In step (b), each of the DNA sequences of the 16 specific target genes is compared with its corresponding reference gene sequence. Targeted gene data are processed through an automated pipeline to perform read alignment and mutation analysis including variants such as SNPs, indels and substitutions in either introns, exons or both. In one embodiment, paired- end 150bp reads are aligned to the GRCh38 human reference using the Burrows-Wheeler Aligner (BWA-MEM) and processed using the GATK best-practices workflow that includes marking of duplicate reads by the use of Picard tools, local realignment around indels, and base quality score recalibration (BQSR) via Genome Analysis Toolkit (GATK). See “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data”, McKenna A, et al, 2010 GENOME RESEARCH 20: 1297-303.

In step (c), single nucleotide variant analysis is performed to identify one or more mutations, if present, in each of the DNA sequences of the 16 target genes.

Variant discovery is a two-step process. HaplotypeCaller is run on each sample separately in gVCF mode (GATK v3.5). This produces an intermediate file format called gVCF (genomic VCF). For projects with large number of samples, gVCFs are combined by batches into merged gVCFs. gVCFs are then run through a joint genotyping step (GATK v3.5) to produce a multi-sample VCF. Variant filtration is performed using Variant Quality Score Recalibration (VQSR) which identifies annotation profiles of variants that are likely to be real, and assigns a score (VQSLOD) to each variant. Variant effects annotation is performed using SnpEff (PMID: 22728672), bcftools (http://github.com/samtools/bcftools) and in-house software. Other functional annotations include variant frequencies in different populations from 1000 Genomes project (PMID:20981092), Exome Aggregation Consortium - ExAC(http://biorxiv.org/content/early/2015/10/30/030338), dbSNP147 (PMID: 11125122); cross-species conservation scores from PhyloP (PMID: 15965027), Genomic Evolutionary Rate Profiling (GERP; PMID: 21152010), PhastCons (PMID: 21278375); functional prediction scores from Polyphen2 (PMID: 20354512) and SIFT (PMID: 19561590); Clinvar(http://www.ncbi. nlm.nih.gov/clinvar/); regulatory annotations from ENCODE (PMID: 15499007) and Regulome (PMID: 22955989). Variants and annotations are exported to tabular formats for the ease of downstream analysis. Additional filtration based on functional annotation is applied to extract variants with predicted effects on protein coding.

Variant discovery, for example, is described in the following references: “A framework for variation discovery and genotyping using next-generation DNA sequencing data” DePristo M, et al, 2011 NATURE GENETICS 43:491-498; and “From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline.” Van der Auwera G, et al., Curr Protoc Bioinformatics. 2013; 43: 1-33.

In step (d), ALS is detected in the subject, if at least one of the 16 target genes has one or more mutations, preferably at least two of the 16 target genes have one or more mutations, and more preferably at least three of the 16 target genes have one or more mutations.

The present invention provides a method to detect and diagnose ALS before clinical- and pathological-onset, which is imperative to prolonging patient lifespan, understanding the pathobiology, and designing therapies for early intervention. ALS is a devastating neurodegenerative disorder, with no cures or genetic diagnostics. The present method detects 45%-59% of the ALS-only population, at 99% CL, with the 16 target genomic signatures, when at least 3 of the 16 target genes contain a mutation. The present method provides use of genetic screening in early ALS diagnosis and therapeutic intervention.

On the gene level, this applications show that the detection of single mutations can identify up to 59% of the ALS population with genes that are never found mutated in the healthy control sample. This application illustrates two novel mutations in gene-coding regions of the genome that are never present in the healthy group yet are found in over 25% of the ALS cohort.

The Second Aspect

In a second aspect of the invention, the inventors have discovered that 22 target genes of the human genome, AL033528.3, THRAP3, AC106707.1, LIPH, AC007690.1, FAM184B, AC096747.1-NDUFB5P1, NDUFS4, RPL5P16-AC008885.1, SLF1, TNRC18, AC023095.1, TRPM3, AL161629.1, NCS1, TXNP1-INPP5F, CCDC59, ATP10A, COX5A, RN7SL33P, TOP2A, and ZC3H7B, which do not mutate in a normal subject, are important for detecting ALS.

The invention provides a method for detecting ALS in a subject, comprising obtaining a biological sample from a subject, and from the sample, detecting one or more mutations in 22 target genes selected from the groups consisting of: AL033528.3, THRAP3, AC106707.1, LIPH, AC007690.1, FAM184B, AC096747.1-NDUFB5P1, NDUFS4, RPL5P16- AC008885.1, SLF1, TNRC18, AC023095.1, TRPM3, AL161629.1, NCS1, TXNP1-INPP5F, CCDC59, ATP10A, COX5A, RN7SL33P, TOP2A, and ZC3H7B, and detecting amyotrophic lateral sclerosis in the subject if the 22 genes has one or more mutations.

The invention also provides a method for detecting ALS in a subject, comprising obtaining a biological sample from a subject, and from the sample, detecting one or more mutations in 23 genomic loci selected from the groups consisting of: chrl :25854953 (chromosome 1 at nucleotide position 25854953), chrl :3624870, chr3: 158557839, chr3: 185543848, chr3: 186923875, chr4: 17685198, chr4: 180358067, chr5:53655366, chr5: 82813472, chr5:94666955, chr7:5338617, chr8: 62196626, chr9:71428255, chr9: 89866631, chr9: 130224292, chrlO: 119712877, chrlO: 119712899, chrl2:82295320, chrl5:25687571, chrl 5:74926032, chrl7:2562894, chrl7:40390624, and chr22:41330858, and detecting amyotrophic lateral sclerosis in the subject if the 23 chromosome positions has one or more mutations.

In one embodiment, the method comprises the step of: (a) amplifying DNA extracted from a biological sample of a subject by target-specific polymerase chain reaction to amplify specific genomic loci comprising 23 specific chromosome positions of chrl :25854953, chrl :3624870, chr3: 158557839, chr3:185543848, chr3: 186923875, chr4: 17685198, chr4: 180358067, chr5:53655366, chr5: 82813472, chr5:94666955, chr7:5338617, chr8: 62196626, chr9:71428255, chr9: 89866631, chr9: 130224292, chrlO: 119712877, chrlO: 119712899, chrl2:82295320, chrl5:25687571, chrl 5:74926032, chrl7:2562894, chrl7:40390624, and chr22:41330858, (b) purifying and sequencing the amplified DNA; (c) analyzing each of the amplified DNA sequences and comparing with its corresponding DNA sequence of the normal genomic loci, (d) identifying one or more mutations, if present, at the 23 chromosome positions, and (e) detecting amyotrophic lateral sclerosis in the subject if the 23 chromosome positions has one or more mutations such as SNPs or Indels.

Whole genome sequencing, which is a genomic technique for sequencing all the protein-coding regions of genes in a genome, is not performed in this method.

In step (a) of the method, the DNA is first extracted from a biological sample of a human subject, as described in the first method.

The DNA extracted from the biological sample of the human subject is then performed target-specific amplification to amplify the 23 loci of the 22 genes.

Table 1 shows the 22 genes that frequently has at least one mutation in ALS patients and the position of the mutation in terms of nucleotide position on a chromosome. Gene TXNP1-INPP5F has two mutated loci chrlO: 119712877 and chrlO: 119712899 in ALS patients.

One way of amplifying these loci (that is used in general due to it being less time consuming and effort demanding) is by amplifying all the different regions at the same time in one polymerase chain reaction (PCR), which is called "multiplexing PCR". Multiplexing PCR can be done using more than one sample DNA at a time (multi-template), or it can be done using just one sample DNA per reaction (single template).

Single template multiplexing PCR is explained below, but both multi -tempi ate and single template are techniques that can be applied for the genes. Besides the classic requirements needed for a PCR, known to a person skilled in the art, there are a few regards to consider when designing primers for multiplexing PCR. These preliminary steps are: (a) Confirming that the melting temperatures (Tm) of all the 23 sets of primers are similar, with no set having a Tm that varies more than 5°C compared with any other set; (b) Evaluating for competitive binding, primers should only anneal to only one specific region in the DNA sample; (c) Checking for primer dimers, it is important because there is a large quantity of primer sets, making the possibility of primer dimer formation more probable. More information on multiplex PCR can be found in "Multiplex polymerase chain reaction: a practical approach" Markoulatos P. et al. 2002 Clinical Laboratory Analysis 16(l):47-51. There are different softwares available for primer design each with its own advantages and disadvantages, for example, Jingwen et al. (2020) classified and reviewed some of the free programs available for doing this in "Classification and review of free PCR primer design software" Jingwen et al. 2020 Bioinformatics 36:22-23, which helps the primer design simpler and more precise. The forward primer and reverse primer can be designed according to methods described above or other methods known to a person skilled in the art to amplify the DNA comprising a target site. In general, the forward and reverse primer are designed to be 30-400 bases away from the target site, e,g, 40- 250 bases, 40-200 bases, 40-150 bases, or 40-100 bases. Table 1 illustrates one design of the forward primer and reverse primer for each of the 23 target loci. The primer design shown in Table 1 is an example, and the present invention is not limited to such specific primer sequences. The two loci of chrlO: 119712877 and chrlO: 119712899 of Gene TXNP1-INPP5F are only 22 bases apart from each other and therefore one set of forward and reverse primers can conveniently amplify both loci.

Table 1.

10

1542568971

* shows number of bases between the forward primer and target locus (before) and number of bases between the target locus and reverse primer (after)

11

154256897 1

In step (b), the amplified DNA is purified, and sequenced according to methods known to a person skilled in the art. DNA purification is a step that removes everything that is not the amplicon from the PCR product, this includes unused primers, nucleotides, enzymes, and other impurities. Sequencing includes library preparation and the act of DNA sequencing itself, done by a sequencing system. Library preparation typically consists of fragmenting the DNA sample and adding sequencing adapters to the fragments that are needed for the sequencing step (next generation sequencing). The act of sequencing itself includes reading the nucleotides in the DNA sample and saving them sequentially into a digital file. The specific protocol for DNA purification and DNA sequencing may differ depending on a number of factors, including the method used for DNA amplification and the sequencing system used. There are different well-known methods for purifying and sequencing the amplified DNA. For example, the amplified DNA can be purified and sequenced by using QIAquickPCR Purification Kit for DNA purification, following QIAquick® Spin Handbook protocol; TruSeq DNA LT kit (see product sheet of TruSeq DNA Library Prep Kits®, Illumina) for library preparation, following the protocol available at Ilumina's website TruSeq® DNA Sample Preparation Guide; and sequencing done by Illumina MiSeq system (see MiSeqTM System specification sheet, Illumina).

In step (c), the amplified DNA sequences of (b) is analyzed and compared with its corresponding DNA sequence of the normal genomic loci. See description in the first method.

In step (d), single nucleotide variant analysis is performed to identify one or more mutations, if present, in each of the DNA sequences of the 23 target loci. See description in the first method.

In step (e), ALS is detected in the subject, if at least one of the 23 target loci has single mutation, preferably at least two of the 23 target loci have mutations, and more preferably at least three of the 23 target loci have mutations.

The present method detects over 30% of the ALS-only population, at 99% CL, with the 23 target genomic signatures, when at least 1 of the 23 target genes contain a mutation. The present method provides use of genetic screening in early ALS diagnosis and therapeutic intervention. On the inter-genic and genic analyses, the inventors show that at least two genes must be mutated in the list of 23 top candidates to achieve 35.7-44.9% accuracy at detection of ALS. The following examples further illustrate the present invention. These examples are intended merely to be illustrative of the present invention and are not to be construed as being limiting.

EXAMPLES

Example 1. Frequency of SNP Associations in 338 ALS patients Methods

Statistical Methods

Clopper-Pearson Interval: Bounds are set on the true fractions of either population with a given feature(s). The number of people within a sample that are positive for the feature-of- interest will have a binomial distribution. Clopper-Pearson intervals on the binomial proportion are calculated for true population proportions¹¹.

Fisher's Exact Test: The probability (p-value) of the null hypothesis, that a mutation is present in the ALS population in the same proportion as in the control population¹². This test statistic is ideal for this study because it is the exact probability that the two proportions are equal and can still be calculated in a reasonable amount of time due to the sample size limits. T -tests are approximations of this probability which converge to the exact value in the limit of large sample size.

Row- wise Conditional Percentage: To quantify how often a pair of our top 16 genes is mutated in the same patient, we calculated a conditional probability considering the independent probabilities of each mutation occurring on its own. For every possible ordered pairing of two genes (240 combinations), we counted the number of cases which have both gene mutations and divided by the total number of cases where the first gene was present. This metric is visually represented as a matrix, with each row and column representing a particular mutated gene from the set of 16, and each element representing the conditional probability of the column and row mutation happening in the same patient, adjusted for the baseline prevalence of the row mutation. The probability is converted into a percentage and can provide insights into how often two gene mutations co-occur in each patient.

Genetic Data Acquisition

Answer ALS Data were provided by the Answer ALS consortium.

Patient Cohorts: The entire Answer ALS consortium genetic database was used for a total of 338 ALS patients and 53 healthy control samples. This was the entirety of the Answer ALS genetic database. The total number of patients who tested positive for mutations in known ALS-linked genes, C9ORF72, S0D1, TDP43, consisted of 30, 6, and 1, out of 130, 27, and 14 tested ALS patients, respectively. Our statistical models were implemented with careful consideration of these numbers.

Results:

The results are shown in FIGs. 1-5.

SNPs in the Coding-Genome Found Only in ALS patients.

Identification of patients in the ALS population that contain SNPs that are not present in the healthy control population renders valuable insight into disease pathogenesis and genetic diagnostics. After sorting through the entire cohort of ALS patients and the healthy control sample, we found that we could detect 100,143 SNPs in coding-genes that were only found in the ALS sample

C9ORF72 hexanucleotide-repeats are the most prevalent ALS mutation known to date, effecting 5-10% of all cases, and up to 34% of familial (fALS)¹³. Of the detected SNPs in the ALS-only sample, we focused on the mutations in greater than 12% of the sample to determine candidate genes in a higher than previously accepted association. We found that there were 21 candidate SNPs that reached this level of significance, with a p-value of less than 1.5 x 10'³, all of which are more significant than C9ORF72 reported population (FIGs. 1 and 2).

Of the 21 SNP candidates, rs767982303 and rs760890146 were each found in 25% of the total ALS population yet are absent in controls (FIG2. 1 and 2). rs767982303 (located on the 0XR1 gene) and rs760890146 (located on the NPM1P49 gene) both lead to an acceptor variant mutation. Other top SNPs-of-interest and their significance are illustrated.

Frequency of Candidate SNP Associations in ALS patients

To determine if there are associations between the candidate SNPs, we generated a row-wise conditional percentage heatmap. We found that roughly half of the ALS population containing one of the top 21 candidate SNPs also shared at least one other mutation. Mutations, rs62220927 and rs62220926, in the protein-coding gene TSPEAR, were both found in the same ALS patients, suggesting a dual-mutation linkage at those genomic sites. Genomic Mutations in Genes of the ALS-only Population.

We identify mutations at a gene-level in the ALS-population and not in healthy controls. We sorted and analyzed candidate genes based on the presence of at least one SNP, rather than individual coding-region SNPs. This approach would allow us to identify genemodifiers involved in ALS pathology.

We found 16 individual genes that were each mutated in over 12% of the ALS cases (FIG. 4). One gene, the miRNA gene, MIR7155, was mutated in over 50% of the population (FIGs. 3 and 4). OXR1, which was identified in the SNP mutation list for its association in over 25% of the ALS-only population, was consistent in our candidate genes list with all mutations either influencing a missense mutation or acceptor variant. This is suggestive that mutations in any genomic region of MIR7155, OXR1 and other top candidates is associated with ALS disease in a large proportion of patients.

Associations of multiple mutations in the same population and biological pathway could provide insight into the pathology of ALS. We generated lists of interacting proteincoding genes and determined the frequency of mutation in those associated genes. We found that for our top candidates that the frequency of a gene in the family being mutated in the control group, such as interacting proteins with 0XR1 or RP11, displayed a similar rate as observed in the ALS population ¹⁴. This is suggestive of the influence of only our candidate genes and not their functionally associated protein-coding genes.

Frequency of Candidate Gene Associations in ALS patients

To determine if there are associations between the top 16 candidate genes and top ALS-linked genes; C9ORF72, SOD1, and TDP43, we generated a row-wise conditional percentage heatmap. In the ALS-only patient population, mutations in the genes TDP43, RP11, CIR1P3, and H2AFZP1 were always prevalent with a MIR7155 mutation. Interestingly, C9ORF72, CCDC42, and SOD1 mutant patients consisted of only half the population of MIR7155, suggestive of MIR7155 representing a novel ALS-correlated gene.

Clinical Evaluation of ALS patients

Multiple clinical phenotypes can be associated with the onset of ALS, including age and area of clinical disease initiation¹⁵. We compared the overall age of onset for the entire ALS population and the patients we can diagnose with our classifier analysis. There are no statistically significant differences detected in the age of onset for our diagnosable patients as compared to the overall ALS population.

Next, we compared the three different areas of disease onset, “axial”, “bulbar”, and “limb”, and each potential correlation to our candidate genetic mutations. We found no significant difference between the clinical appearance of disease onset in our top gene candidate patients and the symptoms displayed in the overall ALS population.

Classifier Analysis for the Top Candidate Genes in the ALS-only Population

Diagnostic testing based on novel gene sequence identification could serve as an early disease detection tool. To determine if our top 16 gene candidates, single or in combination, could be used as a statistical tool to associate and identify the ALS-only population, we designed a classifier analysis. Evaluation of the top 16 candidates led to the discovery that a majority of the ALS sample had at least 3 of our 16 genes mutated, and a peak at 7 genes

(FIGs 5A and 5B) We propose a simple classifier that requires at least three of the 16 genes to be mutated. A conservative upper limit on the rate in the healthy population of having a gene mutation for each of these top 16 genes is estimated to be less than 10% (at 99% CL) using the Clopper-Pearson interval since each gene was not found in 53 control patients¹¹.

The proportion of patients with N independently mutated genes of our 16 will be less than 0.1 to the Nth 99% CL Therefore, this classifier requiring at least of three of any of our

16 mutations has a false-positive rate less than 0.1% (1/1000), meaning the specificity is greater than 99.9% at 99% CL. The sensitivity of this classifier is 52% ± 7% at 99% CL, identifying just over half of the ALS sample.

Summary

Our data of Example 1 demonstrates SNPs in coding-regions or entire genes that are associated in a majority of the ALS population. In this clinical and biomedical trial, the Answer ALS consortium utilized the latest next-generation sequencing technology and annotation with the highest quality control and protocols to allow us to perform unbiased genetic analyses on protein-coding genes and other genomic areas of interest. We are the first to report on this novel genomic database using these statistical and computational methods.

We designed an analysis focused on identification of SNPs in the coding-genome. This model allowed us to detect individual SNPs in 27% of the ALS cohort. When using a gene-level approach, we robustly identify a majority (>50%) of the ALS population at 79% CL using a one-sided Clopper-Pearson interval. Other known ALS-linked genes each consist of less than 10% of the overall ALS population.

We have identified OXR1 as a gene mutated in 27% of the ALS group. OXR1 is an essential member of the antioxidant defense mechanisms in the cell.

In our gene-level analyses, we have discovered that the microRNA, MIR7155, is mutated in 52% of the ALS test-sample.

We found 52% of ALS cases (177 out of 338) with at least three of 16 target genes mutated, with a false-positive rate of less than 0.1%, at 99% confidence level (CL) (see methods on the Clopper-Pearson interval). This establishes that a majority of our ALS patients have mutations in at least 3 of our 16 target genes.

Example 2. Frequency of Candidate SNP Associations in 713 ALS patients Methods

Statistical Methods

Statistical Methods are the same as described in Example 1.

Genetic Data Acquisition

Answer ALS Data were provided by the Answer ALS consortium and Alzheimer’s Disease Neuroimaging Initiative.

Patient Cohorts: A total of 713 ALS patients and 93 control samples were used from the ALS consortium genetic database. A total of 818 healthy control samples were used from the Alzheimer’s Disease Neuroimaging Initiative genetic dataset.

Results:

SNPs in the Coding-Genome Found Only in ALS patients.

Identification of patients in the ALS population that contain SNPs that are not present in the healthy control population renders valuable insight into disease pathogenesis and genetic diagnostics. After sorting through the entire cohort of ALS patients and the healthy control samples, we found that we could detect 44,156,401 variants present only in the ALS population.

C9ORF72 hexanucleotide-repeats are the most prevalent ALS mutation know to data, affecting 5-10% of all cases and up to 34% of familial (fALS). Of the detect variants in the ALS-only samples, we focused on the variants in greater than 22% of the ALS population to determine candidate genes in a higher than previously accepted association. We found that there were 23 variants in 22 different genes that reached this level of significance, with a p- value of less than 2.2 x 10'¹⁶, all of which are more significant than C9ORF72 reported population (Table 2). Top variants of interest and their significance are shown. Table 2 shows the 22 genes that are not mutated in the control sample. The gene names, the number of ALS cases out of the 713-patient cohort, percent of total ALS cases with the 99% CL Clopper-Pearson interval are shown, and p-value, respectively.

Table 2.

Table 3 shows the sensitivity and specificity of combined loci in detecting ALS. The sensitivity of any number of combination of mutations and specificity are shown.

Table 3.

Genomic Mutations in Genes of the ALS-only Population.

We identified variants at a gene level in the ALS population and not in healthy control. We sorted and analyzed candidate genes based on the presence of at least one variant rather than individual variants. This approach would allow us to identify gene-modifiers involved in ALS pathology. We found 22 individual genes that were each mutated in over 21% of the ALS cases (Table 2). One gene, NDUFS4, was mutated in over 30% of the ALS population. The high proportion of ALS patients with variants in the 22 identified genes is suggestive that the mutations in the 22 identified genes are associated with ALS disease in a large proportion of patients.

Clinical Evaluation of ALS patients

Multiple clinical phenotypes can be associated with the onset of ALS, including age and area of clinical disease initiation¹⁵. We compared the overall age of onset for the entire ALS population and the patients we can diagnose with our classifier analysis. There are no statistically significant differences detected in the age of onset for our diagnosable patients compared to the overall ALS population.

Next, we compared the three different areas of disease onset, “axial,” “bulbar,” and “limb,” and each potential correlation to our candidate genetic mutations. We found no significant difference between the clinical appearance of disease onset in our top gene candidate patients and the symptoms displayed in the overall ALS population.

Classifier Analysis for the Top Candidate Genes in the ALS-only Population

Diagnostic testing based on novel gene sequence identification could serve as an early disease detection tool. We designed a classifier analysis to determine if our top 22 gene candidates, single or in combination, could be used as a statistical tool to associate and identify the ALS-only population. Evaluation of the top 22 candidates led to the discovery that a majority of the ALS samples had at least one or two of our 23 specific loci mutated and peaked 17-20 loci and 22-23 loci, respectively. Our results show that the sensitivity of detecting ALS was 58.62% ± 4.8% when at least one of the 23 loci was mutated, and 40.4% ± 4.7% at 99% CL when at least two of the 23 loci was mutated, with a specificity of 100% because none of the 22 genes or 23 loci was mutated in normal population.

Our results also show that the probability of detecting ALS increases as the number of positive test results for the 23 loci increase. (Table 3).

FIG. 6 illustrates distribution of candidate ALS-only mutated genes and probability of having ALS or not having ALS based on the number of positive results or negative results on mutations. The distribution of numbers of variants found out of the 23 genomic loci in the 713 ALS cases is shown in grey. The diamond plus represents the probability of having ALS, which shows an increasing probability with increasing positive numbers of variants. Further, the star represents the probability of not having ALS, which shows a decreasing probability base with increasing positive numbers of variants.

References

1 Tandan R., B. W. G. Amyotrophic Lateral Sclerosis: Part 1. Clinical Features, Pathology, and Ethical Issues in Management. Annals of Neurology 18, 271-280 (1985).

2 Petrov, D., Mansfield, C., Moussy, A. & Hermine, O. ALS Clinical Trials Review: 20 Years of Failure. Are We Any Closer to Registering a New Treatment? Front Aging Neurosci 9, 68, doi: 10.3389/fnagi.2017.00068 (2017).

3 Zinman, L. & Cudkowicz, M. Emerging targets and treatments in amyotrophic lateral sclerosis. The Lancet Neurology 10, 481-490, doi: 10.1016/sl474-4422(l 1)70024-2 (2011).

4 Eisen, A. Amyotrophic Lateral Sclerosis is a Multifactorial Disease. Muscle & Nerve 18, 741-752 (1995).

5 Miller, S. J. Astrocyte Heterogeneity in the Adult Central Nervous System. Front Cell Neurosci 12, 401, doi: 10.3389/fncel.2018.00401 (2018).

6 Miller, S. J., Glatzer, J. C., Hsieh, Y. C. & Rothstein, J. D. Cortical astroglia undergo transcriptomic dysregulation in the G93A SOD1 ALS mouse model. J Neurogenet 32, 322- 335, doi: 10.1080/01677063.2018.1513508 (2018).

7 Miller, S. J., Zhang, P. W., Glatzer, J. & Rothstein, J. D. Astroglial transcriptome dysregulation in early disease of an ALS mutant SOD1 mouse model. J Neurogenet 31, 37- 48, doi: 10.1080/01677063.2016.1260128 (2017).

8 Boylan, K. Familial Amyotrophic Lateral Sclerosis. Neurol Clin 33, 807-830, doi: 10.1016/j.ncl.2015.07.001 (2015).

9 Mejzini, R. et al. ALS Genetics, Mechanisms, and Therapeutics: Where Are We Now? Front Neurosci 13, 1310, doi: 10.3389/fnins.2019.01310 (2019).

10 Desvignes, J. P. et al. VarAFT: a variant annotation and filtration system for human next generation sequencing data. Nucleic Acids Res 46, W545-W553, doi: 10.1093/nar/gky471 (2018).

11 Clopper, C. J., Pearson, E.S. The Use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 404-413 (1934).

12 Fisher, R. A. On the interpretation of X2 from contingency tables, and the calculation of P. 1922 85, 87-94 (1922). 13 Renton, A. E. et al. A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72, 257-268, doi: 10.1016/j. neuron.2011.09.010 (2011).

14 Szklarczyk, D. et al. STRING vl 1 : protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47, D607-D613, doi: 10.1093/nar/gkyl l31 (2019).

15 Ravits, J., Paul, P., Jorg C. Focality of upper and lower motor neuron degeneration at the clinical onset of ALS. Neurology 68, 1571-1575 (2007).

16 Rothstein, J. D., Svendsen, C.N., Cudkowicz, M., Berry, J., Maragakis, N., Sherman, A., Sareen, D., Finkbeiner, S., Fraenkel E. Answer ALS: A clinical and comprehensive multi- omics signature for ALS employing induced pluripotent stem cell derived motor neurons from 1000 sporadic and familial ALS patients nationwide. Annals of Neurology 80, S243- S243 (2016).

17 Zhang, K. et al. The C9orf72 repeat expansion disrupts nucleocytoplasmic transport. Nature 525, 56-61, doi: 10.1038/nature 14973 (2015).

18 Renton, A. E., Chio, A. & Traynor, B. J. State of play in amyotrophic lateral sclerosis genetics. Nat Neurosci 17, 17-23, doi: 10.1038/nn.3584 (2014).

19 Felbecker, A. et al. Four familial ALS pedigrees discordant for two SOD1 mutations: are all SOD1 mutations pathogenic? J Neurol Neurosurg Psychiatry 81, 572-577, doi: 10.1136/jnnp.2009.192310 (2010).

20 Liu, K. X. et al. Neuron-specific antioxidant OXR1 extends survival of a mouse model of amyotrophic lateral sclerosis. Brain 138, 1167-1181, doi: 10.1093/brain/awv039 (2015).

21 Tabula Muris, C. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367-372, doi: 10.1038/s41586-018-0590-4 (2018).

22 Molasy, M., Walczak, A., Szaflik, J., Szaflik, J. P. & Majsterek, I. MicroRNAs in glaucoma and neurodegenerative diseases. J Hum Genet 62, 105-112, doi: 10.1038/jhg.2016.91 (2017).

Claims

WHAT IS CLAIMED IS:

1. A method for detecting amyotrophic lateral sclerosis in a subject, comprising the step of: amplifying DNA extracted from a biological sample of a subject by target-specific polymerase chain reaction to amplify specific genomic loci comprising 23 specific chromosome positions of chrl :25854953 (chromosome 1 at nucleotide position 25854953), chrl:3624870, chr3: 158557839, chr3: 185543848, chr3: 186923875, chr4: 17685198, chr4: 180358067, chr5:53655366, chr5: 82813472, chr5:94666955, chr7:5338617, chr8: 62196626, chr9:71428255, chr9: 89866631, chr9: 130224292, chrlO: 119712877, chrlO: 119712899, chrl2:82295320, chrl5:25687571, chrl 5:74926032, chrl7:2562894, chrl7:40390624, and chr22:41330858, purifying and sequencing the amplified DNA; analyzing each of the amplified DNA sequences and comparing with its corresponding DNA sequence of the normal genomic loci, identifying one or more mutations, if present, at the 23 chromosome positions, and detecting amyotrophic lateral sclerosis in the subject if one or more mutations are present in the 23 chromosome positions.

2. The method according to Claim 1, wherein amyotrophic lateral sclerosis is detected in the subject if two or more mutations are present in the 23 chromosome positions.

3. The method according to Claim 1, wherein the one or more mutations include single nucleotide polymorphisms or insertions and deletions.

4. The method according to Claim 1, wherein the 23 chromosome positions are located in 22 genes selected from the group consisting of: AL033528.3, THRAP3, AC106707.1, LIPH, AC007690.1, FAM184B, AC096747.1-NDUFB5P1, NDUFS4, RPL5P16- AC008885.1, SLF1, TNRC18, AC023095.1, TRPM3, AL161629.1, NCS1, TXNP1-INPP5F, CCDC59, ATP10A, COX5A, RN7SL33P, TOP2A, and ZC3H7B

5. The method according to Claim 1, wherein the biological sample is blood, a tissue sample, or a cell.

23

6. A method for detecting amyotrophic lateral sclerosis in a subject, comprising the step of: sequencing 16 target genes from a biological sample of a subject, wherein the specific genes are MIR7155, NPM1P49, RP11-20B24.3, HNRNPA1P44, OXR1, H2AFZP1, TAB3P1, RPL5P35, ZNF92P2, CIR1P3, GNAI2, CCDC42, RP11-370110.6, ADIPOR1P1, KIAA1841, and AC008074.4, comparing each of the DNA sequences of the 16 target genes with its corresponding normal genes, identifying one or more mutations, if present, in each of the DNA sequences of the 16 target genes, and detecting amyotrophic lateral sclerosis in the subject if at least one of the 16 target genes has one or more mutations.

7. The method according to Claim 6, wherein amyotrophic lateral sclerosis is detected in the subject if at least two of the 16 target genes have one or more mutations.

8. The method according to Claim 6, wherein amyotrophic lateral sclerosis is detected in the subject if at least three of the 16 target genes have one or more mutations.

9. The method according to Claim 6, wherein the biological sample is blood, a tissue sample, or a cell.