WO2025014806A1 - Évaluation du risque polygénique basé sur l'ascendance multiple pour le cancer du sein - Google Patents
Évaluation du risque polygénique basé sur l'ascendance multiple pour le cancer du sein Download PDFInfo
- Publication number
- WO2025014806A1 WO2025014806A1 PCT/US2024/036902 US2024036902W WO2025014806A1 WO 2025014806 A1 WO2025014806 A1 WO 2025014806A1 US 2024036902 W US2024036902 W US 2024036902W WO 2025014806 A1 WO2025014806 A1 WO 2025014806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- ancestry
- trait
- bases
- risk
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P35/00—Antineoplastic agents
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- This disclosure relates to the fields of genetics and medicine. More particularly, this disclosure relates to methods for assessing and predicting polygenic traits and breast cancer risks for medical use, as well as treating breast cancer.
- An important drawback in conventional methods for characterizing risk of a trait from genomic data is that baseline data for a trait in one particular population may not accurately predict the same trait in a different population of different heritage.
- Conventional methods using genomic data from a population drawn from one heritage can overestimate the risk of a particular trait in a different population. Overestimation of risk is a significant drawback, especially for disease traits.
- Another drawback of conventional methods for determining traits such as cancer risk include the problem that calculations using genomic data often depend on self-reported heritage information. Errors in self-reported heritage information in genomic data can prevent appropriate determination of cancer risk for a global population.
- a significant drawback of conventional methods for determining risk of a trait is a lack of discrimination between low risk and high risk of the trait for different populations. For example, conventional methods for breast cancer risk based on genomic data from one heritage may not be able to distinguish between low risk and high risk for a population of a different heritage. This drawback of conventional methods can confuse prevention and treatment strategies for a disease trait and jeopardize patient outcomes.
- This disclosure provides improved methods for determining polygenic traits, such as risks for breast cancer.
- the methods of this disclosure can be used in medicine, as well as for treating diseases for which risk is identified and/or assessed.
- methods of this disclosure may provide superior prediction of clinical risk in breast cancer patients.
- the methods of this disclosure can provide polygenic risk prediction for breast cancer which can be applied globally to all patients of all heritage groups.
- a multiple-ancestry polygenic risk score of this disclosure can be used to assess the expectation of a clinical trait or condition such as cancer.
- aspects of this disclosure can characterize an individual’s risk of a trait from genomic data obtained for the trait in one or more particular heritage groups or populations, where the individual may be of a different or the same heritage group or population.
- Embodiments of this disclosure can provide a multiple-ancestry polygenic risk score for a trait in an individual using genomic data from a population drawn from a different heritage than for the individual, without overestimating the risk of the trait in the individual.
- this disclosure contemplates accurately determining a trait such as cancer risk using genomic data of individuals who self-report heritage information.
- a multiple-ancestry polygenic risk score of this disclosure can be used to accurately determine cancer risk for a global population, regardless of any errors in self-reported heritage information.
- this disclosure provides methods for determining risk of a trait with sufficient discrimination between low risk and high risk of the trait for different populations.
- this disclosure includes methods for breast cancer risk based on a multiple-ancestry polygenic risk score that can distinguish between low risk and high risk for a population or individual of any heritage.
- the methods of this disclosure can provide prevention and treatment strategies for a disease trait and improve patient outcomes.
- this disclosure provides highly calibrated and accurate methods for determining multiple-ancestry polygenic risk scores for traits such as breast cancer risk which avoid overestimation. The methods of this disclosure can be useful for all heritage populations, regardless of self-reported patient data, and can improve medical care and patient treatment.
- this disclosure provides methods for assessing traits such as breast cancer risk with enhanced discrimination of risk level for all populations, regardless of heritage. Methods of this disclosure can be efficiently brought to the point of medical care.
- Methods of this disclosure further contemplate using various trait risk markers, which may be single nucleotide polymorphisms (SNP) or a genomic loci.
- SNPs of this disclosure may be associated with breast cancer risk in one or more different heritage groups. Combinations of SNPs can be used to provide a multiple-ancestry polygenic risk score (MA- PRS), which can stratify unaffected patients for breast cancer risk, irrespective of the presence or absence of a family history of the disease.
- the genomic loci of this disclosure may be a specific portion of the genome that may include a single nucleotide polymorphism (SNP) associated with breast cancer risk in one or more different heritage groups and additional bases in high linkage disequilibrium with the breast cancer risk-associated SNP.
- These additional bases may be physically close to the SNP.
- These genomic loci may be within around 200 bases, around 100 bases, around 50 bases, around 40 bases, around 30 bases, around 20 bases, or around 10 bases of the breast cancer risk-associated SNP. Additionally or alternatively, these genomic loci may be within around 0.2 centimorgans (cM), around 0.1 cM, or around 0.05 cM of the breast cancer risk-associated SNP.
- aspects of this disclosure provide methods for polygenic risk scoring that rely on a unique set of SNPs or on a unique set of genomic loci discovered through designated criteria.
- the unique set of SNP markers or unique set of genomic loci may be selected using a novel synthetic stepwise regression methodology that accounts for linkage disequilibrium.
- This unique set of SNPs or unique set of genomic loci for polygenic risk estimation can discriminate risk for all heritage groups and populations.
- the set of SNP markers for polygenic risk estimation disclosed herein provide accurate results for all heritage groups and populations, substantially without bias toward any population or heritage group.
- Additional classes of markers or elements can include age, family history, breast density, and hormone exposure.
- the clinical utility of this disclosure may include superior prediction of clinical risk for breast cancer patients of all ancestries.
- a multiple-ancestry polygenic score obtained by the methods of this disclosure can provide surprisingly increased accuracy in determining breast cancer risks.
- Methods of this disclosure can provide surprisingly accurate determination of polygenic traits and risks by assessing and including contributions of a wide range of markers for different ancestries.
- Embodiments of this disclosure contemplate determining the levels of polygenic traits and risks in the form of a score based on various genomic risk loci.
- the genomic risk loci can be discretely identified and defined, so that accurate determination can be done by genotyping subjects.
- the genomic risk loci can include genomic risk markers for breast cancer, which are combined with additional risk markers that can be specifically breast cancer-informative.
- Embodiments of this disclosure include: [0029] A method for assessing a risk of a trait in a subject, the method comprising: selecting a plurality trait-associated SNP markers and a plurality of ancestry- informative SNP markers; measuring a genotype of the subject; and calculating a multiple-ancestry polygenic risk score for the risk of the trait in the subject based on the trait-associated SNP markers and the plurality of ancestry-informative SNP markers.
- the trait-associated SNPs can be selected using a synthetic stepwise regression methodology that accounts for linkage disequilibrium between variants.
- the trait-associated SNPs can comprise one or more of European breast cancer-associated SNPs, African breast cancer-associated SNPs, East Asian breast cancer-associated SNPs, and Amerindian breast cancer-associated SNPs.
- the method above, wherein the trait is a risk of a disease in the subject may be cancer.
- the plurality of ancestry-informative SNP markers can be from 10 to 50,000 SNP markers.
- the plurality of ancestry-informative SNP markers can be from 10 to 56 SNP markers.
- the trait-associated SNP markers can be a plurality of cancer-associated SNP markers.
- the trait-associated SNP markers can be a plurality of from 10 to 50,000 breast cancer-associated SNP markers.
- the trait-associated SNP markers can be a plurality of from 10 to 329 breast cancer-associated SNP markers.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score for the risk of the trait in the subject may be done with training clinical data of a reference group.
- the calculating a multiple-ancestry polygenic risk score for the risk of the trait in the subject may be done with validating clinical data of a reference group.
- genotype of the subject may be measured by NGS.
- the genotype of the subject may be determined with a sequencing chip.
- the plurality of ancestry-informative SNP markers may determine a fractional heritage in the genotype of the subject for each of four or more different heritage populations.
- the plurality of ancestry-informative SNP markers may determine a fractional heritage in the genotype of the subject for each of African, European, East Asian, and Amerindian heritage populations.
- the trait may be a risk of a disease in the subject, such as cancer.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score comprises using clinical cohorts of women of African self-reported ancestry, East Asian self-reported ancestry, and European self-reported ancestry.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score may comprise using the sum of ancestry specific polygenic risk scores weighted according to fractional ancestral composition.
- the multiple-ancestry polygenic risk score can be strongly associated with breast cancer in a reference cohort and in sub-cohorts defined by self-reported ancestry.
- the multiple-ancestry polygenic risk score can be combined with clinical and/or biological risk factors for accurate risk stratification for all women of all ancestries.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score may comprise calculating and combining: an African-specific PRS (PRSAT), East Asian-specific PRS (PRSEA), European- specific PRS (PRSEU), and Amerindian-specific PRS (PRSAI); an estimated weight for each ancestry including African (BAT), East Asian (BEA), European (BEU), and Amerindian (BAI); and, an Amerindian SNP genotype (xAm), according to the following equation: BAT X PRSAT+ BEA X PRSEA + BEU X PRSEU + BA; X PRSA; + PAm x XAm.
- PRSAT African-specific PRS
- PRSEA East Asian-specific PRS
- PRSEU European- specific PRS
- PRSAI Amerindian-specific PRS
- xAm Amerindian SNP genotype
- a method for assessing a risk of a trait in a subject comprising: selecting a plurality trait-associated genomic loci and a plurality of ancestry- informative genomic loci; measuring a genotype of the subject; and calculating a multiple-ancestry polygenic risk score for the risk of the trait in the subject based on the trait-associated genomic loci and the plurality of ancestry -informative genomic loci.
- the trait-associated genomic loci can be selected using a synthetic stepwise regression methodology that accounts for linkage disequilibrium between variants.
- the trait- associated genomic loci can comprise one or more of European breast cancer-associated genomic loci, African breast cancer-associated genomic loci, East Asian breast cancer- associated genomic loci, and Amerindian breast cancer-associated genomic loci.
- genomic loci are regions of the genome within around 200 bases, around 100 bases, around 50 bases, around 40 bases, around 30 bases, around 20 bases, or around 10 bases of the trait-associated SNPs.
- genomic loci are regions of the genome within around 0.2 centimorgans (cM), around 0.1 cM, or around 0.05 cM of the trait-associated SNPs.
- genomic loci are regions of the genome within around 200 bases, around 100 bases, around 50 bases, around 40 bases, around 30 bases, around 20 bases, or around 10 bases of the ancestry- informative genomic loci.
- genomic loci are regions of the genome within around 0.2 cM, around 0.1 cM, or around 0.05 cM of the ancestry-informative genomic loci.
- calculating the multiple-ancestry polygenic risk score for the risk of the trait in the subject with additional clinical variables of the subject can be age, personal medical history, and family medical history of the subject.
- the method above, wherein the trait is a risk of a disease in the subject may be cancer.
- the plurality of ancestry-informative genomic loci can be from 10 to 50,000 genomic loci.
- the plurality of ancestry-informative genomic loci can be from 10 to 56 genomic loci.
- the trait-associated genomic loci can be a plurality of cancer-associated genomic loci.
- the trait-associated genomic loci can be a plurality of from 10 to 50,000 breast cancer- associated genomic loci.
- the trait-associated genomic loci can be a plurality of from 10 to 329 breast cancer-associated genomic loci.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score for the risk of the trait in the subject may be done with training clinical data of a reference group.
- the calculating a multiple-ancestry polygenic risk score for the risk of the trait in the subject may be done with validating clinical data of a reference group.
- genotype of the subject may be measured by NGS.
- the method above, wherein the plurality of ancestry-informative genomic loci may determine a fractional heritage in the genotype of the subject for each of four or more different heritage populations.
- the method above, wherein the plurality of ancestry-informative genomic loci may determine a fractional heritage in the genotype of the subject for each of African, European, East Asian, and Amerindian heritage populations.
- the multiple-ancestry polygenic risk score for the risk of the trait in the subject can be calibrated for subjects in four or more different heritage populations so that the risk of the trait is not overestimated in any heritage population.
- the multiple-ancestry polygenic risk score for the risk of the trait in the subject can be calibrated for subjects in African, European, East Asian, and Amerindian heritage populations so that the risk of the trait is not overestimated in any heritage population.
- the trait may be a risk of a disease in the subject, such as cancer.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score comprises using clinical cohorts of women of African self-reported ancestry, East Asian self-reported ancestry, and European self-reported ancestry.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score may comprise using the sum of ancestry specific polygenic risk scores weighted according to fractional ancestral composition.
- the method above, wherein the calculating a multiple-ancestry polygenic risk score may comprise calculating and combining: an African-specific PRS (PRSAT), East Asian-specific PRS (PRSEA), European- specific PRS (PRSEU), and Amerindian-specific PRS (PRSAI); an estimated weight for each ancestry including African (BAT), East Asian (BEA), European (BEU), and Amerindian (BAI); and, an Amerindian SNP genotype (xAm), according to the following equation:
- PRSAT African-specific PRS
- PRSEA East Asian-specific PRS
- PRSEU European- specific PRS
- PRSAI Amerindian-specific PRS
- BAT East Asian
- BEA East Asian
- BEU European
- BAI Amerindian
- xAm Amerindian SNP genotype
- Embodiments of this disclosure further contemplate methods for treating a disease in a subject in need thereof, the method comprising: selecting a plurality of disease-associated SNP markers and a plurality of ancestry-informative SNP markers; measuring a genotype of the subject; and calculating a multiple-ancestry polygenic risk score for the risk of the disease in the subject based on the plurality of the disease-associated SNP markers and ancestry- informative SNP markers and, wherein the score indicates a need for treating the subject; and administering to the subject a therapy for the disease.
- the method above may further comprise calculating the multiple-ancestry polygenic risk score with additional variables for age, personal medical history, and family medical history.
- the therapy may be a cancer therapy selected from one or more of surgery, cryoablation, radiation therapy, bone marrow transplant, chemotherapy, immunotherapy, hormone therapy, stem cell therapy, drug therapy, biological therapy, and administration of a pharmaceutical, prophylactic or therapeutic compound.
- the disease may be breast cancer and the therapy may be a breast cancer therapy.
- Embodiments of this disclosure further contemplate methods for treating a disease in a subject in need thereof, the method comprising: selecting a plurality of disease-associated genomic loci and a plurality of ancestry- informative genomic loci; measuring a genotype of the subject; and calculating a multiple-ancestry polygenic risk score for the risk of the disease in the subject based on the plurality of the disease-associated genomic loci and ancestry- informative genomic loci and, wherein the score indicates a need for treating the subject; and administering to the subject a therapy for the disease.
- the method above may further comprise calculating the multiple-ancestry polygenic risk score with additional variables for age, personal medical history, and family medical history.
- the therapy may be a cancer therapy selected from one or more of surgery, cryoablation, radiation therapy, bone marrow transplant, chemotherapy, immunotherapy, hormone therapy, stem cell therapy, drug therapy, biological therapy, and administration of a pharmaceutical, prophylactic or therapeutic compound.
- the disease may be breast cancer and the therapy may be a breast cancer therapy.
- This disclosure includes methods for diagnosing or prognosing a subject having a disease, the method comprising: selecting a plurality of disease-associated SNP markers and a plurality of ancestry-informative SNP markers; measuring a genotype of the subject; and calculating a multiple-ancestry polygenic risk score for the risk of the disease in the subject based on the plurality of the disease-associated SNP markers and ancestry- informative SNP markers, wherein the score indicates a diagnosis or prognosis for the subject.
- the disease may be cancer.
- This disclosure includes methods for generating data for assessing a trait in a subject, the method comprising: selecting a plurality of disease-associated SNP markers and a plurality of ancestry-informative SNP markers; measuring a genotype of the subject; and measuring trait-associated SNP markers in the genotype of the subject.
- the method above may further comprise determining additional clinical variables of the subject such as age, personal medical history, and family medical history of the subject.
- the trait may be a risk of a disease in the subject, such as cancer.
- the trait-associated SNP markers are a plurality of cancer-associated SNP markers.
- the trait-associated SNP markers may be a plurality of from 10 to 50,000 breast cancer associated SNP markers or from 10 to 329 breast cancer associated SNP markers.
- This disclosure includes methods for diagnosing or prognosing a subject having a disease, the method comprising: selecting a plurality of disease-associated genomic loci and a plurality of ancestry- informative genomic loci; measuring a genotype of the subject; and calculating a multiple-ancestry polygenic risk score for the risk of the disease in the subject based on the plurality of the disease-associated genomic loci and ancestry- informative genomic loci, wherein the score indicates a diagnosis or prognosis for the subject.
- the disease may be cancer.
- This disclosure includes methods for generating data for assessing a trait in a subject, the method comprising: selecting a plurality of disease-associated genomic loci and a plurality of ancestry- informative genomic loci; measuring a genotype of the subject; and measuring trait-associated genomic loci in the genotype of the subject.
- the method above may further comprise determining additional clinical variables of the subject such as age, personal medical history, and family medical history of the subject.
- the trait may be a risk of a disease in the subject, such as cancer.
- the method above, wherein the trait-associated genomic loci are a plurality of cancer-associated genomic loci.
- the trait-associated genomic loci may be a plurality of from 10 to 50,000 breast cancer genomic loci or from 10 to 329 breast cancer associated genomic loci.
- genomic loci are regions of the genome within around 200 bases, around 100 bases, around 50 bases, around 40 bases, around 30 bases, around 20 bases, or around 10 bases of the trait-associated SNPs.
- genomic loci are regions of the genome within around 0.2 centimorgans (cM), around 0.1 cM, or around 0.05 cM of the trait-associated SNPs.
- genomic loci are regions of the genome within around 200 bases, around 100 bases, around 50 bases, around 40 bases, around 30 bases, around 20 bases, or around 10 bases of the ancestry- informative genomic loci.
- genomic loci are regions of the genome within around 0.2 cM, around 0.1 cM, or around 0.05 cM of the ancestry-informative genomic loci.
- This disclosure further includes systems for assessing risk of a disease, such as cancer, in a subject, the system comprising: a processor for receiving a genotype of the subject; one or more processors for carrying out the steps: calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative SNP markers, a plurality of disease- associated SNP markers of the genotype, and additional variables for age, personal medical history, and family medical history; and a display for displaying and/or reporting the risk score.
- a processor for receiving a genotype of the subject
- one or more processors for carrying out the steps: calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative SNP markers, a plurality of disease- associated SNP markers of the genotype, and additional variables for age, personal medical history, and family medical history; and a display for displaying and/or reporting the risk score.
- This disclosure further includes systems for assessing risk of a disease, such as cancer, in a subject, the system comprising: a processor for receiving a genotype of the subject; one or more processors for carrying out the steps: calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative genomic loci, a plurality of disease- associated genomic loci of the genotype, and additional variables for age, personal medical history, and family medical history; and a display for displaying and/or reporting the risk score.
- a processor for receiving a genotype of the subject
- one or more processors for carrying out the steps: calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative genomic loci, a plurality of disease- associated genomic loci of the genotype, and additional variables for age, personal medical history, and family medical history; and a display for displaying and/or reporting the risk score.
- Additional embodiments include non-transitory machine-readable storage mediums having stored therein instructions for execution by a processor which cause the processor to perform the steps of a method for assessing risk of a disease, such as cancer, in a subject, the method comprising: receiving a genotype of the subject; calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative SNP markers, a plurality of disease- associated SNP markers of the genotype, and additional variables for age, personal medical history, and family medical history; and sending to a processor output for displaying and/or reporting the risk score.
- Additional embodiments include non-transitory machine-readable storage mediums having stored therein instructions for execution by a processor which cause the processor to perform the steps of a method for assessing risk of a disease, such as cancer, in a subject, the method comprising: receiving a genotype of the subject; calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative genomic loci, a plurality of disease- associated genomic loci of the genotype, and additional variables for age, personal medical history, and family medical history; and sending to a processor output for displaying and/or reporting the risk score.
- a method for assessing risk of a disease such as cancer
- the method comprising: receiving a genotype of the subject; calculating a multiple-ancestry polygenic risk score for risk of the disease in the subject based on a plurality of ancestry-informative genomic loci, a plurality of disease- associated genomic loci of the genotype, and additional variables for age
- FIG. 1 shows an illustration of ancestry in terms of contributions from different continents.
- FIG. 2 shows an illustration of a distribution of genotypes based on ancestry for Hispanic, White/non-Hispanic, Black/ African, and Asian genotypes.
- This disclosure includes methods for determining a multiple-ancestry polygenic risk score which can be predictive for a trait in a subject.
- a multiple-ancestry polygenic risk score can be predictive for risk assessment for breast cancer.
- This multiple-ancestry polygenic risk score can utilize SNPs that are associated with disease presence and ancestry and/or genomic loci in linkage disequilibrium with these disease-associated or ancestry -informative SNPs to predict risk of disease, including of breast cancer.
- this disclosure provides methods for multiple-ancestry polygenic risk prediction with surprisingly increased accuracy of risk assessment for a trait in a subject. [00102] Embodiments of this disclosure further provide reliable breast cancer risk associations applicable to all populations of all ancestries.
- This disclosure provides various methods for clinical risk management, risk magnitude assessment, as well as multiple-ancestry polygenic risk scores, and non-clinical trait prediction. Methods of this disclosure can provide predictive ability that is surprisingly accurate for populations of all ancestries.
- aspects of this disclosure include genotyping a subject using various markers associated with a disease and combining the genotypes in the form of a multiple-ancestry polygenic risk score to predict risk of a trait, such as a clinical condition or an extent of manifestation of a biological trait.
- a plurality of trait risk markers can be used to provide a multiple-ancestry polygenic risk prediction for the trait.
- the plurality of trait risk markers may include various disease-associated gene markers.
- the plurality of trait risk markers may include from 1- 1,000,000 SNP markers. [00108] In certain embodiments, the plurality of trait risk markers may include from 1- 10,000 SNP markers, or from 1-1000 SNP markers, or from 1-100 SNP markers. A plurality of trait risk markers may be from 1-1000 breast cancer SNP markers, or from 1-500 breast cancer SNP markers, or from 1-100 breast cancer SNP markers.
- the plurality of trait risk markers may include 56 SNP markers to 385 SNP markers.
- the method may also utilize detection of bases within genomic loci or genomic loci that are in linkage disequilibrium with the disease-associated SNPs in order to provide a multiple-ancestry polygenic risk prediction for the trait, as the bases in these genomic loci are highly associated with one another based on the reduced frequencies of crossing over between the bases in the genomic loci.
- the plurality of trait risk markers may include genomic loci associated with from 1-1,000,000 SNP markers.
- the plurality of trait risk markers may include genomic loci associated with from 1-10,000 SNP markers, or genomic loci associated with from 1-1,000 SNP markers, or genomic loci associated with from 1-100 SNP markers.
- the plurality of trait risk markers may be genomic loci associated with 1-1,000 breast cancer SNP markers, or genomic loci associated with from 1-500 breast cancer SNP markers, or genomic loci associated with from 1-1000 breast cancer SNP markers.
- the plurality of trait risk markers may include genomic loci associated with 56 SNP markers to 385 SNP markers.
- the plurality of trait risk markers may include bases within genomic loci associated with from 1-1,000,000 SNP markers.
- the plurality of trait risk markers may include bases within genomic loci associated with from 1-10,000 SNP markers, or bases within genomic loci associated with from 1-1,000 SNP markers, or bases within genomic loci associated with from 1-100 SNP markers.
- the plurality of trait risk markers may be bases within genomic loci associated with 1-1,000 breast cancer SNP markers, or bases within genomic loci associated with from 1-500 breast cancer SNP markers, or bases within genomic loci associated with from 1-1000 breast cancer SNP markers. In certain embodiments, the plurality of trait risk markers may include bases within genomic loci associated with 56 SNP markers to 385 SNP markers.
- the bases within the genomic loci or the genomic loci associated with the SNP markers may be within about 500 bases, within about 400 bases, within about 300 bases, within about 250 bases, within about 200 bases, within about 150 bases, within 100 bases, within about 90 bases, within about 80 bases, within about 70 bases, within about 60 bases, within about 50 bases, within about 40 bases, within about 30 bases, within about 20 bases, and within about 10 bases of the SNP marker.
- the bases in the genomic loci or the genomic loci may be within 15 centimorgans (cM) or less, 10 cM or less, 9 cM or less, 8 cM or less, 7 cM or less, 6 cM or less, 5 cM or less, 4 cM or less, 3 cM or less, 2 cM or less, 1 cM or less, 0.75 cM or less, 0.5 cM or less, 0.25 cM or less, 0.2 cM or less, 0.1 cM or less, or 0.05 cM or less of the SNP marker.
- cM centimorgans
- the bases in the genomic loci or the genomic loci associated with the SNP markers may have a linkage disequilibrium value of above 0.1, above 0.2, above 0.5, above 0.6, above 0.7, above 0.8, above 0.9, or about 1.0 with the SNP marker.
- the bases in the genomic loci or the genomic loci associated with the SNP markers may a logarithm of the odds score of at least above 2, at least above 3, at least above 4, at least above 5, at least above 6, at least above 7, at least above 8, at least above 9, at least above 10, at least above 20 at least above 30, at least above 40, at least above 50 with the SNP markers.
- the bases in the genomic loci or the genomic loci associated with the SNP markers and the SNP markers may exhibit recombination during meiosis with each other at a frequency of less than or equal to about 20%, about 19%, about 18%, about 17%, about 16%, about 15%, about 14%, about 13%, about 12%, about 11%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.75%, about 0.5%, about 0.25%, or about 0.1% or less .
- This disclosure provides methods for determining polygenic traits, such as risks for disease including breast cancer.
- the methods of this disclosure can be used for treating diseases for which risk is determined through polygenic scoring.
- methods of this disclosure may provide superior prediction of clinical risk in breast cancer patients.
- the methods of this disclosure can provide multipleancestry polygenic risk prediction for disease such as breast cancer which can be applied globally to all patients of all heritage groups.
- a multiple-ancestry polygenic risk score of this disclosure can be used to assess the expectation of a clinical trait or condition such as cancer in a subject.
- this disclosure can calculate an individual’s risk of a trait from genomic data obtained for the trait in one or more particular heritage groups or population, where the individual may be of the same or a different heritage group or population.
- Embodiments of this disclosure can therefore provide a multiple-ancestry polygenic risk score for a trait in an individual using genomic data from a population drawn from a plurality of different heritages, including heritages different than the heritage to which the individual belongs or has self-identified, without overestimating the risk of the trait in the individual.
- this disclosure contemplates accurately determining a trait such as cancer risk using genomic data of individuals who self-report heritage information.
- a multiple-ancestry polygenic risk score of this disclosure can be used to accurately determine cancer risk for a subject of any heritage, regardless of any errors in selfreported heritage information.
- this disclosure provides methods for determining risk of a trait in an individual with sufficient discrimination between low risk and high risk for the trait, regardless of the heritage group or population to which the subject belongs or has selfidentified.
- this disclosure includes methods for breast cancer risk based on a multiple-ancestry polygenic risk score that can distinguish between low risk and high risk, surprisingly for an individual of any heritage.
- the methods of this disclosure can provide prevention and treatment strategies for a disease trait to improve patient outcomes.
- this disclosure can provide multiple-ancestry polygenic risk scores that are highly calibrated and accurate.
- the multiple-ancestry polygenic risk scores can be used in methods for determining traits such as breast cancer risk in subjects which avoid overestimation.
- the methods of this disclosure can be useful for all heritage groups and/or populations, regardless of the use of self-reported patient data, and can improve medical care and patient treatment.
- this disclosure provides methods for assessing traits such as breast cancer risk with enhanced discrimination of risk level for all populations, regardless of heritage. Methods of this disclosure can be efficiently brought to the point of medical care.
- Methods of this disclosure further contemplate using various trait risk markers, which may be single nucleotide polymorphisms (SNP) or bases in genomic loci associated with the SNP or genomic loci associated with the SNP.
- SNPs of this disclosure may be associated with breast cancer risk in one or more different heritage groups.
- Combinations of SNPs or bases within genomic loci associated with the SNP or genomic loci associated with the SNP can be used to provide a multiple-ancestry polygenic risk score (MA-PRS), which can stratify unaffected patients for breast cancer risk, irrespective of the presence or absence of a family history of the disease.
- Additional classes of markers or elements can include age, family history, breast density, and hormone exposure.
- the clinical utility of this disclosure may include superior prediction of clinical risk for breast cancer patients of all ancestries.
- a multiple-ancestry polygenic score obtained by the methods of this disclosure can provide surprisingly increased accuracy in determining breast cancer risks.
- Methods of this disclosure can provide surprisingly accurate determination of polygenic traits and risks by assessing and including contributions of a wide range of markers for different ancestries.
- Embodiments of this disclosure contemplate determining the levels of polygenic traits and risks in the form of a score based on various genomic risk loci.
- the genomic risk loci can be discretely identified and defined, so that accurate determination can be done by genotyping subjects.
- the genomic risk loci can include genomic risk markers for breast cancer, which are combined with additional risk markers that can be specifically breast cancer-informative.
- the plurality of trait risk markers may include from 1- 100 family history elements, or from 1-20 family history elements, or from 1-10 family history elements.
- Embodiments of this disclosure may include a plurality of trait risk markers such as from 1-100 clinical elements, or from 1-20 clinical elements, or from 1-10 clinical elements.
- Embodiments herein can provide improved multiple-ancestry polygenic risk prediction for breast cancer.
- a polygenic risk score of this disclosure may be surprisingly more accurate for breast cancer than using conventional methods.
- an association between the multiple-ancestry polygenic risk scores and breast cancer may be evaluated by fixed stratification methods.
- the fixed stratification may be adjusted for age and family history, among other variables and elements.
- Embodiments of this disclosure can provide women an estimated lifetime risk for breast cancer with increased accuracy. Such risk estimation is useful to inform decisions based on a threshold for more aggressive screening, including consideration of breast magnetic resonance imaging (MRI).
- MRI breast magnetic resonance imaging
- disclosed herein are methods that can utilize breast cancer SNP markers or bases within genomic loci associated with breast cancer SNP markers or genomic loci associated with breast cancer SNP markers to provide a multiple-ancestry polygenic risk score for breast cancer.
- breast cancer risk markers are given in: Prediction of breast cancer risk based on profiling with common genetic variants, Mavaddat et al., J Natl Cancer Inst., 2015, April 8, Vol. 107(5), djv036.
- breast cancer risk markers are given in: Michailidou et al., Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer, Nat Genet., 2015, Vol. 47, pp. 373.
- breast cancer risk markers are given in Characterizing Genetic Susceptibility to Breast Cancer in Women of African Ancestry, Feng et al., Cancer Epidemiol Biomarkers Prev., 2017, July, Vol. 26(7), pp. 1016-1026.
- breast cancer risk markers are given in Rainville, I. et al., Breast Cancer Research and Treatment, 2020, Vol. 180, pp. 503-509.
- algorithm encompasses any formula, model, mathematical equation, algorithmic, analytical or programmed process, or statistical technique or classification analysis that takes one or more inputs or parameters, whether continuous or categorical, and calculates an output value, index, index value or score.
- algorithms include but are not limited to ratios, sums, regression operators such as exponents or coefficients, biomarker value transformations and normalizations (including, without limitation, normalization schemes that are based on clinical parameters such as age, gender, ethnicity, etc.), rules and guidelines, statistical classification models, and neural networks trained on populations.
- linear and non-linear equations and statistical classification analyses to determine the relationship between (a) the number of mutations detected in a subject sample and (b) the level of the respective subject’s mutation load.
- allele means one of two or more different nucleotide sequences (DNA or RNA) that occur or are encoded at a specific locus, or two or more different polypeptide sequences encoded by such a locus.
- a first allele can occur on one chromosome
- a second allele occurs on a second homologous chromosome, e.g., as occurs for different chromosomes of a heterozygous individual, or between different homozygous or heterozygous individuals in a population.
- an allele generally refers to the nucleotide base present on chromosome (out of the expected two) at that specific locus.
- a patient may have an adenine (A) one chromosome and a guanine (G) one the other, in which case it can be said that the patient has one A allele and one G allele.
- homozygous means an individual or subject has only one type of allele at a given locus (e.g., a diploid individual has a copy of the same allele at a locus for each of two homologous chromosomes, such as A/ A in the preceding example).
- An individual is “heterozygous” if more than one allele type is present at a given locus (e.g., a diploid individual with one copy each of two different alleles, such as A/G in the preceding example).
- homozygous indicates the degree to which members of a group have the same genotype at one or more specific loci.
- heterozygosity is used to indicate the degree to which individuals within the group differ in genotype at one or more specific loci (e.g., all homozygous, all the same type of heterozygosity, etc.).
- An allele “positively” correlates with a trait when it is linked to it and when presence of the allele is an indicator that the trait or trait form will occur in an individual comprising the allele.
- An allele “negatively” correlates with a trait when it is linked to it and when presence of the allele is an indicator that a trait or trait form will not occur in an individual comprising the allele.
- Allele frequency refers to the frequency (e.g., proportion or percentage) at which an allele (e.g., adenine versus guanine in the example above) is present at a locus within an individual, within a line or within a population (or subpopulation).
- an allele e.g., diploid individuals of genotype “A/ A”, “A/G,” or “G/G” have allele frequencies of 1.0, 0.5, or 0.0, respectively.
- the term “allele frequency” is used to define the minor allele frequency (MAF).
- MAF refers to the frequency at which the least common allele (where two alleles are observed) occurs in a given population, or the frequency at which the second most common allele (where more than two alleles are observed) occurs in a given population.
- amplifying in the context of nucleic acid amplification means any process or reaction whereby additional copies of a nucleic acid (or a transcribed form thereof) comprising a particular nucleotide sequence are produced.
- Amplification techniques include, but are not limited to, various polymerase based replication methods, including the polymerase chain reaction (PCR), ligase mediated methods such as the ligase chain reaction (LCR) and RNA polymerase based amplification (e.g., by transcription) methods.
- PCR polymerase chain reaction
- LCR ligase chain reaction
- RNA polymerase based amplification e.g., by transcription
- An “amplicon” is an amplified nucleic acid, e.g., a nucleic acid (or population of such nucleic acids, e.g., in solution) that is (or are) produced by amplifying a template nucleic acid by an amplification technique (e.g., PCR, LCR, transcription, or the like).
- an amplification technique e.g., PCR, LCR, transcription, or the like.
- the term “analyze” or “analyzing” generally includes “measure,” “measuring,” “detect,” “detecting,” “identify,” “identifying,” “assay,” “assaying,” “quantify,” or “quantifying,” and refers to the process of evaluating a biological sample (or a sample derived therefrom) for the presence, absence amount, level, or quality of some physical, chemical, or electromagnetic property(ies). This is often done by determining a value or set of values associated with such properties (e.g., number of sequencing reads in which a fluorescence signal indicating the presence of an adenine was observed at a particular position within the read corresponding to a particular position in a gene, chromosome or genome).
- Specific examples particularly relevant to the present disclosure include analyzing a sample to determining the sequence at one or more particular genomic loci in the sample, and may further comprise comparing test nucleotide sequence(s) detected in a patient’s sample against reference nucleotide sequence(s) and/or comparing the test number of any such test sequences to one or more reference numbers of such reference sequences.
- breast cancer encompasses any type of breast cancer that can develop in a subject.
- the breast cancer may be characterized as Luminal A (ER+ and/or PR+, HER2-, low Ki67), Luminal B (ER+ and/or PR+, HER2+ (or HER2- with high Ki67), Triple negative/basal-like (ER-, PR-, HER2-) or HER2 type (ER-, PR-, HER2+).
- the breast cancer may be resistant to therapy or therapies such as alkylating agents, platinum agents, taxanes, vinca agents, anti-estrogen drugs, aromatase inhibitors, ovarian suppression agents, endocrine/hormonal agents, bisphophonate therapy agents or targeted biological therapy agents.
- therapies such as alkylating agents, platinum agents, taxanes, vinca agents, anti-estrogen drugs, aromatase inhibitors, ovarian suppression agents, endocrine/hormonal agents, bisphophonate therapy agents or targeted biological therapy agents.
- a locus e.g., SNP or genomic locus
- allele is “correlated” or “associated” with a specified phenotype (e.g., increased risk of developing breast cancer) when it can be statistically linked (positively or negatively) to the phenotype.
- a specified polymorphism may occur more commonly in a case population (e.g., breast cancer patients) than in a control population (e.g., individuals that do not have breast cancer).
- This correlation may suggest some natural or biological causal link (e.g., a natural law or phenomenon), but it typically does not prove or require such a link (i.e., the correlation is not such a law or phenomenon per se).
- correlation refers instead to an artificial statistical linkage between a locus and a trait that underlies the phenotype.
- a region or genomic locus may be “associated” with a SNP when it is statistically linked (positively or negatively) with the SNP.
- a specified polymorphism in a region near the disease-associated SNP may occur more commonly in conjunction with the disease-specific SNP than other polymorphisms. This correlation may arise from linkage disequilibrium as the specified polymorphism and the disease-associate SNP as there is a low level of crossing over that occurs between the polymorphism and the disease associated SNP.
- diagnosis refers to methods by which a determination can be made as to whether an individual has or is likely to have a given clinical characteristic (e.g., risk of developing cancer).
- the skilled artisan often makes a diagnosis on the basis of one or more diagnostic indicators, e.g., a biomarker, the presence, absence, amount, or change in amount of which may indicate the presence, severity, or absence of the condition.
- diagnostic indicators can include patient history; physical symptoms, e.g., unexplained weight loss, fever, fatigue, pains, or skin anomalies; phenotype; genotype; or environmental or heredity factors.
- diagnostic often refers to an increased probability or likelihood that given clinical characteristic is present or will occur; that is, that a clinical characteristic is more likely to be present or to occur in a patient exhibiting a given feature, e.g., the presence or level of a diagnostic indicator, when compared to individuals not exhibiting the feature. Diagnostic methods can be used independently, or in combination with other diagnosing methods known in the art to determine whether a clinical characteristic is present or is more likely to occur in a patient exhibiting a given feature.
- disease can encompass any disorder, condition, sickness, ailment, etc. that manifests in, e.g., a disordered or incorrectly functioning organ, part, structure, or system of the body, and results from, e.g., genetic or developmental errors, infection, poisons, nutritional deficiency or imbalance, toxicity, or unfavorable environmental factors.
- genotype means the genetic constitution of an individual (or group of individuals) at one or more genetic loci. Genotype is defined by the allele(s) of one or more known loci of the individual, typically, the compilation of alleles inherited from its parents. In most aspects and embodiments of the present disclosure, the genotype will be the nucleotide (adenine, thymine (or uracil), cytosine, guanine) at a particular locus in either one or both (typically both) alleles of a subject’s genome or chromosomes.
- the nucleotide(s) at that locus or equivalent thereof in one or both alleles form the genotype of the locus.
- a genotype can typically be homozygous (e.g., A/ A) or heterozygous (e.g., A/B), though more complex genotypes are possible (e.g., AA/A, AA/B, etc.).
- genotyping or determining the genotype for a particular locus means determining the nucleotide(s) at a particular gene locus.
- genotyping means determining through a physical assay the physical presence (and optionally quantity) of the nucleotides at a given locus in a patient’s genome or chromosomes. Genotyping can also be done by determining the amino acid variant at a particular position of a protein which can be used to deduce the corresponding nucleotide variant(s).
- haplotype means the genotype of an individual at a plurality of genetic loci on a single DNA strand. Typically, the genetic loci described by a haplotype are physically and genetically linked, z.e., on the same chromosome strand.
- high stringency hybridization conditions when used in connection with nucleic acid hybridization, means conditions capable of restricting hybridization between nucleic acid molecules in a reaction to only those molecules sufficiently homologous to hybridize under the following conditions: hybridization conducted overnight at 42°C in a solution containing 50% formamide, 5xSSC (750 mM NaCl, 75 mM sodium citrate), 50 mM sodium phosphate, pH 7.6, 5x Denhardt’s solution, 10% dextran sulfate, and 20 microgram/ml denatured and sheared salmon sperm DNA, with hybridization filters washed in O.lxSSC at about 65°C.
- “high stringency hybridization conditions” means the preceding hybridization conditions.
- “moderate stringent hybridization conditions,” when used in connection with nucleic acid hybridization, means conditions capable of restricting hybridization between nucleic acid molecules in a reaction to only those molecules sufficiently homologous to hybridize under the following conditions: hybridization conducted overnight at 37°C in a solution containing 50% formamide, 5xSSC (750 mM NaCl, 75 mM sodium citrate), 50 mM sodium phosphate, pH 7.6, 5x Denhardt’s solution, 10% dextran sulfate, and 20 microgram/ml denatured and sheared salmon sperm DNA, with hybridization filters washed in IxSSC at about 50°C.
- “moderate stringency hybridization conditions” means the preceding hybridization conditions. It is noted that many other hybridization methods, solutions and temperatures can be used to achieve comparable stringent hybridization conditions as will be apparent to skilled artisans apprised of the present disclosure.
- a patient has an “increased risk” of a particular cancer if the probability of the patient developing that cancer (e.g., over the patient’s lifetime, over some defined period of time (e.g., within 10 years), etc.) exceeds some reference probability or value.
- the reference probability may be the probability (ie., prevalence) of the cancer across the general relevant patient population (e.g., all patients; all patients of a particular age, gender, ethnicity; patients having a particular cancer (and thus looking at the risk of a different cancer or an independent second primary of the same type as the first cancer); etc.).
- the lifetime probability of a particular cancer in the general population is X% and a particular patient has been determined by the methods, systems or kits of the present disclosure to have a lifetime probability of that cancer of Y%, and if Y > X, then the patient has an “increased risk” of that cancer.
- the tested patient’s probability may only be considered “increased” when it exceeds the reference probability by some threshold amount (e.g, at least 0.5, 0.75, 0.85, 0.90, 0.95, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more fold or standard deviations greater than the reference probability; at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% greater than the reference probability).
- some threshold amount e.g, at least 0.5, 0.75, 0.85, 0.90, 0.95, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more fold or standard deviations greater than the reference probability; at least 1%, 2%, 3%, 4%, 5%
- LD linkage disequilibrium
- D Lewontin
- r Pearson correlation coefficient
- Linkage disequilibrium can be calculated following the application of the expectation maximization algorithm (EM) for the estimation of haplotype frequencies (Slatkin and Excoffier, 1996).
- LD values according to the present disclosure for genotypes/loci are selected above 0.1, above 0.2, above 0.5, above 0.6, above 0.7, above 0.8, above 0.9, or about 1.0.
- LOD stands for “logarithm of the odds”, a statistical estimate of whether two loci (e.g., or a locus and a disease locus) are likely to be located near each other on a chromosome and are therefore likely to be inherited together.
- a LOD score of between about 2-3 or higher is generally understood to suggest that two genes are located close to each other on the chromosome.
- LOD values according to the present disclosure for genotypes/loci are selected at least above 2, at least above 3, at least above 4, at least above 5, at least above 6, at least above 7, at least above 8, at least above 9, at least above 10, at least above 20 at least above 30, at least above 40, at least above 50.
- SNPs in linkage disequilibrium with the SNPs of the present disclosure can have a specified genetic recombination distance of less than or equal to about 20 centimorgan (cM) or less. For example, 15 cM or less, 10 cM or less, 9 cM or less, 8 cM or less, 7 cM or less, 6 cM or less, 5 cM or less, 4 cM or less, 3 cM or less, 2 cM or less, 1 cM or less, 0.75 cM or less, 0.5 cM or less, 0.25 cM or less, or 0.1 cM or less.
- centimorgan centimorgan
- two linked loci within a single chromosome segment can undergo recombination during meiosis with each other at a frequency of less than or equal to about 20%, about 19%, about 18%, about 17%, about 16%, about 15%, about 14%, about 13%, about 12%, about 11%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.75%, about 0.5%, about 0.25%, or about 0.1% or less.
- SNPs in linkage disequilibrium with the SNPs of the present disclosure are within at least 100 kb (which correlates in humans to about 0.1 cM, depending on local recombination rate), at least 50 kb, at least 20 kb or less of each other.
- surrogate markers for a particular SNP involves a strategy that presumes that SNPs surrounding the target SNP are in linkage disequilibrium and can therefore provide information about disease susceptibility.
- surrogate markers can therefore be identified from publicly available databases, such as HAPMAP, by searching for SNPs fulfilling certain criteria which have been found in the scientific community to be suitable for the selection of surrogate marker candidates.
- locus or “genomic locus” or “genomic loci” means a specific position or site in a gene (or protein), chromosomal region, chromosome, or genome.
- a “test locus” is a genomic locus (e.g., single nucleotide at a specified position within a chromosome) whose sequence or genotype is assessed according to the present disclosure.
- a test locus in the present disclosure is often, though not necessarily, a single nucleotide polymorphism.
- single nucleotide polymorphism or “SNP” means a genetic variation between individuals; e.g., a single nitrogenous base position in the DNA of organisms that is polymorphic or variable.
- SNPs is the plural of SNP.
- the identifier used herein for SNP loci (e.g., in Tables 1 and 2) is the “rs” identifier often used in the art. This identifier is used, e.g., in the dbSNP database available through the NCBI website and may be updated for changed for any given locus over time.
- any “rs” identifier used herein is expressly meant to include new or modified “rs” identifiers assigned to the same locus (i.e., the locus to which the “rs” identifier is assigned in Tables 1 or 2 and in dbSNP as of the date of the filing of this disclosure.
- References to DNA herein may include derivatives of any given source of DNA such as amplicons, RNA transcripts thereof, etc.
- a “polymorphism” or “polymorphic” locus or position is a locus or position that is variable; that is, within a population, the nucleotide sequence at a polymorphism has more than one version or allele.
- a polymorphism is a “single nucleotide polymorphism.”
- marker means to a nucleotide sequence or encoded product thereof (e.g, a protein) used as a point of reference when identifying a locus or a linked locus.
- a marker can be derived from genomic nucleotide sequence or from expressed nucleotide sequences (e.g, from an RNA, nRNA, mRNA, a cDNA, etc.), or from an encoded polypeptide.
- the term also refers to nucleic acid sequences complementary to or flanking the marker sequences, such as nucleic acids used as probes or primer pairs capable of amplifying the marker sequence.
- a “marker probe” is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a marker locus sequence or to sequences adjacent to or near such marker locus sequence. Nucleic acids are “complementary” when they specifically hybridize in solution, e.g., according to Watson-Crick base pairing rules and at certain minimum hybridization conditions (e.g., medium stringency).
- a “marker locus” is a locus that can be used to track the presence of a second linked locus, e.g., a linked or correlated locus that encodes or contributes to the population variation of a phenotypic trait.
- a marker locus can be used to monitor segregation of alleles at a locus that is genetically or physically linked to the marker locus.
- a “marker allele,” alternatively an “allele of a marker locus” is one of a plurality of polymorphic nucleotide sequences found at a marker locus in a population that is polymorphic for the marker locus.
- Each of the identified markers is expected to be in close physical and genetic proximity (resulting in physical and/or genetic linkage) to a genetic element that contributes to the relevant phenotype.
- Markers corresponding to genetic polymorphisms between members of a population can be analyzed (e.g., detected, measured, quantified) by several techniques.
- PCR-based sequence specific amplification methods include, e.g., PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of allele specific hybridization (ASH), detection of single nucleotide extension, detection of amplified variable sequences of the genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs).
- RFLP restriction fragment length polymorphisms
- ASH allele specific hybridization
- SSRs simple sequence repeats
- SNPs single nucleotide polymorphisms
- AFLPs amplified fragment length polymorphisms
- NGS next generation sequencing
- DNA sequencing libraries are generated by clonal amplification by PCR in vitro
- second the DNA is sequenced by synthesis, such that the DNA sequence is determined by the addition of nucleotides to the complementary strand rather through chain-termination chemistry typical of Sanger sequencing
- third the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion, typically without the requirement for a physical separation step.
- NGS parallelization of sequencing reactions can generate hundreds of megabases to gigabases of nucleotide sequence reads in a single instrument run.
- conventional sequencing techniques such as Sanger sequencing, which typically report the average genotype of an aggregate collection of molecules
- NGS technologies typically digitally tabulate the sequence of numerous individual DNA fragments (sequence reads discussed in detail below), such that low frequency variants (e.g., variants present at less than about 10%, 5% or 1% frequency in a heterogeneous population of nucleic acid molecules) can be detected.
- the term “massively parallel” can also be used to refer to the simultaneous generation of sequence information from many different template molecules by NGS.
- NGS strategies can include several methodologies, including, but not limited to: (i) microelectrophoretic methods; (ii) sequencing by hybridization; (iii) real-time observation of single molecules, and (iv) cyclic-array sequencing.
- Cyclic-array sequencing refers to technologies in which a sequence of a dense array of DNA is obtained by iterative cycles of template extension and imaging-based data collection.
- cyclic-array sequencing technologies include, but are not limited to, 454 sequencing, for example, used in 454 Genome Sequencers (Roche Applied Science; Basel), Solexa technology, for example, used in the Illumina Genome Analyzer, Illumina HiSeq, MiSeq, and NextSeq (San Diego, CA), the SOLiD platform (Applied Biosystems; Foster City, CA), the Polonator (Dover/Harvard) and HeliScope Single Molecule Sequencer technology (Helicos;
- NGS methods include single molecule real time sequencing (e.g., Pacific Bio) and ion semiconductor sequencing (e.g., Ion Torrent sequencing). See, e.g., Shendure & Ji, Next Generation DNA Sequencing, NAT. BIOTECH. (2008) 26: 1135-1145 for a more detailed discussion of NGS sequencing technologies.
- patient or “individual” or “subject” refers to a human.
- a subject can be male or female.
- sample or “biological sample” refers to samples such as biopsy or tissue samples, frozen samples, blood and blood fractions or products (e.g., serum, platelets, red blood cells, and the like), tumor samples, sputum, bronchoalveolar lavage, cultured cells, e.g., primary cultures, explants, and transformed cells, stool, urine, etc.
- a “biopsy” refers to the process of removing a tissue sample for diagnostic or prognostic evaluation, and to the tissue specimen derived from such a process. Any suitable biopsy technique can be applied to the methods of the present disclosure.
- biopsy technique applied will depend on the tissue type to be evaluated (e.g., lung etc.), the size and type of the tumor, among other factors.
- Representative biopsy techniques include, but are not limited to, excisional biopsy, incisional biopsy, needle biopsy, surgical biopsy, and bone marrow biopsy.
- An “excisional biopsy” refers to the removal of an entire tumor mass with a small margin of normal tissue surrounding it.
- An “incisional biopsy” refers to the removal of a wedge of tissue that includes a cross-sectional diameter of the tumor.
- a diagnosis made by endoscopy or fluoroscopy can require a “core-needle biopsy”, or a “fine-needle aspiration biopsy” which generally obtains a suspension of cells from within a target tissue.
- a “bodily fluid” include all fluids obtained from a mammalian body, either processed (e.g., serum) or unprocessed, which can include, for example, blood, plasma, urine, lymph, gastric juices, bile, serum, saliva, sweat, and spinal and brain fluids.
- a biological sample is typically obtained from a subject.
- cancer cell samples or “tumor sample” means a specimen comprising either at least one cancer cell or biomolecules derived therefrom, including without limitation, lung cancer (e.g, non-small cell lung cancer (NSCLC)), ovarian cancer, colorectal cancer, breast cancer, endometrial cancer, or prostate cancer.
- NSCLC non-small cell lung cancer
- Biomolecules “derived” from a cancer cell sample include molecules located within or extracted from the sample as well as artificially synthesized copies or versions of such biomolecules.
- One illustrative, non-limiting example of such artificially synthesized molecules includes PCR amplification products in which nucleic acids from the sample serve as PCR templates.
- Nucleic acids of’ a cancer cell sample include nucleic acids located in a cancer cell or biomolecules derived from a cancer cell.
- sequence read means the sequence of an individual DNA molecule sequenced in a sequencing reaction.
- individual DNA molecules used for sequencing can be relatively short (e.g., ranging from 50nt to l,000nt). These molecules are typically heavily overlapping in their sequences. Thus, any individual test locus is contained within numerous distinct DNA molecules in the sample.
- the numerous resulting “sequence reads” can be aligned against each other and/or against a larger reference sequence (e.g., a reference human genome sequence such as the hgl9 version of the human genome assembly available at the University of California Santa Clara’s Genome Browser website).
- test locus or an allele at that locus may be counted only if it is covered by at least some minimal number of sequence reads in the sequencing reaction(s).
- score means a value or set of values selected so as to provide a quantitative measure or assessment of a variable or characteristic of a subject or the subject’s condition or physiology.
- the value(s) comprising the score can be based on, derived from or incorporate, for example, quantitative data resulting in a measured amount of one or more sample constituents obtained from the subject.
- the score can be derived from a single constituent, parameter or assessment, while in other embodiments the score is derived from multiple constituents, parameters and/or assessments.
- the score can be based upon or derived from an interpretation function; e.g., an interpretation function derived from a particular predictive model using any of various statistical algorithms.
- a “change in score” can refer to the absolute change in score, e.g. from one time point to the next, or the percent change in score, or the change in the score per unit time (i.e., the rate of score change).
- treatment includes all clinical management of a subject and interventions, whether biological, chemical, physical, or a combination thereof, intended to sustain, ameliorate, improve, or otherwise alter the condition of a subject. These terms may be used synonymously herein. Treatments include but are not limited to administration of prophylactics or therapeutic compounds (including small molecule and biologic drugs), exercise regimens, physical therapy, dietary modification and/or supplementation, bariatric surgical intervention, administration of therapeutic compounds (prescription or over-the-counter), and any other treatments known in the art as efficacious in preventing, delaying the onset of, or ameliorating disease characterized by HML.
- a “response to treatment” includes a subject’s response to any of the above-described treatments, whether biological, chemical, physical, or a combination of the foregoing.
- a “treatment course” relates to the dosage, duration, extent, etc. of a particular treatment or therapeutic regimen.
- An initial therapeutic regimen as used herein is the first line of treatment.
- variant allele ratio means the proportion of informative sequence reads harboring a particular nucleotide at a specific locus as a proportion of the total sequence reads. For example, if a test locus is covered by 100 informative sequence reads in a particular sequencing reaction and 15 reads carry a particular nucleotide (e.g., a risk modifying allele), then the risk modifying allele ratio is 15%. In some contexts variant allele ratios that are too low or too high may indicate unreliability in an allele or genotype call (sometimes referred to herein as a call failure).
- a test locus (or an allele or specific nucleotide at that locus) may be counted only if the variant allele ratio is within a specific (e.g., pre-specified) range.
- the study sets were derived from cohorts of women referred for hereditary cancer testing with a multi-gene panel.
- Patient data were eligible for inclusion if they were from women referred for hereditary cancer testing between ages 18 and 84 who tested negative for pathogenic variants in breast cancer-risk genes including one or more of BRCA1, BRCA2, TP53, PTEN, SIKH, CDH1, PAI. 2, CHEK2, ATM, NBN and BARD 1.
- Cases were defined as patients who had a personal history of invasive breast cancer (BC) and controls were defined as patients with no personal history of BC, ductal carcinoma in situ, lobular carcinoma in situ, atypical hyperplasia, or other breast disease at the time of consent.
- BC invasive breast cancer
- controls were defined as patients with no personal history of BC, ductal carcinoma in situ, lobular carcinoma in situ, atypical hyperplasia, or other breast disease at the time of consent.
- self-reported ancestry was collected according to the 1997 United States Office of Management and
- study subjects were screened for the germline presence of pathogenic mutations in the following genes: APC, ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A (pl4ARF, p 16), CHEK2, EPCAM, MLH1, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51C, RAD51D, SMAD4, STK11, and TP53. Details of the screening methodology and statistical variant classification methods have been previously described.
- long-range and nested PCR were applied to segments of the CHEK2 gene to exclude pseudogene sequences.
- Sequencing may be performed using methods know in the art, for example using Illumina instruments (Illumina Inc., San Diego, CA) to identify both sequence variants and large rearrangements (including deletions and duplications).
- Illumina instruments Illumina Inc., San Diego, CA
- SNP markers are genotyped using standard methods. For example, genotyping may be performed using hybrid selection of SNP targets for breast cancer risk and SNPs for genetic ancestry followed by NGS as described previously (Hughes et al., JCO Precision Oncology 2020:585-92). SNP markers associated with breast cancer risk are detailed in Table 2. SNPs used to determine genetic ancestry are listed in Table 1. For a subset of samples, ancestry variants were determined by targeted PCR with a custom rh-Amp genotyping pool (Integrated DNA Technologies, Coralville, Iowa) followed by NGS.
- a polygenic determination of a trait in a subject can be done with a set of polygenic SNP markers.
- the trait can be ancestry.
- methods of this disclosure can use SNPs associated with one or more different heritage groups.
- methods of this disclosure can use bases in genomic loci associated with SNPs associated with one or more different heritage groups.
- methods of this disclosure can use genomic loci associated with SNPs associated with one or more different heritage groups. Combinations of SNPs or bases within genomic loci associated with these SNPs or genomic loci associated with these SNPs can be used for assessing the ancestry of a subject.
- a genotype of a subject can be determined based on fractional ancestry of one or more different heritage groups.
- Embodiments of this disclosure provide methods for assessing ancestry of a subject by selecting a plurality of ancestry-informative SNP markers. Embodiments of this disclosure also provide methods for assessing ancestry of a subject by selecting a plurality of bases in genomic loci associated with ancestry-informative SNP markers. Embodiments of this disclosure also provide methods for assessing ancestry of a subject by selecting a plurality of genomic loci associated with ancestry-informative SNP markers.
- the ancestry- informative SNP markers can be based on one or more criteria such as the ability to substantially cover the entirety of the human genome, having at least 1% genomic frequency, and having different frequencies in different heritage populations.
- a fractional heritage in the genotype of the subject can be calculated for each of the different heritage populations based on the plurality of ancestry-informative SNP markers or bases in genomic loci associated with ancestry-informative SNP markers or genomic loci associated with ancestry -informative SNP markers.
- the ancestry-informative SNP markers can have different frequencies in four or more different heritage populations, such as in African, European, East Asian, and Amerindian heritage populations.
- a plurality of from 10 to 50,000 ancestry informative SNP markers or 10 to 50,000 bases within genomic loci associated with ancestry- informative SNP markers or 10 to 50,000 genomic loci associated with ancestry -informative SNP markers can be used.
- the plurality of ancestry-informative SNP markers may include from 1-1,000,000 SNP markers.
- a plurality of 10 to 56 ancestry- informative SNP markers can been used.
- the bases associated with ancestry-informative SNP markers may include 1-1,000,000 bases in genomic loci associated with SNP markers or 1- 1,000,000 genomic loci associated with the SNP markers. In certain embodiments, a plurality of 10 to 56 bases within genomic loci associated with ancestry -informative SNP markers can be used. In certain embodiments, a plurality of 10 to 56 genomic loci associated with ancestry -informative SNP markers can be used.
- Methods of this disclosure can combine the use of the ancestry-informative SNP markers with additional SNP markers that may be associated with a biological trait.
- methods of this disclosure can combine the use of the bases in genomic loci associated with the ancestry -informative SNP or genomic loci associated with the ancestry-informative SNP markers with additional SNP markers that may be associated with a biological trait.
- Combinations of SNPs or bases within genomic loci associated with the informative SNPs can be used to provide a multiple-ancestry polygenic risk score (MA- PRS), which can stratify subjects for risk of the trait regardless of heritage.
- a multipleancestry polygenic risk score can inherently incorporate genomic information based on fractional ancestry.
- methods of this disclosure can combine the use of the ancestry-informative SNP markers with additional SNP markers that may be associated with a biological trait, bases in genomic loci associated with the additional SNP markers that may be associated with a biological trait, or genomic loci associated with the additional SNP markers that may be associated with a biological trait. Additionally or alternatively methods of this disclosure can combine the use of the bases in genomic loci associated with the ancestry-informative SNP or genomic loci associated with the ancestry- informative SNP markers with additional SNP markers that may be associated with a biological trait, bases in genomic loci associated with the additional SNP markers that may be associated with a biological trait, or genomic loci associated with the additional SNP markers that may be associated with a biological trait.
- aspects of this disclosure provide methods for multiple-ancestry polygenic risk scoring that rely on a unique set of SNPs or a unique set of bases in genomic loci associated with the informative SNPs or a unique set of genomic loci associated with the informative SNPs discovered through design criteria.
- This unique set of SNPs, bases in genomic loci associated with the informative SNPs, or genomic loci associated with the informative SNPs for multiple-ancestry polygenic risk estimation can discriminate risk for all heritage groups and populations.
- the unique set of SNP markers, bases in genomic loci associated with the informative SNPs, or genomic loci associated with the informative SNPs for polygenic risk estimation disclosed herein provide accurate results for all heritage groups and populations, substantially without bias toward any population or heritage group.
- a set of ancestry-informative SNP markers, bases in genomic loci associated with the informative SNPs, or genomic loci associated with the informative SNPs was discovered using estimates of individual SNP risk betas in different ancestry groups.
- individual SNP risk betas were determined from known values, from data obtained in myRisk patients, and through meta-analysis of combined data of the foregoing.
- Estimates of individual SNP risk betas in different ancestry groups can be used to determine a unique set of SNP markers, bases in genomic loci associated with the informative SNPs, or genomic loci associated with the informative SNPs that can provide a multiple-ancestry polygenic risk score for risk of a trait, such as cancer risk, and which can stratify unaffected patients for risk irrespective of ancestry.
- African SNP risk betas can be determined from 1,000 or more, or from 5,000 or more, or from 10,000 or more myRisk measurements of patients of self-reported African ancestry.
- About seventy Asian SNP risk betas can be determined from Shu et al., Nat Commun., 2020, Vol. 11, pp. 1217-1226; Ho et al., Genetics in Medicine, 2022, pp. 586-600; and Ho et al., Nat. Commun., 2020, Vol. 11, pp. 3833.
- Hispanic SNP risk betas can be determined from 1,000 or more, or from 5,000 or more, or from 10,000 or more Hispanic myRisk measurements of patients of self-reported Hispanic ancestry.
- Amerindian SNP risk betas can be determined from myRisk measurements of patients of genetic Amerindian ancestry in Simmons et al., Prevention, Risk, Reduction, and Genetics, 2024, Vol. 42: 16, pp. 10533.
- Embodiments of this disclosure can provide a multiple-ancestry polygenic risk score for a trait that can be clinically validated for all women of all heritage groups and populations.
- a multiple-ancestry polygenic risk score of this disclosure can provide meaningful risk discrimination of a trait for all women of all heritage groups and populations.
- a multiple-ancestry polygenic risk score of this disclosure can provide a statistical distribution of scores for a trait for a population, where the scores can be centered at zero with no bias for any ancestry-specific subpopulation.
- FIG. 1 shows an illustration of ancestry in terms of contributions from different continents.
- FIG. 2 shows an illustration of a distribution of genotypes based on ancestry for Hispanic, White/non-Hispanic, Bl ack/ African, and Asian self-reported ancestry categories.
- a unique set of ancestry -informative SNPs, bases in genomic loci associated with ancestry -informative SNPs, or genomic loci associated with ancestry-informative SNPs for multiple-ancestry polygenic risk estimation can be obtained by characterizing the ancestry of a subject in terms of contributions from different continents.
- a set of ancestry-informative SNP markers, bases in genomic loci associated with ancestry-informative SNPs, or genomic loci associated with ancestry-informative SNPs can be obtained by design criteria to distinguish between four continental ancestries: African, East Asian, European, and Amerindian. Examples of ancestry -informative SNPs may be found in Table 1.
- Ancestry -informative SNPs of this disclosure include those in Table 1.
- Bases in genomic loci associated with or genomic loci associated with these and other ancestry-informative SNPs can be determined by determining regions that are in linkage disequilibrium with the ancestry-informative SNPs. Determining linkage disequilibrium can be done by estimating the frequency of recombination during meiosis between the bases in the genomic loci associated with the SNP markers or the genomic loci associated with the SNP markers and the SNP markers.
- the frequency may be less than or equal to about 20%, about 19%, about 18%, about 17%, about 16%, about 15%, about 14%, about 13%, about 12%, about 11%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.75%, about 0.5%, about 0.25%, or about 0.1% or less.
- determining linkage disequilibrium can be done by identifying bases in genomic loci or genomic loci that are within 0.2 cM or less, 0.1 cM or less, or within 0.05 cM or less of the SNP marker.
- determining linkage disequilibrium can be done by identifying bases in genomic loci or genomic loci that have a linkage disequilibrium value of above 0.1, above 0.2, above 0.5, above 0.6, above 0.7, above 0.8, above 0.9, or about 1.0 with the SNP marker. Additionally or alternatively, determining linkage disequilibrium can be done by identifying bases in genomic loci or genomic loci that have a logarithm of odd score of at least above 2, at least above 3, at least above 4, at least above 5, at least above 6, at least above 7, at least above 8, at least above 9, at least above 10, at least above 20 at least above 30, at least above 40, at least above 50 with the SNP markers.
- determining linkage disequilibrium can be done by identifying bases in genomic loci or genomic loci that are within about 200 bases, within about 150 bases, within 100 bases, within about 90 bases, within about 80 bases, within about 70 bases, within about 60 bases, within about 50 bases, within about 40 bases, within about 30 bases, within about 20 bases, and within about 10 bases of the SNP marker.
- Embodiments of this disclosure further contemplate combining a unique set of additional SNP markers, genomic loci associated with the additional SNP markers, or bases within genomic loci associated with the additional SNP markers associated with a trait such as risk of cancer with ancestry-informative SNP markers, genomic loci associated with ancestry -informative SNP markers, or bases within genomic loci associated with ancestry- informative SNP markers.
- methods of this disclosure can combine the use of ancestry -informative SNP markers, genomic loci associated with ancestry-informative SNP markers, or bases within genomic loci associated with ancestry -informative SNP markers with additional SNP markers, genomic loci associated with the additional SNP markers, or bases within genomic loci associated with the additional SNP markers that may be associated with cancer risk in one or more different heritage groups.
- Combinations of such SNPs, genomic loci associated with the ancestry -informative SNP markers, or bases within genomic loci associated with ancestry-informative SNP markers can be used to provide a multiple-ancestry polygenic risk score (MA-PRS) for cancer risk, which can stratify unaffected patients for cancer risk, irrespective of ancestry and the presence or absence of a family history of the disease.
- MA-PRS multiple-ancestry polygenic risk score
- methods of this disclosure can combine the use of ancestry-informative SNP markers, genomic loci associated with the ancestry-informative SNP markers, or bases within genomic loci associated with ancestry-informative SNP markers with additional SNP markers, genomic loci associated with the additional SNP markers, or bases within genomic loci associated with the additional SNP markers that may be associated with breast cancer risk in women of one or more different heritage groups.
- Combinations of such SNPs, genomic loci associated with the ancestry -informative SNP markers, or bases within genomic loci associated with ancestry-informative SNP markers can be used to provide a multiple-ancestry polygenic risk score (MA-PRS) for breast cancer risk, which can stratify unaffected women for breast cancer risk, irrespective of ancestry and the presence or absence of a family history of the disease.
- MA-PRS multiple-ancestry polygenic risk score
- a multiple-ancestry polygenic risk estimation may characterize the risk of cancer in a subject regardless of the subject’s genetic ancestry by using a combination of ancestry -informative SNP markers, genomic loci associated with the ancestry -informative SNP markers, or bases within genomic loci associated with ancestry- informative SNP markers and cancer-associated SNPs, genomic loci associated with the cancer-informative SNP markers, or bases within genomic loci associated with cancer- informative SNP markers.
- the cancer-associated SNPs and the cancer-associated genomic loci may be derived from one or more different heritage groups or populations.
- a multiple-ancestry polygenic risk estimation may characterize the risk of breast cancer in a woman regardless of genetic ancestry by using a combination of ancestry-informative SNP markers, genomic loci associated with the ancestry-informative SNP markers, or bases within genomic loci associated with ancestry-informative SNP markers and breast cancer-associated SNPs, genomic loci associated with the breast cancer- associated SNP markers, or bases within genomic loci associated with the breast cancer- associated SNP markers.
- the breast cancer-associated SNPs, genomic loci associated with the breast cancer-associated SNP markers, or bases within genomic loci associated with the breast cancer-associated SNP markers may be derived from one or more different heritage groups or populations.
- a multiple-ancestry polygenic risk score may characterize the risk of breast cancer in a woman regardless of genetic ancestry by using a combination of from 10 to 56 ancestry-informative SNP markers (Table 1), genomic loci associated with the ancestry -informative SNP markers, or bases within genomic loci associated with ancestry- informative SNP markers and from 10 to 329 breast cancer-associated SNPs (Table 2), genomic loci associated with the breast cancer-associated SNP markers, or bases within genomic loci associated with the breast cancer-associated SNP markers.
- the breast cancer- associated SNPs can include 93 previously published breast cancer-associated SNPs (Hughes et al., JCO Precis Oncol 6:e2200084), including up to 92 European breast cancer-associated SNPs and one Hispanic breast cancer SNP 6q25 (rsl40068132), as well as an additional 236 breast cancer-associated SNPs (Table 2), that were selected using a novel synthetic stepwise regression methodology that accounts for linkage disequilibrium.
- a multiple-ancestry polygenic risk score may characterize the risk of breast cancer in a woman regardless of genetic ancestry by using a combination of 56 ancestry-informative SNP markers and 329 breast cancer-associated SNPs comprised of 94 European breast cancer-associated SNPs and 237 non-European breast cancer-associated SNPs including one Hispanic breast cancer SNP 6q25 (rsl40068132).
- a multiple-ancestry polygenic risk score may characterize the risk of breast cancer in a woman by measuring ancestry at each SNP or genomic loci and applying ancestry-specific risks and calibrate the overall risk score according to ancestry-specific frequencies.
- a multiple-ancestry polygenic risk score of this disclosure can achieve a high level of accuracy for all women of all heritage groups and populations in terms of surprisingly high risk discrimination and superior accuracy of calibration.
- MA-PRS was defined as a combination of ancestry-specific PRSs on the basis of genetic ancestral composition.
- Ancestry-specific PRSs SNP weights (or effect sizes) were estimated, as log odds ratios (ORs) for association with BC, for each reference ancestry on the basis of the largest accessible data set.
- African SNP weights were based on meta-analysis of 77,625 Black/ African patients and literature (Du et al., JNCI, 2021, 1186-1176).
- East Asian weights were based on meta analysis of 150,201 East Asian patients and literature (Ho et al., Genetics in Medicine, 2022, 586-600; Ho et al., Nature Commun., 2020, 11 : 3833 ; and Shu et al., Nature Commun., 2020, 11 : 1217).
- European SNP weights were determined from literature (Zhang et al., Nature Genetics, 2020, 52(6):572-581). For one Amerindian SNP (rsl40068132), which is prevalent only among Hispanics, a weight was estimated, denoted by pAm, on the basis of 6,718 selfreported Hispanic patients referred for hereditary cancer testing. At the individual SNP level, Amerindian SNP weights are set equal to the East Asian weights.
- an ancestry-specific PRS was defined as the sum of BC risk alleles, centered by the ancestry-specific allele frequency and weighted by the ancestry-specific effect size multiplied by the probability of the allele having been inherited from that ancestry.
- Weights for the African-, East Asian-, European-, and Amerindian-specific PRSs denoted by BAT, BEA, BEU, and BAI, respectively, were estimated as log ORs from a multivariable logistic regression model with BC status as the dependent variable and age, personal and family cancer history, self-reported ancestry, genetic ancestry, and interaction between self-reported and genetic ancestry as independent variables.
- the final MA-PRS was defined as:
- the MA-PRS was evaluated for improved discrimination compared with a previously published MA-PRS-93 (Hughes et al., JCO Precis Oncol 6:e2200084), which overlaps with the 327 BC-associated SNPs used in the MA-PRS.
- the primary and first secondary analyses were repeated within subcohorts defined by self-reported ancestry.
- Cancer therapy can include surgery, cryoablation, radiation therapy, bone marrow transplant, chemotherapy, immunotherapy, hormone therapy, stem cell therapy, drug therapy, biological therapy, and administration of a pharmaceutical, prophylactic or therapeutic compound including, for example, a biologic or exogenous active agent.
- treatments include bariatric surgical intervention, physical therapy, diet, and diet supplementation.
- Examples of a cancer biological therapy include adoptive cell transfer, angiogenesis inhibitors, bacillus Calmette-Guerin therapy, biochemotherapy, cancer vaccines, chimeric antigen receptor (CAR) T-cell therapy, cytokine therapy, gene therapy, immune checkpoint modulators, immunoconjugates, monoclonal antibodies, oncolytic virus therapy, and targeted drug therapy.
- angiogenesis inhibitors include adoptive cell transfer, angiogenesis inhibitors, bacillus Calmette-Guerin therapy, biochemotherapy, cancer vaccines, chimeric antigen receptor (CAR) T-cell therapy, cytokine therapy, gene therapy, immune checkpoint modulators, immunoconjugates, monoclonal antibodies, oncolytic virus therapy, and targeted drug therapy.
- CAR chimeric antigen receptor
- Examples of a cancer surgery include lumpectomy, partial mastectomy, total mastectomy, simple mastectomy, modified radical mastectomy, radical mastectomy, and Halsted radical mastectomy.
- Examples of a cancer drug include drugs approved to prevent breast cancer including Evista (Raloxifene Hydrochloride), Raloxifene Hydrochloride, and Tamoxifen Citrate.
- Examples of a cancer drug include drugs approved to treat breast cancer including, Abemaciclib, Abraxane (Paclitaxel Albumin-stabilized Nanoparticle Formulation), Ado-Trastuzumab Emtansine, Afinitor (Everolimus), Afinitor Disperz (Everolimus), Alpelisib, Anastrozole, Aredia (Pamidronate Disodium), Arimidex (Anastrozole), Aromasin (Exemestane), Atezolizumab, Capecitabine, Cyclophosphamide, Docetaxel, Doxorubicin Hydrochloride, Ellence (Epirubicin Hydrochloride), Enhertu (Fam-Trastuzumab Deruxtecan- nxki), Epirubicin Hydrochloride, Eribulin Mesylate, Everolimus, Exemestane, 5-FU (Fluorouracil Injection), Fam-Trastuzumab Deruxtecan-n
- sample includes any biological sample that is isolated from a subject.
- a sample can include, without limitation, a single cell or multiple cells, fragments of cells, an aliquot of body fluid, whole blood, platelets, serum, plasma, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, synovial fluid, lymphatic fluid, ascites fluid, and interstitial or extracellular fluid.
- sample also encompasses the fluid in spaces between cells, including synovial fluid, gingival crevicular fluid, bone marrow, cerebrospinal fluid (CSF), saliva, mucous, sputum, semen, sweat, urine, or any other bodily fluids.
- a blood sample can include whole blood or any fraction thereof, including blood cells, red blood cells, white blood cells or leucocytes, platelets, serum and plasma.
- the term “subject” includes humans. Humans generally include women and men and others such as non-binary.
- this disclosure can provide methods for recommending therapeutic regimens, including withdrawal from therapeutic regiments.
- an odds ratio can provide a clinician with a prognostic picture of a subject’s biological state.
- Such embodiments may provide subject-specific prognostic information, which can be informative for a therapy decision, and may also facilitate monitoring therapy response.
- Such embodiments may result in a surprisingly improved treatment, such as better control of a disease, or an increase in the proportion of subjects achieving amelioration of symptoms.
- biological can include pharmaceutical therapy products manufactured or extracted from a biological substance.
- a biologic can include vaccines, blood or blood components, allergenics, somatic cells, gene therapies, tissues, recombinant proteins, and living cells; and can be composed of sugars, proteins, nucleic acids, living cells or tissues, or combinations thereof.
- therapeutic regimen can include any clinical management of a subject, as well as interventions, whether biological, chemical, physical, or a combination thereof, intended to sustain, ameliorate, improve, or otherwise alter the condition of a subject.
- administering can include the placement of a composition into a subject by a method or route that results in at least partial localization of the composition at a desired site such that a desired effect is produced.
- Routes of administration include both local and systemic administration. Generally, local administration results in more of the composition being delivered to a specific location as compared to the entire body of the subject, whereas systemic administration results in delivery to essentially the entire body of the subject.
- administering also includes performing physical actions on a subject’s body, including physical therapy, as well as chiropractic care, massage and acupuncture.
- machine-readable storage medium can comprise, for example, a data storage material that is encoded with machine-readable data or data arrays.
- the data and machine-readable storage medium may be capable of being used for a variety of purposes, when using a machine programmed with instructions for using said data. Such purposes include storing, accessing and manipulating information relating to the risk of a subject or population over time, or risk in response to treatment, or for drug discovery for inflammatory disease.
- Data comprising genomic measurements can be implemented in computer programs that are executing on programmable computers, which may comprise a processor, a data storage system, one or more input devices, one or more output devices. Program code can be applied to the input data to perform the functions described herein, and to generate output information. Output information can then be applied to one or more output devices.
- a computer can be, for example, a personal computer, a microcomputer, or a workstation.
- the term computer program can be instruction code implemented in a high-level procedural or object-oriented programming language, to communicate with a computer system.
- the program may be implemented in machine or assembly language.
- the programming language can also be a compiled or interpreted language.
- Each computer program can be stored on storage media or a device such as ROM, or magnetic diskette, and can be readable by a programmable computer for configuring and operating the computer when the storage media or device is read by the computer to perform the described procedures.
- a health-related or genomic data management system can be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium causes a computer to operate in a specific manner to perform various functions.
- a reference SNP ID number is an identification tag assigned by NCBI to a group or cluster of SNPs that map to an identical location.
- the rs ID number, or rs tag is assigned after submission.
- a submitted SNP is evaluated to see if it maps to an identical location as previously submitted SNPs; if it does, then the submitted SNP is linked into the reference set of the existing reference SNP record.
- These SNP rs IDs are mapped to external resources or databases, including NCBI databases.
- the SNP rs ID number is noted on the records of these external resources and databases in order to point users back to the original dbSNP records.
- a reference SNP record has the format NCBI
- Words specifically defined herein have the meaning provided in the context of the present disclosure as a whole, and as are typically understood by those skilled in the art. As used herein, the singular forms “a,” “an,” and “the” include the plural.
- Example 1 Multiple-ancestry polygenic breast cancer risk assessment.
- Accurate breast cancer risk assessment is essential to identify women for whom screening and preventative interventions may be lifesaving. Incorporation of polygenic risk scores (PRSs) into clinical models can improve risk prediction, but most PRS have shown suboptimal performance among non-European subjects.
- Breast cancer risk assessment was determined by using single-nucleotide polymorphisms (SNPs) with small effects that were aggregated into a multiple-ancestry polygenic risk score (MA-PRS), multiple-ancestry developed and validated for populations of European, African, East Asian, and Amerindian descent.
- SNPs single-nucleotide polymorphisms
- MA-PRS multiple-ancestry polygenic risk score
- MA-PRS novel multiple-ancestry PRS
- MA-PRS MA-149
- BC breast cancer-
- SNPs single nucleotide polymorphisms
- MA-149 significantly outperformed the Tyrer-Cuzick (TC) model, and integration of MA-149 with TC improved predictive accuracy by roughly two-fold over TC alone.
- a set of 385 SNPs (56 ancestry-informative and 329 BC-associated) were selected for the new PRS (MA-385) (Tables 1 and 2).
- the strongest associations were observed in Ashkenazi Jewish and Hispanic women (Table 3).
- MA-385 identified more women at >2-fold increased risk than MA-149 (6.5% vs. 2%). Goodness-of-fit tests showed that MA-385 was well-calibrated, while a 385-SNP PRS with European weights was miscalibrated for non-Europeans.
- MA-385 was well-calibrated, improved upon clinical factors, and outperformed existing PRS in all tested ancestries. Incorporation of MA-385 into risk assessment could improve the early detection and prevention of BC.
Landscapes
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Pathology (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Wood Science & Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Zoology (AREA)
- Primary Health Care (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Hospice & Palliative Care (AREA)
- Epidemiology (AREA)
- Oncology (AREA)
- Microbiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Pharmacology & Pharmacy (AREA)
- Animal Behavior & Ethology (AREA)
Abstract
La présente invention concerne des procédés permettant d'évaluer le risque posé par une caractéristique chez un sujet, en sélectionnant une pluralité de marqueurs SNP informatifs sur le plan de l'ascendance sur la base de critères de conception objectifs, en mesurant un génotype du sujet, en obtenant des marqueurs SNP associés à la caractéristique, et en calculant un score de risque polygénique basé sur l'ascendance multiple pour le risque posé par la caractéristique chez le sujet sur la base de la pluralité de marqueurs SNP informatifs sur le plan de l'ascendance et des marqueurs SNP associés à la caractéristique. La caractéristique peut être le risque de développer un cancer. La présente invention concerne également des procédés permettant d'évaluer l'ascendance d'un sujet.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363525534P | 2023-07-07 | 2023-07-07 | |
| US63/525,534 | 2023-07-07 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025014806A1 true WO2025014806A1 (fr) | 2025-01-16 |
Family
ID=94216393
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/036902 Pending WO2025014806A1 (fr) | 2023-07-07 | 2024-07-05 | Évaluation du risque polygénique basé sur l'ascendance multiple pour le cancer du sein |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025014806A1 (fr) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060234223A1 (en) * | 2002-02-21 | 2006-10-19 | Ariel Darvasi | Association of snps in the comt locus and neighboring loci with schizophrenia, bipolar disorder, breast cancer and colorectal cancer |
| US20070166707A1 (en) * | 2002-12-27 | 2007-07-19 | Rosetta Inpharmatics Llc | Computer systems and methods for associating genes with traits using cross species data |
| WO2014015084A2 (fr) * | 2012-07-17 | 2014-01-23 | Counsyl, Inc. | Système et procédés pour la détection d'une variation génétique |
| US20190024191A1 (en) * | 2016-02-11 | 2019-01-24 | Pioneer Hi-Bred International, Inc. | Qtls associated with and methods for identifying lodging resistance in soybean |
| US20220387622A1 (en) * | 2019-05-21 | 2022-12-08 | Beam Therapeutics Inc. | Methods of editing a single nucleotide polymorphism using programmable base editor systems |
| WO2022182870A9 (fr) * | 2021-02-24 | 2023-02-16 | Myriad Genetics, Inc. | Évaluation globale du risque polygénique pour le cancer du sein |
-
2024
- 2024-07-05 WO PCT/US2024/036902 patent/WO2025014806A1/fr active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060234223A1 (en) * | 2002-02-21 | 2006-10-19 | Ariel Darvasi | Association of snps in the comt locus and neighboring loci with schizophrenia, bipolar disorder, breast cancer and colorectal cancer |
| US20070166707A1 (en) * | 2002-12-27 | 2007-07-19 | Rosetta Inpharmatics Llc | Computer systems and methods for associating genes with traits using cross species data |
| WO2014015084A2 (fr) * | 2012-07-17 | 2014-01-23 | Counsyl, Inc. | Système et procédés pour la détection d'une variation génétique |
| US20190024191A1 (en) * | 2016-02-11 | 2019-01-24 | Pioneer Hi-Bred International, Inc. | Qtls associated with and methods for identifying lodging resistance in soybean |
| US20220387622A1 (en) * | 2019-05-21 | 2022-12-08 | Beam Therapeutics Inc. | Methods of editing a single nucleotide polymorphism using programmable base editor systems |
| WO2022182870A9 (fr) * | 2021-02-24 | 2023-02-16 | Myriad Genetics, Inc. | Évaluation globale du risque polygénique pour le cancer du sein |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2020264326B2 (en) | Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results | |
| JP6625045B2 (ja) | 相同組換え欠損を評価するための方法および材料 | |
| JP2023501376A (ja) | 核酸分子を分析するための方法およびシステム | |
| JP2015070839A (ja) | 膵臓腫瘍形成の根底にある経路および遺伝性の膵癌遺伝子 | |
| EP3230472A1 (fr) | Méthodes et matériaux permettant de prédire une réaction au niraparib | |
| Williams et al. | Tracking clonal evolution of drug resistance in ovarian cancer patients by exploiting structural variants in cfDNA | |
| Gusev | Germline mechanisms of immunotherapy toxicities in the era of genome‐wide association studies | |
| US20220205043A1 (en) | Detecting cancer risk | |
| CN110468206A (zh) | 一种与手足综合症相关的snp标志物及其应用 | |
| US20190002981A1 (en) | Method of Testing for Preeclampsia and Treatment Therefor | |
| US20240071628A1 (en) | Database for therapeutic interventions | |
| US20240052419A1 (en) | Methods and systems for detecting genetic variants | |
| WO2020185398A1 (fr) | Procédé destiné à effectuer des procédures médicales sur des patients atteints d'un cancer du sein guidées par un score de risque polygénique dérivé du snp | |
| US20240301500A1 (en) | Global polygenic risk assessment for breast cancer | |
| US20230170045A1 (en) | Comprehensive polygenic risk prediction for breast cancer | |
| WO2025014806A1 (fr) | Évaluation du risque polygénique basé sur l'ascendance multiple pour le cancer du sein | |
| CN108753959A (zh) | 一种位于disc1fp1基因的与放疗引起的放射性脑损伤相关的snp标志物及其应用 | |
| CN108441560A (zh) | 一种位于cep128基因的与放疗引起的放射性脑损伤相关的snp标志物及其应用 | |
| CA2972433C (fr) | Detection et traitement d'une maladie faisant preuve d'heterogeneite des cellules malades et systemes et procedes de communication des resultats de test | |
| Kachuri | Investigation of Genetic Profiles in Chromosome 5p15. 33 and Telomere Length in Lung Cancer Risk and Clinical Outcomes | |
| Martens | ORIGINAL PAPER Elevated APOBEC3B Correlates with Poor Outcomes for Estrogen-Receptor-Positive Breast Cancers | |
| CN108715895A (zh) | 一种位于kctd1基因的与放疗引起的放射性脑损伤相关的snp标志物及其应用 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24840329 Country of ref document: EP Kind code of ref document: A1 |