[go: up one dir, main page]

HK1139737A1 - Genetic analysis systems and methods - Google Patents

Genetic analysis systems and methods Download PDF

Info

Publication number
HK1139737A1
HK1139737A1 HK10106416.1A HK10106416A HK1139737A1 HK 1139737 A1 HK1139737 A1 HK 1139737A1 HK 10106416 A HK10106416 A HK 10106416A HK 1139737 A1 HK1139737 A1 HK 1139737A1
Authority
HK
Hong Kong
Prior art keywords
individual
phenotype
profile
genotype
genomic
Prior art date
Application number
HK10106416.1A
Other languages
Chinese (zh)
Other versions
HK1139737B (en
Inventor
D‧A‧斯坦芬
M‧F‧菲利普庞
J‧韦塞尔
M‧卡吉尔
E‧哈尔佩里恩
Original Assignee
纳维哲尼克斯公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/781,679 external-priority patent/US20080131887A1/en
Application filed by 纳维哲尼克斯公司 filed Critical 纳维哲尼克斯公司
Publication of HK1139737A1 publication Critical patent/HK1139737A1/en
Publication of HK1139737B publication Critical patent/HK1139737B/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Physiology (AREA)
  • Ecology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Hematology (AREA)
  • General Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Urology & Nephrology (AREA)
  • Biomedical Technology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides methods of determining a Genetic Composite Index score by assessing the association between an individual's genotype and at least one disease or condition. The assessment comprises comparing an individual's genomic profile with a database of medically relevant genetic variations that have been established to associate with at least one disease or condition.

Description

Genetic analysis system and method
Background
Other recent advances in human genome sequencing and human genomics have revealed that the genome composition between any two people has more than 99.9% similarity. Relatively small amounts of variation in DNA between individuals are responsible for differences in phenotypic traits and are associated with many human diseases, susceptibility to various diseases, and response to disease treatment. Inter-individual variation of DNA occurs in coding and non-coding regions and includes base changes at specific sites in the genomic DNA sequence, as well as insertion and deletion of DNA. Changes that occur at a single base position in the genome are referred to as single nucleotide polymorphisms, or "SNPs".
Although SNPs are relatively rare in the human genome, they account for a large portion of inter-individual DNA sequence variation, with one SNP occurring approximately every 1,200 base pairs in the human genome (see International HapMap Project, www.hapmap.org). The complexity of SNPs is becoming known as more human genetic information is available. In turn, the occurrence of SNPs in the genome is associated with the presence and/or susceptibility to a variety of diseases and conditions.
As these correlations and other advances in human genetics are obtained, medical and personal care are generally moving toward personalized approaches where a patient will make appropriate medical and other choices taking into account his or her genomic information, among other factors. Thus, there is a need to provide individuals and their health care providers with information specific to the individual's personal genome, thereby providing personalized medical and other decisions.
Disclosure of Invention
The present invention provides a method of assessing genotype correlations in an individual, the method comprising: a) obtaining a genetic sample of the individual, b) generating a genomic profile of the individual, c) determining the genotype-phenotype association of the individual by comparing the genomic profile of the individual to a current database of human genotype-phenotype associations, d) reporting the results from step c) to the individual or to a health care manager of the individual, e) when additional human genotype associations are known, updating a database of human genotype correlations with the additional human genotype correlations, f) updating the genotype correlations of the individual by comparing the genomic profile of the individual obtained from step c), or a portion thereof, with the additional human genotype correlations, and determining additional genotype correlations for the individual, and g) reporting the results obtained from step f) to the individual or to a health care manager of the individual.
The present invention further provides a commercial method of assessing genotype correlation in an individual, the method comprising: a) obtaining a genetic sample of the individual; b) generating a genomic profile of the individual, c) determining the genotype correlation of the individual by comparing the genomic profile of the individual to a database of human genotype correlations; d) providing the individual in an encrypted manner with the results of determining the genotype correlations of the individual; e) updating the human genotype correlations database with additional human genotype correlations, when the additional human genotype correlations are known; f) updating the genotype correlation of the individual by comparing the genomic profile of the individual or a portion thereof to additional human genotype correlations and determining additional genotype correlations for the individual; and g) providing the individual or a health care manager of the individual with results that update the genotype correlation of the individual.
Another aspect of the invention is a method of generating a phenotype profile of an individual, the method comprising: a) providing a rule set (rule set) comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype, b) providing a data set comprising a genomic profile of each individual of the plurality of individuals, wherein each genomic profile comprises a plurality of genotypes; c) periodically updating the rule set with at least one new rule, wherein the at least one new rule indicates a correlation between genotypes and phenotypes not previously associated with each other in the rule set; d) applying each new rule to a genomic profile of at least one individual, thereby correlating at least one genotype with at least one phenotype of the individual, and optionally, e) generating a report comprising the phenotypic profile of the individual.
The present invention also provides a system comprising: a) a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype; b) code for periodically updating the rule set with at least one new rule, wherein the at least one new rule indicates a correlation between genotypes and phenotypes not previously associated with each other in the rule set; c) a database comprising genomic profiles of a plurality of individuals; d) code for applying the rule set to a genomic profile of the individual to determine a phenotypic profile of the individual; and e) code to generate a report for each individual.
Another aspect of the present invention is the transmission over the network in an encrypted or unencrypted manner in the above described method and system.
Reference to the incorporated references
All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
Drawings
FIG. 1 is a flow diagram illustrating a method aspect of the invention.
FIG. 2 is an example of a means for controlling the quality of genomic DNA.
FIG. 3 shows an example of hybridization quality control means.
FIG. 4 is a table of typical genotype correlations from publications with SNPs tested and effect evaluations. A-I) indicates genotype correlation of individual loci; J) indicates genotype correlation of the two loci; K) indicates genotype correlations at three loci; l) is an index of the ethnicity and national abbreviations used in A-K; m) is a reference for the index, inheritance rate and inheritance rate of the phenotypic Name abbreviation (ShortPhotylype Name) in A-K.
FIGS. 5A-J are tables with typical genotype correlations with effect evaluation.
FIGS. 6A-F are tables of typical genotype correlations and estimated relative risk.
FIG. 7 is an example report.
FIG. 8 is an illustration of a system for analyzing and transmitting genomic and phenotypic profiles over a network.
FIG. 9 is a flow chart illustrating a business method aspect of the invention.
FIG. 10: the prevalence (prevalence) evaluates the effect on relative risk assessment. Assuming Hardy-Weinberg Equilibrium (Hardy-Weinberg Equilibrium), each curve corresponds to a different value of allele frequency in the population. The two black lines correspond to a dominance ratio of 9 and 6, the two red lines correspond to a dominance ratio of 6 and 4, and the two blue lines correspond to a dominance ratio of 3 and 2.
FIG. 11: allele frequencies evaluate the effect on relative risk assessment. Each curve corresponds to a different value of prevalence in the population. The two black lines correspond to a dominance ratio of 9 and 6, the two red lines correspond to a dominance ratio of 6 and 4, and the two blue lines correspond to a dominance ratio of 3 and 2.
FIG. 12: pairwise comparison of absolute values of different models.
FIG. 13: pairwise comparisons of rank values (GCI scores) based on different models. Spearman correlations between the different pairs are given in table 2.
FIG. 14: popularity reports the effect on GCI scores. The Spearman correlation between any two prevalence values is at least 0.99.
FIG. 15: is a diagram of an example web page from a personal portal.
FIG. 16: a diagram of an example web page from a personal portal illustrating a risk of a person to have prostate cancer.
FIG. 17: a diagram of an example web page from a person's portal to illustrate the person's risk of crohn's disease.
FIG. 18: histogram of GCI scores for HapMAP-based multiple sclerosis using 2 SNPs.
FIG. 19: is a lifetime risk for individuals with multiple sclerosis using GCI Plus.
FIG. 20: histogram of GCI scores for crohn's disease.
FIG. 21: is a table of multi-locus correlations.
FIG. 22: table of SNPs and phenotypic associations.
FIG. 23: table of phenotypes and prevalence.
FIG. 24: are a glossary of abbreviations in figures 21, 22 and 25.
FIG. 25: table of SNPs and phenotypic associations.
Detailed Description
The present invention provides methods and systems for generating phenotypic profiles based on stored genomic profiles of individuals or groups of individuals, and for conveniently generating original and updated phenotypic profiles based on stored genomic profiles. The genomic profile is generated by genotyping a biological sample obtained from the individual. The biological sample obtained from an individual may be any sample from which a genetic sample may be derived. The sample may be from a buccal swab, saliva, blood, hair, or any other type of tissue sample. The genotype may then be determined from the biological sample. The genotype may be any genetic variant or biomarker, for example, Single Nucleotide Polymorphisms (SNPs), haplotypes (haplotypes)) or sequences of the genome. The genotype may be the entire genomic sequence of the individual. Genotypes can be derived from high throughput analysis that generates thousands or millions of data points, e.g., microarray analysis for most or all known SNPs. In other embodiments, the genotype may also be determined by high throughput sequencing.
The genotype forms the genomic map of the individual. Genomic profiles are stored digitally and are easily accessed at any point in time to generate phenotypic profiles. A phenotype profile is generated by applying rules that associate or bind a genotype with a phenotype. Rules may be formulated based on scientific studies that indicate a correlation between genotype and phenotype. The correlation may be validated or confirmed by a committee of one or more experts. By applying rules to the genomic profile of an individual, associations between the genotype and phenotype of the individual can be determined. An individual's phenotype profile will have this certainty. The determination may be a positive correlation between the genotype of the individual and a given phenotype, such that the individual has the given phenotype or will develop the phenotype. Alternatively, it may be determined that the individual does not have or will not produce a given phenotype. In other embodiments, the determination may be a risk factor, an estimate, or a probability that the individual has or will develop a phenotype.
The determination may be based on a variety of rules, for example, a variety of rules may be applied to a genomic profile to determine the association of an individual's genotype with a particular phenotype. The determination process may also include individual-specific factors such as race, gender, lifestyle (e.g., diet and exercise habits), age, environment (e.g., location of residence), family medical history, personal medical history, and other known phenotypes. The incorporation of specific factors may include these factors by modifying existing rules. Alternatively, separate rules may be generated from these factors and applied to the individual's phenotypic determination after existing rules have been applied.
A phenotype may include any measurable trait or characteristic, such as susceptibility to a disease or response to a drug treatment. Other phenotypes that may be included are physical and mental traits such as height, weight, hair color, eye color, sunburn sensitivity, size, memory, intelligence, optimism, overall temperament. Phenotypes may also include genetic comparisons with other individuals or organisms. For example, individuals may be interested in the similarity between their genomic profile and that of celebrities. They may also compare their genetic profile to other organisms (e.g., bacteria, plants, or other animals).
In summary, the collection of related phenotypes determined for an individual constitutes a phenotype profile for that individual. The phenotype profile may be accessed through an online portal. Alternatively, the phenotype profile may be provided in paper form as it existed at a particular time, with subsequent updates also being provided in paper form. Phenotypic profiles may also be provided through an online portal. The online portal may optionally be an encrypted online portal. Access to the phenotype profiles may be provided to registered users who subscribe to rules for generating correlations between phenotypes and genotypes, determining a genomic profile of an individual, applying the rules to the genomic profile, and a service for generating a phenotype profile of an individual. Access may also be provided to non-registered users, where they may have limited rights to access their phenotype profiles and/or reports, or may allow for the generation of an initial report or phenotype profile, but only generate updated reports through a paid subscription. Healthcare managers and providers, such as caregivers, physicians, and genetic consultants, may also have access to the phenotype spectrum.
In another aspect of the invention, genomic profiles may be generated for registered and non-registered users and stored digitally, but access to phenotypic profiles and reports may be limited to registered users. In another variation, both registered and non-registered users may have access to their genotype and phenotype profiles, but non-registered users have restricted access or allow for the generation of limited reports, whereas registered users have full access and may allow for the generation of full reports. In another embodiment, registered and non-registered users may initially have full access or full initial reports, but only registered users may access reports updated based on their stored genomic profile.
In another aspect of the invention, information regarding the association of a plurality of genetic markers with one or more diseases or conditions is combined and analyzed to obtain a Genetic Composite Index (GCI) score. This score includes known risk factors as well as other information and assumptions such as allele frequency and prevalence of the disease. GCI can be used to quantitatively assess the association of a disease or condition with the combined effects of a range of genetic markers. The GCI score can be used to provide reliable (e.g., robust), understandable, and/or intuitive knowledge to persons who are not genetically trained regarding their individual risk of contracting a disease as compared to a relevant population based on existing scientific studies. The GCI score may be used to generate a GCI Plus score. The GCI Plus score may include all GCI hypotheses including risk of status (e.g., lifetime risk), age-defined prevalence, and/or age-defined incidence. The lifetime risk of an individual can then be calculated as the GCI Plus score which is proportional to the individual GCI score divided by the average GCI score. The average GCI score may be determined from a group of individuals with similar familial context, such as a group of caucasians, asians, eastern indians, or other groups with common familial context. The group may consist of at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55 or 60 individuals. In certain embodiments, the average GCI score may be determined by at least 75, 80, 95, or 100 individuals. The GCI Plus score can be determined by determining the GCI score of the individual, dividing the GCI score by the average relative risk, and multiplying by the lifetime risk of the condition or phenotype. For example, the GCI Plus score is calculated using the data from fig. 22 and/or fig. 25 and the information in fig. 24, such as in fig. 19.
The present invention encompasses the use of the GCI scores described herein, and one of skill in the art will readily recognize the use of GCI Plus scores or variants thereof in place of the GCI scores described herein.
In one embodiment, a GCI score is generated for each disease or condition of interest. These GCI scores can be pooled to form a risk profile (risk profile) for the individual. The GCI scores may be stored digitally so that they can be conveniently accessed at any point in time to generate a risk profile. The risk profile may be broken down according to a large disease category, such as cancer, heart disease, metabolic disorder, mental disorder, bone disease, or age on-set disorder. Large disease classes can be further broken down into subclasses. For example, for a large classification as cancer, the subclass of cancers may be listed, for example, by type (sarcoma, carcinoma or leukemia, etc.) or by tissue specificity (nerve, breast, ovary, testis, prostate, bone, lymph node, pancreas, esophagus, stomach, liver, brain, lung, kidney, etc.).
In another embodiment, a GCI score is generated for the individual that provides readily understandable information about the individual's risk of acquiring or susceptibility to at least one disease or condition. In one embodiment, multiple GCI scores are generated for different diseases or conditions. In another embodiment, at least one GCI score may be accessed through an online portal. Alternatively, the at least one GCI score may be provided in a paper format, with subsequent updates also being provided in a paper format. In one embodiment, access to at least one GCI score is provided to registered users who are individuals subscribed to the service. In an alternative embodiment, non-registered users are provided access rights, where they may have limited access rights to access at least one of their GCI scores, or they may allow for the generation of an initial report of at least one of their GCI scores, but only generate updated reports through a paid subscription. In another embodiment, healthcare managers and providers, such as caregivers, doctors, and genetic consultants, may also have access to at least one of the individual's GCI scores.
There may also be a basic registration mode. The base registry may provide a phenotype profile in which registered users may choose to apply all existing rules to their genomic profile, or to apply a subset of existing rules to their genomic profile. For example, they may choose to apply only rules for treatable (actionable) disease phenotypes. The base registrations may have different levels within the registration hierarchy. For example, the different levels may depend on the number of phenotypes registered users want to associate with their genomic profile, or on the number of people who can access their phenotypic profile. Another level of basic enrollment may incorporate individual-specific factors such as a phenotype that is already known (e.g., age, gender, or medical history) into their phenotype profile. Yet another level of basic enrollment may allow an individual to generate at least one GCI score for a disease or condition. A variant of this level may further allow an individual to specify that an automatic update of the at least one GCI score for a disease or condition be generated if any change in the at least one GCI score is due to a change in the analysis used to generate the at least one GCI score. In some implementations, the individual can be notified of the automatic update by email, voice message, text message, postal delivery, or facsimile.
Registered users may also generate reports with their phenotype profiles and information about the phenotypes (e.g., genetic and medical information about the phenotypes). For example, the prevalence of a phenotype in a population, genetic variants for association, molecular mechanisms that cause a phenotype, methods of treatment for a phenotype, treatment options for a phenotype, and prophylactic actions may be included in the report. In other embodiments, the report may also include information such as the similarity between the genotype of the individual and the genotypes of other individuals (e.g., celebrities or other known individuals). Information about similarity can be, but is not limited to, percent homology, number of identical variations, and possibly similar phenotypes. The reports may further include at least one GCI score.
If the report is accessed online, the report may also provide a link to other locations with further information about the phenotype, a link to an online support team and message board of people with the same phenotype or one or more similar phenotypes, a link to contact an online genetic advisor or physician, or a link to schedule a telephone call or live appointment with a genetic advisor or physician. If the report is in paper form, the information may be the location of the linked site or the telephone number and address of the genetic counselor or doctor. Registered users can also select which phenotypes to include in their phenotype profile and which information to include in their reports. The profile and report may also be made available to the individual's health care manager or provider, such as a caregiver, doctor, psychiatrist, psychologist, therapist, or genetic counselor. The registered user can also choose whether the form and report, or portions thereof, are available to the individual's healthcare manager or provider.
The present invention may also include a registered high level (premium level). The high-level of registration digitally maintains its genomic profile after the initial phenotypic profile and report are generated, and registered users can generate phenotypic profiles and reports with updated correlations from recent studies. In another embodiment, the registered users can generate risk profiles and reports using updated correlations from recent studies. As studies reveal new correlations between genotypes and phenotypes, diseases or conditions, new rules will be generated based on these new correlations and can be applied to genomic profiles that have been stored and maintained. The new rules may associate genotypes that have not been previously associated with any phenotype, associate genotypes with new phenotypes, correct existing correlations, or provide a basis for adjusting GCI scores based on associations between newly discovered genotypes and diseases or conditions. Registered users may be notified of the new correlations via email or other electronic means, and if a phenotype of interest, they may choose to update their phenotype profile with the new correlations. Registered users may select a registration mode that pays for each update, for multiple updates within a specified time period (e.g., 3 months, 6 months, or 1 year), or for unlimited updates. Another level of enrollment may be that, rather than an individual selecting when to update their phenotype profile or risk profile, the enrolled users automatically update their phenotype profile or risk profile whenever new rules are generated based on new correlations.
In another aspect of registration, a registered user may introduce the following services to a non-registered user: generating rules of correlation between phenotype and genotype, determining a genomic profile of the individual, applying the rules to the genomic profile, and generating a phenotypic profile of the individual. The registered user may be prompted by an introduction to a preferred service subscription price or to upgrade his existing registration. Individuals introduced may have free access or may enjoy discounted registration fees for a limited period of time.
Phenotype profiles and reports and risk profiles and reports may be generated for both human and non-human individuals. For example, the subject may include other mammals, such as cattle, horses, sheep, dogs, or cats. As used herein, a registered user is a human individual who subscribes to a service by purchasing or paying for one or more services. Services may include, but are not limited to, one or more of the following: determining a genomic profile of themselves or another individual (e.g., a registered user's child or pet); obtaining a phenotype spectrum; updating phenotypic profiles and obtaining reports based on their genomic and phenotypic profiles.
In another aspect of the invention, a "field-deployed" mechanism can be derived from an aggregation of individuals to generate a phenotypic profile of the individuals. In a preferred embodiment, the individual may have an initial phenotype profile generated based on genetic information. For example, an initial phenotype profile is generated that includes risk factors for different phenotypes and suggested therapeutic or prophylactic measures. For example, the phenotype profile may include information about available medications for a condition and/or recommendations for dietary changes or exercise regimens. Individuals may choose to see or contact a doctor or genetic counselor through a web portal or telephone to discuss their phenotype profile. The individual may decide to take a certain course of action, e.g. take a specific medication, change their diet, etc.
The individual may then subsequently submit a biological sample to assess changes in their physical state and possible changes in risk factors. The individual may determine the change by submitting the biological sample directly to an entity that generates the genomic profile and the phenotypic profile (or to a related entity, such as an entity contracted by the entity that generates the genetic profile and the phenotypic profile). Alternatively, the individual may utilize a "regional deployment" mechanism, wherein the individual may submit their saliva, blood, or other biological sample to a detection device at their home, analyzed by a third party, and the data transmitted for inclusion in another phenotype profile. For example, an individual may receive an initial phenotypic report based on their genetic data to report to an individual with an increased lifetime risk of Myocardial Infarction (MI). The report may also have recommendations for preventive measures to reduce the risk of MI, such as cholesterol lowering drugs and dietary changes. Individuals may choose to contact a genetic counselor or physician to discuss the reports and preventive measures and decide to change their diet. After taking a new diet for a period of time, the individual may visit their individual physician to measure their cholesterol level. New information (cholesterol levels) may be transmitted (e.g., via the Internet) to entities with genomic information and used to generate new phenotypic profiles of individuals, as well as new risk factors for myocardial infarction and/or other states.
Individuals may also use "area deployment" mechanisms or direct mechanisms to determine their individual response to a particular drug treatment. For example, an individual may measure their response to a drug, and this information may be used to determine a more effective treatment. Information that can be determined includes, but is not limited to, metabolite levels, glucose levels, ion levels (e.g., calcium, sodium, potassium, iron), vitamins, blood cell counts, Body Mass Index (BMI), protein levels, transcript levels, heart rate, etc., which can be determined by readily available methods and can be included in algorithms to determine a revised overall risk assessment score in conjunction with the initial genomic profile.
The term "biological sample" refers to any biological sample that can be isolated from an individual, including samples from which genetic material can be isolated. As used herein, "genetic sample" refers to DNA and/or RNA obtained from or derived from an individual.
As used herein, the term "genome" is intended to mean the entire set of chromosomal DNA found in the nucleus of a human cell. The term "genomic DNA" refers to one or more chromosomal DNA molecules, or a portion of a chromosomal DNA molecule, that naturally occurs in the nucleus of a human cell.
The term "genomic profile" refers to a set of information about an individual's genes, such as the presence or absence of a particular SNP or mutation. The genomic profile includes the genotype of the individual. The genomic profile may also be a substantially complete genomic sequence of an individual. In some embodiments, the genomic profile may be at least 60%, 80%, or 95% of the entire genomic sequence of the individual. The genomic profile may be about 100% of the entire genomic sequence of an individual. When referring to a genomic map, "a portion thereof" refers to a genomic map of a subset of the genomic map of the whole genome.
The term "genotype" refers to the specific genetic composition of an individual's DNA. The genotype may include genetic variants and genetic markers of the individual. Genetic markers and genetic variants may include nucleotide repeats, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations. Copy number variations may include microsatellite repeats, nucleotide repeats, centromeric repeats or telomeric repeats. The genotype may also be SNP, haplotype or diplotype (diplotype). Haplotypes may refer to loci or alleles. Haplotypes can also be referred to as a set of Single Nucleotide Polymorphisms (SNPs) on a single chromatid that are statistically correlated. Diplotypes are a set of haplotypes.
The term single nucleotide polymorphism, or "SNP," refers to a particular locus that exhibits a variation (e.g., at least 1 percentage point (1%)) on the chromosome relative to the identity of nitrogenous choline present at that locus in a human population. For example, in the case where one individual may have adenosine (a) at a particular nucleotide position of a given gene, another individual may have cytosine (C), guanine (G) or thymine (T) at that position, such that a SNP is present at that particular position.
As used herein, the term "SNP genomic profile" refers to the base content of a given individual's DNA at a SNP location of the entire individual's whole genomic DNA sequence. "SNP profile" refers to a complete genomic profile, or to a portion thereof, such as a more localized SNP profile that may be associated with a particular gene or a particular set of genes.
The term "phenotype" is used to describe a quantitative trait or characteristic of an individual. Phenotypes include, but are not limited to, medical and non-medical conditions. Medical conditions include diseases and disorders. Phenotypes may also include physical traits such as hair color, physiological traits such as lung capacity, mental traits such as memory retention, emotional traits such as anger control ability, ethnic characteristics such as ethnic background, familial characteristics such as the position of an individual's birth, and age characteristics such as age expectations or age of onset of different phenotypes. Phenotypes may also be monogenic, where it is believed that one gene may be associated with a phenotype; or polygenic, wherein more than one gene is associated with a phenotype.
"rules" are used to define the correlation between genotype and phenotype. The rules may define the relevance by a numerical value, such as by a percentage, risk factor, or confidence score. The rules may include correlations of multiple genotypes with phenotypes. A "rule set" includes more than one rule. A "new rule" may be a rule that indicates a correlation between a genotype and a phenotype for which the rule does not currently exist. The new rules may associate unassociated genotypes with phenotypes. The new rules may also associate genotypes that have been associated with a phenotype with a previously unassociated phenotype. The "new rule" may also be an existing rule that is modified by other factors, including another rule. Existing rules may be modified due to known characteristics of the individual, such as race, family, geography, gender, age, family history, or other previously determined phenotype.
As used herein, "genotype association" refers to the statistical association between individual genotypes (e.g., the presence of a mutation or mutations), and the likelihood that a phenotype (e.g., a particular disease, state, physical state, and/or mental state) is predisposed. The frequency with which a particular phenotype is observed in the presence of a particular genotype determines the degree of genotype correlation or the likelihood that a particular phenotype will occur. For example, as detailed herein, SNPs that result in the apolipoprotein E4 isoform are associated with the induction of early onset Alzheimer's disease. Genotype correlations may also refer to correlations or negative correlations in which a phenotype is not likely to result. Genotype correlations may also indicate an assessment that an individual has a phenotype or is predisposed to developing a phenotype. Genotype correlations can be represented by numerical values, such as percentages, relative risk factors, effect assessments, or confidence scores.
The term "phenotype profile" refers to a collection of multiple phenotypes associated with a genotype or genotypes of an individual. The phenotype profile may include information generated by applying one or more rules to the genomic profile or information about genotype correlations applied to the genomic profile. A phenotype profile may be generated by applying rules that relate multiple genotypes to a phenotype. The probability or the evaluation can be expressed as a numerical value, for example as a percentage, as a numerical risk factor or as a numerical confidence interval. The probability may also be expressed as high, medium, or low. The phenotype profile may also indicate the presence or risk of a phenotype. For example, the phenotype profile may indicate the presence of blue eyes or a high risk of developing diabetes. The phenotypic profile may also indicate a predicted prognosis, therapeutic effect, or response to treatment of the medical condition.
The term risk profile refers to a collection of GCI scores for more than one disease or condition. GCI scores are based on analysis of associations between an individual's genotype and one or more diseases or conditions. The risk profile may display GCI scores grouped by disease category. Further, the risk profile may show information on how to predict changes in GCI scores as the individual ages or as various risk factors are adjusted. For example, the GCI score for a particular disease may take into account changes in diet or the effects of precautions taken (smoking cessation, medication, bilateral radical mastectomy, hysterectomy). The GCI score may be displayed as a numerical metric, a graphical display, an auditory feedback, or a combination of any of the foregoing.
As used herein, the term "online portal" refers to a source of information that an individual conveniently accesses through a computer and Internet website, telephone, or other means that allows similar access to the information. The online portal may be an encrypted website. The website may provide links to other encrypted and unencrypted websites, such as links to encrypted websites having a phenotype profile of the individual or links to unencrypted websites (e.g., message boards of individuals sharing a particular phenotype).
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology, cell biology, biochemistry and immunology, which are within the skill of the art. These conventional techniques include nucleic acid isolation, polymer array synthesis, hybridization, ligation, and hybridization detection using labels. This invention illustrates a specific exemplification of suitable techniques and is given by reference. However, other equivalent conventional methods may also be used. Other conventional techniques and instructions for use can be found in the following standard laboratory manuals and literature: for example, genomic analysis: a Series of Laboratory manuals (volumes I-IV) (Genome Analysis: A Laboratory Manual Series (Vols. I-IV)), PCR primers: a Laboratory Manual (PCR Primer: A Laboratory Manual), molecular cloning method: a Laboratory Manual (Molecular Cloning: A Laboratory Manual) (all from Cold Spring Harbor Laboratory Press), Stryer, L. (1995) biochemistry (fourth edition) Freeman, New York, Gait, "oligonucleotide Synthesis: practical methods (Oligonucleotide Synthesis: a practical approach) "1984, IRL press, london, Nelson and Cox (2000), Lehninger, biochemical principles, third edition, w.h. freeman pub., new york, n.y.; and Berg et al (2002) biochemistry, fifth edition, w.h.freeman pub., new york, n.y., all of which are incorporated herein by reference in their entirety.
The methods of the invention include analyzing the genomic profile of an individual to provide the individual with molecular information about the phenotype. As detailed herein, an individual provides a genetic sample from which a personal genome map is generated. The data relating to genotype correlations of an individual's genomic profile is queried by comparing the genomic profile to a database of established and validated human genotype correlations. The database of established and validated genotype correlations can be from the literature of peer-reviewed and further reviewed and validated by a committee of one or more experts in the field, such as geneticists, epidemiologists or statisticians. In a preferred embodiment, rules are formulated based on validated genotype correlations and applied to the genomic profile of the individual to generate a phenotypic profile. The results of the analysis of the individual's genomic profile (the phenotypic profile) are provided to the individual or individual's healthcare manager along with explanatory and supportive information, thereby giving the ability to personalize the selection of the individual's healthcare.
The method of the invention is described in detail in FIG. 1, wherein a genomic map of an individual is first generated. The individual genomic profile will include information about the individual genes based on genetic variation and genetic markers. Genetic variation is a genotype, which constitutes a genomic map. Such genetic variations or genetic markers include, but are not limited to, single nucleotide polymorphisms, single and/or polynucleotide repeats, single and/or polynucleotide deletions, microsatellite repeats (typically a small number of nucleotide repeats having 5 to 1,000 repeat units), dinucleotide repeats, trinucleotide repeats, sequence rearrangements (including translocations and repeats), copy number variations (deletions and additions at specific loci), and the like. Other genetic variations include chromosomal repeats and translocations as well as centromeric and telomeric repeats.
Genotypes may also include haplotypes and diplotypes. In some embodiments, the genomic profile may have at least 100,000, 300,000, 500,000, or 1,000,000 genotypes. In some embodiments, the genomic profile may be substantially the entire genomic sequence of an individual. In other embodiments, the genomic profile is at least 60%, 80%, or 95% of the entire genomic sequence of the individual. The genomic profile may be about 100% of the entire genomic sequence of an individual. Genetic samples containing the target substance include, but are not limited to, unamplified genomic DNA or RNA samples or amplified DNA (or cDNA). The target substance may be a specific region of genomic DNA comprising a genetic marker of particular interest.
In step 102 of FIG. 1, a genetic sample of an individual is isolated from a biological sample of the individual. These biological samples include, but are not limited to, blood, hair, skin, saliva, semen, urine, fecal material, sweat, oral cavity (buccal), and various body tissues. In some embodiments, the tissue sample may be collected directly from the individual, e.g., the oral sample may be obtained by swabbing the inside of the cheek of the individual with a swab. Other samples such as saliva, semen, urine, fecal material, or sweat may also be provided by the individual himself. Other biological samples may be taken by a health care professional (e.g., a phlebotomist, nurse, or doctor). For example, a blood sample may be drawn from an individual by a nurse. Tissue biopsies can be performed by health care professionals, and health care professionals can also utilize the kit to efficiently obtain a sample. Small cylindrical skin samples may be removed or small tissue or fluid samples may be removed using a needle.
In some embodiments, a kit is provided to an individual having a sample collection container for a biological sample of the individual. The kit may also provide instructions for the individual to directly collect their own sample, such as how much hair, urine, sweat or saliva to provide. The kit may also include instructions for the individual to request that a tissue sample be extracted by a health care professional. The kit may include a location where the sample may be collected by a third party, for example, the kit may be provided to a healthcare facility where the sample is subsequently collected from the individual. The kit may also provide a return package for delivering the sample to a sample processing facility where the genetic material is isolated from the biological sample (step 104).
Genetic samples of DNA or RNA can be isolated from biological samples according to any of several known biochemical and molecular biological methods, see, e.g., Sambrook et al, molecular cloning: a laboratory Manual (Molecular Cloning: A laboratory Manual) (Cold spring harbor laboratory, N.Y.) (1989). There are also several commercially available kits and reagents for isolating DNA or RNA from biological samples, such as those available from DNAGenotek, Gentra Systems, Qiagen, Ambion, and other suppliers. Oral sample kits are readily commercially available, e.g., MasterAmp from Epicentre BiotechnologiesTMBuccal Swab DNA extraction kit, as well as kits for extracting DNA from blood samples, e.g., Extract-N-Amp from SigmaAldrichTM. DNA derived from other tissues can be obtained by digesting the tissues with protease and performing heat treatment, centrifuging the sample and extracting unnecessary substances using phenol-chloroform, leaving the DNA in the aqueous phase. The DNA may then be further isolated by ethanol precipitation.
In a preferred embodiment, genomic DNA is isolated from saliva. For example, using DNA self-collection kit technology available from DNA Genotek, individuals collect saliva samples for clinical processing. The samples can be conveniently stored and transported at room temperature. After the sample is delivered to the appropriate laboratory for processing, the DNA is isolated by heat denaturation and protease digestion of the sample (typically at least 1 hour at 50 ℃ using reagents supplied by the collection kit supplier). The sample was then centrifuged and the supernatant was subjected to ethanol precipitation. The DNA pellet is suspended in a buffer suitable for subsequent analysis.
In another embodimentRNA can be used as a genetic sample. In particular, genetic variations in expression can be identified from mRNA. The term "messenger RNA" or "mRNA" includes, but is not limited to, pre-mRNA transcripts, transcript processing intermediates, mature mRNA prepared for translation and transcription of a gene or genes, or nucleic acid derived from mRNA transcripts. Transcript processing may include splicing, editing, and degradation. As used herein, a nucleic acid derived from an mRNA transcript refers to a nucleic acid whose mRNA transcript or subsequence thereof ultimately serves as a template for its synthesis. Thus, cDNA reverse transcribed from mRNA, DNA amplified from cDNA, RNA transcribed from amplified DNA, and the like are all derived from mRNA transcripts. RNA can be isolated from any of several body tissues using methods known in the art, e.g., using PAXgene obtained from PreAnalytiXTMBlood RNA System RNA was isolated from unfractionated whole blood. Typically, mRNA will be used to reverse transcribe cDNA which is then used or amplified for gene variation analysis.
Prior to genomic profiling, genetic samples are typically amplified from cDNA reverse transcribed from DNA or RNA. DNA can be amplified by a variety of methods, many of which use PCR. See, for example, PCR techniques: DNA Amplification mechanism and Applications (PCRTechnology: Principles and Applications for DNA Amplification) (Ed.H.A.Erlich, Freeman Press, NY, N.Y., 1992); PCR protocol: methods and application guidelines (PCR Protocols: A Guide to Methods and Applications) (eds. Innis et al, Academic Press, San Diego, Calif., 1990); mattila et al, nucleic acids Res.19, 4967 (1991); eckert et al, PCR methods and Applications (PCRmethods and Applications)1, 17 (1991); PCR (eds. mcpherson et al, IRL Press, Oxford); and U.S. Pat. nos. 4,683,202, 4,683,195, 4,800,159, 4,965,188, and 5,333,675, each of which is incorporated herein by reference in its entirety.
Other suitable amplification methods include Ligase Chain Reaction (LCR) (e.g., Wu and Wallace, genomics, 4, 560(1989), Landegren et al, science, 241, 1077(1988) and Barringer et al, Gene, 89: 117(1990)), transcriptional amplification (Kwoh et al, Proc. Natl.Acad. Sci. USA 86: 1173-1177(1989) and WO88/10315), autonomous sequence replication (Guateli et al, Proc. Nat.Acad. Sci.USA, 87: 1874-1878(1990) and WO90/06995), selective amplification of target polynucleotide sequences (U.S. Pat. No. 6,410,276), consensus primer polymerase chain reaction (CP-PCR) (U.S. Pat. No. 4,437,975), random primer polymerase chain reaction (AP-PCR) (U.S. Pat. 5,413,909, U.S. No. 2), nucleic acid sequence based amplification (RCA-loop), amplification of nucleic acid sequences (RCA-PCR) (U.S. Pat. No. 64), and amplification of multiple amplification loops (RCA-amplification of NAc. Pat. 3, NAcarrier amplification loop) (amplification of nucleic acid sequences (RCA-PCR) (U.S. No. 3, amplification of nucleic acid sequences of amplification of PCR) (U.S. 3, amplification of DNA, amplification of multiple (C2CA) (Dahl et al, Proc. Natl. Acad. Sci 101: 4548-4553 (2004)). (see U.S. patent nos. 5,409,818, 5,554,517, and 6,063,603, each of which is incorporated herein by reference). Other amplification methods that may be used are described in U.S. Pat. Nos. 5,242,794, 5,494,810, 5,409,818, 4,988,617, 6,063,603, and 5,554,517, and U.S. patent application No. 09/854,317, each of which is incorporated herein by reference.
The generation of the genomic map of step 106 is accomplished using any of several methods. Several methods are known in the art to identify genetic variations, and these include, but are not limited to, DNA sequencing by any of several methods, PCR-based methods, fragment length polymorphism analysis (restriction fragment length polymorphism (RFLP), Cleavage Fragment Length Polymorphism (CFLP)), hybridization methods using allele-specific oligonucleotides as templates (e.g., TaqMan PCR method, invader method (invader method), DNA chip method), methods using primer extension reactions, mass spectrometry (MALDI-TOF/MS method), and the like.
In one embodiment, high density DNA arrays are used for SNP identification and profiling. These arrays are commercially available from Affymetrix and Illumina (see Affymetrix GeneChip)500K Assay Manual, Affymetrix, Santa Clara, CA (incorporated by reference); sentrixhumanHap650Y genotyping bead chip (genotyping bead), Illumina, San Diego, CA).
For example, SNP profiles can be generated by genotyping SNPs over 900,000 using Affymetrix Genome Wide Human SNP Array 6.0. Alternatively, more than 500,000 SNPs analyzed by complete genome sampling can be determined by using Affymetrix GeneChip Human Mapping 500K Array Set. In these assays, a subset of the human genome is amplified by a single primer amplification reaction using restriction enzyme digested, adaptor ligated human genomic DNA. As shown in fig. 2, the concentration of the ligated DNA can then be determined. The amplified DNA is then fragmented and the mass of the sample is determined before proceeding to step 106. If the sample meets the PCR and fragmentation criteria, the sample is denatured, labeled and then hybridized to a microarray consisting of small DNA probes at specific locations on the coated quartz face. The amount of label hybridized to each probe as a function of the amplified DNA sequence is monitored to generate sequence information and ultimately SNP genotyping.
The use of Affymetrix GeneChip 500K Assay was performed according to the manufacturer's instructions. Briefly, isolated genomic DNA was first digested with NspI or StyI restriction endonucleases. The digested DNA is then ligated to NspI or StyI adaptor oligonucleotides that anneal to NspI or StyI restriction enzyme DNA, respectively. The ligated adaptor-containing DNA is then amplified by PCR to produce amplified DNA fragments of between about 200 to 1100 base pairs, as confirmed by gel electrophoresis. PCR products that meet amplification criteria are purified and quantified for fragmentation. The PCR product was fragmented with DNase I to achieve optimal DNA chip hybridization. After fragmentation, the DNA fragments should be less than 250 base pairs and on average 180 base pairs, as confirmed by gel electrophoresis. Samples meeting the fragmentation criteria were then labeled with a biotin compound using terminal deoxynucleotidyl transferase. The labeled fragments were then denatured and then hybridized to a GeneChip 250K array. After hybridization, the array was stained in a three-step process prior to scanning, consisting of the following steps: streptavidin phycoerythrin (SAPE) staining was followed by an antibody amplification step with biotinylated anti-streptavidin antibody (goat) and a final staining with streptavidin phycoerythrin (SAPE). After labeling, the array is covered with array holding buffer and then scanned with a Scanner such as the Affymetrix GeneChip Scanner 3000.
After Affymetrix GeneChip Human Mapping 500K Array Set scan, data analysis was performed according to the manufacturer's instructions, as shown in FIG. 3. Briefly, raw data was obtained using GeneChip operating software (GCOS). It can also be achieved by using Affymetrix GeneChip Command ConsoleTMData are obtained. Initial data were obtained and analyzed using GeneChip genotyping analysis software (GTYPE). For the purposes of the present invention, samples with a GTYPE modulation rate (call rate) of less than 80% were excluded. The samples were then examined using BRLMM and/or SNiPer algorithm analysis. And excluding samples with BRLMM calling rate less than 95% or SNiPer calling rate less than 98%. Finally, correlation analysis was performed and samples with SNiPer mass index less than 0.45 and/or Hardy-Weinberg p-value less than 0.00001 were excluded.
Alternatively or in addition to DNA microarray analysis, genetic variations, such as SNPs and mutations, can be detected by DNA sequencing. DNA sequencing may also be used to sequence a substantial portion or all of an individual's genomic sequence. In general, DNA sequencing is commonly used based on polyacrylamide gel fractionation to resolve populations of chain end fragments (Sanger et al, Proc. Natl. Acad. Sci. USA 74: 5463-5467 (1977)). Alternative methods that have been developed and continue to be developed improve the speed and simplicity of DNA sequencing. For example, high throughput and single molecule sequencing platforms are commercially available from, or are being developed by, 454Life Sciences (Branford, CT) (Margulies et al, Nature, (2005) 437: 376-.
After the genome map of the individual is generated in step 106, the map is stored digitally in step 108, which may be stored digitally in an encrypted manner. The genomic profile is encoded in a computer-readable format for storage as part of a data set, and may be stored as a database, where the genomic profile may be "deposited" and can be accessed again at a later time. The data set includes a plurality of data points, where each data point relates to an individual. Each data point may have a plurality of data elements. One data element is a unique identifier that is used to identify the genomic map of an individual. It may also be a bar code. Another data element is genotype information, such as a SNP or a nucleotide sequence of the genome of the individual. Data elements corresponding to genotype information can also be included in the data points. For example, if the genotype information includes SNPs identified by microarray analysis, other data elements may include microarray SNP identification numbers, SNPrs numbers, and polymorphic nucleotides (polymorphic nucleotides). Other data elements may be the chromosomal location of the genotype information, quality measures of the data, raw data files, data images, and extraction intensity scores.
Individual-specific factors, such as physical data, medical data, race, family, geography, gender, age, family history, known phenotypes, demographic data, exposure data (exposuredata), lifestyle data, behavioral data, and other known phenotypes, may also be included as data elements. For example, these factors may include, but are not limited to, the individual's: a place of birth, a parent and/or grandparent, a family of relatives, a place of residence of a ancestor, environmental conditions, known health conditions, known drug interactions, home hygiene conditions, lifestyle conditions, diet, exercise habits, marital status, and physical measurement data (e.g., weight, height, cholesterol level, heart rate, blood pressure, glucose level, and other measurement data known in the art). The above factors of the individual's relatives or ancestors (e.g., parents and grandparents) may also be introduced as data elements and used to determine the risk of the individual's phenotype or status.
The specific factors may be obtained from a questionnaire or from an individual's healthcare manager. Information from the map of "savings" can then be accessed and used as needed. For example, in an initial assessment of genotype correlations of an individual, the entire information of the individual (typically SNPs or other genomic sequences across or taken from the entire genome) will be analyzed for determining genotype correlations. In subsequent analyses, all or a portion of the information from the stored or deposited genome map may be accessed as needed or appropriate.
Comparison of genomic profiles with genotype-associated databases
In step 110, genotype correlations are obtained from the scientific literature. The genotypic relevance of a genetic variation is determined from analysis of a population of individuals who have been tested for the presence or absence of one or more phenotypic traits of interest and for genotypic profiling. Alleles of each genetic variation or polymorphism in the genotype spectrum are then tested to determine whether the presence of a particular allele is associated with the trait of interest. Correlation analysis can be performed by standard statistical methods and statistically significant correlations between genetic variation and phenotypic characteristics are recorded. For example, it may be determined that the presence of allele A1 of polymorphism A is associated with heart disease. As a further example, it may be found that the presence of a combination of allele A1 at polymorphism A and allele B1 at polymorphism B is associated with an increased risk of cancer. The results of the analysis can be published in peer review literature, confirmed by other research groups, and/or analyzed by expert committees (e.g., geneticists, statisticians, epidemiologists, and doctors), and can also be validated.
Examples of correlations between genotypes and phenotypes are shown in fig. 4,5 and 6, where rules between genotypes and phenotypes applied to the genomic profile are based on these correlations. For example, in fig. 4A and B, each row corresponds to a phenotype/locus/race, with fig. 4C through I including further information on the relevance of each of these rows. By way of example, the "phenotype name abbreviation" of BC in fig. 4A is an abbreviation for breast cancer as noted in the index for the phenotype name abbreviation of fig. 4M. In the row BC _4 (which is the class name of the locus), gene LSP1 is associated with breast cancer. As shown in fig. 4C, the published or functional SNP confirmed for this association is rs3817198, while the disclosed risk allele is C and the non-risk allele is T. The disclosed SNPs and alleles are identified by publications (e.g., the basic publications in fig. 4E-G). In the example of LSP1 of fig. 4E, the basic publication is Easton et al, nature 447: 713-720(2007). Fig. 22 and 25 further list the correlations. The correlations in figures 22 and 25 can be used to calculate an individual's risk for a state or phenotype, e.g., calculating a GCI or GCI Plus score. The GCI or GCI Plus score may also introduce information such as popularity of the status, as in fig. 23.
Alternatively, correlations may be formed from stored genomic profiles. For example, individuals with stored genomic profiles may also have stored known phenotypic information. Analysis of the stored genomic profile and known phenotypes can form genotype correlations. As an example, 250 individuals with stored genomic profiles also have stored information previously diagnosed as having diabetes. Their genomic profile was analyzed and compared to a control group of non-diabetic individuals. Individuals previously diagnosed with diabetes are then determined to have a higher rate of a particular genetic variant than the control group, and thus a genotype correlation can be made between the particular genetic variant and diabetes.
In step 112, rules are formed based on the association between the confirmed genetic variants and the particular phenotype. Rules may be generated, for example, based on the correlated genotypes and phenotypes listed in table 1. The rules based on relevance may introduce other factors, such as gender (e.g., fig. 4) or ethnicity (fig. 4 and 5) to produce an effect evaluation as in fig. 4 and 5. Other metrics produced by the rules may evaluate relative risk increase as in fig. 6. The relative risk increase for effect assessment and estimation can be from or calculated from published literature. Alternatively, the rules may be based on correlations generated from stored genomic profiles and previously known phenotypes. In some embodiments, the rules may be based on the correlations in fig. 22 and 25.
In a preferred embodiment, the genetic variant is a SNP. Although SNPs occur at single sites, individuals carrying a particular SNP allele at one site are generally predictable to carry a particular SNP allele at other sites. The association of a SNP with an allele that predisposes an individual to a disease or condition is produced by linkage disequilibrium (linkagedisequilibrium), in which the frequency of nonrandom associations occurring between alleles at two or more loci in a population is greater than or less than that expected by random formation of recombinations.
Other genetic markers or variants (e.g., nucleotide repeats or insertions) may also be in linkage disequilibrium with genetic markers that have been shown to be associated with a particular phenotype. For example, nucleotide insertions are associated with a phenotype, and SNPs are in linkage disequilibrium with nucleotide insertions. Rules are formed based on the association between SNPs and phenotypes. Rules based on the correlation between nucleotide insertions and phenotypes can also be developed. Either rule or both rules may be applied to the genomic map, as the presence of one SNP may give a certain risk factor and the other rule may give another risk factor, and when they are combined, the risk may be increased.
Through linkage disequilibrium, disease-prone alleles co-segregate with specific alleles of SNPs or combinations of specific alleles of SNPs (cosegregates). The particular combination of SNP alleles along a chromosome is called a haplotype, and the region of DNA in which they are combined may be called a haplotype block. Although a haplotype block may consist of one SNP, a typical haplotype block represents a series of 2 or more contiguous SNPs that exhibit low haplotype diversity between individuals and generally have a low recombination frequency. Identification of the haplotype can be performed by identifying one or more SNPs located in the haplotype block. Thus, in general, SNP profiling can be used to identify a haplotype block rather than having to identify all SNPs in a given haplotype block.
Genotypic correlations between SNP haplotype patterns and disease, status or physical state are becoming increasingly known. For a given disease, the haplotype patterns of a group of people known to have the disease are compared to a group of people without the disease. By analyzing many individuals, the frequency of polymorphisms in a population can be determined, and these frequencies or genotypes can then be correlated with a particular phenotype (e.g., disease or condition). Examples of known SNP-disease associations include complement factor H polymorphisms in age-related macular degeneration (Klein et al, science, 308: 385-INSIG2A variant of the gene (Herbert et al, science, 312: 279-283 (2006)). Other known SNP associations include, for example, polymorphisms in the 9p21 region including CDKN2A and B (e.g., rs10757274, rs2383206, rs13333040, rs2383207, and rs10116277 associated with myocardial infarction (Helgadottir et al, science, 316: 1491-.
SNPs may be functional or non-functional. For example, functional SNPs have an effect on cellular function, resulting in a phenotype, whereas non-functional SNPs are functionally silent, but may be in linkage disequilibrium with functional SNPs. SNPs may also be synonymous or non-synonymous. Synonymous SNPs are SNPs in which the different forms result in the same polypeptide sequence, and are non-functional SNPs. If a SNP results in different polypeptides, the SNP is non-synonymous and may be functional or non-functional. SNPs or other genetic markers used to identify haplotypes in a diplotype (which is 2 or more haplotypes) may also be used to associate phenotypes associated with the diplotype. Information about the haplotype, diplotype, and SNP profile of an individual may be in the genomic map of the individual.
In a preferred embodiment, for a rule generated based on a genetic marker that is linked in linkage disequilibrium with another genetic marker associated with a phenotype, the genetic marker may have an r2 or D' score greater than 0.5, which is commonly used in the art to determine linkage disequilibrium. In preferred embodiments, the score is greater than 0.6, 0.7, 0.8, 0.90, 0.95, or 0.99. As a result, in the present invention, the genetic markers used to associate a phenotype with an individual's genomic profile may be the same or different from functional or published SNPs associated with the phenotype. For example, using BC _4, the test SNP and the disclosed SNP are the same, just as the risk and non-risk alleles tested are the same as the disclosed risk and non-risk alleles (fig. 4A and C). However, for BC _5, CASP8 and their association with breast cancer, the test SNPs differ from their functional or published SNPs just as the risk and non-risk alleles tested were for the published risk and non-risk alleles. The tested and disclosed alleles are oriented with respect to the positive strand of the genome and from these columns, homozygous risk or non-risk genotypes can be inferred, which can generate rules for the genome map of individuals, e.g., registered users. In some embodiments, instead of identifying a test SNP, an allelic difference or SNP may be identified based on another analytical method (e.g., TaqMan) using published SNP information. For example, AMD _5 in fig. 25A, discloses the SNP rs1061170, but no test SNP was identified. Test SNPs can be identified by LD analysis of the disclosed SNPs. Alternatively, rather than using a test SNP, the genome of an individual having the test SNP can be evaluated using TaqMan or other equivalent assay methods.
The test SNPs may be "DIRECT" or "TAG (TAG)" SNPs (fig. 4E-G, fig. 5). A direct SNP is the same test SNP as a published or functional SNP, e.g., for BC _ 4. Using the European and Asian SNP rs1073640, a direct SNP can also be used for FGFR2 association of breast cancer, with the minor allele being A and the other allele being G (Easton et al, Nature 447: 1087-. Another published or functional SNP that is also an FGFR2 association with breast cancer in Europe and Asian is rs1219648(Hunter et al, nat. Genet.39: 870-874 (2007)). A tag SNP is a test SNP that is different from a functional or public SNP, such as BC _ 5. Tagging SNPs may also be used for other genetic variants, e.g. for CAMTA1(rs4908449), 9p21(rs10757274, rs2383206, rs13333040, rs2383207, rs10116277), COL1a1(rs1800012), FVL (rs6025), HLA-DQA1(rs 498888889, rs2588331), eNOS (rs1799983), MTHFR (rs1801133) and APC (rs 28933380).
A database of SNPs is publicly available from: for example, International HapMap Project (see www.hapmap.org, The International HapMap Consortium, Nature, 426: 789-. These databases provide, or enable the determination of, SNP haplotype patterns. Thus, these SNP databases enable the detection of genetic risk factors underlying a wide range of diseases and conditions (e.g., cancer, inflammatory diseases, cardiovascular diseases, neurodegenerative diseases, and infectious diseases). These diseases or conditions may be disposable, where methods of treatment and therapy currently exist. Treatment may include prophylactic treatment and treatment to improve symptoms and conditions, including lifestyle changes.
Many other phenotypes can also be detected, such as physical traits, physiological traits, mental traits, emotional traits, race, family, and age. Physical traits may include height, hair color, eye color, body, or traits such as energy, endurance, and agility. The mental traits may include intelligence, memory, or learning. Ethnicity and pedigree may include the identification of pedigree or ethnicity, or where the ancestry of the individual originated. The age may be the actual age of the individual determined or the age at which the genetic characteristics of the individual are such that they are relative to the total population. For example, an individual is 38 years old in nature, but its genetic characteristics may determine that its memory or physical health status may be 28 years old on average. The additional age trait may be the predicted lifespan of the individual.
Other phenotypes may also include non-medical conditions, such as "entertainment" phenotypes. These phenotypes may include comparisons with known individuals, e.g., foreign nobody, politician, celebrity, inventor, athlete, musician, artist, business, and notorious individuals (e.g., criminals). Other "recreational" phenotypes may include comparison with other organisms, such as bacteria, insects, plants, or non-human animals. For example, an individual may be interested in seeing how their genomic profile compares to the genomic profile of their pet dog or president.
In step 114, the rules are applied to the stored genomic profile to generate the phenotypic profile of step 116. For example, the information in fig. 4,5 or 6 may form the basis of a rule or test to be applied to the genomic profile of an individual. The rules may include the information in fig. 4 for the test SNPs and alleles and the assessment of effects, where UNITS for the assessment of effects is the unit of the assessment of effects, e.g., OR odds ratio (95% confidence interval) OR mean. In a preferred embodiment the evaluation of the effect may be a genotypic risk (FIGS. 4C-G), such as risk for homozygotes (homoz or RR), risk heterozygotes (heteroz or RN) and non-risk homozygotes (homoz or NN). In other embodiments, the effect evaluation may be carrier risk (carrierisk), which is RR or RN to NN. In still further embodiments, the assessment of effect may be based on allele, allele risk, e.g., R versus N. Here too, there are two loci (FIG. 4J) or three loci (FIG. 4K) of genotype effect evaluation (e.g., 9 possible genotype combinations for two locus effect evaluation: RRRR, RRNN, etc.). The frequency of the test SNPs in the public HapMap is also recorded in fig. 4H and I.
In other embodiments, the information from fig. 21, 22, 23, and/or 25 can be used to generate information to apply to a genomic profile of an individual. For example, the information can be used to generate a GCI or GCI Plus score for the individual (e.g., fig. 19). The score can be used to generate information of genetic risk (e.g., estimated lifetime risk) for one or more states in a phenotypic profile of an individual (e.g., fig. 15). The method allows calculation of an estimated lifetime risk or relative risk for one or more phenotypes or states as listed in fig. 22 or 25. The risk of a single state may be based on one or more SNPs. For example, the estimated risk for a phenotype or state may be based on at least 2, 3, 4,5, 6, 7, 8, 9, 10, 11, or 12 SNPs, wherein the SNP used to estimate risk may be a public SNP, a test SNP, or both (e.g., fig. 25).
The estimated risk for the state may be based on the SNPs listed in fig. 22 or 25. In some embodiments, the risk of a condition may be based on at least one SNP. For example, an individual's assessment of risk for Alzheimer's Disease (AD), colorectal cancer (CRC), Osteoarthritis (OA), or exfoliative glaucoma (XFG) may be based on 1 SNP (e.g., rs4420638 for AD, rs 69883267 for CRC, rs4911178 for OA, and rs2165241 for XFG). For other states, such as obesity (BMIOB), graves' disease (GD), or Hemochromatosis (HEM), the estimated risk of an individual may be based on at least 1 or 2 SNPs (e.g., rs9939609 and/or rs9291171 for BMIOB; DRB1 0301DQA1 and/or rs3087243 for GD; rs 0501800562 and/or rs129128 for HEM). For states such as, but not limited to, Myocardial Infarction (MI), Multiple Sclerosis (MS) or Psoriasis (PS), 1,2 or 3 SNPs may be used to assess an individual's risk for these states (e.g., rs1866389, rs1333049 and/or rs6922269 for MI; rs6897932, rs12722489 and/or DRB1 x 1501 for MS; rs6859018, rs11209026 and/or HLAC 0602 for PS). To assess the individual risk of Restless Legs Syndrome (RLS) or celiac disease (CelD), 1,2, 3 or 4 SNPs may be used (e.g., rs6904723, rs2300478, rs1026732 and/or rs9296249 for RLS; rs6840978, rs11571315, rs2187668 and/or DQA1 x 0301 DQB1 x 0302 for CelD). For Prostate Cancer (PC) or lupus (SLE), 1,2, 3, 4 or 5 SNPs may be used to assess an individual's risk for PC or SLE (e.g., rs4242384, rs 69883267, rs 169901979, rs17765344 and/or rs4430796 for PC, rs12531711, rs10954213, rs2004640, DRB1 0301 and/or DRB1 1501 for SLE). To assess the lifetime risk of an individual for macular degeneration (AMD) or Rheumatoid Arthritis (RA), 1,2, 3, 4,5 or 6 SNPs may be used (e.g. rs10737680, rs10490924, rs541862, rs2230199, rs1061170, and/or rs9332739 for AMD, rs6679677, rs11203367, rs6457617, DRB 0101, DRB1 × 1, and/or DRB 04084 × 0404 for RA). To assess the lifetime risk of an individual with Breast Cancer (BC), 1,2, 3, 4,5, 6 or 7 SNPs may be used (e.g., rs3803662, rs2981582, rs4700485, rs3817198, rs17468277, rs6721996 and/or rs 3803662). To assess the lifetime risk of an individual with Crohn's Disease (CD) or type 2 diabetes (T2D), 1,2, 3, 4,5, 6, 7, 8, 9, 10 or 11 SNPs may be used (e.g., rs2066845, rs5743293, rs10883365, rs17234657, rs10210302, rs9858542, rs11805303, rs1000113, rs17221417, rs2542151 and/or rs10761659 for CD; rs13266634, rs4506565, rs10012946, rs 10006992, rs10811661, rs 77512288738, rs8050136, rs1111875, rs4402960, rs5215 and/or rs1801282 for T2D). In some embodiments, the SNP used as a basis for risk determination may form a linkage disequilibrium with a SNP described above or listed in fig. 22 or 25.
The phenotype profile of an individual may include a number of phenotypes. In particular, assessing a patient's risk of having a disease or other condition (e.g., likely drug response, including metabolism, efficacy, and/or safety) by the methods of the invention enables prognostic or diagnostic analysis of susceptibility to a variety of unrelated diseases and conditions, whether in symptomatic, presymptomatic, or asymptomatic individuals, including carriers of one or more disease/condition-susceptible alleles. Thus, these methods provide an overall assessment of individual susceptibility to a disease or condition without the need to pre-envisage testing for any particular disease or condition. For example, the methods of the invention enable the assessment of individual susceptibility for any of a variety of conditions listed in table 1, fig. 4,5 or 6 based on individual genomic profiles. Moreover, these methods allow individuals evaluating one or more phenotypes or states to estimate a lifetime risk or relative risk, such as those phenotypes in fig. 22 or 25.
The evaluation preferably provides information about 2 or more of these states, and more preferably 3, 4,5, 10, 20, 50, 100 or even more of these states. In a preferred embodiment, at least 20 rules are applied to the genomic profile of an individual to obtain a phenotypic profile. In other embodiments, at least 50 rules are applied to the genomic profile of the individual. A single rule of phenotype may be applied to a single gene phenotype. More than one rule may also be used for a single phenotype, such as a multi-gene phenotype or a single-gene phenotype where multiple genetic variants in a single gene affect the probability of the phenotype appearing.
After an initial scan of the genomic profile of an individual patient, updates to the individual genotype correlations are made (or employed) by comparison to additional nucleotide variants (e.g., SNPs) when these additional nucleotide variants are known. For example, step 110 can be performed periodically, e.g., daily, weekly, or monthly, by one or more of ordinary skill in the art of genetics who search the scientific literature for new genotype correlations. The new genotype correlations may then be further confirmed by a committee of one or more experts in the field. Step 112 may then be periodically updated with new rules based on the new validated dependencies.
The new rule may include genotypes or phenotypes outside of the existing rules. For example, genotypes not associated with any phenotype are found to be associated with a new or existing phenotype. The new rules may also be used for the correlation between previous non-genotypes and their associated phenotypes. The new rules may also be determined for genotypes and phenotypes that already have existing rules. For example, there are rules based on the correlation between genotype a and phenotype a. New studies revealed that genotype B is associated with phenotype a, thus creating new rules based on this association. Another example is the discovery that phenotype B is associated with genotype A and new rules are developed accordingly.
Rules can be formulated when finding correlations based on what is known but not initially confirmed in the published scientific literature. For example, it may be reported that genotype C is associated with phenotype C. Additional publications report that genotype D is associated with phenotype D. Phenotypes C and D are associated symptoms, e.g., phenotype C may be tachypnea, while phenotype D is a smaller lung volume. The association between genotype C and phenotype D or between genotype D and phenotype C can be found and confirmed by statistical methods using the existing stored genomic profiles of individuals with genotypes C and D and phenotypes C and D, or by further study. New rules may then be generated based on the newly discovered and confirmed correlations. In another embodiment, stored genotype profiles for multiple individuals with specific or related phenotypes can be studied to determine genotypes common to these individuals and determine correlations. New rules may be generated based on this correlation.
Rules may also be formulated to modify existing rules. For example, the correlation between genotype and phenotype may be determined in part by known individual characteristics, such as race, family, geography, gender, age, family history, or any other known phenotype of the individual. Rules based on these known individual characteristics may be formulated and incorporated into existing rules to provide revised rules. The choice of rules to apply the correction will depend on the particular individual factors of the individual. For example, the rule may be based on a 35% probability that an individual has phenotype E when the individual has genotype E. However, if the individual is of a particular ethnicity, the probability is 5%. New rules may be formulated based on this result and applied to individuals with the particular ethnic characteristics. Alternatively, an existing rule may be applied that determines a value of 35% and then another rule based on the ethnic characteristics of the phenotype. Rules based on known individual characteristics can be determined from the scientific literature or based on studies on stored genomic profiles. As new rules are generated, they may be added and applied to the genomic map in step 114, or they may be applied periodically, for example at least once a year.
Information on individual risk of disease can also be expanded with the technological advances in higher resolution SNP genomic maps. As described above, the initial SNP genomic profile can be easily generated using microarray technology for scanning 500,000 SNPs. Given the case of the haplotype block, this number can be used for a typical profile of all SNPs in the genome of an individual. Nonetheless, it is estimated that about 1000 ten thousand SNPs (the International HapMap Project; www.hapmap.org) typically occur in the human genome. With technological advances in practical and economic interpretation of SNPs (e.g., microarrays of 1,000,000, 1,500,000, 2,000,000, 3,000,000 or more SNPs) or whole genome sequencing at higher levels of detail, more detailed SNP genomic profiles can be generated. Likewise, advances in technology through computational analysis methods will enable more elaborate economic analysis of SNP genomic profiles and updating of SNP-disease association master databases.
After generating the phenotype profile at step 116, the registered users or their healthcare managers may access their genomic or phenotype profiles through an online portal or website as in step 118. Reports including phenotype profiles and other information about phenotype profiles and genomic profiles may also be provided to registered users or their healthcare managers, as described in steps 120 and 122. The report may be printed out, stored in a registered user's computer, or viewed online.
FIG. 7 illustrates an example online report. The registered user may choose to display a single phenotype or more than one phenotype. The registered users may also have different View options, for example, a "Quick View" option as shown in FIG. 7. The phenotype may be a medical condition and the different treatments and symptoms in the quick report may be linked to other web pages containing further information about the treatment. For example, by clicking on a medication, a website may be directed that includes information about dosage, cost, side effects, and efficacy. The drug may also be compared to other treatments. The website may also include a link to the website of the pharmaceutical manufacturer. Another link may provide the registered user with the option of generating a pharmacogenomic (pharmacogenomic) map, which will include information on their likely response to the drug based on their genomic map. Links to alternatives to medication may also be provided, such as preventive behavior (e.g., fitness and weight loss); and may also provide links to dietary supplements, dietary plans, and links to nearby health clubs, health clinics, health and rehabilitation providers, metropolitan spa (day spa), and the like. Educational and informative videos, summaries of available treatments, possible therapies, and general advice may also be provided.
The online report may also provide a link to schedule individual doctors or genetic counseling appointments or to access an online genetic counselor or doctor, thereby providing the registered user with the opportunity to query more information about their phenotype profile. Links to online genetic consultation and physician queries may also be provided on the online report.
Reports may also be viewed in other forms, such as a composite view of a single phenotype, where more detail is provided for each category. For example, there may be more detailed statistics regarding the likelihood of a phenotype occurring for registered users; more information about typical symptoms or phenotypes, such as a range of symptoms representative of a medical condition or a physical non-medical condition (e.g., height); or more information about genes and genetic variants, such as population prevalence, e.g., in the world or in different countries, or in different age ranges or genders. For example, fig. 15 shows a summary of estimated lifetime risks for a number of states. The individual may view more information about a particular condition, such as prostate cancer (figure 16) or crohn's disease (figure 17).
In another embodiment, the report may be of an "entertaining" phenotype, e.g., the similarity of an individual's genomic profile to that of a known individual (e.g., albert einstein). The report can show the percent similarity between the individual genomic profile and the individual genomic profile of einstein, and can further show the predicted IQ of einstein and the predicted IQ of the individual. Further information may include the genomic profile of the total population and how its IQ is compared to the genomic profile and IQ of the individual and einstein.
In another embodiment, the report may display all phenotypes that have been associated with the genomic profile of the registered user. In other embodiments, the report may only show a phenotype determined to be positively correlated with the genomic profile of the individual. Individuals may select particular sub-classes that display phenotypes in other forms, such as medical-only phenotypes or disposable medical phenotypes only. For example, the disposable phenotypes and their associated genotypes may include crohn's disease (associated with IL23R and CARD 15), type 1 diabetes (associated with HLA-DR/DQ), lupus (associated with HLA-DRB1), psoriasis (HLA-C), multiple sclerosis (HLA-DQA1), graves' disease (HLA-DRB1), rheumatoid arthritis (HLA-DRB1), type 2 diabetes (TCF7L2), breast cancer (BRCA2), colon cancer (APC), situational memory (KIBRA), and osteoporosis (COL1a 1). Individuals may also select sub-classes that show phenotypes in the report, e.g., inflammatory diseases of medical conditions only or physical traits of non-medical conditions only. In some embodiments, an individual may choose to display all of the states for which an estimated risk is calculated for the individual by highlighting those states for which an estimated risk is calculated (e.g., fig. 15A, D), states with only a higher risk (fig. 15B), or states with only a lower risk (fig. 15C).
The information delivered and communicated to the individual may be encrypted and confidential and access to the information by the individual may be controlled. Information derived from complex genomic profiles can be provided to individuals as regulatory-approved, understandable, medically relevant, and/or highly influential data. Information may also be of general importance, regardless of medical treatment. Information may be delivered to an individual cryptographically in several ways, including, but not limited to, an entrance interface and/or mail. More preferably, the information is provided to the individual encrypted via a portal interface to which the individual has secure and confidential access (if the individual so chooses). This interface is preferably provided through an online, internet web portal, or alternatively, through the phone or other means that allows private, secure, and easy-to-use access. The data transmission of genomic profiles, phenotypic profiles and reports over the network is provided to the individual or its health care manager.
Accordingly, FIG. 8 is a block diagram illustrating a representative example logic device through which phenotype profiles and reports may be generated. FIG. 8 shows a computer system (or digital device) 800 for receiving and storing a genomic profile, analyzing genotype correlations, generating rules based on genotype correlations, applying rules to the genomic profile, and generating phenotypic profiles and reports. The computer system 800 may be understood as a logical device capable of reading instructions from the media 811 and/or the network port 805, the network port 805 optionally being connected to a server 809 having a fixed media 812. The system shown in fig. 8 includes a CPU 801, a disk drive 803, an optional input device (e.g., keyboard 815 and/or mouse 816), and an optional monitor 807. Data communication with the server 809 at the local or remote location may be accomplished via the communication medium shown. A communication medium may include any means for transmitting and/or receiving data. The communication medium may be, for example, a network connection, a wireless connection, or an internet connection. This connection may provide communication over the World Wide Web. It is envisioned that data pertaining to the present invention may be transmitted over such means over a network or connection for receipt and/or verification by a party 822. Recipient 822 may be, but is not limited to, an individual, a registered user, a healthcare provider, or a healthcare manager. In one embodiment, the computer-readable medium comprises a medium adapted to convey the results of an analysis of a biological sample or genotype correlation. The medium may comprise results on a phenotype profile of an individual subject, wherein such results are obtained using the methods described herein.
The personal portal will preferably serve as the basic interface for the individual receiving and evaluating the genomic data. The portal will enable an individual to track the progress of their samples from collection to testing and to track the results. Through portal visits, individuals are presented with the relative risk of common genetic diseases based on their genomic profiles. The registered user can select through the portal which rules to apply to their genomic profile.
In one embodiment, one or more web pages will have a list of phenotypes and a box near each phenotype that registered users can select to include in their phenotype profiles. Phenotypes can be linked to information related to the phenotype to assist registered users in judiciously selecting a phenotype about which they wish to include in their phenotype profile. The web page may also have phenotypes organized in disease groups (e.g., treatable diseases or non-treatable diseases). For example, the registered user may select only disposable phenotypes, such as HLA-DQA1 and celiac disease. Registered users may also choose to display pre-symptomatic or post-symptomatic treatment of the phenotype. For example, an individual may be selected to have a treatable phenotype (beyond further screening) of presymptomatic treatment, which for celiac disease is a presymptomatic treatment of gluten-free diet. Another example may be alzheimer's disease, with pre-symptomatic treatment being statins, exercise, vitamins and psychotropic effects. Thrombosis is another example, and pre-symptomatic treatment is to avoid oral contraceptives and avoid prolonged sedentary. An example of a phenotype with approved post-symptomatic treatment is wet AMD associated with CFH, where an individual may undergo laser treatment of their condition.
Phenotypes may also be organized by type or kind of disease or condition, such as neurological, cardiovascular, endocrine, immunological, and the like. Phenotypes can also be grouped into medical and non-medical phenotypes. Other classifications of phenotypes on web pages can be made in terms of physical traits, physiological traits, mental traits, or emotional traits. The web page may further provide for selecting a set of phenotypic partitions by selecting a box. For example, all phenotypes, medically-only related phenotypes, non-medically related phenotypes only, disposable phenotypes only, non-disposable phenotypes only, different disease groups, or "entertainment" phenotypes are selected. The "entertaining" phenotype may include comparison to a celebrity or other well-known individual, or comparison to other animals or even other organisms. A list of genomic maps available for comparison may also be provided on a web page for selection by the registered user for comparison with the registered user's genomic map.
The online portal may also provide a search engine to assist registered users in browsing the portal, retrieving a particular phenotype, or retrieving particular terms or information revealed by their phenotype profile or report. Links to access the collocated services and offered products may also be provided by the portal. Additional links to chat rooms supporting teams, message boards, and individuals with common or similar phenotypes may also be provided. The online portal may also provide links to other addresses with more information about the phenotype in the registered user phenotype spectrum. The online portal may also provide services that allow registered users to share their formulaic spectrum and reports with friends, family, or healthcare managers. Registered users may choose to display the phenotype they wish to share with their friends, family or healthcare managers in a phenotype spectrum.
The phenotype profiles and reports provide personalized genotype correlations for individuals. The genotypic relevance provided to an individual can be used to determine personal health care and lifestyle choices. If a strong correlation between the genetic variant and the disease that can be treated is found, the detection of the genetic variant can help decide to initiate disease treatment and/or individual monitoring. In the case where there is a statistically significant correlation, but not considered a strong correlation, the individual may discuss this information with the individual physician and decide on an appropriate, beneficial course of action. Potential regimens that may benefit an individual in terms of a particular genotype correlation include performing therapeutic treatments, monitoring potential therapeutic needs or effects, or changing lifestyle in terms of diet, exercise, and other personal habits/activities. For example, a treatable phenotype (e.g., celiac disease) may be treated for symptoms of a gluten-free diet. Also, through pharmacogenomics, genotype related information can be applied to predict the likely response of an individual who must be treated with a particular drug or course of drug therapy, such as the likely efficacy or safety of a particular drug therapy.
The registered user may choose to provide the genomic profile and the phenotypic profile to their healthcare manager, such as a physician or genetic counselor. The genomic and phenotypic profiles may be accessed directly by the healthcare administrator, printed out as a copy by a registered user for delivery to the healthcare administrator, or sent directly to the healthcare administrator through an online portal (e.g., via a link on an online report).
The transfer of this relevant information will cause the patient to perform an action that is coordinated with his physician. In particular, discussions between a patient and his doctor may be made possible by personal portals and links to medical information and integrating the patient's genomic information into his medical records. The medical information may include prevention and health information. The information provided to an individual patient by the present invention will enable the patient to make an informed choice as to his or her health care. In this way, patients can select for diseases that may help them avoid and/or delay the more likely cause of their individual genomic profile (inherited DNA). In addition, the patient will be able to adopt a treatment regime tailored to the specific medical needs of the individual himself. Individuals will also have the ability to access their genotype data if they develop a disease and require this information to help their physician develop a treatment strategy.
Genotype-related information can also be used in conjunction with genetic counseling to suggest to couples considering fertility, as well as to suggest potential genetic concerns for the mother, father, and/or child. The genetic advisor can provide information and support to registered users with a phenotype profile that shows an increased risk for a particular state or disease. They can interpret information about the condition, analyze genetic patterns and risk of recurrence, and discuss available choices with registered users. The genetic counselor can also provide supportive consultations to recommend community or national support services to registered users. Genetic counseling may include a specific registration plan. In some embodiments, the genetic counseling may be scheduled to be available within 24 hours of the request and for times such as evening, saturday, sunday, and/or holiday.
The entry of the individual will also facilitate the transfer of additional information beyond the initial screening. Individuals will be informed of new scientific discoveries about their personal genetic profile, such as information about new therapeutic or prophylactic strategies for their current or potential state. New findings may also be communicated to their healthcare managers. In a preferred embodiment, the registered user or their healthcare provider is electronically notified of new genotype correlations and new studies about the phenotypes in the phenotype profile of the registered user. In other embodiments, an email of the "entertainment" phenotype is sent to registered users, e.g., an email can inform them that 77% of their genomic profile is the same as that of arabian-lincoln and that further information is provided through an online portal.
The present invention also provides a computer code system for generating new rules, revising rules, combining rules, periodically updating rule sets with new rules, securely maintaining a genomic profile database, applying rules to genomic profiles to determine phenotypic profiles, and for generating reports. The computer code informs the registered user of new or revised correlations and new or revised reports, such as reports with new prevention and health information, information about new treatments under development, or available new treatments.
Business method
The present invention provides a commercial method for assessing genotype correlations of individuals based on a comparison of a patient's genomic profile to a clinical database of established medically relevant nucleotide variants. The present invention further provides a business method that uses a stored genomic profile of an individual to assess initially unknown novel correlations to generate an updated phenotypic profile of the individual without requiring the individual to submit additional biological samples. Fig. 9 is a flow chart illustrating the business method.
The revenue stream for the commercial process of the present invention is generated in part in step 101 when an individual initially requests and purchases a personal genome map for genotypic correlations of a variety of common human diseases, conditions and physical states. The request and purchase may be made from a number of sources including, but not limited to, an online web portal, an online health service, and an individual's individual doctor or similar source of personal medical attention. In alternative embodiments, the genomic profile may be provided free of charge and the revenue stream may be generated in a subsequent step (e.g., step 103).
A registered user or consumer makes a request to purchase a form spectrum. The collection kit is provided to the consumer in response to the demand and purchase for collecting the biological sample for genetic sample isolation in step 103. When requested by a source that is online, by telephone, or other such that a consumer cannot readily obtain the collection kit in person, the collection kit is provided by courier, such as a courier service that is delivered on the day or at night. Included in the collection kit are containers for the sample and packaging materials for rapid delivery of the sample to the laboratory where the genomic map is generated. The kit may also include instructions for sending the sample to a sample processing facility or laboratory and instructions for accessing its genomic and phenotypic profiles, which may be performed through an online portal.
As explained in detail above, genomic DNA can be obtained from any of a variety of types of biological samples. Preferably, genomic DNA is isolated from saliva using a commercially available collection kit (e.g., a kit available from DNA Genotek). The use of saliva and such a kit enables non-invasive sample collection, since it is convenient for the consumer to provide a saliva sample in a container from the collection kit, and then seal the container. Additionally, saliva samples can be stored and transported at room temperature.
After the biological sample is deposited in the collection or specimen container, the consumer delivers the sample to the laboratory for processing in step 105. Typically, the consumer may use the packaging material provided in the collection kit to deliver/send the sample to the laboratory through rapid delivery, such as a courier service on the same day or overnight.
Laboratories that process samples and generate genomic maps may follow appropriate government agency guidelines and regulations. For example, in the united states, a treatment laboratory may be managed by one or more federal agencies and/or one or more state agencies, such as the Food and Drug Administration (FDA) or the Centers for medical and medical id Services (CMS). Clinical laboratories in the United states may be licensed or approved according to Clinical Laboratory Improvement Algorithms (CLIA) of 1988.
In step 107, the sample is processed as previously described by the laboratory to isolate a genetic sample of DNA or RNA. The isolated genetic sample is then analyzed and a genomic map is generated in step 109. Preferably, a genomic SNP profile is generated. As described above, several methods can be used to generate SNP profiles. Preferably, high density arrays (e.g., commercially available platforms from Affymetrix or Illumina) are used for SNP identification and profiling. For example, as described in more detail above, SNP profiles were generated using Affymetrix GeneChipassay. As technology evolves, there may be other technology vendors that can generate high density SNP profiles. In another embodiment, the genomic profile of the registered user will be the genomic sequence of the registered user.
After the genomic profile of the individual is generated, the genotype data is preferably encrypted, entered in step 111, and deposited in an encrypted database or vault where the information is stored for future use in step 113. The genomic profile and related information may be confidential, with access to this private information and genomic profile being restricted according to the instructions of the individual and/or his or her individual physician. Others (e.g., the family of the individual and a genetic counselor) may also be granted access by the registered user.
The database or vault may be located locally at the processing laboratory. Alternatively, the database may be located at a separate location. In this case, the genomic map data generated by the processing laboratory may be transported to a separate facility comprising a database in step 111.
After generating the genomic profile of the individual, the genetic variation of the individual is then compared to a clinical database of determined medically relevant genetic variants in step 115. Alternatively, the genotype correlations may not be medically relevant but still be included in the genotype correlation database, for example, physical traits such as eye color, or "entertainment" phenotypes such as similarity to a celebrity genome map.
Medically relevant SNPs can be established through scientific literature and related sources. non-SNP genetic variants can also be established to associate with a phenotype. Typically, the association of SNPs for a given disease is established by comparing the haplotype patterns of a group of people known to already have the disease to a group of people without the disease. By analyzing many individuals, the frequency of polymorphisms in a population can be determined, and in turn these genotype frequencies can be correlated with a particular phenotype (e.g., disease or condition). Alternatively, the phenotype may be a non-medical condition.
Related SNPs and non-SNP genetic variants can also be determined by analyzing stored genomic maps of individuals, rather than by available published literature. Individuals with stored genomic profiles may reveal phenotypes that have been previously determined. Analysis of the genotype and revealed phenotype of an individual can be compared to individuals without the phenotype to determine correlations that can then be used for other genomic profiles. Individuals whose genomic profile is determined may fill out a questionnaire regarding the phenotypes that have been previously determined. The questionnaire may include questions about medical and non-medical conditions, such as previously diagnosed diseases, family history of medical conditions, lifestyle, physical traits, mental traits, age, social life, environment, and the like.
In one embodiment, if an individual fills out a questionnaire, they can determine their genomic profile for free. In some embodiments, individuals fill out questionnaires periodically to gain free access to their profile and reports. In other embodiments, individuals who have filled out a questionnaire may be given an upgrade to the registration so that they have a higher level of access than their previous registrations, or they may purchase or update the registration at a lower price.
To ensure scientific accuracy and importance, all information deposited in the medically relevant database of genetic variants in step 121 is first approved by a research/clinical advisor group and, if authorized in step 119, reviewed and supervised by appropriate governmental agencies. For example, in the united states, the FDA may supervise by approving algorithms for validating data relating to genetic variants (typically SNPs, transcript levels, or mutations). In step 123, the scientific literature and other relevant sources are monitored for additional genetic variant-disease or condition correlations, and after confirming their accuracy and importance, and upon review and approval by governmental agencies, these additional genotype correlations are added to the master database in step 125.
The combination of a database of approved and validated medically relevant genetic variants with a genome-wide individual profile will advantageously allow genetic risk assessment of a large number of diseases or conditions. After compiling a genomic profile of an individual, the genotype correlations of the individual may be determined by comparing nucleotide (genetic) variants or genetic markers of the individual to a database of human nucleotide variants that have been associated with a particular phenotype (e.g., a disease, state, or physical state). By comparing the individual's genomic profile to a master database of genotype correlations, individuals can be informed whether and to what extent they find positive or negative for genetic risk factors. Individuals will receive relative risk and/or disease constitution data for a wide range of scientifically proven disease states (e.g., alzheimer's disease, cardiovascular disease, coagulation). For example, genotype correlations in table 1 may be included. In addition, SNP disease correlations in the database may include, but are not limited to, those shown in fig. 4. Other correlations in fig. 5 and 6 may also be included. The business method of the present invention thus provides risk analysis for a large number of diseases and conditions without the need to know in advance what risks those diseases and conditions may cause.
In other embodiments, the genotypic correlation associated with the genome-wide individual profile is a non-medically relevant phenotype, such as a "recreational" phenotype or a physical trait such as hair color. In a preferred embodiment, the rule or rule set is applied to a genomic map or SNP map of the individual, as described above. Applying the rules to the genomic profile generates a phenotypic profile for the individual.
Thus, when new correlations are discovered and validated, the master database of human genotype correlations is expanded with additional genotype correlations. Updates may be made by accessing relevant information from individual genomic profiles stored in a database, as needed or appropriate. For example, the known correlations of new genotypes may be based on specific gene variants. It can then be determined whether an individual is likely to be affected by the new genotype correlation by obtaining and comparing only a portion of the gene in the individual's complete genomic profile.
The results of the genomic query are preferably analyzed and interpreted for presentation to the individual in an understandable format. The results of the initial screening are then provided to the patient in a secure, confidential manner, either by mail or through an online portal interface as described in detail above, step 117.
The report may include a phenotype profile as well as genomic information about the phenotypes in the phenotype profile, e.g., basic genetic information about the genes involved or statistical information about the genetic variants in different populations. Other information based on phenotype profiles that may be included in the report are preventive strategies, health information, treatment methods, symptom recognition, early detection protocols, intervention protocols, and further identification and classification of phenotypes. Controlled, modest updates may be or may be performed after initial screening of the genomic profile of the individual.
When new genotype correlations arise and are verified and approved, the individual genomic profile is updated or available for updating in conjunction with updates to the master database. New rules based on new genotype correlations may be applied to the initial genomic profile to provide an updated phenotypic profile. An updated genotype correlation profile may be generated by comparing the relevant portion of the genomic profile of the individual to the new genotype correlations in step 127. For example, if a new genotype correlation is found based on variations in a particular gene, the portion of the gene that maps to the genome of the individual can be analyzed for the new genotype correlation. In this case, one or more rules may be applied to generate an updated tabular form, rather than updating the tabular form with the entire rule set having the rules already applied. In step 129, the results of the updated genotype correlations for the individual are provided in an encrypted manner.
The initial and updated phenotype profiles may be services provided to registered users or consumers. Different levels of registration for genomic profiling and combinations thereof may be provided. Likewise, the registration level may be varied to provide individuals with a choice of the amount of service they wish to receive with their genotype correlations. In this way, the level of service provided will vary with the registration level of the service purchased by the individual.
Entry level registration of registered users may include genomic profiles and initial phenotype profiles. This may be the base registration level. There may be different levels of service within the base registration level. For example, a particular registration level may provide an introduction to genetic counseling, doctors with special expertise in treating or preventing a particular disease, and other service options. Genetic counseling can be obtained online or by telephone. In another embodiment, the price of the enrollment may depend on the number of phenotypes the individual selects for their phenotype profile. Another option might be whether the registered user chooses to access online genetic counseling.
In another case, registration may provide an initial genotypic correlation of the whole genome while maintaining the genomic profile of the individual in the database; this database may be encrypted if the individual so chooses. After this initial analysis, subsequent analyses and additional results may be completed upon request and additional payment by the individual. This may be a high level registration.
In one embodiment of the business method of the present invention, an update of the risk of the individual is made and the individual may be provided with corresponding information on a registered basis. Registered users who purchase advanced registrations may obtain updates. Registration for genotype correlation analysis can provide an update of a particular type or subclass of new genotype correlations according to individual preferences. For example, an individual may only wish to learn about the existence of genotype correlations for known therapeutic or prophylactic processes. To assist the individual in deciding whether to perform additional analyses, the individual may be provided with information regarding additional genotype correlations that have been made available. This information can be conveniently mailed or emailed to registered users.
In advanced registrations, there may be more service levels, such as those mentioned in the basic registration. Other registration modes may be provided in a high level. For example, the highest ranking may provide unlimited updates and reports to registered users. The profile of registered users may be updated when new correlations and rules are determined. In this level, registered users may also allow access to an unlimited number of individuals, such as family members and healthcare managers. Registered users may also have unlimited access to online genetic consultants and physicians.
The next registration level within the high hierarchy may provide more limited aspects, such as a limited number of updates. Registered users may make a limited number of updates to their genomic profile during the registration period, e.g., 4 times a year. In another registration level, registered users may update their stored genomic profile once a week, once a month, or once a year. In another embodiment, registered users may only have a limited number of phenotypes that may choose to update their genomic profile.
The personal portal will also conveniently enable individuals to maintain a registry of risk or relevance updates and/or information updates, or request updated risk assessments and information. As described above, different levels of registration may be provided to enable an individual to select various levels of genotype correlation results and updates, and registered users may select different levels of registration through their personal portals.
Any of these registration options will contribute to the revenue stream for the business method of the present invention. The revenue stream for the commercial process of the present invention is also increased by adding new consumers and registered users, wherein new genomic profiles are added to the database.
Table 1: a representative gene having a phenotype-associated genetic variant.
Gene Phenotype
A2M Alzheimer's disease
ABCA1 Cholesterol, HDL
ABCB1 HIV
ABCB1 Epilepsy
ABCB1 Complications of renal transplantation
ABCB1 Digoxin, serum concentration
ABCB1 Crohn's disease; ulcerative colitis
ABCB1 Parkinson's disease
ABCC8 Type 2 diabetes mellitus
ABCC8 Diabetes mellitus, type 2
ABO Myocardial infarction
ACADM Medium chain acyl-CoA dehydrogenase deficiency
ACDC Type 2, diabetes mellitus
ACE Type 2 diabetes mellitus
ACE Hypertension (hypertension)
ACE Alzheimer's disease
ACE Myocardial infarction
ACE Cardiovascular
ACE Left ventricular hypertrophy
ACE Coronary artery disease
ACE Atherosclerosis, coronary sclerosis
ACE For retinopathy, diabetes
ACE Systemic Lupus Erythematosus (SLE)
ACE Blood pressure, of the arteries
ACE Erectile dysfunction
ACE Lupus (Lupus)
Gene Phenotype
ACE Polycystic kidney disease
ACE Apoplexy (apoplexy)
ACP1 Diabetes mellitus, type 1
ACSM1(LIP)c Cholesterol levels
ADAM33 Asthma (asthma)
ADD1 Hypertension (hypertension)
ADD1 Blood pressure, of the arteries
ADH1B Abuse of alcohol
ADH1C Abuse of alcohol
ADIPOQ Diabetes mellitus, type 2
ADIPOQ Obesity
ADORA2A Panic disorder
ADRB1 Hypertension (hypertension)
ADRB1 Heart failure
ADRB2 Asthma (asthma)
ADRB2 Hypertension (hypertension)
ADRB2 Obesity
ADRB2 Blood pressure, of the arteries
ADRB2 Type 2 diabetes mellitus
ADRB3 Obesity
ADRB3 Type 2 diabetes mellitus
ADRB3 Hypertension (hypertension)
AGT Hypertension (hypertension)
AGT Type 2 diabetes mellitus
AGT Essential hypertension
AGT Myocardial infarction
AGTR1 Hypertension (hypertension)
Gene Phenotype
AGTR2 Hypertension (hypertension)
AHR Breast cancer
ALAD Toxicity of lead
ALDH2 Alcoholism
ALDH2 Abuse of alcohol
ALDH2 Colorectal cancer
ALDRL2 Type 2 diabetes mellitus
ALOX5 Asthma (asthma)
ALOX5AP Asthma (asthma)
APBB1 Alzheimer's disease
APC Colorectal cancer
APEX1 Lung cancer
APOA1 Atherosclerosis, coronary
APOA1 Cholesterol, HDL
APOA1 Coronary artery disease
APOA1 Type 2 diabetes mellitus
APOA4 Type 2 diabetes mellitus
APOA5 Triglycerides
APOA5 Atherosclerosis, coronary
APOB Hypercholesterolemia with high blood pressure
APOB Obesity
APOB Cardiovascular
APOB Coronary artery disease
APOB Coronary heart disease
APOB Type 2 diabetes mellitus
APOC1 Alzheimer's disease
APOC3 Triglycerides
Gene Phenotype
APOC3 Type 2 diabetes mellitus
APOE Alzheimer's disease
APOE Type 2 diabetes mellitus
APOE Multiple sclerosis
APOE Atherosclerosis, coronary
APOE Parkinson's disease
APOE Coronary heart disease
APOE Myocardial infarction
APOE Apoplexy (apoplexy)
APOE Alzheimer's disease
APOE Coronary artery disease
APP Alzheimer's disease
AR Prostate cancer
AR Breast cancer
ATM Breast cancer
ATP7B Wilson's disease
ATXN8OS Spinocerebellar ataxia
BACE1 Alzheimer's disease
BCHE Alzheimer's disease
BDKRB2 Hypertension (hypertension)
BDNF Alzheimer's disease
BDNF Bipolar disorder
BDNF Parkinson's disease
BDNF Schizophrenia
BDNF Memory power
BGLAP Bone mineral density
BRAF Thyroid cancer
Gene Phenotype
BRCA1 Breast cancer
BRCA1 Breast cancer; ovarian cancer
BRCA1 Ovarian cancer
BRCA2 Breast cancer
BRCA2 Breast cancer; ovarian cancer
BRCA2 Ovarian cancer
BRIP1 Breast cancer
C4A Systemic Lupus Erythematosus (SLE)
CALCR Bone mineral density
CAMTA1 Scenario memory
CAPN10 Diabetes mellitus, type 2
CAPN10 Type 2 diabetes mellitus
CAPN3 Muscular dystrophy
CARD15 Crohn's disease
CARD15 Crohn's disease; ulcerative colitis
CARD15 Inflammatory bowel disease
CART Obesity
CASR Bone mineral density
CCKAR Schizophrenia
CCL2 Systemic Lupus Erythematosus (SLE)
CCL5 HIV
CCL5 Asthma (asthma)
CCND1 Colorectal cancer
CCR2 HIV
CCR2 HIV infection
CCR2 Hepatitis C
CCR2 Myocardial infarction
Gene Phenotype
CCR3 Asthma (asthma)
CCR5 HIV
CCR5 HIV infection
CCR5 Hepatitis C
CCR5 Asthma (asthma)
CCR5 Multiple sclerosis
CD14 Specific reactivity (atopy)
CD14 Asthma (asthma)
CD14 Crohn's disease
CD14 Crohn's disease; ulcerative colitis
CD14 Periodontitis
CD14 Total IgE
CDH1 Prostate cancer
CDH1 Colorectal cancer
CDKN2A Melanoma (MEA)
CDSN Psoriasis vulgaris
CEBPA Of leukemia, bone marrow
CETP Atherosclerosis, coronary
CETP Coronary heart disease
CETP Hypercholesterolemia with high blood pressure
CFH Macular degeneration
CFTR Cystic fibrosis
CFTR Pancreatitis
CFTR Cystic fibrosis
CHAT Alzheimer's disease
CHEK2 Breast cancer
CHRNA7 Schizophrenia
Gene Phenotype
CMA1 Atopic dermatitis
CNR1 Schizophrenia
COL1A1 Bone mineral density
COL1A1 Osteoporosis and its preparation method
COL1A2 Bone mineral density
COL2A1 Osteoarthritis
COMT Schizophrenia
COMT Breast cancer
COMT Parkinson's disease
COMT Bipolar disorder
COMT Obsessive compulsive neurosis
COMT Alcoholism
CR1 Systemic Lupus Erythematosus (SLE)
CRP C-reactive protein
CST3 Alzheimer's disease
CTLA4 Type 1 diabetes mellitus
CTLA4 Graves' disease
CTLA4 Multiple sclerosis
CTLA4 Rheumatoid arthritis
CTLA4 Systemic Lupus Erythematosus (SLE)
CTLA4 Lupus erythematosus (lupus erythematosus)
CTLA4 Celiac disease
CTSD Alzheimer's disease
CX3CR1 HIV
CXCL12 HIV
CXCL12 HIV infection
CYBA Atherosclerosis, coronary
Gene Phenotype
CYBA Hypertension (hypertension)
CYP11B2 Hypertension (hypertension)
CYP11B2 Left ventricular hypertrophy
CYP17A1 Breast cancer
CYP17A1 Prostate cancer
CYP17A1 Endometriosis of the endometrium
CYP17A1 Endometrial cancer
CYP19A1 Breast cancer
CYP19A1 Prostate cancer
CYP19A1 Endometriosis of the endometrium
CYP1A1 Lung cancer
CYP1A1 Breast cancer
CYP1A1 Colorectal cancer
CYP1A1 Prostate cancer
CYP1A1 Esophageal cancer
CYP1A1 Endometriosis of the endometrium
CYP1A1 Cytogenesis study
CYP1A2 Schizophrenia
CYP1A2 Colorectal cancer
CYP1B1 Breast cancer
CYP1B1 Glaucoma treatment
CYP1B1 Prostate cancer
CYP21A2 Deletion of 21-hydroxylase
CYP21A2 Congenital adrenal hyperplasia
CYP21A2 Adrenal hyperplasia, congenital
CYP2A6 Smoking behaviour
CYP2A6 Nicotine
Gene Phenotype
CYP2A6 Lung cancer
CYP2C19 Infection of helicobacter pylori
CYP2C19 Phenytoin
CYP2C19 Stomach disease
CYP2C8 Malaria, plasmodium falciparum
CYP2C9 Anticoagulant complications
CYP2C9 Sensitivity to Fahualing
CYP2C9 Favallin treatment, response thereof
CYP2C9 Colorectal cancer
CYP2C9 Phenytoin
CYP2C9 Reaction of acetonitre and coumaryl alcohol
CYP2C9 Blood coagulation disorders
CYP2C9 Hypertension (hypertension)
CYP2D6 Colorectal cancer
CYP2D6 Parkinson's disease
CYP2D6 CYP2D6 undesirable metaboliser phenotype
CYP2E1 Lung cancer
CYP2E1 Colorectal cancer
CYP3A4 Prostate cancer
CYP3A5 Prostate cancer
CYP3A5 Esophageal cancer
CYP46A1 Alzheimer's disease
DBH Schizophrenia
DHCR7 Stern-Lon-Ouder syndrome
DISC1 Schizophrenia
DLST Alzheimer's disease
DMD Muscular dystrophy
Gene Phenotype
DRD2 Alcoholism
DRD2 Schizophrenia
DRD2 Smoking behaviour
DRD2 Parkinson's disease
DRD2 Tardive dyskinesia
DRD3 Schizophrenia
DRD3 Tardive dyskinesia
DRD3 Bipolar disorder
DRD4 Attention deficit disorder with hyperactivity]
DRD4 Schizophrenia
DRD4 New pursuit (novelty seek)
DRD4 ADHD
DRD4 Personality quality
DRD4 Abuse of heroin
DRD4 Abuse of alcohol
DRD4 Alcoholism
DRD4 Personality disorder
DTNBP1 Schizophrenia
EDN1 Hypertension (hypertension)
EGFR Lung cancer
ELAC2 Prostate cancer
ENPP1 Type 2 diabetes mellitus
EPHB2 Prostate cancer
EPHX1 Lung cancer
EPHX1 Colorectal cancer
EPHX1 Cell generation study
EPHX1 Chronic obstructive pulmonary disease/COPD
Gene Phenotype
ERBB2 Breast cancer
ERCC1 Lung cancer
ERCC1 Colorectal cancer
ERCC2 Lung cancer
ERCC2 Cell generation study
ERCC2 Cancer of the bladder
ERCC2 Colorectal cancer
ESR1 Bone mineral density
ESR1 Bone mineral density
ESR1 Breast cancer
ESR1 Endometriosis of the endometrium
ESR1 Osteoporosis and its preparation method
ESR2 Bone mineral density
ESR2 Breast cancer
Estrogen receptors Bone mineral density
F2 Coronary heart disease
F2 Apoplexy (apoplexy)
F2 Of thromboembolism, of veins
F2 Pre-eclampsia
F2 Thrombosis
F5 Of thromboembolism, of veins
F5 Pre-eclampsia
F5 Myocardial infarction
F5 Apoplexy (apoplexy)
F5 Of stroke, ischemia
F7 Atherosclerosis, coronary
F7 Myocardial infarction
Gene Phenotype
F8 Hemophilia
F9 Hemophilia
FABP2 Type 2 diabetes mellitus
FAS Alzheimer's disease
FASLG Multiple sclerosis
FCGR2A Systemic Lupus Erythematosus (SLE)
FCGR2A Lupus erythematosus (lupus erythematosus)
FCGR2A Periodontitis
FCGR2A Rheumatoid arthritis
FCGR2B Lupus erythematosus (lupus erythematosus)
FCGR2B Systemic Lupus Erythematosus (SLE)
FCGR3A Systemic Lupus Erythematosus (SLE)
FCGR3A Lupus erythematosus (lupus erythematosus)
FCGR3A Periodontitis
FCGR3A Arthritis (arthritis)
FCGR3A Rheumatoid arthritis
FCGR3B Periodontitis
FCGR3B Periodontal disease
FCGR3B Lupus erythematosus (lupus erythematosus)
FGB Fibrinogen
FGB Myocardial infarction
FGB Coronary heart disease
FLT3 Of leukemia, bone marrow
FLT3 Leukemia (leukemia)
FMR1 Fragile X syndrome
FRAXA Fragile X syndrome
FUT2 Infection of helicobacter pylori
Gene Phenotype
FVL Factor V Leiden
G6PD Deletion of G6PD
G6PD Hyperbilirubinemia
GABRA5 Bipolar disorder
GBA Gaucher disease
GBA Parkinson's disease
GCGR(FAAH,ML4R,UCP2) Body weight/obesity
GCK Type 2 diabetes mellitus
GCLM(F12,TLR4) Atherosclerosis, myocardial infarction
GDNF Schizophrenia
GHRL Obesity
GJB1 Charcot Marie-picture thinking disease
GJB2 Deafness
GJB2 Of hearing loss, sensory nerve non-syndromic
GJB2 Of hearing loss, sensory nerves
GJB2 Hearing loss/deafness
GJB6 Of hearing loss, sensory nerve non-syndromic
GJB6 Hearing loss/deafness
GNAS Hypertension (hypertension)
GNB3 Hypertension (hypertension)
GPX1 Lung cancer
GRIN1 Schizophrenia
GRIN2B Schizophrenia
GSK3B Bipolar disorder
GSTM1 Lung cancer
GSTM1 Colorectal cancer
GSTM1 Breast cancer
Gene Phenotype
GSTM1 Prostate cancer
GSTM1 Cell generation study
GSTM1 Cancer of the bladder
GSTM1 Esophageal cancer
GSTM1 Head and neck cancer
GSTM1 Leukemia (leukemia)
GSTM1 Parkinson's disease
GSTM1 Stomach cancer
GSTP1 Lung cancer
GSTP1 Colorectal cancer
GSTP1 Breast cancer
GSTP1 Cell generation study
GSTP1 Prostate cancer
GSTT1 Lung cancer
GSTT1 Colorectal cancer
GSTT1 Breast cancer
GSTT1 Prostate cancer
GSTT1 Cancer of the bladder
GSTT1 Cell generation study
GSTT1 Asthma (asthma)
GSTT1 Toxicity of benzene
GSTT1 Esophageal cancer
GSTT1 Head and neck cancer
GYS1 Type 2 diabetes mellitus
HBB Thalassemia
HBB Thalassemia, beta-
HD Huntington's chorea
Gene Phenotype
HFE Hemochromatosis
HFE Iron level
HFE Colorectal cancer
HK2 Type 2 diabetes mellitus
HLA Rheumatoid arthritis
HLA Type 1 diabetes mellitus
HLA Behcet's disease
HLA Celiac disease
HLA Psoriasis vulgaris
HLA Graves disease
HLA Multiple sclerosis
HLA Schizophrenia
HLA Asthma (asthma)
HLA Diabetes mellitus
HLA Lupus (Lupus)
HLA-A Leukemia (leukemia)
HLA-A HIV
HLA-A Diabetes mellitus, type 1
HLA-A Graft versus host disease
HLA-A Multiple sclerosis
HLA-B Leukemia (leukemia)
HLA-B Behcet's disease
HLA-B Celiac disease
HLA-B Diabetes mellitus, type 1
HLA-B Graft versus host disease
HLA-B Sarcoidosis of meat type
HLA-C Psoriasis vulgaris
Gene Phenotype
HLA-DPA1 Measles, measles and other diseases
HLA-DPB1 Diabetes mellitus, type 1
HLA-DPB1 Asthma (asthma)
HLA-DQA1 Diabetes mellitus, type 1
HLA-DQA1 Celiac disease
HLA-DQA1 Cervical cancer
HLA-DQA1 Asthma (asthma)
HLA-DQA1 Multiple sclerosis
HLA-DQA1 Diabetes, type 2; diabetes mellitus, type 1
HLA-DQA1 Lupus erythematosus (lupus erythematosus)
HLA-DQA1 Loss of pregnancy, relapse
HLA-DQA1 Psoriasis vulgaris
HLA-DQB1 Diabetes mellitus, type 1
HLA-DQB1 Celiac disease
HLA-DQB1 Multiple sclerosis
HLA-DQB1 Cervical cancer
HLA-DQB1 Lupus erythematosus (lupus erythematosus)
HLA-DQB1 Loss of pregnancy, relapse
HLA-DQB1 Arthritis (arthritis)
HLA-DQB1 Asthma (asthma)
HLA-DQB1 HIV
HLA-DQB1 Lymphoma (lymphoma)
HLA-DQB1 Tuberculosis (tuberculosis)
HLA-DQB1 Rheumatoid arthritis
HLA-DQB1 Diabetes mellitus, type 2
HLA-DQB1 Graft versus host disease
HLA-DQB1 Narcolepsy
Gene Phenotype
HLA-DQB1 Arthritis, rheumatic
HLA-DQB1 Cholangitis, sclerosing
HLA-DQB1 Diabetes, type 2; diabetes mellitus, type 1
HLA-DQB1 Graves' disease
HLA-DQB1 Hepatitis C
HLA-DQB1 Hepatitis C, chronic
HLA-DQB1 Malaria
HLA-DQB1 Malaria, plasmodium falciparum
HLA-DQB1 Melanoma (MEA)
HLA-DQB1 Psoriasis vulgaris
HLA-DQB1 Sjogren's syndrome
HLA-DQB1 Systemic Lupus Erythematosus (SLE)
HLA-DRB1 Diabetes mellitus, type 1
HLA-DRB1 Multiple sclerosis
HLA-DRB1 Systemic Lupus Erythematosus (SLE)
HLA-DRB1 Rheumatoid arthritis
HLA-DRB1 Cervical cancer
HLA-DRB1 Arthritis (arthritis)
HLA-DRB1 Celiac disease
HLA-DRB1 Lupus erythematosus (lupus erythematosus)
HLA-DRB1 Sarcoidosis of meat type
HLA-DRB1 HIV
HLA-DRB1 Tuberculosis (tuberculosis)
HLA-DRB1 Graves' disease
HLA-DRB1 Lymphoma (lymphoma)
HLA-DRB1 Psoriasis vulgaris
HLA-DRB1 Asthma (asthma)
Gene Phenotype
HLA-DRB1 Crohn's disease
HLA-DRB1 Graft versus host disease
HLA-DRB1 Hepatitis C, chronic
HLA-DRB1 Narcolepsy
HLA-DRB1 Sclerosis, systemic
HLA-DRB1 Sjogren's syndrome
HLA-DRB1 Type 1 diabetes mellitus
HLA-DRB1 Arthritis, rheumatic
HLA-DRB1 Cholangitis, sclerosing
HLA-DRB1 Diabetes, type 2; diabetes mellitus, type 1
HLA-DRB1 Infection of helicobacter pylori
HLA-DRB1 Hepatitis C
HLA-DRB1 Arthritis of teenagers
HLA-DRB1 Leukemia (leukemia)
HLA-DRB1 Malaria
HLA-DRB1 Melanoma (MEA)
HLA-DRB1 Loss of pregnancy, relapse
HLA-DRB3 Psoriasis vulgaris
HLA-G Loss of pregnancy, relapse
HMOX1 Atherosclerosis, coronary
HNF4A Diabetes mellitus, type 2
HNF4A Type 2 diabetes mellitus
HSD11B2 Hypertension (hypertension)
HSD17B1 Breast cancer
HTR1A Depression, major type
HTR1B Dependence on alcohol
HTR1B Alcoholism
Gene Phenotype
HTR2A Memory power
HTR2A Schizophrenia
HTR2A Bipolar disorder
HTR2A Depression (depression)
HTR2A Depression, major type
HTR2A Suicide
HTR2A Alzheimer's disease
HTR2A Anorexia nervosa
HTR2A Hypertension (hypertension)
HTR2A Obsessive compulsive neurosis
HTR2C Schizophrenia
HTR6 Alzheimer's disease
HTR6 Schizophrenia
HTRA1 Wet age-related macular degeneration
IAPP Type 2 diabetes mellitus
IDE Alzheimer's disease
IFNG Tuberculosis (tuberculosis)
IFNG Type 1 diabetes mellitus
IFNG Graft versus host disease
IFNG Hepatitis B
IFNG Multiple sclerosis
IFNG Asthma (asthma)
IFNG Breast cancer
IFNG Kidney transplantation
IFNG Complications of renal transplantation
IFNG Long service life
IFNG Loss of pregnancy, relapse
Gene Phenotype
IGFBP3 Breast cancer
IGFBP3 Prostate cancer
IL10 Systemic Lupus Erythematosus (SLE)
IL10 Asthma (asthma)
IL10 Graft versus host disease
IL10 HIV
IL10 Kidney transplantation
IL10 Complications of renal transplantation
IL10 Hepatitis B
IL10 Arthritis of teenagers
IL10 Long service life
IL10 Multiple sclerosis
IL10 Loss of pregnancy, relapse
IL10 Rheumatoid arthritis
IL10 Tuberculosis (tuberculosis)
IL12B Type 1 diabetes mellitus
IL12B Asthma (asthma)
IL13 Asthma (asthma)
IL13 Specific reactivity
IL13 Chronic obstructive pulmonary disease/COPD
IL13 Graves' disease
IL1A Periodontitis
IL1A Alzheimer's disease
IL1B Periodontitis
IL1B Alzheimer's disease
IL1B Stomach cancer
IL1R1 Type 1 diabetes mellitus
Gene Phenotype
IL1RN Stomach cancer
IL2 Asthma; eczema; allergic diseases
IL4 Asthma (asthma)
IL4 Specific reactivity
IL4 HIV
IL4R Asthma (asthma)
IL4R Specific reactivity
IL4R Total serum IgE
IL6 Bone mineralization
IL6 Kidney transplantation
IL6 Complications of renal transplantation
IL6 Long service life
IL6 Multiple sclerosis
IL6 Bone mineral density
IL6 Bone mineral density
IL6 Colorectal cancer
IL6 Arthritis of teenagers
IL6 Rheumatoid arthritis
IL9 Asthma (asthma)
INHA Premature ovarian failure
INS Type 1 diabetes mellitus
INS Type 2 diabetes mellitus
INS Diabetes mellitus, type 1
INS Obesity
INS Prostate cancer
INSIG2 Obesity
INSR Type 2 diabetes mellitus
Gene Phenotype
INSR Hypertension (hypertension)
INSR Polycystic ovarian syndrome
IPF1 Diabetes mellitus, type 2
IRS1 Type 2 diabetes mellitus
IRS1 Diabetes mellitus, type 2
IRS2 Diabetes mellitus, type 2
ITGB3 Myocardial infarction
ITGB3 Atherosclerosis, coronary
ITGB3 Coronary heart disease
ITGB3 Myocardial infarction
KCNE1 EKG, Exception
KCNE2 EKG, Exception
KCNH2 EKG, Exception
KCNH2 QT interval prolongation syndrome
KCNJ11 Diabetes mellitus, type 2
KCNJ11 Type 2 diabetes mellitus
KCNN3 Schizophrenia
KCNQ1 EKG, Exception
KCNQ1 QT interval prolongation syndrome
KIBRA Scenario memory
KLK1 Hypertension (hypertension)
KLK3 Prostate cancer
KRAS Colorectal cancer
LDLR Hypercholesterolemia with high blood pressure
LDLR Hypertension (hypertension)
LEP Obesity
LEPR Obesity
Gene Phenotype
LIG4 Breast cancer
LIPC Atherosclerosis, coronary
LPL Coronary artery disease
LPL Hyperlipidemia
LPL Triglycerides
LRP1 Alzheimer's disease
LRP5 Bone mineral density
LRRK2 Parkinson's disease
LRRK2 Parkinson's disease
LTA Type 1 diabetes mellitus
LTA Asthma (asthma)
LTA Systemic Lupus Erythematosus (SLE)
LTA Septicemia
LTC4S Asthma (asthma)
MAOA Alcoholism
MAOA Schizophrenia
MAOA Bipolar disorder
MAOA Smoking behaviour
MAOA Personality disorder
MAOB Parkinson's disease
MAOB Smoking behaviour
MAPT Parkinson's disease
MAPT Alzheimer's disease
MAPT Dementia and method of treatment
MAPT Dementia of frontotemporal type
MAPT Progressive supranuclear palsy
MC1R Melanoma (MEA)
Gene Phenotype
MC3R Obesity
MC4R Obesity
MECP2 Rett syndrome
MEFV Familial mediterranean fever
MEFV Amyloidosis of the disease
MICA Type 1 diabetes mellitus
MICA Behcet's disease
MICA Celiac disease
MICA Rheumatoid arthritis
MICA Systemic Lupus Erythematosus (SLE)
MLH1 Colorectal cancer
MME Alzheimer's disease
MMP1 Lung cancer
MMP1 Ovarian cancer
MMP1 Periodontitis
MMP3 Myocardial infarction
MMP3 Ovarian cancer
MMP3 Rheumatoid arthritis
MPO Lung cancer
MPO Alzheimer's disease
MPO Breast cancer
MPZ Charcot Marie-picture thinking disease
MS4A2 Asthma (asthma)
MS4A2 Specific reactivity
MSH2 Colorectal cancer
MSH6 Colorectal cancer
MSR1 Prostate cancer
Gene Phenotype
MTHFR Colorectal cancer
MTHFR Type 2 diabetes mellitus
MTHFR Neural tube defect
MTHFR Homocysteine
MTHFR Of thromboembolism, of veins
MTHFR Atherosclerosis, coronary
MTHFR Alzheimer's disease
MTHFR Esophageal cancer
MTHFR Pre-eclampsia
MTHFR Loss of pregnancy, relapse
MTHFR Apoplexy (apoplexy)
MTHFR Thrombosis, deep veins
MT-ND1 Diabetes mellitus, type 2
MTR Colorectal cancer
MT-RNR1 Of hearing loss, sensory nerve non-syndromic
MTRR Neural tube defect
MTRR Homocysteine
MT-TL1 Diabetes mellitus, type 2
MUTYH Colorectal cancer
MYBPC3 Cardiomyopathy
MYH7 Cardiomyopathy
MYOC Glaucoma, primary open angle
MYOC Glaucoma treatment
NAT1 Colorectal cancer
NAT1 Breast cancer
NAT1 Cancer of the bladder
NAT2 Colorectal cancer
Gene Phenotype
NAT2 Cancer of the bladder
NAT2 Breast cancer
NAT2 Lung cancer
NBN Breast cancer
NCOA3 Breast cancer
NCSTN Alzheimer's disease
NEUROD1 Type 1 diabetes mellitus
NF1 Neurofibromatosis 1
NOS1 Asthma (asthma)
NOS2A Multiple sclerosis
NOS3 Hypertension (hypertension)
NOS3 Coronary heart disease
NOS3 Atherosclerosis, coronary
NOS3 Coronary artery disease
NOS3 Myocardial infarction
NOS3 Acute coronary syndrome
NOS3 Blood pressure, of the arteries
NOS3 Pre-eclampsia
NOS3 Nitric oxide
NOS3 Alzheimer's disease
NOS3 Asthma (asthma)
NOS3 Type 2 diabetes mellitus
NOS3 Cardiovascular diseases
NOS3 Behcet's disease
NOS3 Erectile dysfunction
NOS3 Renal failure, chronic
NOS3 Toxicity of lead
Gene Phenotype
NOS3 Left ventricular hypertrophy
NOS3 Loss of pregnancy, relapse
NOS3 For retinopathy, diabetes
NOS3 Apoplexy (apoplexy)
NOTCH4 Schizophrenia
NPY Abuse of alcohol
NQO1 Lung cancer
NQO1 Colorectal cancer
NQO1 Toxicity of benzene
NQO1 Cancer of the bladder
NQO1 Parkinson's disease
NR3C2 Hypertension (hypertension)
NR4A2 Parkinson's disease
NRG1 Schizophrenia
NTF3 Schizophrenia
OGG1 Lung cancer
OGG1 Colorectal cancer
OLR1 Alzheimer's disease
OPA1 Glaucoma treatment
OPRM1 Abuse of alcohol
OPRM1 Dependence on drugs
OPTN Glaucoma, primary open angle
P450 Metabolism of drugs
PADI4 Rheumatoid arthritis
PAH phenylketonuria/PKU
PAI1 Coronary heart disease
PAI1 Asthma (asthma)
Gene Phenotype
PALB2 Breast cancer
PARK2 Parkinson's disease
PARK7 Parkinson's disease
PDCD1 Lupus erythematosus (lupus erythematosus)
PINK1 Parkinson's disease
PKA Memory power
PKC Memory power
PLA2G4A Schizophrenia
PNOC Schizophrenia
POMC Obesity
PON1 Atherosclerosis, coronary
PON1 Parkinson's disease
PON1 Type 2 diabetes mellitus
PON1 Atherosclerosis of arteries
PON1 Coronary artery disease
PON1 Coronary heart disease
PON1 Alzheimer's disease
PON1 Long service life
PON2 Atherosclerosis, coronary
PON2 Premature delivery
PPARG Type 2 diabetes mellitus
PPARG Obesity
PPARG Diabetes mellitus, type 2
PPARG Colorectal cancer
PPARG Hypertension (hypertension)
PPARGC1A Diabetes mellitus, type 2
PRKCZ Type 2 diabetes mellitus
Gene Phenotype
PRL Systemic Lupus Erythematosus (SLE)
PRNP AAlzheimer's disease
PRNP Creutzfeldt-Jakob disease
PRNP Yak-Ke-Shi disease
PRODH Schizophrenia
PRSS1 Pancreatitis
PSEN1 Alzheimer's disease
PSEN2 Alzheimer's disease
PSMB8 Type 1 diabetes mellitus
PSMB9 Type 1 diabetes mellitus
PTCH Skin cancer, non-melanoma
PTGIS Hypertension (hypertension)
PTGS2 Colorectal cancer
PTH Bone mineral density
PTPN11 Noonan syndrome
PTPN22 Rheumatoid arthritis
PTPRC Multiple sclerosis
PVT1 End stage renal disease
RAD51 Breast cancer
RAGE For retinopathy, diabetes
RB1 Retinoblastoma
RELN Schizophrenia
REN Hypertension (hypertension)
RET Thyroid cancer
RET Hischutton's disease
RFC1 Neural tube defect
RGS4 Schizophrenia
Gene Phenotype
RHO Retinitis pigmentosa
RNASEL Prostate cancer
RYR1 Malignant hyperthermia
SAA1 Amyloidosis of the disease
SCG2 Hypertension (hypertension)
SCG3 Obesity
SCGB1A1 Asthma (asthma)
SCN5A Brugada syndrome
SCN5A EKG, Exception
SCN5A QT interval prolongation syndrome
SCNN1B Hypertension (hypertension)
SCNN1G Hypertension (hypertension)
SERPINA1 COPD
SERPINA3 Alzheimer's disease
SERPINA3 COPD
SERPINA3 Parkinson's disease
SERPINE1 Myocardial infarction
SERPINE1 Type 2 diabetes mellitus
SERPINE1 Atherosclerosis, coronary
SERPINE1 Obesity
SERPINE1 Pre-eclampsia
SERPINE1 Apoplexy (apoplexy)
SERPINE1 Hypertension (hypertension)
SERPINE1 Loss of pregnancy, relapse
SERPINE1 Of thromboembolism, of veins
SLC11A1 Tuberculosis (tuberculosis)
SLC22A4 Crohn's disease; ulcerative colitis
Gene Phenotype
SLC22A5 Crohn's disease; ulcerative colitis
SLC2A1 Type 2 diabetes mellitus
SLC2A2 Type 2 diabetes mellitus
SLC2A4 Type 2 diabetes mellitus
SLC3A1 Cystinuria
SLC6A3 Attention deficit disorder with hyperactivity]
SLC6A3 Parkinson's disease
SLC6A3 Smoking behaviour
SLC6A3 Alcoholism
SLC6A3 Schizophrenia
SLC6A4 Depression (depression)
SLC6A4 Depression, major type
SLC6A4 Schizophrenia
SLC6A4 Suicide
SLC6A4 Alcoholism
SLC6A4 Bipolar disorder
SLC6A4 Personality quality
SLC6A4 Attention deficit disorder with hyperactivity]
SLC6A4 Alzheimer's disease
SLC6A4 Personality disorder
SLC6A4 Panic disorder
SLC6A4 Abuse of alcohol
SLC6A4 Affective disorders
SLC6A4 Anxiety disorders
SLC6A4 Smoking behaviour
SLC6A4 Depression, major; bipolar disorder
SLC6A4 Abuse of heroin
Gene Phenotype
SLC6A4 Irritable bowel syndrome
SLC6A4 Migraine headache
SLC6A4 Obsessive compulsive neurosis
SLC6A4 Suicide behavior
SLC7A9 Cystinuria
SNAP25 ADHD
SNCA Parkinson's disease
SOD1 ALS/amyotrophic lateral sclerosis
SOD2 Breast cancer
SOD2 Lung cancer
SOD2 Prostate cancer
SPINK1 Pancreatitis
SPP1 Multiple sclerosis
SRD5A2 Prostate cancer
STAT6 Asthma (asthma)
STAT6 Total IgE
SULT1A1 Breast cancer
SULT1A1 Colorectal cancer
TAP1 Type 1 diabetes mellitus
TAP1 Lupus erythematosus (lupus erythematosus)
TAP2 Type 1 diabetes mellitus
TAP2 Diabetes mellitus, type 1
TBX21 Asthma (asthma)
TBXA2R Asthma (asthma)
TCF1 Diabetes mellitus, type 2
TCF1 Type 2 diabetes mellitus
TF Alzheimer's disease
Gene Phenotype
TGFB1 Breast cancer
TGFB1 Kidney transplantation
TGFB1 Complications of renal transplantation
TH Schizophrenia
THBD Myocardial infarction
TLR4 Asthma (asthma)
TLR4 Crohn's disease; ulcerative colitis
TLR4 Septicemia
TNF Asthma (asthma)
TNFA Cerebrovascular disease
TNF Type 1 diabetes mellitus
TNF Rheumatoid arthritis
TNF Systemic Lupus Erythematosus (SLE)
TNF Kidney transplantation
TNF Psoriasis vulgaris
TNF Septicemia
TNF Type 2 diabetes mellitus
TNF Alzheimer's disease
TNF Crohn's disease
TNF Diabetes mellitus, type 1
TNF Hepatitis B
TNF Complications of renal transplantation
TNF Multiple sclerosis
TNF Schizophrenia
TNF Celiac disease
TNF Obesity
TNF Loss of pregnancy, relapse
Gene Phenotype
TNFRSF11B Bone mineral density
TNFRSF1A Rheumatoid diseaseArthritis of joint
TNFRSF1B Rheumatoid arthritis
TNFRSF1B Systemic Lupus Erythematosus (SLE)
TNFRSF1B Arthritis (arthritis)
TNNT2 Cardiomyopathy
TP53 Lung cancer
TP53 Breast cancer
TP53 Colorectal cancer
TP53 Prostate cancer
TP53 Cervical cancer
TP53 Ovarian cancer
TP53 Smoking
TP53 Esophageal cancer
TP73 Lung cancer
TPH1 Suicide
TPH1 Depression, major type
TPH1 Suicide behavior
TPH1 Schizophrenia
TPMT Thiopurine methyltransferase Activity
TPMT Leukemia (leukemia)
TPMT Inflammatory bowel disease
TPMT Thiopurine S-methyltransferase phenotype
TSC1 Tuberous sclerosis
TSC2 Tuberous sclerosis
TSHR Graves' disease
TYMS Colorectal cancer
Gene Phenotype
TYMS Stomach cancer
TYMS Esophageal cancer
UCHL1 Parkinson's disease
UCP1 Obesity
UCP2 Obesity
UCP3 Obesity
UGT1A1 Hyperbilirubinemia
UGT1A1 Syndrome of Rilbert syndrome
UGT1A6 Colorectal cancer
UGT1A7 Colorectal cancer
UTS2 Diabetes mellitus, type 2
VDR Bone mineral density
VDR Prostate cancer
VDR Bone mineral density
VDR Type 1 diabetes mellitus
VDR Osteoporosis and its preparation method
VDR Bone mass
VDR Breast cancer
VDR Toxicity of lead
VDR Tuberculosis (tuberculosis)
VDR Type 2 diabetes mellitus
VEGF Breast cancer
Vit D rec Idiopathic short stature
VKORC1 Warfarin therapy, response thereto
WNK4 Hypertension (hypertension)
XPA Lung cancer
XPC Lung cancer
Gene Phenotype
XPC Cell generation study
XRCC1 Lung cancer
XRCC1 Cell generation study
XRCC1 Breast cancer
XRCC1 Cancer of the bladder
XRCC2 Breast cancer
XRCC3 Breast cancer
XRCC3 Cell generation study
XRCC3 Lung cancer
XRCC3 Cancer of the bladder
ZDHHC8 Schizophrenia
Genetic Integrated index (GCI)
The etiology of many conditions or diseases is attributed to both genetic and environmental factors. Recent advances in genotyping technology have provided opportunities to identify new associations between disease and genetic markers throughout the genome. Indeed, many recent studies have found these associations, where a particular allele or genotype is associated with an increased risk of disease. Some of these studies include collecting a set of test cases and a set of controls and comparing the allelic distribution of genetic markers between the two populations. In some of these studies, the association between a particular genetic marker and a disease was determined in isolation from other genetic markers that were handled as background and did not play a role in statistical analysis.
Genetic markers and variants may include SNPs, nucleotide repeats, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations. Copy number variations may include microsatellite repeats, nucleotide repeats, centromere repeats or telomere repeats.
In one aspect of the invention, information about the association of multiple genetic markers with one or more diseases or conditions is combined and analyzed to derive a GCI score. GCI scoring can be used to provide people without genetic training with reliable (i.e., robust), understandable, and/or intuitive knowledge of their individual risk of disease compared to a relevant population based on current scientific research. In one embodiment, the method of generating a reliable GCI score for the combined effect of different loci is based on the reported individual risk for each locus studied. For example, a disease or condition of interest is identified and then sources of information (including, but not limited to, databases, patent publications, and scientific literature) are queried for information regarding the association of the disease or condition with one or more genetic loci. These sources of information are validated and evaluated using quality criteria. In some embodiments, the evaluation process includes multiple steps. In other embodiments, the information sources are evaluated against a plurality of quality criteria. Information derived from information resources is used to identify odds ratios or relative risks of one or more genetic loci for each disease or condition of interest.
In alternative embodiments, the Odds Ratio (OR) OR Relative Risk (RR) for at least one genetic locus is not available from available sources of information. The RRs are then calculated using (1) the reporter OR of multiple alleles of the same locus, (2) allele frequencies from datasets (e.g., HapMap datasets), and/OR (3) disease/status prevalence from available resources (e.g., CDC, national center for Health Statistics, etc.) to derive the RRs for all alleles of interest. In one embodiment, the ORs of multiple alleles of the same locus are assessed separately OR independently. In a preferred embodiment, the ORs of multiple alleles of the same locus are combined to account for the dependency (dependency) between the ORs of different alleles. In some embodiments, established disease models (including, but not limited to, models such as positive (additive), additive (additive), Harvard-modified, dominant effects) are used to generate intermediate scores representing individual risk according to the selected model.
In another embodiment, a method of analyzing multiple models of a disease or condition of interest is used, and correlates the results obtained from these different models; this minimizes the possible errors that may be introduced by selecting a particular disease model. This approach minimizes the impact of reasonable errors in prevalence, allele frequency, and OR estimates derived from the information sources on the calculation of relative risk. Incorrectly estimating prevalence has little or no effect on the final score due to the "linear" or monotonic nature of the effect of prevalence estimates on RRs; the same model is assumed to be consistently applied to all individuals generating the report.
In another embodiment, a method is used that considers environmental/behavioral/demographic data as an additional "locus". In related embodiments, such data may be obtained from information sources, such as medical or scientific literature or databases (e.g., association of smoking w/lung cancer or from insurance health risk assessment). In one embodiment, a GCI score is generated for one or more complex diseases. Complex diseases can be influenced by multiple genes, environmental factors, and their interactions. When studying complex diseases, a large number of possible interactions need to be analyzed. In one embodiment, a program such as the Bonferroni correction is used to correct multiple comparisons. In an alternative embodiment, when the tests are independent or show a particular type of dependency, the overall level of significance (also known as the "family error rate") is controlled using the Simes test (Sarkar S. (1998)). Some probability inequalities for ordered MTP2 random variables: proof of Simes hypothesis (Ann Stat 26: 494-504). If p (K) ≦ α K/K for any K in K1,., then the Simes test rejects all K tests for the global zero hypothesis with the specific zero hypothesis true (Simes RJ (1986) enhanced Bonferroni procedure for multiple tests of signalicities.biometrika 73: 751-754).
Other embodiments that may be used in the context of multi-gene and multi-environment factor analysis control false discovery rates (false-discovery rates), i.e., the expected proportion of false rejects that reject zero hypotheses. This approach is particularly beneficial when, as in microarray studies, a fraction of the null hypotheses can be assumed to be erroneous. Devlin et al (2003, Analysis of multiple genes of association. Gene expression 25: 36-47) proposed a variation of the Benjamini and Hochberg (1995, Controlling the false discovery rate: a practical and functional profiling. J R Stat Soc Ser B57: 289-300) incremental program that controls the rate of false discovery when testing a large number of possible gene-gene interactions in a multiple locus association study. The Benjamini and Hochberg programs are related to the Simes test; setting k*Maxk such that p (K) ≦ α K/K, which rejects all responses toK of (a)*A null hypothesis. In fact, when all null hypotheses are true, The Benjamini and Hochberg programs are reduced to The Simes test (Benjamini Y, Yekutieli D (2001) The control soft feel discovery rate in multiple testing under dependency.Ann Stat29:1165-1188)。
In some embodiments, the individual is ranked based on its median score compared to a population of individuals to generate a final score, which may be expressed as a ranking in the population, such as 99 th or 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, or 0 th ranking. In another embodiment, the score may be displayed as a range, such as 100 th to 95 th quantile, 95 th to 85 th quantile, 85 th to 60 th quantile, or any subrange between 100 th to 0 th quantile. In yet another embodiment, the individuals are ranked in quartiles, such as the highest 75 th quartile or the lowest 25 th quartile. In further embodiments, the individual is ranked compared to the mean or median score in the population.
In one embodiment, the population compared to the individual includes a large number of people from different geographic and ethnic backgrounds, such as a global population. In other embodiments, the population compared to the individual is limited to a particular geography, family, race, gender, age (fetal, neonatal, child, juvenile, adolescent, adult, elderly individual), disease state (e.g., symptomatic, asymptomatic, carrier, early onset, late onset). In some embodiments, the population compared to the individual is derived from information reported from public and/or private information sources.
In one embodiment, the GCI score or GCIPlus score of the individual is visualized using a display device. In some embodiments, a display screen (e.g., a computer monitor or television screen) is used for visual display, such as a personal portal with associated information. In another embodiment, the display device is a static display device, such as a printed page. In one embodiment, the display may include, but is not limited to, one or more of the following: bin (bin) (e.g., 1-5, 6-10, 11-15, 16-20, 21-25, 26-30, 31-35, 36-40, 41-45, 46-50, 51-55, 56-60, 61-65, 66-70, 71-75, 76-80, 81-85, 86-90, 91-95, 96-100), color or gray scale gradient, thermometer, scale, pie chart, bar chart, or bar chart. For example, fig. 18 and 19 are different displays of MS and fig. 20 is for crohn's disease. In another embodiment, a thermometer is used to display the GCI score and disease/status prevalence. In another embodiment, the temperature table displays the level of change with the reported GCI score, e.g., fig. 15-17, the color corresponding to risk. The thermometer may display a colorimetric change with increasing GCI score (e.g., gradually changing from a blue color at a lower GCI score to a red color at a higher GCI score). In a related embodiment, the thermometer displays a level that varies with the reported GCI score and a colorimetric change that increases with the level of risk.
In an alternative embodiment, the individual's GCI score is communicated to the individual using auditory feedback. In one embodiment, the audible feedback is a verbal explanation that the risk level is high or low. In another embodiment, the auditory feedback is a recitation of a particular GCI score, such as a number, percentile, range, quartile, or comparison to a population average or median GCI score. In one embodiment, the live person delivers the audible feedback either personally or through a communication device, such as a telephone (landline, cellular, or satellite), or through a personal portal. In another embodiment, the audible feedback is delivered by an automated system (e.g., a computer). In one embodiment, the auditory feedback is delivered as part of an Interactive Voice Response (IVR) system, a technique that allows computers to detect voice and touch tones using normal telephone calls. In another embodiment, the individual may interact with a central server through an IVR system. IVR systems can react to pre-recorded or dynamically generated audio to interact with individuals and provide them with auditory feedback of their risk level. In one embodiment, the individual may call a number answered by the IVR. After optionally entering an authentication code, security code, or through a voice recognition program, the IVR system causes the object to select an option from a menu, such as a touch tone or voice menu. One of these options may provide the individual with his or her risk level.
In another embodiment, the GCI score of an individual is visualized using a display device and communicated using auditory feedback, for example through a personal portal. This combination may include a visual display of the GCI score and an audible feedback that discusses the relevance of the GCI score to the overall health of the individual and possible precautions that may be proposed.
In one embodiment, the GCI score is generated using a multi-step method. Initially, for each state to be studied, the relative risk of odds ratio derived from each genetic marker is calculated. For each prevalence value of p 0.01, 0.02,..., 0.5, GCI scores for the HapMap CEU population were calculated based on prevalence and HapMap allele frequency. If the GCI score does not change under varying prevalence, the only assumption considered is the existence of a cumulative model. Additionally, it may be determined that the model is sensitive to popularity. For any combination of the uncalled values, the distribution of relative risk and score in the HapMap population was obtained. For each new individual, the individual score was compared to the HapMap distribution and the resulting score was the ranking of the individual in this population. The resolution of the reported scores may be low due to assumptions made in the process. The population will be divided into quantiles (3-6 bins) and the reported bin will be the one in which the individual ranks fall. The number of bins may be different for different diseases based on considerations such as resolution of the scores for each disease. In the case of a link between scores of different HapMap individuals, an average ranking will be used.
In one embodiment, a higher GCI score is interpreted as indicating an increased risk of acquiring or being diagnosed with a condition or disease. In another embodiment, a mathematical model is used to derive the GCI score. In some embodiments, the GCI score is based on a mathematical model that accounts for incomplete features that underlie information about a population and/or a disease or condition. In some embodiments, the mathematical model includes at least one hypothesis that is specific as part of the basis for calculating the GCI score, wherein the hypothesis includes, but is not limited to: an assumption of a given odds ratio; an assumption that the popularity of the state is known; the hypothesis that the genotype frequencies in the population are known; and the hypothesis that consumers are from the same community background as the study used and HapMap; the combined risk is the hypothesis of the product of different risk factors for the individual genetic markers. In some embodiments, GCI may also include the hypothesis that the polygenic frequency of a genotype is the product of the allelic frequencies of each SNP or individual genetic marker (e.g., different SNPs or genetic markers are independent throughout the population).
Integral model
In one embodiment, the GCI score is calculated under the assumption that the risk attributed to the set of genetic markers is the product of the risks attributed to the individual genetic markers. This means that different genetic markers are due to the risk of disease independently of other genetic markers. Formally, there is an at risk allele r1、...、rkAnd non-risk allele n1、...、nkK genetic markers of (1). In SNPi, we mean that the three possible genotype values are riri、niriAnd nini. Genotype information of an individual can be determined by vector (g)1、...、gk) Described, wherein g is based on the number of risk alleles at the i positioniMay be 0,1 or 2. We pass through λ1 iIndicating the relative risk of a heterozygous genotype at the same position compared to the homozygous non-risk allele at position i. In other words, we defineSimilarly, we mean ririThe relative risk of the genotype isUnder the integrative model, we assume that there is genotype (g)1、...、gk) The risk of the individual of (a) isVolumetric models have previously been used in the literature to simulate case-control studies or for visualization purposes.
Assessing relative risk
In another embodiment, the relative risk for different genetic markers is known and a cumulative model can be used for risk assessment. However, in some embodiments that include association studies, the study design prevents reporting of relative risk. In some case-control studies, the relative risk cannot be calculated directly from the data without further assumptions. Instead of reporting relative risk, the usual way is to report the Odds Ratio (OR) of genotypes, which are diseases (r) carrying a given risk genotypeiriOr niri) The ratio of the probability of not carrying a given risk genotype disease. In the form of a sheet, the sheet is,
finding the relative risk from the odds ratio may require additional assumptions. For example, assume allele frequencies in the entire populationAndknown or evaluated (these may be evaluated from existing datasets, e.g. a HapMap dataset comprising 120 chromosomes), and/or it is assumed that the prevalence of the disease, p ═ p (d), is known. From the three equations above, one can derive:
p=a·P(D|nini)+b·P(D|niri)+c·P(D|riri)
by definition of relative risk, in dividing by pP (D | n)ini) After the term, the first equation can be rewritten as:
and therefore the latter two equations can be rewritten as:
(1)
it should be noted that equation system 1 is equivalent to the Zhang and Yu formulas in Zhang J and Yu k when a is 1 (non-risk allele frequency of 1) (What's the relative rise a method of correcting the proportions in the co-ordinates of common outrecords, jama, 280: 1690-1, 1998, the entire contents of which are incorporated by reference). In contrast to Zhang and Yu formulas, some embodiments of the invention take into account allele frequencies in the population, which may affect relative risk. Still other embodiments allow for interdependence of relative risk. This is in contrast to calculating each relative risk independently.
Equation system 1 can be rewritten as two quadratic equations with up to four possible solutions. Gradient descent algorithm (gradient algorithm) can be used to solveSolving these equations, where the starting point is set to the odds ratio, for example,and
for example:
finding a solution to these equations is equivalent to finding the function g (λ)1,λ2)=f11,λ2)2+f21,λ2)2Is measured.
Therefore, the temperature of the molten metal is controlled,
in this example, we pass the setting x0=OR1,y0=OR2And starting. We will find the value ε]=10-10Set to the tolerance constant (tolerance constant) of the whole algorithm. In iteration i, we defineThen, we set
These iterations are repeated until g (x)i,yi) < tolerance, wherein tolerance is set to 10 in the provided code-7
In this embodiment, these equations give a, b, c, p, OR1And OR2Positive solutions of different values of (b). FIG. 1 shows a schematic view of a0
Robustness of relative risk assessment
In some embodiments, the effect of different parameters (prevalence, allele frequency, and odds ratio error) on estimates of relative risk is determined. To determine the impact of allele frequencies and prevalence estimates on relative risk values, relative risk (under HWE) from a set of values of different odds ratios and different allele frequencies was calculated, and the results of these calculations were plotted for prevalence values in the range of 0 to 1. Fig. 10. In addition, for a fixed prevalence value, the resulting relative risk can be plotted as a function of risk allele frequency. Fig. 11. When p is 0, λ1=OR1And λ2=OR2And when p is 1, λ1=λ20. This can be calculated directly from the equation. In addition, in some embodiments, λ is when the risk allele frequency is high1Is closer to a linear function, and λ2Closer to a concave function with a bounded second derivative. In the limiting case, λ is when c is 12=OR2+p(1-OR2) And is andif OR is present1≈OR2The latter is also close to a linear function. When the risk allele frequency is low, lambda1And λ2The behavior of the approximation function 1/p. In the limiting case, when c is 0,this indicates that for high risk allele frequencies, an incorrect prevalence estimate will not significantly affect the resulting relative risk. In addition, for low risk allele frequencies, if the correct prevalence p is replaced by a prevalence value p' ═ α p, then the resulting relative risk will be eliminated at mostThe coefficient of (a). This is illustrated in the (c) and (d) diagrams of fig. 11. It should be noted that for high risk allele frequencies, the two plots are quite similar, whereas for low allele frequencies, there is a higher deviation in the difference in relative risk values, which is less than a factor of 2.
Calculating GCI score
In one embodiment, the genetic composite index is calculated using a reference set representing the relevant population. This reference set may be one of the populations in the HapMap or another genotype dataset.
In this embodiment, the GCI is calculated as follows. For each of the k risk loci, the relative risk is calculated from the odds ratio using equation system 1. Then, a volumetric score is calculated for each individual in the reference set. The GCI of an individual with a positive score of s is the score of all individuals in the reference dataset with a score of s' ≦ s. For example, if 50% of the individuals in the reference set have a multiplicative score less than s, then the individual's final GCI score will be 0.5.
Other models
In one embodiment, a volumetric model is used. In alternative embodiments, other models may be used for the purpose of determining the GCI score. Other suitable models include, but are not limited to:
an additive model. In an additive model, has genotype (g)1,...gk) The risk of the individual is assumed to be
A generalized additive model. In the generalized additive model, it is assumed that the function f exists so as to have a genotype (g)1,...gk) The risk of the individual of (a) is
Harvard improvement score (Het). This score was derived by G.A Colditz et al, whereby the score was applied to a genetic marker (Harvard report on cancer preventionvolume 4: Harvard cancer risk index. cancer Causes and Controls, 11: 477-. Although the function f operates on odds ratios rather than relative risk, the Het score is essentially a generalized additive score. This is useful in situations where the relative risk is difficult to assess. To define the function f, the intermediate function g is defined as:
then calculateIn which p ishet iThe frequency of individuals heterozygous for SNP i in the entire reference population. The function f is then defined as f (x) g (x)/Het, and Harvard improvement score (Het) is simply defined as
Harvard modified score (Hom). Except that the value het is valuedInstead, this score is similar to the Het score, where p ishom iThe frequency of individuals with homozygous risk alleles.
The maximum advantage ratio. In this model, it is assumed that one of the genetic markers (the one with the greatest odds ratio) gives a lower bound to the combined risk for the entire group of subjects. Formally, having a genotype (g)1,...gk) Is scored as
Comparison between scores
In one embodiment, GCI scores were calculated based on multiple models across the entire HapMap CEU population for 10 SNPs associated with T2D. The related SNPs are rs7754840, rs4506565, rs7756992, rs10811661, rs12804210, rs8050136, rs1111875, rs4402960, rs5215 and rs 1801282. Odds ratios of three possible genotypes for each of these SNPs are reported in the literature. The CEU population consists of a three-person group of thirty mothers-father-children. To avoid dependency, sixty parents from this group were employed. One individual with no calls in one of the 10 SNPs was excluded, resulting in a set of 59 individuals. The GCI rating of each individual was then calculated using several different models.
It can be observed that for this data set, the different models produce highly correlated results. Fig. 12 and 13. Spearman correlation was calculated between pairs of models (table 2), which showed that the additive and the multiplicative models had a correlation coefficient of 0.97, and thus the GCI score was robust using either the additive or the multiplicative models. Similarly, the correlation between the Harvard modified score and the multiplicative model was 0.83, and the correlation coefficient between the Harvard score and the additive model was 0.7. However, using the maximum odds ratio as the genetic score results in a dichotomous score (dichotomous score) defined by one SNP. Overall, these results show that scoring ranks provide a stable framework that minimizes model dependence.
Table 2: spearman correlation of score distribution of CEU data between model pairs.
The effect of variation in T2D prevalence on the resulting distribution was determined. The popularity values varied between 0.001 and 0.512 (FIG. 14). For the case of T2D, it can be seen that different prevalence values result in the same order of individuals (Spearman correlation > 0.99), so an artificially fixed value of 0.001 for prevalence can be assumed.
Extending a model to an arbitrary number of variants
In another embodiment, the model may be extended to the case where any number of possible variations occur. The previous considerations relate to the case where there are three possible variants (nn, nr, rr). In general, when multiple SNP associations are known, any number of variants can be found in a population. For example, when the interaction between two genetic markers is correlated with status, there are nine possible variants. This results in eight different odds ratios.
To summarize the original formula, it can be assumed that there are k +1 possible variants a0,...,akHaving a frequency f0,f1,...,fkThe measured odds ratio is 1, OR1,...,ORkAnd unknown relative risk value of 1, lambda1,...,λk. It can further be assumed that with respect to a0All relative risk and odds ratios were determined, and therefore,andbased on:
can determine
And, if setThis results in the following equation:
and therefore the number of the first and second channels,
or
The latter is an equation with one variable (C). This process can produce many different solutions (basically, up to k +1 different solutions). A criteria optimization tool (e.g., gradient descent) may be used to find the closest C0=∑fitiThe solution of (1).
The present invention uses a stable scoring framework for risk factor quantification. Although different genetic models may result in different scores, the results are often correlated. Thus, the quantification of risk factors is generally independent of the model used.
Comparative Risk case assessment study
Methods for assessing relative risk from odds ratios of multiallelic genes in case-control studies are also provided in the present invention. In contrast to previous approaches, this approach takes into account allele frequency, prevalence of disease, and dependence between the relative risk of different alleles. The performance of this method on a simulated case control study was determined and found to be extremely accurate.
Method of producing a composite material
In the case of testing for association of a particular SNP with disease D, R and N represent risk and non-risk alleles of this particular SNP. P (RR | D), P (RN | D) and P (NN | D) represent the probability of being affected by disease if the individual is assumed to be homozygous for the risk allele, heterozygous or homozygous for the non-risk allele, respectively. f. ofRR、fRNAnd fNNUsed to indicate the frequency of the three genotypes in the population. Using these definitions, relative risk is defined as
In case control studies, P (RR | D) values (i.e., the frequency of RR in case and control) and P (RN | D), P (NN | D) and P (NN | D), i.e., the frequency of RN and NN in case and control, can be evaluated. To estimate relative risk, Bayes (Bayes) law can be used to derive:
thus, if the frequency of genotypes is known, one can use them to calculate relative risk. The frequency of genotypes in a population cannot be calculated from the case-control study itself, as they depend on the prevalence of the disease in the population. In particular, if the prevalence of the disease is p (d):
fRR=P(RR|D)p(D)+P(RR|~D)(1-p(D))
fRN=P(RN|D)p(D)+P(RN|~D)(1-p(D))
fNN=P(NN|D)p(D)+P(NN|~D)(1-p(D))。
when p (d) is sufficiently small, the frequency of genotypes can approach that in the control population, but when prevalence is high, this will not be an accurate estimate. However, if a reference data set (e.g., HapMap [ cite ]) is given, one can estimate genotype frequencies based on the reference data set.
Most recent studies do not estimate relative risk using a reference dataset and only report odds ratios. Odds ratio can be written
The odds ratio is often advantageous since it is usually not necessary to have an estimate of allele frequency in the population; to calculate odds ratios, what is generally required is genotype frequency in cases and controls.
In some cases, genotype data is not available by itself, but summary data (e.g., odds ratio) is available. This is the case when a posterior analysis (meta-analysis) is performed based on results from previous case control studies. In this case, it is verified how to find the relative risk from the odds ratio. The fact shown using the following equation:
p(D)=fRRP(D|RR)+fRNP(D|RN)+fNNP(D|NN)
if this equation is divided by P (D | NN), we get
This enables the odds ratio to be written in the form:
by a similar calculation, the following system of equations is obtained:
equation 1
If odds ratios, genotype frequencies in the population, and prevalence of disease are known, then relative risk can be obtained by solving this system of equations.
It should be noted that there are two quadratic equations, so they have a maximum of four solutions. However, as shown below, there is typically one possible solution to this approach.
It should be noted that when fNNWhen 1, equation system 1 is equivalent to Zhang and Yu formulas; however, the allele frequencies in the population are considered here. Moreover, our method takes into account the fact that: the two relative risks are dependent on each other, whereas previous methods propose calculating each relative risk independently.
Relative risk of multiallelic loci. The calculations are somewhat complex if multiple markers or other multiallelic variants are considered. a is0、a1、...、akRepresents the possible k +1 alleles, where a0Is a non-risk allele. The allele frequency f in the population for k +1 possible alleles is assumed0、f1、f2、...、fk. For allele i, the relative risk and odds ratio is defined as
The following equation applies to the prevalence of the disease:
thus, by dividing both sides of the equation by p (D | a)0) We get:
obtaining:
by settingTo obtainThus, by definition of C, we derive:
this is a polynomial equation with one variable C. Once C is determined, the relative risk is determined. The polynomial is k +1 degrees and therefore we expect to have at most k +1 solutions. However, since the right side of the equation strictly reduces as a function of C, there may typically be only one solution to this equation. This solution is easily found using a binary search, since the solution bounds on C ═ 1 andin the meantime.
Stability of relative risk assessment. The effect of various parameters (prevalence, allele frequency, and odds ratio error) on the estimate of relative risk was determined. To determine the impact of allele frequencies and prevalence estimates on relative risk values, relative risk was calculated from a set of values for different odds ratios, different allele frequencies (at HWE), and the results of these calculations were plotted for prevalence values in the range of 0 to 1.
In addition, for a fixed prevalence value, the resulting relative risk is plotted as a function of risk allele frequency. It is clear that in all cases of p (d) ═ 0, λRR=ORRRAnd λRN=ORRNAnd when p (D) is 1, λRR=λRN0. This can be directly calculated from equation 1. In addition, λ is when the risk allele frequency is highRRClose to linear behavior, and λRNClose to a concave function with a bounded second derivative. When the frequency of the risk allele is low,λRRand λRNClose to the behavior of the function 1/p (D). This means that for high risk allele frequencies, a false estimate of prevalence will not greatly affect the resulting relative risk.
The following examples illustrate and explain the present invention. The scope of the invention is not limited to these examples.
Example I
Generation and analysis of SNP profiles
The individual is provided with a sample tube in a kit (e.g., purchased from DNA Genotek) in which the individual deposits a saliva sample (approximately 4ml) from which genomic DNA will be extracted. Saliva samples were delivered to CLIA certified laboratories for processing and analysis. Typically, the sample is delivered to the testing facility by overnight mailing in a shipping container that is conveniently provided to the individual within the collection kit.
In a preferred embodiment, the genomic DNA is isolated from saliva. For example, using the DNA self-collection kit technology provided by DNA Genotek, an individual collects approximately 4ml of saliva samples for clinical processing. After delivery of the sample to an appropriate laboratory for processing, the DNA is isolated by heat denaturation and protease digestion of the sample (typically for at least one hour at 50 ℃ using reagents provided by the collection kit supplier). Subsequently, the sample was centrifuged, and the supernatant was subjected to ethanol precipitation. The DNA pellet is suspended in a buffer suitable for subsequent analysis.
Genomic DNA of an individual is isolated from a saliva sample according to well known procedures and/or procedures provided by the manufacturer of the collection kit. Typically, the sample is first heat denatured and protease digested. Next, the sample was centrifuged, and the supernatant was retained. The supernatant was then subjected to ethanol precipitation to obtain a precipitate containing approximately 5-16 ug of genomic DNA. The DNA pellet was suspended in 10mM Tris (pH 7.6), 1mM EDTA (TE). SNP profiles were generated by hybridizing genomic DNA to commercially available high density SNP arrays (e.g., those provided by Affymetrix or Illumina) using instrumentation and instructions provided by the array manufacturer. Individual SNP profiles are stored in an encrypted database or vault.
The data structure of the patient is queried for risk-conferring SNPs by comparison with a clinical database of established, medically relevant SNPs whose presence in the genome is associated with a given disease or condition. The database includes information on the statistical relevance of specific SNPs and SNP haplotypes to a particular disease or condition. For example, as shown in example III, polymorphisms in the apolipoprotein E gene lead to distinct isoforms of the protein, which in turn are associated with a statistical likelihood of developing Alzheimer's disease. As another example, individuals with a variant of the coagulation protein factor V, known as the factor VLeiden, have an increased tendency to clot. Many genes in which SNPs are associated with disease or status phenotypes are shown in table 1. The scientific accuracy and importance of the information in the database is approved by the research/clinical advisory committee and can be reviewed by a supervising governmental agency. The database can be continuously updated as more SNP-disease associations emerge from the scientific community.
The results of the analysis of the individual SNP profiles are securely provided to the patient through an online portal or email. The patient is provided with explanatory and supportive information, such as the information on factor V Leiden shown in example IV. Secure access to the individual's SNP profile information (e.g., through an online portal) would facilitate discussion with the patient's physician and give the ability to select for personalized medicine.
Example II
Updating of genotype correlations
In response to a request to initially determine the genotype correlations of an individual, a genomic profile is generated, the genotype correlations are obtained, and the results are provided to the individual as described in example I. After an initial determination of the genotype correlations of the individual, later when additional genotype correlations are known, updated correlations are determined or can be determined. The registered users have advanced registrations and their genotype profiles are stored in an encrypted database. The updated correlations were performed on the stored genotype profiles.
For example, as described in example I above, initial genotype correlations have determined that a particular individual does not have ApoE4, and therefore is not susceptible to early onset alzheimer's disease, and that this individual does not have factor V Leiden. After this initial determination, the new correlations become known and validated such that polymorphisms in a given gene (say gene XYZ) are correlated with a given state (say state 321). This new genotype correlation was added to the master database of human genotype correlations. Updates are then provided to the specific individual by first obtaining data for the relevant genes XYZ from the genomic profile of the specific individual stored in the encrypted database. The relevant gene XYZ data for a particular individual is compared to the gene XYZ information of the updated master database. From this comparison, the susceptibility or predisposition of a particular individual to state 321 is determined. The results of this determination are added to the genotype correlations of a particular individual. The updated results of whether a particular individual is sensitive or genetically susceptible to the state 321 are provided to the particular individual along with explanatory and supportive information.
Example III
Association of the ApoE4 locus with Alzheimer's disease
The risk of Alzheimer's Disease (AD) has been shown to be associated with polymorphisms in the apolipoprotein e (ApoE) gene, which result in three isoforms of ApoE known as ApoE2, ApoE3 and ApoE 4. These isoforms differ from each other by one or two amino acids at residues 112 and 158 of the APOE protein. ApoE2 contains cysteine/cysteine at position 112/158; ApoE3 contains cysteine/arginine at position 112/158; and ApoE4 contains arginine/arginine at position 112/158. As shown in Table 3, the risk of Alzheimer's disease onset at a younger age increases with the APOE ε 4 gene copy number. Also, as shown in table 3, the relative risk of AD increases with APOE ∈ 4 gene copy number.
Table 3: prevalence of AD risk alleles (Corder et al, Science: 261: 921-3, 1993)
APOE epsilon 4 copies Popularity of Risk of alzheimer's disease Age of onset
0 73% 20% 84
1 24% 47% 75
2 3% 91% 68
Table 4: has the relative risk of AD of ApoE4 (Farrer et al, JAMA: 278: 1349-56, 1997)
APOE genotype Ratio of advantages to each other
ε2ε2 0.6
ε2ε3 0.6
ε3ε3 1.0
ε2ε4 22.6
ε3ε4 3.2
ε4ε4 14.9
Example IV
Information on factor V Leiden-positive patients
The following information is an example of information that may be provided to individuals with genomic SNP profiles showing the presence of the factor V Leiden gene. The individual may have a basic registration that may provide information in an initial report.
What is the factor V Leiden?
Factor V Leiden is not a disease, which means the presence of a specific gene inherited by one's parents. Factor V Leiden is a variant of the protein factor V (5) required for coagulation. Persons with factor V deficiency are more likely to bleed severely, while persons with factor V Leiden have an increased tendency to clot blood.
The human carrying the factor V Leiden gene has a 5-fold higher risk of developing a blood clot (thrombosis) than the rest of the population. But many people with this gene never develop blood clots. In the uk and usa, 5% of the population carries one or more factor V Leiden genes, which is much greater than the number of people who will actually suffer thrombosis.
How do you get the factor V Leiden?
The factor V gene is inherited by one's parents. As with all genetic traits, one gene is inherited from the mother and one from the father. Thus, it is possible to inherit: two normal genes or one factor V Leiden gene and one normal gene or two factor VLeiden genes. Having one factor V Leiden gene will result in a slightly higher risk of developing thrombosis, but having two genes results in a much greater risk.
What are the symptoms of factor V Leiden?
There are no signs unless you have a blood clot (thrombosis).
What is the danger signal?
The most common problem is blood clots in the legs. Leg swelling, pain and redness indicate this problem. In more rare cases, pulmonary blood clots (pulmonary thrombosis) may occur, which lead to breathing difficulties. Depending on the size of the blood clot, the severity of this condition is barely noticeable in patients with severe dyspnea. In even more rare cases, blood clots may occur in the arm or other body parts. Since these clots form in veins that transport blood to the heart rather than in arteries that carry blood from the heart, the factor VLeiden does not increase the risk of coronary thrombosis.
What can do to avoid blood clots?
Factor V Leiden is only slightly elevated leading to the risk of blood clots and many people with this state never develop thrombosis. One can do many things to avoid causing blood clots. Avoid standing or sitting in the same posture for a long time. When traveling long distances, it is important to exercise regularly-the blood must be left "still". Overnight or smoking will greatly increase the risk of blood clots. Women carrying the factor V Leiden gene should not take a contraceptive pill because this would significantly increase the chance of thrombosis. Women carrying the factor V Leiden gene should also consult their physician before pregnancy because this also increases the risk of thrombosis.
How do doctors find out if you have the factor V Leiden?
The gene for factor V Leiden can be found in blood samples.
Blood clots in the leg or arm are typically determined by ultrasound examination.
After a substance is injected into the blood to visualize the clot, the clot may also be detected by X-ray. Blood clots in the lungs are more difficult to find, but often physicians will use radioactive materials to test the distribution of blood flow in the lungs and the distribution of air flow into the lungs. The two distribution patterns should match-a mismatch indicates the presence of a blood clot.
How does the factor V Leiden handle?
Persons with factor V Leiden do not require treatment unless their blood begins to clot, in which case the physician will prescribe a blood-thinning (anticoagulant) drug, such as warfarin (e.g., warfarin sodium) or heparin, to prevent further blood clotting. Treatment will typically last three to six months, but may take longer if there are several blood clots. In severe cases, the course of medication may continue indefinitely; in extremely rare cases, blood clots may require surgical removal.
How does factor V Leiden treat during pregnancy?
Women carrying two factor V Leiden genes will need to receive treatment with heparin procoagulant drugs during pregnancy. The same treatment is applicable to women carrying only one factor V Leiden gene who have previously had a blood clot themselves or a family history of blood clotting.
All women carrying the factor V Leiden gene may need to wear special stockings to prevent blood clots in the latter half of pregnancy. After the birth of children, they may be prescribed the anticoagulant drug heparin.
Prognosis
The risk of developing blood clots increases with age, but in an age-based survey of 100 people carrying the gene, only a few have been found to have suffered from thrombosis. The National Society of Genetics Counselors (NSGC) can provide a list of Genetic consultants in The region of you and information about establishing family history. Their online databases are searched on www.nsgc.org/consumer.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that many alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that the invention cover methods and structures within the scope of these claims and their equivalents.

Claims (78)

1. A method of assessing genotype correlations of an individual, the method comprising:
a) obtaining a genetic sample of the individual, wherein the genetic sample is DNA;
b) generating a genomic profile of the individual from the genetic sample;
c) determining a genotype correlation for the individual by comparing the genomic profile of the individual to a database of correlations of current human genotypes and phenotypes to determine, for each phenotype of interest, a plurality of relative risk or odds ratios for a plurality of alleles, including risk alleles or non-risk alleles, of the individual;
d) updating the human genotype correlations database with additional human genotype correlations, when the additional human genotype correlations are known; and
e) updating the genotype correlation of the individual by comparing the genomic profile of the individual of step c), or a portion thereof, to the additional human genotype correlation and determining additional genotype correlations for the individual.
2. The method of claim 1, wherein the genetic sample is obtained by a third party.
3. The method of claim 1, wherein the generating a genomic profile is performed by a third party.
4. The method of claim 1, further comprising calculating a GCI score, wherein the GCI is calculated from a plurality of relative risks or odds ratios.
5. The method of claim 1, wherein the genomic profile comprises single nucleotide polymorphisms, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations.
6. The method of claim 1, wherein the genomic profile is the entire genome of the individual.
7. The method of claim 1, wherein the method comprises assessing 2 or more genotype correlations.
8. The method of claim 1, wherein the method comprises assessing 10 or more genotype correlations.
9. The method of claim 1, wherein said human genotype correlation database comprises genetic variants in one or more genes listed in table 1, figure 4, figure 5, 6, 22, or 25 and phenotypes associated with the genetic variants.
10. The method of claim 1, wherein said human genotype correlation database comprises genetic variants determined from said genomic profile of said individual and predetermined phenotypes revealed by said individual.
11. The method of claim 9 or 10, wherein the genetic variant is a single nucleotide polymorphism, a nucleotide insertion, a nucleotide deletion, a chromosomal translocation, a chromosomal duplication, or a copy number variation.
12. The method of claim 1, wherein the genetic sample is blood, hair, skin, saliva, semen, urine, fecal matter, sweat, or an oral sample.
13. The method of claim 1, wherein the genomic profile is generated using a high density DNA microarray, DNA sequencing, or PCR-based methods.
14. The method of claim 4, wherein at least one of physical data, medical data, ethnicity, family, geography, gender, age, family history, known phenotype, demographic data, exposure data, lifestyle data, or behavioral data of the individual is incorporated into the calculation of the GCI.
15. The method of claim 1, wherein the genomic profile of the individual is compared to a correlation between a SNP and a phenotype, wherein the SNP:
rs 69883267 when the phenotype is colorectal cancer, rs2165241 when the phenotype is exfoliative glaucoma, rs9939609 when the phenotype is obesity, rs3087243 or DRB1 0301 when the phenotype is Graves' disease, rs1800562 when the phenotype is hemochromatosis, rs 6969 when the phenotype is myocardial infarction, rs6897932, rs12722489 or DRB1 1501 when the phenotype is multiple sclerosis, rs11209026 when the phenotype is Psoriasis (PS), rs2300478, rs1026732 or rs9296249 when the phenotype is restless leg syndrome, rs6840978 or rs2187668 when the phenotype is celiac disease, rs 69883267, rs 30909 or rs 30906 when the phenotype is prostate cancer, dr5744798 when the phenotype is lupus erythematosus, dr5431711, DRB 3809, DRB 125355639 or DRB 36933 9 when the phenotype is rheumatoid arthritis, dr6446579 or DRB 4705, DRB 125357243 when the phenotype is lupus erythematosus, DRB 125357263, DRB 12535729, or DRB 3564049 when the phenotype is lupus erythematosus, rs2981582, rs3817198 or rs3803662, rs 2066846845, rs10883365, rs17234657, rs10210302, rs9858542, rs11805303, rs1000113, rs2542151 or rs10761659 when the phenotype is crohn's disease, rs13266634, rs4506565, rs7756992, rs10811661, rs8050136, rs1111875, rs4402960, rs5215 or rs1801282 when the phenotype is type 2 diabetes.
16. The method of claim 15, further comprising:
f) calculating at least one GCI score for said phenotype in combination with said relative risk or odds ratio.
17. A method of assessing genotype correlations of an individual, the method comprising:
a) obtaining a plurality of genetic samples from a plurality of individuals;
b) providing a set of rules comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype;
c) providing a data set comprising a genomic profile of each individual of the plurality of individuals, wherein each genomic profile comprises a plurality of genotypes;
d) determining a genotype correlation for the individual by comparing the genomic profile of the individual to a database of correlations of current human genotypes and phenotypes to determine, for each phenotype of interest, a plurality of relative risk or odds ratios for a plurality of alleles, including risk alleles or non-risk alleles, of the individual;
e) periodically updating the rule set with at least one new rule, wherein the at least one new rule indicates a correlation between genotypes and phenotypes that were previously unrelated to each other in the rule set; and
f) applying each new rule to the genomic profile of at least one of the individuals, thereby correlating at least one genotype with at least one phenotype for the individual.
18. The method of claim 17, further comprising:
f) generating a report comprising the phenotype profile of the individual.
19. The method of claim 17, further comprising: after step b)
i) Applying the rules of the rule set to the genomic profile of the individual to determine a set of phenotypic profiles of the individual; and
ii) generating a report comprising the initial phenotype profile of the individual.
20. The method of claim 18 or 19, wherein providing the report comprises transmitting the report over a network.
21. The method of claim 18 or 19, wherein the report is provided in an encrypted manner.
22. The method of claim 18 or 19, wherein the report is provided in an unencrypted manner.
23. The method of claim 18 or 19, wherein the report is provided through an online portal.
24. The method of claim 18 or 19, wherein the report is provided as a paper or email.
25. The method of claim 17, wherein the new rule associates an unassociated genotype with a phenotype.
26. The method of claim 17, wherein the new rule associates an associated genotype with a phenotype not previously associated therewith in the rule set.
27. The method of claim 17, wherein the new rule changes a rule in the rule set.
28. The method of claim 17, wherein the new rule is generated by correlation of a genotype of the genomic profile from the individual and a predetermined phenotype of the individual.
29. The method of claim 17, wherein the rule associates a plurality of genotypes with a phenotype.
30. The method of claim 17, wherein applying the new rule further comprises determining the phenotype profile based at least in part on a characteristic of the individual selected from the group consisting of ethnicity, pedigree, geography, gender, age, family history, and a predetermined phenotype.
31. The method of claim 17, wherein the genotype comprises a nucleotide repeat, a nucleotide insertion, a nucleotide deletion, a chromosomal translocation, a chromosomal repeat, or a copy number variation.
32. The method of claim 31, wherein the copy number variation is a microsatellite repeat, a nucleotide repeat, a centromeric repeat or a telomeric repeat.
33. The method of claim 17, wherein the genotype comprises a single nucleotide polymorphism.
34. The method of claim 17, wherein the genotypes comprise a haplotype and a diplotype.
35. The method of claim 17, wherein the genotype comprises a genetic marker in linkage disequilibrium with a phenotype-associated single nucleotide polymorphism.
36. The method of claim 17, wherein the phenotype profile indicates the presence or risk of the quantitative trait.
37. The method of claim 17, wherein the phenotype profile indicates a probability that an individual having a genotype has or will have a phenotype.
38. The method of claim 37, wherein the probability is based on a GCI or GCI Plus score.
39. The method of claim 37, wherein the probability is an estimated lifetime risk.
40. The method of claim 17, wherein the correlation is validated.
41. The method of claim 17, wherein the rule set includes at least 20 rules.
42. The method of claim 17, wherein the set of rules includes at least 50 rules.
43. The method of claim 17, wherein the rule set comprises rules based on the genotype correlations in table 1.
44. The method of claim 17, wherein the rule set comprises rules based on the genotype correlations in figures 4,5, 6, 22, or 25.
45. The method of claim 17, wherein the phenotype comprises a quantitative trait.
46. The method of claim 45, wherein the quantitative trait comprises a medical condition.
47. The method of claim 46, wherein said phenotype profile indicates the presence or absence of said medical condition, the risk of developing said medical condition, the prognosis of said medical condition, the effect of treatment of said medical condition, or the response to treatment of said medical condition.
48. The method of claim 45, wherein the quantitative trait comprises a phenotype of a non-medical condition.
49. The method of claim 45, wherein the quantitative trait is selected from the group consisting of a physical trait, a physiological trait, a mental trait, an emotional trait, ethnicity, pedigree, or age.
50. The method of claim 17, wherein the subject is a human.
51. The method of claim 17, wherein the subject is a non-human.
52. The method of claim 17, wherein the individual is a registered user.
53. The method of claim 17, wherein the individual is a non-registered user.
54. The method of claim 17, wherein the genomic profile comprises at least 100,000 genotypes.
55. The method of claim 17, wherein the genomic profile comprises at least 400,000 genotypes.
56. The method of claim 17, wherein the genomic profile comprises at least 900,000 genotypes.
57. The method of claim 17, wherein the genomic profile comprises at least 1,000,000 genotypes.
58. The method of claim 17, wherein the genomic profile comprises substantially complete whole genome sequence.
59. The method of claim 17, wherein the data set comprises a plurality of data points, wherein each data point relates to an individual and comprises a plurality of data elements, wherein the data elements comprise at least one element selected from the group consisting of a unique identifier of the individual, genotype information, microarray SNP identification number, SNP rs number, chromosomal location, polymorphic nucleotides, quality metrics, raw data files, images, extracted intensity scores, physical data, medical data, ethnicity, pedigree, geography, gender, age, family history, known phenotype, demographic data, exposure data, lifestyle data, and behavioral data.
60. The method of claim 17, wherein the periodic updating and applying occurs at least once a year.
61. The method of claim 17, wherein providing the data set comprises obtaining a genomic profile for each individual of a plurality of individuals by:
i) performing genetic analysis on a genetic sample obtained from said individual, and
ii) encoding the analysis in a computer readable form.
62. The method of claim 17, wherein said phenotype profile comprises a monogenic phenotype.
63. The method of claim 17, wherein said phenotype profile comprises a multigenic phenotype.
64. The method of claim 17, wherein the report includes an initial phenotype profile.
65. The method of claim 17, wherein the report comprises an updated phenotype profile.
66. The method of claim 17, wherein said report further comprises information about said phenotype of said phenotypic profile selected from one or more of the following: preventive countermeasures, health information, therapy, symptom recognition, early detection protocols, intervention protocols, and precise identification and subclassification of the phenotypes in the phenotype profile.
67. The method of claim 17, further comprising:
e) adding a new genomic profile of a new individual to the individual dataset;
f) applying the rule set to the genomic profile of the new individual; and
g) generating an initial report of the phenotype profile of the new individual.
68. The method of claim 17, comprising:
e) adding a new genomic profile of the individual;
f) applying the rule set to the new genomic profile of the individual; and
g) generating a new report of the phenotype profile of the individual.
69. A system for assessing genotype correlations of an individual, the system comprising:
a) means for storing a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype, wherein the genotype correlation is determined by comparing the genomic profile of the individual to a database of correlations of current human genotypes and phenotypes to determine a plurality of relative risk or odds ratios for a plurality of alleles, including risk alleles or non-risk alleles, of the individual for each phenotype of interest;
b) means for periodically updating said rule set with at least one new rule, wherein said at least one new rule indicates a correlation between genotypes and phenotypes not previously correlated with each other in said rule set;
c) means for generating a genomic profile of an individual, thereby obtaining a database comprising genomic profiles of a plurality of individuals;
d) means for applying the rule set to the genomic profile of an individual to determine a phenotypic profile of the individual; and
e) and means for generating a report for each individual.
70. The system of claim 69, wherein the report is transmitted over a network.
71. The system of claim 69, wherein the report is provided in an encrypted manner.
72. The system of claim 69, wherein said report is provided in an unencrypted manner.
73. The system of claim 69, wherein said report is provided through an online portal.
74. The system of claim 69, wherein the report is provided via paper or email.
75. The system of claim 69, further comprising means for announcing to said individual a new or revised association.
76. The system of claim 69, further comprising code for advertising to said individual new or revised rules applicable to said genomic profile of said individual.
77. The system of claim 69, further comprising means for advertising to said individual new or revised prevention and health information regarding said phenotype of said phenotype profile of said individual.
78. A kit for performing the method of claim 1, the kit comprising:
a) at least one sample collection container;
b) instructions for obtaining a sample from an individual;
c) instructions for accessing a genomic profile of the individual obtained from the sample through an online portal;
d) instructions for accessing, via an online portal, a phenotype profile of the individual obtained from the sample; and
e) a package for delivering the sample collection container to the sample processing mechanism.
HK10106416.1A 2006-11-30 2007-11-30 Genetic analysis systems and methods HK1139737B (en)

Applications Claiming Priority (13)

Application Number Priority Date Filing Date Title
US86806606P 2006-11-30 2006-11-30
US60/868,066 2006-11-30
US95112307P 2007-07-20 2007-07-20
US60/951,123 2007-07-20
US11/781,679 2007-07-23
US11/781,679 US20080131887A1 (en) 2006-11-30 2007-07-23 Genetic Analysis Systems and Methods
US97219807P 2007-09-13 2007-09-13
US60/972,198 2007-09-13
US98562207P 2007-11-05 2007-11-05
US60/985,622 2007-11-05
US98968507P 2007-11-21 2007-11-21
US60/989,685 2007-11-21
PCT/US2007/086138 WO2008067551A2 (en) 2006-11-30 2007-11-30 Genetic analysis systems and methods

Publications (2)

Publication Number Publication Date
HK1139737A1 true HK1139737A1 (en) 2010-09-24
HK1139737B HK1139737B (en) 2014-04-11

Family

ID=

Also Published As

Publication number Publication date
EP2102651A2 (en) 2009-09-23
GB0723512D0 (en) 2008-01-09
CA2671267A1 (en) 2008-06-05
EP2102651A4 (en) 2010-11-17
CN103642902B (en) 2016-01-20
GB2444410B (en) 2011-08-24
JP2014140387A (en) 2014-08-07
AU2007325021B2 (en) 2013-05-09
GB2444410A (en) 2008-06-04
WO2008067551A3 (en) 2008-12-11
KR20090105921A (en) 2009-10-07
CN103642902A (en) 2014-03-19
TW200847056A (en) 2008-12-01
WO2008067551A2 (en) 2008-06-05
JP2010522537A (en) 2010-07-08
TWI363309B (en) 2012-05-01
AU2007325021A1 (en) 2008-06-05

Similar Documents

Publication Publication Date Title
US9092391B2 (en) Genetic analysis systems and methods
EP2215253B1 (en) Method and computer system for correlating genotype to phenotype using population data
TWI363309B (en) Genetic analysis systems, methods and on-line portal
CN101617227B (en) Genetic analysis systems and methods
TWI423063B (en) Methods and systems for personalized action plans
Thomas et al. Recent developments in genomewide association scans: a workshop summary and review
TWI423151B (en) Methods and systems for incorporating multiple environmental and genetic risk factors
TWI460602B (en) Device for universal preconception screening
Hicks et al. Integrative analysis of response to tamoxifen treatment in ER-positive breast cancer using GWAS information and transcription profiling
Schaid et al. Discovery of cancer susceptibility genes: study designs, analytic approaches, and trends in technology
HK1139737B (en) Genetic analysis systems and methods
Coon et al. A generic research paradigm for identification and validation of early molecular diagnostics and new therapeutics in common disorders
HK1156668A (en) Methods and systems for universal carrier screening

Legal Events

Date Code Title Description
PC Patent ceased (i.e. patent has lapsed due to the failure to pay the renewal fee)

Effective date: 20171130