HK1139737A1

HK1139737A1 - Genetic analysis systems and methods

Info

Publication number: HK1139737A1
Application number: HK10106416.1A
Authority: HK
Inventors: D‧A‧斯坦芬; M‧F‧菲利普庞; J‧韦塞尔; M‧卡吉尔; E‧哈尔佩里恩
Original assignee: 纳维哲尼克斯公司
Priority date: 2006-11-30
Filing date: 2007-11-30
Publication date: 2010-09-24
Also published as: EP2102651A2; GB0723512D0; CA2671267A1; EP2102651A4; CN103642902B; GB2444410B; JP2014140387A; AU2007325021B2; GB2444410A; WO2008067551A3; KR20090105921A; CN103642902A; TW200847056A; WO2008067551A2; JP2010522537A; TWI363309B; AU2007325021A1

Abstract

The present invention provides methods of determining a Genetic Composite Index score by assessing the association between an individual's genotype and at least one disease or condition. The assessment comprises comparing an individual's genomic profile with a database of medically relevant genetic variations that have been established to associate with at least one disease or condition.

Description

Genetic analysis system and method

Background

Other recent advances in human genome sequencing and human genomics have revealed that the genome composition between any two people has more than 99.9% similarity. Relatively small amounts of variation in DNA between individuals are responsible for differences in phenotypic traits and are associated with many human diseases, susceptibility to various diseases, and response to disease treatment. Inter-individual variation of DNA occurs in coding and non-coding regions and includes base changes at specific sites in the genomic DNA sequence, as well as insertion and deletion of DNA. Changes that occur at a single base position in the genome are referred to as single nucleotide polymorphisms, or "SNPs".

Although SNPs are relatively rare in the human genome, they account for a large portion of inter-individual DNA sequence variation, with one SNP occurring approximately every 1,200 base pairs in the human genome (see International HapMap Project, www.hapmap.org). The complexity of SNPs is becoming known as more human genetic information is available. In turn, the occurrence of SNPs in the genome is associated with the presence and/or susceptibility to a variety of diseases and conditions.

As these correlations and other advances in human genetics are obtained, medical and personal care are generally moving toward personalized approaches where a patient will make appropriate medical and other choices taking into account his or her genomic information, among other factors. Thus, there is a need to provide individuals and their health care providers with information specific to the individual's personal genome, thereby providing personalized medical and other decisions.

Disclosure of Invention

The present invention provides a method of assessing genotype correlations in an individual, the method comprising: a) obtaining a genetic sample of the individual, b) generating a genomic profile of the individual, c) determining the genotype-phenotype association of the individual by comparing the genomic profile of the individual to a current database of human genotype-phenotype associations, d) reporting the results from step c) to the individual or to a health care manager of the individual, e) when additional human genotype associations are known, updating a database of human genotype correlations with the additional human genotype correlations, f) updating the genotype correlations of the individual by comparing the genomic profile of the individual obtained from step c), or a portion thereof, with the additional human genotype correlations, and determining additional genotype correlations for the individual, and g) reporting the results obtained from step f) to the individual or to a health care manager of the individual.

The present invention further provides a commercial method of assessing genotype correlation in an individual, the method comprising: a) obtaining a genetic sample of the individual; b) generating a genomic profile of the individual, c) determining the genotype correlation of the individual by comparing the genomic profile of the individual to a database of human genotype correlations; d) providing the individual in an encrypted manner with the results of determining the genotype correlations of the individual; e) updating the human genotype correlations database with additional human genotype correlations, when the additional human genotype correlations are known; f) updating the genotype correlation of the individual by comparing the genomic profile of the individual or a portion thereof to additional human genotype correlations and determining additional genotype correlations for the individual; and g) providing the individual or a health care manager of the individual with results that update the genotype correlation of the individual.

Another aspect of the invention is a method of generating a phenotype profile of an individual, the method comprising: a) providing a rule set (rule set) comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype, b) providing a data set comprising a genomic profile of each individual of the plurality of individuals, wherein each genomic profile comprises a plurality of genotypes; c) periodically updating the rule set with at least one new rule, wherein the at least one new rule indicates a correlation between genotypes and phenotypes not previously associated with each other in the rule set; d) applying each new rule to a genomic profile of at least one individual, thereby correlating at least one genotype with at least one phenotype of the individual, and optionally, e) generating a report comprising the phenotypic profile of the individual.

The present invention also provides a system comprising: a) a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype; b) code for periodically updating the rule set with at least one new rule, wherein the at least one new rule indicates a correlation between genotypes and phenotypes not previously associated with each other in the rule set; c) a database comprising genomic profiles of a plurality of individuals; d) code for applying the rule set to a genomic profile of the individual to determine a phenotypic profile of the individual; and e) code to generate a report for each individual.

Another aspect of the present invention is the transmission over the network in an encrypted or unencrypted manner in the above described method and system.

Reference to the incorporated references

All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.

Drawings

FIG. 1 is a flow diagram illustrating a method aspect of the invention.

FIG. 2 is an example of a means for controlling the quality of genomic DNA.

FIG. 3 shows an example of hybridization quality control means.

FIG. 4 is a table of typical genotype correlations from publications with SNPs tested and effect evaluations. A-I) indicates genotype correlation of individual loci; J) indicates genotype correlation of the two loci; K) indicates genotype correlations at three loci; l) is an index of the ethnicity and national abbreviations used in A-K; m) is a reference for the index, inheritance rate and inheritance rate of the phenotypic Name abbreviation (ShortPhotylype Name) in A-K.

FIGS. 5A-J are tables with typical genotype correlations with effect evaluation.

FIGS. 6A-F are tables of typical genotype correlations and estimated relative risk.

FIG. 7 is an example report.

FIG. 8 is an illustration of a system for analyzing and transmitting genomic and phenotypic profiles over a network.

FIG. 9 is a flow chart illustrating a business method aspect of the invention.

FIG. 10: the prevalence (prevalence) evaluates the effect on relative risk assessment. Assuming Hardy-Weinberg Equilibrium (Hardy-Weinberg Equilibrium), each curve corresponds to a different value of allele frequency in the population. The two black lines correspond to a dominance ratio of 9 and 6, the two red lines correspond to a dominance ratio of 6 and 4, and the two blue lines correspond to a dominance ratio of 3 and 2.

FIG. 11: allele frequencies evaluate the effect on relative risk assessment. Each curve corresponds to a different value of prevalence in the population. The two black lines correspond to a dominance ratio of 9 and 6, the two red lines correspond to a dominance ratio of 6 and 4, and the two blue lines correspond to a dominance ratio of 3 and 2.

FIG. 12: pairwise comparison of absolute values of different models.

FIG. 13: pairwise comparisons of rank values (GCI scores) based on different models. Spearman correlations between the different pairs are given in table 2.

FIG. 14: popularity reports the effect on GCI scores. The Spearman correlation between any two prevalence values is at least 0.99.

FIG. 15: is a diagram of an example web page from a personal portal.

FIG. 16: a diagram of an example web page from a personal portal illustrating a risk of a person to have prostate cancer.

FIG. 17: a diagram of an example web page from a person's portal to illustrate the person's risk of crohn's disease.

FIG. 18: histogram of GCI scores for HapMAP-based multiple sclerosis using 2 SNPs.

FIG. 19: is a lifetime risk for individuals with multiple sclerosis using GCI Plus.

FIG. 20: histogram of GCI scores for crohn's disease.

FIG. 21: is a table of multi-locus correlations.

FIG. 22: table of SNPs and phenotypic associations.

FIG. 23: table of phenotypes and prevalence.

FIG. 24: are a glossary of abbreviations in figures 21, 22 and 25.

FIG. 25: table of SNPs and phenotypic associations.

Detailed Description

The present invention provides methods and systems for generating phenotypic profiles based on stored genomic profiles of individuals or groups of individuals, and for conveniently generating original and updated phenotypic profiles based on stored genomic profiles. The genomic profile is generated by genotyping a biological sample obtained from the individual. The biological sample obtained from an individual may be any sample from which a genetic sample may be derived. The sample may be from a buccal swab, saliva, blood, hair, or any other type of tissue sample. The genotype may then be determined from the biological sample. The genotype may be any genetic variant or biomarker, for example, Single Nucleotide Polymorphisms (SNPs), haplotypes (haplotypes)) or sequences of the genome. The genotype may be the entire genomic sequence of the individual. Genotypes can be derived from high throughput analysis that generates thousands or millions of data points, e.g., microarray analysis for most or all known SNPs. In other embodiments, the genotype may also be determined by high throughput sequencing.

The genotype forms the genomic map of the individual. Genomic profiles are stored digitally and are easily accessed at any point in time to generate phenotypic profiles. A phenotype profile is generated by applying rules that associate or bind a genotype with a phenotype. Rules may be formulated based on scientific studies that indicate a correlation between genotype and phenotype. The correlation may be validated or confirmed by a committee of one or more experts. By applying rules to the genomic profile of an individual, associations between the genotype and phenotype of the individual can be determined. An individual's phenotype profile will have this certainty. The determination may be a positive correlation between the genotype of the individual and a given phenotype, such that the individual has the given phenotype or will develop the phenotype. Alternatively, it may be determined that the individual does not have or will not produce a given phenotype. In other embodiments, the determination may be a risk factor, an estimate, or a probability that the individual has or will develop a phenotype.

The determination may be based on a variety of rules, for example, a variety of rules may be applied to a genomic profile to determine the association of an individual's genotype with a particular phenotype. The determination process may also include individual-specific factors such as race, gender, lifestyle (e.g., diet and exercise habits), age, environment (e.g., location of residence), family medical history, personal medical history, and other known phenotypes. The incorporation of specific factors may include these factors by modifying existing rules. Alternatively, separate rules may be generated from these factors and applied to the individual's phenotypic determination after existing rules have been applied.

A phenotype may include any measurable trait or characteristic, such as susceptibility to a disease or response to a drug treatment. Other phenotypes that may be included are physical and mental traits such as height, weight, hair color, eye color, sunburn sensitivity, size, memory, intelligence, optimism, overall temperament. Phenotypes may also include genetic comparisons with other individuals or organisms. For example, individuals may be interested in the similarity between their genomic profile and that of celebrities. They may also compare their genetic profile to other organisms (e.g., bacteria, plants, or other animals).

In summary, the collection of related phenotypes determined for an individual constitutes a phenotype profile for that individual. The phenotype profile may be accessed through an online portal. Alternatively, the phenotype profile may be provided in paper form as it existed at a particular time, with subsequent updates also being provided in paper form. Phenotypic profiles may also be provided through an online portal. The online portal may optionally be an encrypted online portal. Access to the phenotype profiles may be provided to registered users who subscribe to rules for generating correlations between phenotypes and genotypes, determining a genomic profile of an individual, applying the rules to the genomic profile, and a service for generating a phenotype profile of an individual. Access may also be provided to non-registered users, where they may have limited rights to access their phenotype profiles and/or reports, or may allow for the generation of an initial report or phenotype profile, but only generate updated reports through a paid subscription. Healthcare managers and providers, such as caregivers, physicians, and genetic consultants, may also have access to the phenotype spectrum.

In another aspect of the invention, genomic profiles may be generated for registered and non-registered users and stored digitally, but access to phenotypic profiles and reports may be limited to registered users. In another variation, both registered and non-registered users may have access to their genotype and phenotype profiles, but non-registered users have restricted access or allow for the generation of limited reports, whereas registered users have full access and may allow for the generation of full reports. In another embodiment, registered and non-registered users may initially have full access or full initial reports, but only registered users may access reports updated based on their stored genomic profile.

In another aspect of the invention, information regarding the association of a plurality of genetic markers with one or more diseases or conditions is combined and analyzed to obtain a Genetic Composite Index (GCI) score. This score includes known risk factors as well as other information and assumptions such as allele frequency and prevalence of the disease. GCI can be used to quantitatively assess the association of a disease or condition with the combined effects of a range of genetic markers. The GCI score can be used to provide reliable (e.g., robust), understandable, and/or intuitive knowledge to persons who are not genetically trained regarding their individual risk of contracting a disease as compared to a relevant population based on existing scientific studies. The GCI score may be used to generate a GCI Plus score. The GCI Plus score may include all GCI hypotheses including risk of status (e.g., lifetime risk), age-defined prevalence, and/or age-defined incidence. The lifetime risk of an individual can then be calculated as the GCI Plus score which is proportional to the individual GCI score divided by the average GCI score. The average GCI score may be determined from a group of individuals with similar familial context, such as a group of caucasians, asians, eastern indians, or other groups with common familial context. The group may consist of at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55 or 60 individuals. In certain embodiments, the average GCI score may be determined by at least 75, 80, 95, or 100 individuals. The GCI Plus score can be determined by determining the GCI score of the individual, dividing the GCI score by the average relative risk, and multiplying by the lifetime risk of the condition or phenotype. For example, the GCI Plus score is calculated using the data from fig. 22 and/or fig. 25 and the information in fig. 24, such as in fig. 19.

The present invention encompasses the use of the GCI scores described herein, and one of skill in the art will readily recognize the use of GCI Plus scores or variants thereof in place of the GCI scores described herein.

In one embodiment, a GCI score is generated for each disease or condition of interest. These GCI scores can be pooled to form a risk profile (risk profile) for the individual. The GCI scores may be stored digitally so that they can be conveniently accessed at any point in time to generate a risk profile. The risk profile may be broken down according to a large disease category, such as cancer, heart disease, metabolic disorder, mental disorder, bone disease, or age on-set disorder. Large disease classes can be further broken down into subclasses. For example, for a large classification as cancer, the subclass of cancers may be listed, for example, by type (sarcoma, carcinoma or leukemia, etc.) or by tissue specificity (nerve, breast, ovary, testis, prostate, bone, lymph node, pancreas, esophagus, stomach, liver, brain, lung, kidney, etc.).

In another embodiment, a GCI score is generated for the individual that provides readily understandable information about the individual's risk of acquiring or susceptibility to at least one disease or condition. In one embodiment, multiple GCI scores are generated for different diseases or conditions. In another embodiment, at least one GCI score may be accessed through an online portal. Alternatively, the at least one GCI score may be provided in a paper format, with subsequent updates also being provided in a paper format. In one embodiment, access to at least one GCI score is provided to registered users who are individuals subscribed to the service. In an alternative embodiment, non-registered users are provided access rights, where they may have limited access rights to access at least one of their GCI scores, or they may allow for the generation of an initial report of at least one of their GCI scores, but only generate updated reports through a paid subscription. In another embodiment, healthcare managers and providers, such as caregivers, doctors, and genetic consultants, may also have access to at least one of the individual's GCI scores.

There may also be a basic registration mode. The base registry may provide a phenotype profile in which registered users may choose to apply all existing rules to their genomic profile, or to apply a subset of existing rules to their genomic profile. For example, they may choose to apply only rules for treatable (actionable) disease phenotypes. The base registrations may have different levels within the registration hierarchy. For example, the different levels may depend on the number of phenotypes registered users want to associate with their genomic profile, or on the number of people who can access their phenotypic profile. Another level of basic enrollment may incorporate individual-specific factors such as a phenotype that is already known (e.g., age, gender, or medical history) into their phenotype profile. Yet another level of basic enrollment may allow an individual to generate at least one GCI score for a disease or condition. A variant of this level may further allow an individual to specify that an automatic update of the at least one GCI score for a disease or condition be generated if any change in the at least one GCI score is due to a change in the analysis used to generate the at least one GCI score. In some implementations, the individual can be notified of the automatic update by email, voice message, text message, postal delivery, or facsimile.

Registered users may also generate reports with their phenotype profiles and information about the phenotypes (e.g., genetic and medical information about the phenotypes). For example, the prevalence of a phenotype in a population, genetic variants for association, molecular mechanisms that cause a phenotype, methods of treatment for a phenotype, treatment options for a phenotype, and prophylactic actions may be included in the report. In other embodiments, the report may also include information such as the similarity between the genotype of the individual and the genotypes of other individuals (e.g., celebrities or other known individuals). Information about similarity can be, but is not limited to, percent homology, number of identical variations, and possibly similar phenotypes. The reports may further include at least one GCI score.

If the report is accessed online, the report may also provide a link to other locations with further information about the phenotype, a link to an online support team and message board of people with the same phenotype or one or more similar phenotypes, a link to contact an online genetic advisor or physician, or a link to schedule a telephone call or live appointment with a genetic advisor or physician. If the report is in paper form, the information may be the location of the linked site or the telephone number and address of the genetic counselor or doctor. Registered users can also select which phenotypes to include in their phenotype profile and which information to include in their reports. The profile and report may also be made available to the individual's health care manager or provider, such as a caregiver, doctor, psychiatrist, psychologist, therapist, or genetic counselor. The registered user can also choose whether the form and report, or portions thereof, are available to the individual's healthcare manager or provider.

The present invention may also include a registered high level (premium level). The high-level of registration digitally maintains its genomic profile after the initial phenotypic profile and report are generated, and registered users can generate phenotypic profiles and reports with updated correlations from recent studies. In another embodiment, the registered users can generate risk profiles and reports using updated correlations from recent studies. As studies reveal new correlations between genotypes and phenotypes, diseases or conditions, new rules will be generated based on these new correlations and can be applied to genomic profiles that have been stored and maintained. The new rules may associate genotypes that have not been previously associated with any phenotype, associate genotypes with new phenotypes, correct existing correlations, or provide a basis for adjusting GCI scores based on associations between newly discovered genotypes and diseases or conditions. Registered users may be notified of the new correlations via email or other electronic means, and if a phenotype of interest, they may choose to update their phenotype profile with the new correlations. Registered users may select a registration mode that pays for each update, for multiple updates within a specified time period (e.g., 3 months, 6 months, or 1 year), or for unlimited updates. Another level of enrollment may be that, rather than an individual selecting when to update their phenotype profile or risk profile, the enrolled users automatically update their phenotype profile or risk profile whenever new rules are generated based on new correlations.

In another aspect of registration, a registered user may introduce the following services to a non-registered user: generating rules of correlation between phenotype and genotype, determining a genomic profile of the individual, applying the rules to the genomic profile, and generating a phenotypic profile of the individual. The registered user may be prompted by an introduction to a preferred service subscription price or to upgrade his existing registration. Individuals introduced may have free access or may enjoy discounted registration fees for a limited period of time.

Phenotype profiles and reports and risk profiles and reports may be generated for both human and non-human individuals. For example, the subject may include other mammals, such as cattle, horses, sheep, dogs, or cats. As used herein, a registered user is a human individual who subscribes to a service by purchasing or paying for one or more services. Services may include, but are not limited to, one or more of the following: determining a genomic profile of themselves or another individual (e.g., a registered user's child or pet); obtaining a phenotype spectrum; updating phenotypic profiles and obtaining reports based on their genomic and phenotypic profiles.

In another aspect of the invention, a "field-deployed" mechanism can be derived from an aggregation of individuals to generate a phenotypic profile of the individuals. In a preferred embodiment, the individual may have an initial phenotype profile generated based on genetic information. For example, an initial phenotype profile is generated that includes risk factors for different phenotypes and suggested therapeutic or prophylactic measures. For example, the phenotype profile may include information about available medications for a condition and/or recommendations for dietary changes or exercise regimens. Individuals may choose to see or contact a doctor or genetic counselor through a web portal or telephone to discuss their phenotype profile. The individual may decide to take a certain course of action, e.g. take a specific medication, change their diet, etc.

The individual may then subsequently submit a biological sample to assess changes in their physical state and possible changes in risk factors. The individual may determine the change by submitting the biological sample directly to an entity that generates the genomic profile and the phenotypic profile (or to a related entity, such as an entity contracted by the entity that generates the genetic profile and the phenotypic profile). Alternatively, the individual may utilize a "regional deployment" mechanism, wherein the individual may submit their saliva, blood, or other biological sample to a detection device at their home, analyzed by a third party, and the data transmitted for inclusion in another phenotype profile. For example, an individual may receive an initial phenotypic report based on their genetic data to report to an individual with an increased lifetime risk of Myocardial Infarction (MI). The report may also have recommendations for preventive measures to reduce the risk of MI, such as cholesterol lowering drugs and dietary changes. Individuals may choose to contact a genetic counselor or physician to discuss the reports and preventive measures and decide to change their diet. After taking a new diet for a period of time, the individual may visit their individual physician to measure their cholesterol level. New information (cholesterol levels) may be transmitted (e.g., via the Internet) to entities with genomic information and used to generate new phenotypic profiles of individuals, as well as new risk factors for myocardial infarction and/or other states.

Individuals may also use "area deployment" mechanisms or direct mechanisms to determine their individual response to a particular drug treatment. For example, an individual may measure their response to a drug, and this information may be used to determine a more effective treatment. Information that can be determined includes, but is not limited to, metabolite levels, glucose levels, ion levels (e.g., calcium, sodium, potassium, iron), vitamins, blood cell counts, Body Mass Index (BMI), protein levels, transcript levels, heart rate, etc., which can be determined by readily available methods and can be included in algorithms to determine a revised overall risk assessment score in conjunction with the initial genomic profile.

The term "biological sample" refers to any biological sample that can be isolated from an individual, including samples from which genetic material can be isolated. As used herein, "genetic sample" refers to DNA and/or RNA obtained from or derived from an individual.

As used herein, the term "genome" is intended to mean the entire set of chromosomal DNA found in the nucleus of a human cell. The term "genomic DNA" refers to one or more chromosomal DNA molecules, or a portion of a chromosomal DNA molecule, that naturally occurs in the nucleus of a human cell.

The term "genomic profile" refers to a set of information about an individual's genes, such as the presence or absence of a particular SNP or mutation. The genomic profile includes the genotype of the individual. The genomic profile may also be a substantially complete genomic sequence of an individual. In some embodiments, the genomic profile may be at least 60%, 80%, or 95% of the entire genomic sequence of the individual. The genomic profile may be about 100% of the entire genomic sequence of an individual. When referring to a genomic map, "a portion thereof" refers to a genomic map of a subset of the genomic map of the whole genome.

The term "genotype" refers to the specific genetic composition of an individual's DNA. The genotype may include genetic variants and genetic markers of the individual. Genetic markers and genetic variants may include nucleotide repeats, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations. Copy number variations may include microsatellite repeats, nucleotide repeats, centromeric repeats or telomeric repeats. The genotype may also be SNP, haplotype or diplotype (diplotype). Haplotypes may refer to loci or alleles. Haplotypes can also be referred to as a set of Single Nucleotide Polymorphisms (SNPs) on a single chromatid that are statistically correlated. Diplotypes are a set of haplotypes.

The term single nucleotide polymorphism, or "SNP," refers to a particular locus that exhibits a variation (e.g., at least 1 percentage point (1%)) on the chromosome relative to the identity of nitrogenous choline present at that locus in a human population. For example, in the case where one individual may have adenosine (a) at a particular nucleotide position of a given gene, another individual may have cytosine (C), guanine (G) or thymine (T) at that position, such that a SNP is present at that particular position.

As used herein, the term "SNP genomic profile" refers to the base content of a given individual's DNA at a SNP location of the entire individual's whole genomic DNA sequence. "SNP profile" refers to a complete genomic profile, or to a portion thereof, such as a more localized SNP profile that may be associated with a particular gene or a particular set of genes.

The term "phenotype" is used to describe a quantitative trait or characteristic of an individual. Phenotypes include, but are not limited to, medical and non-medical conditions. Medical conditions include diseases and disorders. Phenotypes may also include physical traits such as hair color, physiological traits such as lung capacity, mental traits such as memory retention, emotional traits such as anger control ability, ethnic characteristics such as ethnic background, familial characteristics such as the position of an individual's birth, and age characteristics such as age expectations or age of onset of different phenotypes. Phenotypes may also be monogenic, where it is believed that one gene may be associated with a phenotype; or polygenic, wherein more than one gene is associated with a phenotype.

"rules" are used to define the correlation between genotype and phenotype. The rules may define the relevance by a numerical value, such as by a percentage, risk factor, or confidence score. The rules may include correlations of multiple genotypes with phenotypes. A "rule set" includes more than one rule. A "new rule" may be a rule that indicates a correlation between a genotype and a phenotype for which the rule does not currently exist. The new rules may associate unassociated genotypes with phenotypes. The new rules may also associate genotypes that have been associated with a phenotype with a previously unassociated phenotype. The "new rule" may also be an existing rule that is modified by other factors, including another rule. Existing rules may be modified due to known characteristics of the individual, such as race, family, geography, gender, age, family history, or other previously determined phenotype.

As used herein, "genotype association" refers to the statistical association between individual genotypes (e.g., the presence of a mutation or mutations), and the likelihood that a phenotype (e.g., a particular disease, state, physical state, and/or mental state) is predisposed. The frequency with which a particular phenotype is observed in the presence of a particular genotype determines the degree of genotype correlation or the likelihood that a particular phenotype will occur. For example, as detailed herein, SNPs that result in the apolipoprotein E4 isoform are associated with the induction of early onset Alzheimer's disease. Genotype correlations may also refer to correlations or negative correlations in which a phenotype is not likely to result. Genotype correlations may also indicate an assessment that an individual has a phenotype or is predisposed to developing a phenotype. Genotype correlations can be represented by numerical values, such as percentages, relative risk factors, effect assessments, or confidence scores.

The term "phenotype profile" refers to a collection of multiple phenotypes associated with a genotype or genotypes of an individual. The phenotype profile may include information generated by applying one or more rules to the genomic profile or information about genotype correlations applied to the genomic profile. A phenotype profile may be generated by applying rules that relate multiple genotypes to a phenotype. The probability or the evaluation can be expressed as a numerical value, for example as a percentage, as a numerical risk factor or as a numerical confidence interval. The probability may also be expressed as high, medium, or low. The phenotype profile may also indicate the presence or risk of a phenotype. For example, the phenotype profile may indicate the presence of blue eyes or a high risk of developing diabetes. The phenotypic profile may also indicate a predicted prognosis, therapeutic effect, or response to treatment of the medical condition.

The term risk profile refers to a collection of GCI scores for more than one disease or condition. GCI scores are based on analysis of associations between an individual's genotype and one or more diseases or conditions. The risk profile may display GCI scores grouped by disease category. Further, the risk profile may show information on how to predict changes in GCI scores as the individual ages or as various risk factors are adjusted. For example, the GCI score for a particular disease may take into account changes in diet or the effects of precautions taken (smoking cessation, medication, bilateral radical mastectomy, hysterectomy). The GCI score may be displayed as a numerical metric, a graphical display, an auditory feedback, or a combination of any of the foregoing.

As used herein, the term "online portal" refers to a source of information that an individual conveniently accesses through a computer and Internet website, telephone, or other means that allows similar access to the information. The online portal may be an encrypted website. The website may provide links to other encrypted and unencrypted websites, such as links to encrypted websites having a phenotype profile of the individual or links to unencrypted websites (e.g., message boards of individuals sharing a particular phenotype).

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology, cell biology, biochemistry and immunology, which are within the skill of the art. These conventional techniques include nucleic acid isolation, polymer array synthesis, hybridization, ligation, and hybridization detection using labels. This invention illustrates a specific exemplification of suitable techniques and is given by reference. However, other equivalent conventional methods may also be used. Other conventional techniques and instructions for use can be found in the following standard laboratory manuals and literature: for example, genomic analysis: a Series of Laboratory manuals (volumes I-IV) (Genome Analysis: A Laboratory Manual Series (Vols. I-IV)), PCR primers: a Laboratory Manual (PCR Primer: A Laboratory Manual), molecular cloning method: a Laboratory Manual (Molecular Cloning: A Laboratory Manual) (all from Cold Spring Harbor Laboratory Press), Stryer, L. (1995) biochemistry (fourth edition) Freeman, New York, Gait, "oligonucleotide Synthesis: practical methods (Oligonucleotide Synthesis: a practical approach) "1984, IRL press, london, Nelson and Cox (2000), Lehninger, biochemical principles, third edition, w.h. freeman pub., new york, n.y.; and Berg et al (2002) biochemistry, fifth edition, w.h.freeman pub., new york, n.y., all of which are incorporated herein by reference in their entirety.

The methods of the invention include analyzing the genomic profile of an individual to provide the individual with molecular information about the phenotype. As detailed herein, an individual provides a genetic sample from which a personal genome map is generated. The data relating to genotype correlations of an individual's genomic profile is queried by comparing the genomic profile to a database of established and validated human genotype correlations. The database of established and validated genotype correlations can be from the literature of peer-reviewed and further reviewed and validated by a committee of one or more experts in the field, such as geneticists, epidemiologists or statisticians. In a preferred embodiment, rules are formulated based on validated genotype correlations and applied to the genomic profile of the individual to generate a phenotypic profile. The results of the analysis of the individual's genomic profile (the phenotypic profile) are provided to the individual or individual's healthcare manager along with explanatory and supportive information, thereby giving the ability to personalize the selection of the individual's healthcare.

The method of the invention is described in detail in FIG. 1, wherein a genomic map of an individual is first generated. The individual genomic profile will include information about the individual genes based on genetic variation and genetic markers. Genetic variation is a genotype, which constitutes a genomic map. Such genetic variations or genetic markers include, but are not limited to, single nucleotide polymorphisms, single and/or polynucleotide repeats, single and/or polynucleotide deletions, microsatellite repeats (typically a small number of nucleotide repeats having 5 to 1,000 repeat units), dinucleotide repeats, trinucleotide repeats, sequence rearrangements (including translocations and repeats), copy number variations (deletions and additions at specific loci), and the like. Other genetic variations include chromosomal repeats and translocations as well as centromeric and telomeric repeats.

Genotypes may also include haplotypes and diplotypes. In some embodiments, the genomic profile may have at least 100,000, 300,000, 500,000, or 1,000,000 genotypes. In some embodiments, the genomic profile may be substantially the entire genomic sequence of an individual. In other embodiments, the genomic profile is at least 60%, 80%, or 95% of the entire genomic sequence of the individual. The genomic profile may be about 100% of the entire genomic sequence of an individual. Genetic samples containing the target substance include, but are not limited to, unamplified genomic DNA or RNA samples or amplified DNA (or cDNA). The target substance may be a specific region of genomic DNA comprising a genetic marker of particular interest.

In step 102 of FIG. 1, a genetic sample of an individual is isolated from a biological sample of the individual. These biological samples include, but are not limited to, blood, hair, skin, saliva, semen, urine, fecal material, sweat, oral cavity (buccal), and various body tissues. In some embodiments, the tissue sample may be collected directly from the individual, e.g., the oral sample may be obtained by swabbing the inside of the cheek of the individual with a swab. Other samples such as saliva, semen, urine, fecal material, or sweat may also be provided by the individual himself. Other biological samples may be taken by a health care professional (e.g., a phlebotomist, nurse, or doctor). For example, a blood sample may be drawn from an individual by a nurse. Tissue biopsies can be performed by health care professionals, and health care professionals can also utilize the kit to efficiently obtain a sample. Small cylindrical skin samples may be removed or small tissue or fluid samples may be removed using a needle.

In some embodiments, a kit is provided to an individual having a sample collection container for a biological sample of the individual. The kit may also provide instructions for the individual to directly collect their own sample, such as how much hair, urine, sweat or saliva to provide. The kit may also include instructions for the individual to request that a tissue sample be extracted by a health care professional. The kit may include a location where the sample may be collected by a third party, for example, the kit may be provided to a healthcare facility where the sample is subsequently collected from the individual. The kit may also provide a return package for delivering the sample to a sample processing facility where the genetic material is isolated from the biological sample (step 104).

Genetic samples of DNA or RNA can be isolated from biological samples according to any of several known biochemical and molecular biological methods, see, e.g., Sambrook et al, molecular cloning: a laboratory Manual (Molecular Cloning: A laboratory Manual) (Cold spring harbor laboratory, N.Y.) (1989). There are also several commercially available kits and reagents for isolating DNA or RNA from biological samples, such as those available from DNAGenotek, Gentra Systems, Qiagen, Ambion, and other suppliers. Oral sample kits are readily commercially available, e.g., MasterAmp from Epicentre Biotechnologies^TMBuccal Swab DNA extraction kit, as well as kits for extracting DNA from blood samples, e.g., Extract-N-Amp from SigmaAldrich^TM. DNA derived from other tissues can be obtained by digesting the tissues with protease and performing heat treatment, centrifuging the sample and extracting unnecessary substances using phenol-chloroform, leaving the DNA in the aqueous phase. The DNA may then be further isolated by ethanol precipitation.

In a preferred embodiment, genomic DNA is isolated from saliva. For example, using DNA self-collection kit technology available from DNA Genotek, individuals collect saliva samples for clinical processing. The samples can be conveniently stored and transported at room temperature. After the sample is delivered to the appropriate laboratory for processing, the DNA is isolated by heat denaturation and protease digestion of the sample (typically at least 1 hour at 50 ℃ using reagents supplied by the collection kit supplier). The sample was then centrifuged and the supernatant was subjected to ethanol precipitation. The DNA pellet is suspended in a buffer suitable for subsequent analysis.

In another embodimentRNA can be used as a genetic sample. In particular, genetic variations in expression can be identified from mRNA. The term "messenger RNA" or "mRNA" includes, but is not limited to, pre-mRNA transcripts, transcript processing intermediates, mature mRNA prepared for translation and transcription of a gene or genes, or nucleic acid derived from mRNA transcripts. Transcript processing may include splicing, editing, and degradation. As used herein, a nucleic acid derived from an mRNA transcript refers to a nucleic acid whose mRNA transcript or subsequence thereof ultimately serves as a template for its synthesis. Thus, cDNA reverse transcribed from mRNA, DNA amplified from cDNA, RNA transcribed from amplified DNA, and the like are all derived from mRNA transcripts. RNA can be isolated from any of several body tissues using methods known in the art, e.g., using PAXgene obtained from PreAnalytiX^TMBlood RNA System RNA was isolated from unfractionated whole blood. Typically, mRNA will be used to reverse transcribe cDNA which is then used or amplified for gene variation analysis.

Prior to genomic profiling, genetic samples are typically amplified from cDNA reverse transcribed from DNA or RNA. DNA can be amplified by a variety of methods, many of which use PCR. See, for example, PCR techniques: DNA Amplification mechanism and Applications (PCRTechnology: Principles and Applications for DNA Amplification) (Ed.H.A.Erlich, Freeman Press, NY, N.Y., 1992); PCR protocol: methods and application guidelines (PCR Protocols: A Guide to Methods and Applications) (eds. Innis et al, Academic Press, San Diego, Calif., 1990); mattila et al, nucleic acids Res.19, 4967 (1991); eckert et al, PCR methods and Applications (PCRmethods and Applications)1, 17 (1991); PCR (eds. mcpherson et al, IRL Press, Oxford); and U.S. Pat. nos. 4,683,202, 4,683,195, 4,800,159, 4,965,188, and 5,333,675, each of which is incorporated herein by reference in its entirety.

Other suitable amplification methods include Ligase Chain Reaction (LCR) (e.g., Wu and Wallace, genomics, 4, 560(1989), Landegren et al, science, 241, 1077(1988) and Barringer et al, Gene, 89: 117(1990)), transcriptional amplification (Kwoh et al, Proc. Natl.Acad. Sci. USA 86: 1173-1177(1989) and WO88/10315), autonomous sequence replication (Guateli et al, Proc. Nat.Acad. Sci.USA, 87: 1874-1878(1990) and WO90/06995), selective amplification of target polynucleotide sequences (U.S. Pat. No. 6,410,276), consensus primer polymerase chain reaction (CP-PCR) (U.S. Pat. No. 4,437,975), random primer polymerase chain reaction (AP-PCR) (U.S. Pat. 5,413,909, U.S. No. 2), nucleic acid sequence based amplification (RCA-loop), amplification of nucleic acid sequences (RCA-PCR) (U.S. Pat. No. 64), and amplification of multiple amplification loops (RCA-amplification of NAc. Pat. 3, NAcarrier amplification loop) (amplification of nucleic acid sequences (RCA-PCR) (U.S. No. 3, amplification of nucleic acid sequences of amplification of PCR) (U.S. 3, amplification of DNA, amplification of multiple (C2CA) (Dahl et al, Proc. Natl. Acad. Sci 101: 4548-4553 (2004)). (see U.S. patent nos. 5,409,818, 5,554,517, and 6,063,603, each of which is incorporated herein by reference). Other amplification methods that may be used are described in U.S. Pat. Nos. 5,242,794, 5,494,810, 5,409,818, 4,988,617, 6,063,603, and 5,554,517, and U.S. patent application No. 09/854,317, each of which is incorporated herein by reference.

The generation of the genomic map of step 106 is accomplished using any of several methods. Several methods are known in the art to identify genetic variations, and these include, but are not limited to, DNA sequencing by any of several methods, PCR-based methods, fragment length polymorphism analysis (restriction fragment length polymorphism (RFLP), Cleavage Fragment Length Polymorphism (CFLP)), hybridization methods using allele-specific oligonucleotides as templates (e.g., TaqMan PCR method, invader method (invader method), DNA chip method), methods using primer extension reactions, mass spectrometry (MALDI-TOF/MS method), and the like.

In one embodiment, high density DNA arrays are used for SNP identification and profiling. These arrays are commercially available from Affymetrix and Illumina (see Affymetrix GeneChip)500K Assay Manual, Affymetrix, Santa Clara, CA (incorporated by reference); sentrixhumanHap650Y genotyping bead chip (genotyping bead), Illumina, San Diego, CA).

For example, SNP profiles can be generated by genotyping SNPs over 900,000 using Affymetrix Genome Wide Human SNP Array 6.0. Alternatively, more than 500,000 SNPs analyzed by complete genome sampling can be determined by using Affymetrix GeneChip Human Mapping 500K Array Set. In these assays, a subset of the human genome is amplified by a single primer amplification reaction using restriction enzyme digested, adaptor ligated human genomic DNA. As shown in fig. 2, the concentration of the ligated DNA can then be determined. The amplified DNA is then fragmented and the mass of the sample is determined before proceeding to step 106. If the sample meets the PCR and fragmentation criteria, the sample is denatured, labeled and then hybridized to a microarray consisting of small DNA probes at specific locations on the coated quartz face. The amount of label hybridized to each probe as a function of the amplified DNA sequence is monitored to generate sequence information and ultimately SNP genotyping.

The use of Affymetrix GeneChip 500K Assay was performed according to the manufacturer's instructions. Briefly, isolated genomic DNA was first digested with NspI or StyI restriction endonucleases. The digested DNA is then ligated to NspI or StyI adaptor oligonucleotides that anneal to NspI or StyI restriction enzyme DNA, respectively. The ligated adaptor-containing DNA is then amplified by PCR to produce amplified DNA fragments of between about 200 to 1100 base pairs, as confirmed by gel electrophoresis. PCR products that meet amplification criteria are purified and quantified for fragmentation. The PCR product was fragmented with DNase I to achieve optimal DNA chip hybridization. After fragmentation, the DNA fragments should be less than 250 base pairs and on average 180 base pairs, as confirmed by gel electrophoresis. Samples meeting the fragmentation criteria were then labeled with a biotin compound using terminal deoxynucleotidyl transferase. The labeled fragments were then denatured and then hybridized to a GeneChip 250K array. After hybridization, the array was stained in a three-step process prior to scanning, consisting of the following steps: streptavidin phycoerythrin (SAPE) staining was followed by an antibody amplification step with biotinylated anti-streptavidin antibody (goat) and a final staining with streptavidin phycoerythrin (SAPE). After labeling, the array is covered with array holding buffer and then scanned with a Scanner such as the Affymetrix GeneChip Scanner 3000.

After Affymetrix GeneChip Human Mapping 500K Array Set scan, data analysis was performed according to the manufacturer's instructions, as shown in FIG. 3. Briefly, raw data was obtained using GeneChip operating software (GCOS). It can also be achieved by using Affymetrix GeneChip Command Console^TMData are obtained. Initial data were obtained and analyzed using GeneChip genotyping analysis software (GTYPE). For the purposes of the present invention, samples with a GTYPE modulation rate (call rate) of less than 80% were excluded. The samples were then examined using BRLMM and/or SNiPer algorithm analysis. And excluding samples with BRLMM calling rate less than 95% or SNiPer calling rate less than 98%. Finally, correlation analysis was performed and samples with SNiPer mass index less than 0.45 and/or Hardy-Weinberg p-value less than 0.00001 were excluded.

Alternatively or in addition to DNA microarray analysis, genetic variations, such as SNPs and mutations, can be detected by DNA sequencing. DNA sequencing may also be used to sequence a substantial portion or all of an individual's genomic sequence. In general, DNA sequencing is commonly used based on polyacrylamide gel fractionation to resolve populations of chain end fragments (Sanger et al, Proc. Natl. Acad. Sci. USA 74: 5463-5467 (1977)). Alternative methods that have been developed and continue to be developed improve the speed and simplicity of DNA sequencing. For example, high throughput and single molecule sequencing platforms are commercially available from, or are being developed by, 454Life Sciences (Branford, CT) (Margulies et al, Nature, (2005) 437: 376-.

After the genome map of the individual is generated in step 106, the map is stored digitally in step 108, which may be stored digitally in an encrypted manner. The genomic profile is encoded in a computer-readable format for storage as part of a data set, and may be stored as a database, where the genomic profile may be "deposited" and can be accessed again at a later time. The data set includes a plurality of data points, where each data point relates to an individual. Each data point may have a plurality of data elements. One data element is a unique identifier that is used to identify the genomic map of an individual. It may also be a bar code. Another data element is genotype information, such as a SNP or a nucleotide sequence of the genome of the individual. Data elements corresponding to genotype information can also be included in the data points. For example, if the genotype information includes SNPs identified by microarray analysis, other data elements may include microarray SNP identification numbers, SNPrs numbers, and polymorphic nucleotides (polymorphic nucleotides). Other data elements may be the chromosomal location of the genotype information, quality measures of the data, raw data files, data images, and extraction intensity scores.

Individual-specific factors, such as physical data, medical data, race, family, geography, gender, age, family history, known phenotypes, demographic data, exposure data (exposuredata), lifestyle data, behavioral data, and other known phenotypes, may also be included as data elements. For example, these factors may include, but are not limited to, the individual's: a place of birth, a parent and/or grandparent, a family of relatives, a place of residence of a ancestor, environmental conditions, known health conditions, known drug interactions, home hygiene conditions, lifestyle conditions, diet, exercise habits, marital status, and physical measurement data (e.g., weight, height, cholesterol level, heart rate, blood pressure, glucose level, and other measurement data known in the art). The above factors of the individual's relatives or ancestors (e.g., parents and grandparents) may also be introduced as data elements and used to determine the risk of the individual's phenotype or status.

The specific factors may be obtained from a questionnaire or from an individual's healthcare manager. Information from the map of "savings" can then be accessed and used as needed. For example, in an initial assessment of genotype correlations of an individual, the entire information of the individual (typically SNPs or other genomic sequences across or taken from the entire genome) will be analyzed for determining genotype correlations. In subsequent analyses, all or a portion of the information from the stored or deposited genome map may be accessed as needed or appropriate.

Comparison of genomic profiles with genotype-associated databases

In step 110, genotype correlations are obtained from the scientific literature. The genotypic relevance of a genetic variation is determined from analysis of a population of individuals who have been tested for the presence or absence of one or more phenotypic traits of interest and for genotypic profiling. Alleles of each genetic variation or polymorphism in the genotype spectrum are then tested to determine whether the presence of a particular allele is associated with the trait of interest. Correlation analysis can be performed by standard statistical methods and statistically significant correlations between genetic variation and phenotypic characteristics are recorded. For example, it may be determined that the presence of allele A1 of polymorphism A is associated with heart disease. As a further example, it may be found that the presence of a combination of allele A1 at polymorphism A and allele B1 at polymorphism B is associated with an increased risk of cancer. The results of the analysis can be published in peer review literature, confirmed by other research groups, and/or analyzed by expert committees (e.g., geneticists, statisticians, epidemiologists, and doctors), and can also be validated.

Examples of correlations between genotypes and phenotypes are shown in fig. 4,5 and 6, where rules between genotypes and phenotypes applied to the genomic profile are based on these correlations. For example, in fig. 4A and B, each row corresponds to a phenotype/locus/race, with fig. 4C through I including further information on the relevance of each of these rows. By way of example, the "phenotype name abbreviation" of BC in fig. 4A is an abbreviation for breast cancer as noted in the index for the phenotype name abbreviation of fig. 4M. In the row BC _4 (which is the class name of the locus), gene LSP1 is associated with breast cancer. As shown in fig. 4C, the published or functional SNP confirmed for this association is rs3817198, while the disclosed risk allele is C and the non-risk allele is T. The disclosed SNPs and alleles are identified by publications (e.g., the basic publications in fig. 4E-G). In the example of LSP1 of fig. 4E, the basic publication is Easton et al, nature 447: 713-720(2007). Fig. 22 and 25 further list the correlations. The correlations in figures 22 and 25 can be used to calculate an individual's risk for a state or phenotype, e.g., calculating a GCI or GCI Plus score. The GCI or GCI Plus score may also introduce information such as popularity of the status, as in fig. 23.

Alternatively, correlations may be formed from stored genomic profiles. For example, individuals with stored genomic profiles may also have stored known phenotypic information. Analysis of the stored genomic profile and known phenotypes can form genotype correlations. As an example, 250 individuals with stored genomic profiles also have stored information previously diagnosed as having diabetes. Their genomic profile was analyzed and compared to a control group of non-diabetic individuals. Individuals previously diagnosed with diabetes are then determined to have a higher rate of a particular genetic variant than the control group, and thus a genotype correlation can be made between the particular genetic variant and diabetes.

In step 112, rules are formed based on the association between the confirmed genetic variants and the particular phenotype. Rules may be generated, for example, based on the correlated genotypes and phenotypes listed in table 1. The rules based on relevance may introduce other factors, such as gender (e.g., fig. 4) or ethnicity (fig. 4 and 5) to produce an effect evaluation as in fig. 4 and 5. Other metrics produced by the rules may evaluate relative risk increase as in fig. 6. The relative risk increase for effect assessment and estimation can be from or calculated from published literature. Alternatively, the rules may be based on correlations generated from stored genomic profiles and previously known phenotypes. In some embodiments, the rules may be based on the correlations in fig. 22 and 25.

In a preferred embodiment, the genetic variant is a SNP. Although SNPs occur at single sites, individuals carrying a particular SNP allele at one site are generally predictable to carry a particular SNP allele at other sites. The association of a SNP with an allele that predisposes an individual to a disease or condition is produced by linkage disequilibrium (linkagedisequilibrium), in which the frequency of nonrandom associations occurring between alleles at two or more loci in a population is greater than or less than that expected by random formation of recombinations.

Other genetic markers or variants (e.g., nucleotide repeats or insertions) may also be in linkage disequilibrium with genetic markers that have been shown to be associated with a particular phenotype. For example, nucleotide insertions are associated with a phenotype, and SNPs are in linkage disequilibrium with nucleotide insertions. Rules are formed based on the association between SNPs and phenotypes. Rules based on the correlation between nucleotide insertions and phenotypes can also be developed. Either rule or both rules may be applied to the genomic map, as the presence of one SNP may give a certain risk factor and the other rule may give another risk factor, and when they are combined, the risk may be increased.

Through linkage disequilibrium, disease-prone alleles co-segregate with specific alleles of SNPs or combinations of specific alleles of SNPs (cosegregates). The particular combination of SNP alleles along a chromosome is called a haplotype, and the region of DNA in which they are combined may be called a haplotype block. Although a haplotype block may consist of one SNP, a typical haplotype block represents a series of 2 or more contiguous SNPs that exhibit low haplotype diversity between individuals and generally have a low recombination frequency. Identification of the haplotype can be performed by identifying one or more SNPs located in the haplotype block. Thus, in general, SNP profiling can be used to identify a haplotype block rather than having to identify all SNPs in a given haplotype block.

Genotypic correlations between SNP haplotype patterns and disease, status or physical state are becoming increasingly known. For a given disease, the haplotype patterns of a group of people known to have the disease are compared to a group of people without the disease. By analyzing many individuals, the frequency of polymorphisms in a population can be determined, and these frequencies or genotypes can then be correlated with a particular phenotype (e.g., disease or condition). Examples of known SNP-disease associations include complement factor H polymorphisms in age-related macular degeneration (Klein et al, science, 308: 385-INSIG2A variant of the gene (Herbert et al, science, 312: 279-283 (2006)). Other known SNP associations include, for example, polymorphisms in the 9p21 region including CDKN2A and B (e.g., rs10757274, rs2383206, rs13333040, rs2383207, and rs10116277 associated with myocardial infarction (Helgadottir et al, science, 316: 1491-.

SNPs may be functional or non-functional. For example, functional SNPs have an effect on cellular function, resulting in a phenotype, whereas non-functional SNPs are functionally silent, but may be in linkage disequilibrium with functional SNPs. SNPs may also be synonymous or non-synonymous. Synonymous SNPs are SNPs in which the different forms result in the same polypeptide sequence, and are non-functional SNPs. If a SNP results in different polypeptides, the SNP is non-synonymous and may be functional or non-functional. SNPs or other genetic markers used to identify haplotypes in a diplotype (which is 2 or more haplotypes) may also be used to associate phenotypes associated with the diplotype. Information about the haplotype, diplotype, and SNP profile of an individual may be in the genomic map of the individual.

In a preferred embodiment, for a rule generated based on a genetic marker that is linked in linkage disequilibrium with another genetic marker associated with a phenotype, the genetic marker may have an r2 or D' score greater than 0.5, which is commonly used in the art to determine linkage disequilibrium. In preferred embodiments, the score is greater than 0.6, 0.7, 0.8, 0.90, 0.95, or 0.99. As a result, in the present invention, the genetic markers used to associate a phenotype with an individual's genomic profile may be the same or different from functional or published SNPs associated with the phenotype. For example, using BC _4, the test SNP and the disclosed SNP are the same, just as the risk and non-risk alleles tested are the same as the disclosed risk and non-risk alleles (fig. 4A and C). However, for BC _5, CASP8 and their association with breast cancer, the test SNPs differ from their functional or published SNPs just as the risk and non-risk alleles tested were for the published risk and non-risk alleles. The tested and disclosed alleles are oriented with respect to the positive strand of the genome and from these columns, homozygous risk or non-risk genotypes can be inferred, which can generate rules for the genome map of individuals, e.g., registered users. In some embodiments, instead of identifying a test SNP, an allelic difference or SNP may be identified based on another analytical method (e.g., TaqMan) using published SNP information. For example, AMD _5 in fig. 25A, discloses the SNP rs1061170, but no test SNP was identified. Test SNPs can be identified by LD analysis of the disclosed SNPs. Alternatively, rather than using a test SNP, the genome of an individual having the test SNP can be evaluated using TaqMan or other equivalent assay methods.

The test SNPs may be "DIRECT" or "TAG (TAG)" SNPs (fig. 4E-G, fig. 5). A direct SNP is the same test SNP as a published or functional SNP, e.g., for BC _ 4. Using the European and Asian SNP rs1073640, a direct SNP can also be used for FGFR2 association of breast cancer, with the minor allele being A and the other allele being G (Easton et al, Nature 447: 1087-. Another published or functional SNP that is also an FGFR2 association with breast cancer in Europe and Asian is rs1219648(Hunter et al, nat. Genet.39: 870-874 (2007)). A tag SNP is a test SNP that is different from a functional or public SNP, such as BC _ 5. Tagging SNPs may also be used for other genetic variants, e.g. for CAMTA1(rs4908449), 9p21(rs10757274, rs2383206, rs13333040, rs2383207, rs10116277), COL1a1(rs1800012), FVL (rs6025), HLA-DQA1(rs 498888889, rs2588331), eNOS (rs1799983), MTHFR (rs1801133) and APC (rs 28933380).

A database of SNPs is publicly available from: for example, International HapMap Project (see www.hapmap.org, The International HapMap Consortium, Nature, 426: 789-. These databases provide, or enable the determination of, SNP haplotype patterns. Thus, these SNP databases enable the detection of genetic risk factors underlying a wide range of diseases and conditions (e.g., cancer, inflammatory diseases, cardiovascular diseases, neurodegenerative diseases, and infectious diseases). These diseases or conditions may be disposable, where methods of treatment and therapy currently exist. Treatment may include prophylactic treatment and treatment to improve symptoms and conditions, including lifestyle changes.

Many other phenotypes can also be detected, such as physical traits, physiological traits, mental traits, emotional traits, race, family, and age. Physical traits may include height, hair color, eye color, body, or traits such as energy, endurance, and agility. The mental traits may include intelligence, memory, or learning. Ethnicity and pedigree may include the identification of pedigree or ethnicity, or where the ancestry of the individual originated. The age may be the actual age of the individual determined or the age at which the genetic characteristics of the individual are such that they are relative to the total population. For example, an individual is 38 years old in nature, but its genetic characteristics may determine that its memory or physical health status may be 28 years old on average. The additional age trait may be the predicted lifespan of the individual.

Other phenotypes may also include non-medical conditions, such as "entertainment" phenotypes. These phenotypes may include comparisons with known individuals, e.g., foreign nobody, politician, celebrity, inventor, athlete, musician, artist, business, and notorious individuals (e.g., criminals). Other "recreational" phenotypes may include comparison with other organisms, such as bacteria, insects, plants, or non-human animals. For example, an individual may be interested in seeing how their genomic profile compares to the genomic profile of their pet dog or president.

In step 114, the rules are applied to the stored genomic profile to generate the phenotypic profile of step 116. For example, the information in fig. 4,5 or 6 may form the basis of a rule or test to be applied to the genomic profile of an individual. The rules may include the information in fig. 4 for the test SNPs and alleles and the assessment of effects, where UNITS for the assessment of effects is the unit of the assessment of effects, e.g., OR odds ratio (95% confidence interval) OR mean. In a preferred embodiment the evaluation of the effect may be a genotypic risk (FIGS. 4C-G), such as risk for homozygotes (homoz or RR), risk heterozygotes (heteroz or RN) and non-risk homozygotes (homoz or NN). In other embodiments, the effect evaluation may be carrier risk (carrierisk), which is RR or RN to NN. In still further embodiments, the assessment of effect may be based on allele, allele risk, e.g., R versus N. Here too, there are two loci (FIG. 4J) or three loci (FIG. 4K) of genotype effect evaluation (e.g., 9 possible genotype combinations for two locus effect evaluation: RRRR, RRNN, etc.). The frequency of the test SNPs in the public HapMap is also recorded in fig. 4H and I.

In other embodiments, the information from fig. 21, 22, 23, and/or 25 can be used to generate information to apply to a genomic profile of an individual. For example, the information can be used to generate a GCI or GCI Plus score for the individual (e.g., fig. 19). The score can be used to generate information of genetic risk (e.g., estimated lifetime risk) for one or more states in a phenotypic profile of an individual (e.g., fig. 15). The method allows calculation of an estimated lifetime risk or relative risk for one or more phenotypes or states as listed in fig. 22 or 25. The risk of a single state may be based on one or more SNPs. For example, the estimated risk for a phenotype or state may be based on at least 2, 3, 4,5, 6, 7, 8, 9, 10, 11, or 12 SNPs, wherein the SNP used to estimate risk may be a public SNP, a test SNP, or both (e.g., fig. 25).

The estimated risk for the state may be based on the SNPs listed in fig. 22 or 25. In some embodiments, the risk of a condition may be based on at least one SNP. For example, an individual's assessment of risk for Alzheimer's Disease (AD), colorectal cancer (CRC), Osteoarthritis (OA), or exfoliative glaucoma (XFG) may be based on 1 SNP (e.g., rs4420638 for AD, rs 69883267 for CRC, rs4911178 for OA, and rs2165241 for XFG). For other states, such as obesity (BMIOB), graves' disease (GD), or Hemochromatosis (HEM), the estimated risk of an individual may be based on at least 1 or 2 SNPs (e.g., rs9939609 and/or rs9291171 for BMIOB; DRB1 0301DQA1 and/or rs3087243 for GD; rs 0501800562 and/or rs129128 for HEM). For states such as, but not limited to, Myocardial Infarction (MI), Multiple Sclerosis (MS) or Psoriasis (PS), 1,2 or 3 SNPs may be used to assess an individual's risk for these states (e.g., rs1866389, rs1333049 and/or rs6922269 for MI; rs6897932, rs12722489 and/or DRB1 x 1501 for MS; rs6859018, rs11209026 and/or HLAC 0602 for PS). To assess the individual risk of Restless Legs Syndrome (RLS) or celiac disease (CelD), 1,2, 3 or 4 SNPs may be used (e.g., rs6904723, rs2300478, rs1026732 and/or rs9296249 for RLS; rs6840978, rs11571315, rs2187668 and/or DQA1 x 0301 DQB1 x 0302 for CelD). For Prostate Cancer (PC) or lupus (SLE), 1,2, 3, 4 or 5 SNPs may be used to assess an individual's risk for PC or SLE (e.g., rs4242384, rs 69883267, rs 169901979, rs17765344 and/or rs4430796 for PC, rs12531711, rs10954213, rs2004640, DRB1 0301 and/or DRB1 1501 for SLE). To assess the lifetime risk of an individual for macular degeneration (AMD) or Rheumatoid Arthritis (RA), 1,2, 3, 4,5 or 6 SNPs may be used (e.g. rs10737680, rs10490924, rs541862, rs2230199, rs1061170, and/or rs9332739 for AMD, rs6679677, rs11203367, rs6457617, DRB 0101, DRB1 × 1, and/or DRB 04084 × 0404 for RA). To assess the lifetime risk of an individual with Breast Cancer (BC), 1,2, 3, 4,5, 6 or 7 SNPs may be used (e.g., rs3803662, rs2981582, rs4700485, rs3817198, rs17468277, rs6721996 and/or rs 3803662). To assess the lifetime risk of an individual with Crohn's Disease (CD) or type 2 diabetes (T2D), 1,2, 3, 4,5, 6, 7, 8, 9, 10 or 11 SNPs may be used (e.g., rs2066845, rs5743293, rs10883365, rs17234657, rs10210302, rs9858542, rs11805303, rs1000113, rs17221417, rs2542151 and/or rs10761659 for CD; rs13266634, rs4506565, rs10012946, rs 10006992, rs10811661, rs 77512288738, rs8050136, rs1111875, rs4402960, rs5215 and/or rs1801282 for T2D). In some embodiments, the SNP used as a basis for risk determination may form a linkage disequilibrium with a SNP described above or listed in fig. 22 or 25.

The phenotype profile of an individual may include a number of phenotypes. In particular, assessing a patient's risk of having a disease or other condition (e.g., likely drug response, including metabolism, efficacy, and/or safety) by the methods of the invention enables prognostic or diagnostic analysis of susceptibility to a variety of unrelated diseases and conditions, whether in symptomatic, presymptomatic, or asymptomatic individuals, including carriers of one or more disease/condition-susceptible alleles. Thus, these methods provide an overall assessment of individual susceptibility to a disease or condition without the need to pre-envisage testing for any particular disease or condition. For example, the methods of the invention enable the assessment of individual susceptibility for any of a variety of conditions listed in table 1, fig. 4,5 or 6 based on individual genomic profiles. Moreover, these methods allow individuals evaluating one or more phenotypes or states to estimate a lifetime risk or relative risk, such as those phenotypes in fig. 22 or 25.

The evaluation preferably provides information about 2 or more of these states, and more preferably 3, 4,5, 10, 20, 50, 100 or even more of these states. In a preferred embodiment, at least 20 rules are applied to the genomic profile of an individual to obtain a phenotypic profile. In other embodiments, at least 50 rules are applied to the genomic profile of the individual. A single rule of phenotype may be applied to a single gene phenotype. More than one rule may also be used for a single phenotype, such as a multi-gene phenotype or a single-gene phenotype where multiple genetic variants in a single gene affect the probability of the phenotype appearing.

After an initial scan of the genomic profile of an individual patient, updates to the individual genotype correlations are made (or employed) by comparison to additional nucleotide variants (e.g., SNPs) when these additional nucleotide variants are known. For example, step 110 can be performed periodically, e.g., daily, weekly, or monthly, by one or more of ordinary skill in the art of genetics who search the scientific literature for new genotype correlations. The new genotype correlations may then be further confirmed by a committee of one or more experts in the field. Step 112 may then be periodically updated with new rules based on the new validated dependencies.

The new rule may include genotypes or phenotypes outside of the existing rules. For example, genotypes not associated with any phenotype are found to be associated with a new or existing phenotype. The new rules may also be used for the correlation between previous non-genotypes and their associated phenotypes. The new rules may also be determined for genotypes and phenotypes that already have existing rules. For example, there are rules based on the correlation between genotype a and phenotype a. New studies revealed that genotype B is associated with phenotype a, thus creating new rules based on this association. Another example is the discovery that phenotype B is associated with genotype A and new rules are developed accordingly.

Rules can be formulated when finding correlations based on what is known but not initially confirmed in the published scientific literature. For example, it may be reported that genotype C is associated with phenotype C. Additional publications report that genotype D is associated with phenotype D. Phenotypes C and D are associated symptoms, e.g., phenotype C may be tachypnea, while phenotype D is a smaller lung volume. The association between genotype C and phenotype D or between genotype D and phenotype C can be found and confirmed by statistical methods using the existing stored genomic profiles of individuals with genotypes C and D and phenotypes C and D, or by further study. New rules may then be generated based on the newly discovered and confirmed correlations. In another embodiment, stored genotype profiles for multiple individuals with specific or related phenotypes can be studied to determine genotypes common to these individuals and determine correlations. New rules may be generated based on this correlation.

Rules may also be formulated to modify existing rules. For example, the correlation between genotype and phenotype may be determined in part by known individual characteristics, such as race, family, geography, gender, age, family history, or any other known phenotype of the individual. Rules based on these known individual characteristics may be formulated and incorporated into existing rules to provide revised rules. The choice of rules to apply the correction will depend on the particular individual factors of the individual. For example, the rule may be based on a 35% probability that an individual has phenotype E when the individual has genotype E. However, if the individual is of a particular ethnicity, the probability is 5%. New rules may be formulated based on this result and applied to individuals with the particular ethnic characteristics. Alternatively, an existing rule may be applied that determines a value of 35% and then another rule based on the ethnic characteristics of the phenotype. Rules based on known individual characteristics can be determined from the scientific literature or based on studies on stored genomic profiles. As new rules are generated, they may be added and applied to the genomic map in step 114, or they may be applied periodically, for example at least once a year.

Information on individual risk of disease can also be expanded with the technological advances in higher resolution SNP genomic maps. As described above, the initial SNP genomic profile can be easily generated using microarray technology for scanning 500,000 SNPs. Given the case of the haplotype block, this number can be used for a typical profile of all SNPs in the genome of an individual. Nonetheless, it is estimated that about 1000 ten thousand SNPs (the International HapMap Project; www.hapmap.org) typically occur in the human genome. With technological advances in practical and economic interpretation of SNPs (e.g., microarrays of 1,000,000, 1,500,000, 2,000,000, 3,000,000 or more SNPs) or whole genome sequencing at higher levels of detail, more detailed SNP genomic profiles can be generated. Likewise, advances in technology through computational analysis methods will enable more elaborate economic analysis of SNP genomic profiles and updating of SNP-disease association master databases.

After generating the phenotype profile at step 116, the registered users or their healthcare managers may access their genomic or phenotype profiles through an online portal or website as in step 118. Reports including phenotype profiles and other information about phenotype profiles and genomic profiles may also be provided to registered users or their healthcare managers, as described in steps 120 and 122. The report may be printed out, stored in a registered user's computer, or viewed online.

FIG. 7 illustrates an example online report. The registered user may choose to display a single phenotype or more than one phenotype. The registered users may also have different View options, for example, a "Quick View" option as shown in FIG. 7. The phenotype may be a medical condition and the different treatments and symptoms in the quick report may be linked to other web pages containing further information about the treatment. For example, by clicking on a medication, a website may be directed that includes information about dosage, cost, side effects, and efficacy. The drug may also be compared to other treatments. The website may also include a link to the website of the pharmaceutical manufacturer. Another link may provide the registered user with the option of generating a pharmacogenomic (pharmacogenomic) map, which will include information on their likely response to the drug based on their genomic map. Links to alternatives to medication may also be provided, such as preventive behavior (e.g., fitness and weight loss); and may also provide links to dietary supplements, dietary plans, and links to nearby health clubs, health clinics, health and rehabilitation providers, metropolitan spa (day spa), and the like. Educational and informative videos, summaries of available treatments, possible therapies, and general advice may also be provided.

The online report may also provide a link to schedule individual doctors or genetic counseling appointments or to access an online genetic counselor or doctor, thereby providing the registered user with the opportunity to query more information about their phenotype profile. Links to online genetic consultation and physician queries may also be provided on the online report.

Reports may also be viewed in other forms, such as a composite view of a single phenotype, where more detail is provided for each category. For example, there may be more detailed statistics regarding the likelihood of a phenotype occurring for registered users; more information about typical symptoms or phenotypes, such as a range of symptoms representative of a medical condition or a physical non-medical condition (e.g., height); or more information about genes and genetic variants, such as population prevalence, e.g., in the world or in different countries, or in different age ranges or genders. For example, fig. 15 shows a summary of estimated lifetime risks for a number of states. The individual may view more information about a particular condition, such as prostate cancer (figure 16) or crohn's disease (figure 17).

In another embodiment, the report may be of an "entertaining" phenotype, e.g., the similarity of an individual's genomic profile to that of a known individual (e.g., albert einstein). The report can show the percent similarity between the individual genomic profile and the individual genomic profile of einstein, and can further show the predicted IQ of einstein and the predicted IQ of the individual. Further information may include the genomic profile of the total population and how its IQ is compared to the genomic profile and IQ of the individual and einstein.

In another embodiment, the report may display all phenotypes that have been associated with the genomic profile of the registered user. In other embodiments, the report may only show a phenotype determined to be positively correlated with the genomic profile of the individual. Individuals may select particular sub-classes that display phenotypes in other forms, such as medical-only phenotypes or disposable medical phenotypes only. For example, the disposable phenotypes and their associated genotypes may include crohn's disease (associated with IL23R and CARD 15), type 1 diabetes (associated with HLA-DR/DQ), lupus (associated with HLA-DRB1), psoriasis (HLA-C), multiple sclerosis (HLA-DQA1), graves' disease (HLA-DRB1), rheumatoid arthritis (HLA-DRB1), type 2 diabetes (TCF7L2), breast cancer (BRCA2), colon cancer (APC), situational memory (KIBRA), and osteoporosis (COL1a 1). Individuals may also select sub-classes that show phenotypes in the report, e.g., inflammatory diseases of medical conditions only or physical traits of non-medical conditions only. In some embodiments, an individual may choose to display all of the states for which an estimated risk is calculated for the individual by highlighting those states for which an estimated risk is calculated (e.g., fig. 15A, D), states with only a higher risk (fig. 15B), or states with only a lower risk (fig. 15C).

The information delivered and communicated to the individual may be encrypted and confidential and access to the information by the individual may be controlled. Information derived from complex genomic profiles can be provided to individuals as regulatory-approved, understandable, medically relevant, and/or highly influential data. Information may also be of general importance, regardless of medical treatment. Information may be delivered to an individual cryptographically in several ways, including, but not limited to, an entrance interface and/or mail. More preferably, the information is provided to the individual encrypted via a portal interface to which the individual has secure and confidential access (if the individual so chooses). This interface is preferably provided through an online, internet web portal, or alternatively, through the phone or other means that allows private, secure, and easy-to-use access. The data transmission of genomic profiles, phenotypic profiles and reports over the network is provided to the individual or its health care manager.

Accordingly, FIG. 8 is a block diagram illustrating a representative example logic device through which phenotype profiles and reports may be generated. FIG. 8 shows a computer system (or digital device) 800 for receiving and storing a genomic profile, analyzing genotype correlations, generating rules based on genotype correlations, applying rules to the genomic profile, and generating phenotypic profiles and reports. The computer system 800 may be understood as a logical device capable of reading instructions from the media 811 and/or the network port 805, the network port 805 optionally being connected to a server 809 having a fixed media 812. The system shown in fig. 8 includes a CPU 801, a disk drive 803, an optional input device (e.g., keyboard 815 and/or mouse 816), and an optional monitor 807. Data communication with the server 809 at the local or remote location may be accomplished via the communication medium shown. A communication medium may include any means for transmitting and/or receiving data. The communication medium may be, for example, a network connection, a wireless connection, or an internet connection. This connection may provide communication over the World Wide Web. It is envisioned that data pertaining to the present invention may be transmitted over such means over a network or connection for receipt and/or verification by a party 822. Recipient 822 may be, but is not limited to, an individual, a registered user, a healthcare provider, or a healthcare manager. In one embodiment, the computer-readable medium comprises a medium adapted to convey the results of an analysis of a biological sample or genotype correlation. The medium may comprise results on a phenotype profile of an individual subject, wherein such results are obtained using the methods described herein.

The personal portal will preferably serve as the basic interface for the individual receiving and evaluating the genomic data. The portal will enable an individual to track the progress of their samples from collection to testing and to track the results. Through portal visits, individuals are presented with the relative risk of common genetic diseases based on their genomic profiles. The registered user can select through the portal which rules to apply to their genomic profile.

In one embodiment, one or more web pages will have a list of phenotypes and a box near each phenotype that registered users can select to include in their phenotype profiles. Phenotypes can be linked to information related to the phenotype to assist registered users in judiciously selecting a phenotype about which they wish to include in their phenotype profile. The web page may also have phenotypes organized in disease groups (e.g., treatable diseases or non-treatable diseases). For example, the registered user may select only disposable phenotypes, such as HLA-DQA1 and celiac disease. Registered users may also choose to display pre-symptomatic or post-symptomatic treatment of the phenotype. For example, an individual may be selected to have a treatable phenotype (beyond further screening) of presymptomatic treatment, which for celiac disease is a presymptomatic treatment of gluten-free diet. Another example may be alzheimer's disease, with pre-symptomatic treatment being statins, exercise, vitamins and psychotropic effects. Thrombosis is another example, and pre-symptomatic treatment is to avoid oral contraceptives and avoid prolonged sedentary. An example of a phenotype with approved post-symptomatic treatment is wet AMD associated with CFH, where an individual may undergo laser treatment of their condition.

Phenotypes may also be organized by type or kind of disease or condition, such as neurological, cardiovascular, endocrine, immunological, and the like. Phenotypes can also be grouped into medical and non-medical phenotypes. Other classifications of phenotypes on web pages can be made in terms of physical traits, physiological traits, mental traits, or emotional traits. The web page may further provide for selecting a set of phenotypic partitions by selecting a box. For example, all phenotypes, medically-only related phenotypes, non-medically related phenotypes only, disposable phenotypes only, non-disposable phenotypes only, different disease groups, or "entertainment" phenotypes are selected. The "entertaining" phenotype may include comparison to a celebrity or other well-known individual, or comparison to other animals or even other organisms. A list of genomic maps available for comparison may also be provided on a web page for selection by the registered user for comparison with the registered user's genomic map.

The online portal may also provide a search engine to assist registered users in browsing the portal, retrieving a particular phenotype, or retrieving particular terms or information revealed by their phenotype profile or report. Links to access the collocated services and offered products may also be provided by the portal. Additional links to chat rooms supporting teams, message boards, and individuals with common or similar phenotypes may also be provided. The online portal may also provide links to other addresses with more information about the phenotype in the registered user phenotype spectrum. The online portal may also provide services that allow registered users to share their formulaic spectrum and reports with friends, family, or healthcare managers. Registered users may choose to display the phenotype they wish to share with their friends, family or healthcare managers in a phenotype spectrum.

The phenotype profiles and reports provide personalized genotype correlations for individuals. The genotypic relevance provided to an individual can be used to determine personal health care and lifestyle choices. If a strong correlation between the genetic variant and the disease that can be treated is found, the detection of the genetic variant can help decide to initiate disease treatment and/or individual monitoring. In the case where there is a statistically significant correlation, but not considered a strong correlation, the individual may discuss this information with the individual physician and decide on an appropriate, beneficial course of action. Potential regimens that may benefit an individual in terms of a particular genotype correlation include performing therapeutic treatments, monitoring potential therapeutic needs or effects, or changing lifestyle in terms of diet, exercise, and other personal habits/activities. For example, a treatable phenotype (e.g., celiac disease) may be treated for symptoms of a gluten-free diet. Also, through pharmacogenomics, genotype related information can be applied to predict the likely response of an individual who must be treated with a particular drug or course of drug therapy, such as the likely efficacy or safety of a particular drug therapy.

The registered user may choose to provide the genomic profile and the phenotypic profile to their healthcare manager, such as a physician or genetic counselor. The genomic and phenotypic profiles may be accessed directly by the healthcare administrator, printed out as a copy by a registered user for delivery to the healthcare administrator, or sent directly to the healthcare administrator through an online portal (e.g., via a link on an online report).

The transfer of this relevant information will cause the patient to perform an action that is coordinated with his physician. In particular, discussions between a patient and his doctor may be made possible by personal portals and links to medical information and integrating the patient's genomic information into his medical records. The medical information may include prevention and health information. The information provided to an individual patient by the present invention will enable the patient to make an informed choice as to his or her health care. In this way, patients can select for diseases that may help them avoid and/or delay the more likely cause of their individual genomic profile (inherited DNA). In addition, the patient will be able to adopt a treatment regime tailored to the specific medical needs of the individual himself. Individuals will also have the ability to access their genotype data if they develop a disease and require this information to help their physician develop a treatment strategy.

Genotype-related information can also be used in conjunction with genetic counseling to suggest to couples considering fertility, as well as to suggest potential genetic concerns for the mother, father, and/or child. The genetic advisor can provide information and support to registered users with a phenotype profile that shows an increased risk for a particular state or disease. They can interpret information about the condition, analyze genetic patterns and risk of recurrence, and discuss available choices with registered users. The genetic counselor can also provide supportive consultations to recommend community or national support services to registered users. Genetic counseling may include a specific registration plan. In some embodiments, the genetic counseling may be scheduled to be available within 24 hours of the request and for times such as evening, saturday, sunday, and/or holiday.

The entry of the individual will also facilitate the transfer of additional information beyond the initial screening. Individuals will be informed of new scientific discoveries about their personal genetic profile, such as information about new therapeutic or prophylactic strategies for their current or potential state. New findings may also be communicated to their healthcare managers. In a preferred embodiment, the registered user or their healthcare provider is electronically notified of new genotype correlations and new studies about the phenotypes in the phenotype profile of the registered user. In other embodiments, an email of the "entertainment" phenotype is sent to registered users, e.g., an email can inform them that 77% of their genomic profile is the same as that of arabian-lincoln and that further information is provided through an online portal.

The present invention also provides a computer code system for generating new rules, revising rules, combining rules, periodically updating rule sets with new rules, securely maintaining a genomic profile database, applying rules to genomic profiles to determine phenotypic profiles, and for generating reports. The computer code informs the registered user of new or revised correlations and new or revised reports, such as reports with new prevention and health information, information about new treatments under development, or available new treatments.

Business method

The present invention provides a commercial method for assessing genotype correlations of individuals based on a comparison of a patient's genomic profile to a clinical database of established medically relevant nucleotide variants. The present invention further provides a business method that uses a stored genomic profile of an individual to assess initially unknown novel correlations to generate an updated phenotypic profile of the individual without requiring the individual to submit additional biological samples. Fig. 9 is a flow chart illustrating the business method.

The revenue stream for the commercial process of the present invention is generated in part in step 101 when an individual initially requests and purchases a personal genome map for genotypic correlations of a variety of common human diseases, conditions and physical states. The request and purchase may be made from a number of sources including, but not limited to, an online web portal, an online health service, and an individual's individual doctor or similar source of personal medical attention. In alternative embodiments, the genomic profile may be provided free of charge and the revenue stream may be generated in a subsequent step (e.g., step 103).

A registered user or consumer makes a request to purchase a form spectrum. The collection kit is provided to the consumer in response to the demand and purchase for collecting the biological sample for genetic sample isolation in step 103. When requested by a source that is online, by telephone, or other such that a consumer cannot readily obtain the collection kit in person, the collection kit is provided by courier, such as a courier service that is delivered on the day or at night. Included in the collection kit are containers for the sample and packaging materials for rapid delivery of the sample to the laboratory where the genomic map is generated. The kit may also include instructions for sending the sample to a sample processing facility or laboratory and instructions for accessing its genomic and phenotypic profiles, which may be performed through an online portal.

As explained in detail above, genomic DNA can be obtained from any of a variety of types of biological samples. Preferably, genomic DNA is isolated from saliva using a commercially available collection kit (e.g., a kit available from DNA Genotek). The use of saliva and such a kit enables non-invasive sample collection, since it is convenient for the consumer to provide a saliva sample in a container from the collection kit, and then seal the container. Additionally, saliva samples can be stored and transported at room temperature.

After the biological sample is deposited in the collection or specimen container, the consumer delivers the sample to the laboratory for processing in step 105. Typically, the consumer may use the packaging material provided in the collection kit to deliver/send the sample to the laboratory through rapid delivery, such as a courier service on the same day or overnight.

Laboratories that process samples and generate genomic maps may follow appropriate government agency guidelines and regulations. For example, in the united states, a treatment laboratory may be managed by one or more federal agencies and/or one or more state agencies, such as the Food and Drug Administration (FDA) or the Centers for medical and medical id Services (CMS). Clinical laboratories in the United states may be licensed or approved according to Clinical Laboratory Improvement Algorithms (CLIA) of 1988.

In step 107, the sample is processed as previously described by the laboratory to isolate a genetic sample of DNA or RNA. The isolated genetic sample is then analyzed and a genomic map is generated in step 109. Preferably, a genomic SNP profile is generated. As described above, several methods can be used to generate SNP profiles. Preferably, high density arrays (e.g., commercially available platforms from Affymetrix or Illumina) are used for SNP identification and profiling. For example, as described in more detail above, SNP profiles were generated using Affymetrix GeneChipassay. As technology evolves, there may be other technology vendors that can generate high density SNP profiles. In another embodiment, the genomic profile of the registered user will be the genomic sequence of the registered user.

After the genomic profile of the individual is generated, the genotype data is preferably encrypted, entered in step 111, and deposited in an encrypted database or vault where the information is stored for future use in step 113. The genomic profile and related information may be confidential, with access to this private information and genomic profile being restricted according to the instructions of the individual and/or his or her individual physician. Others (e.g., the family of the individual and a genetic counselor) may also be granted access by the registered user.

The database or vault may be located locally at the processing laboratory. Alternatively, the database may be located at a separate location. In this case, the genomic map data generated by the processing laboratory may be transported to a separate facility comprising a database in step 111.

After generating the genomic profile of the individual, the genetic variation of the individual is then compared to a clinical database of determined medically relevant genetic variants in step 115. Alternatively, the genotype correlations may not be medically relevant but still be included in the genotype correlation database, for example, physical traits such as eye color, or "entertainment" phenotypes such as similarity to a celebrity genome map.

Medically relevant SNPs can be established through scientific literature and related sources. non-SNP genetic variants can also be established to associate with a phenotype. Typically, the association of SNPs for a given disease is established by comparing the haplotype patterns of a group of people known to already have the disease to a group of people without the disease. By analyzing many individuals, the frequency of polymorphisms in a population can be determined, and in turn these genotype frequencies can be correlated with a particular phenotype (e.g., disease or condition). Alternatively, the phenotype may be a non-medical condition.

Related SNPs and non-SNP genetic variants can also be determined by analyzing stored genomic maps of individuals, rather than by available published literature. Individuals with stored genomic profiles may reveal phenotypes that have been previously determined. Analysis of the genotype and revealed phenotype of an individual can be compared to individuals without the phenotype to determine correlations that can then be used for other genomic profiles. Individuals whose genomic profile is determined may fill out a questionnaire regarding the phenotypes that have been previously determined. The questionnaire may include questions about medical and non-medical conditions, such as previously diagnosed diseases, family history of medical conditions, lifestyle, physical traits, mental traits, age, social life, environment, and the like.

In one embodiment, if an individual fills out a questionnaire, they can determine their genomic profile for free. In some embodiments, individuals fill out questionnaires periodically to gain free access to their profile and reports. In other embodiments, individuals who have filled out a questionnaire may be given an upgrade to the registration so that they have a higher level of access than their previous registrations, or they may purchase or update the registration at a lower price.

To ensure scientific accuracy and importance, all information deposited in the medically relevant database of genetic variants in step 121 is first approved by a research/clinical advisor group and, if authorized in step 119, reviewed and supervised by appropriate governmental agencies. For example, in the united states, the FDA may supervise by approving algorithms for validating data relating to genetic variants (typically SNPs, transcript levels, or mutations). In step 123, the scientific literature and other relevant sources are monitored for additional genetic variant-disease or condition correlations, and after confirming their accuracy and importance, and upon review and approval by governmental agencies, these additional genotype correlations are added to the master database in step 125.

The combination of a database of approved and validated medically relevant genetic variants with a genome-wide individual profile will advantageously allow genetic risk assessment of a large number of diseases or conditions. After compiling a genomic profile of an individual, the genotype correlations of the individual may be determined by comparing nucleotide (genetic) variants or genetic markers of the individual to a database of human nucleotide variants that have been associated with a particular phenotype (e.g., a disease, state, or physical state). By comparing the individual's genomic profile to a master database of genotype correlations, individuals can be informed whether and to what extent they find positive or negative for genetic risk factors. Individuals will receive relative risk and/or disease constitution data for a wide range of scientifically proven disease states (e.g., alzheimer's disease, cardiovascular disease, coagulation). For example, genotype correlations in table 1 may be included. In addition, SNP disease correlations in the database may include, but are not limited to, those shown in fig. 4. Other correlations in fig. 5 and 6 may also be included. The business method of the present invention thus provides risk analysis for a large number of diseases and conditions without the need to know in advance what risks those diseases and conditions may cause.

In other embodiments, the genotypic correlation associated with the genome-wide individual profile is a non-medically relevant phenotype, such as a "recreational" phenotype or a physical trait such as hair color. In a preferred embodiment, the rule or rule set is applied to a genomic map or SNP map of the individual, as described above. Applying the rules to the genomic profile generates a phenotypic profile for the individual.

Thus, when new correlations are discovered and validated, the master database of human genotype correlations is expanded with additional genotype correlations. Updates may be made by accessing relevant information from individual genomic profiles stored in a database, as needed or appropriate. For example, the known correlations of new genotypes may be based on specific gene variants. It can then be determined whether an individual is likely to be affected by the new genotype correlation by obtaining and comparing only a portion of the gene in the individual's complete genomic profile.

The results of the genomic query are preferably analyzed and interpreted for presentation to the individual in an understandable format. The results of the initial screening are then provided to the patient in a secure, confidential manner, either by mail or through an online portal interface as described in detail above, step 117.

The report may include a phenotype profile as well as genomic information about the phenotypes in the phenotype profile, e.g., basic genetic information about the genes involved or statistical information about the genetic variants in different populations. Other information based on phenotype profiles that may be included in the report are preventive strategies, health information, treatment methods, symptom recognition, early detection protocols, intervention protocols, and further identification and classification of phenotypes. Controlled, modest updates may be or may be performed after initial screening of the genomic profile of the individual.

When new genotype correlations arise and are verified and approved, the individual genomic profile is updated or available for updating in conjunction with updates to the master database. New rules based on new genotype correlations may be applied to the initial genomic profile to provide an updated phenotypic profile. An updated genotype correlation profile may be generated by comparing the relevant portion of the genomic profile of the individual to the new genotype correlations in step 127. For example, if a new genotype correlation is found based on variations in a particular gene, the portion of the gene that maps to the genome of the individual can be analyzed for the new genotype correlation. In this case, one or more rules may be applied to generate an updated tabular form, rather than updating the tabular form with the entire rule set having the rules already applied. In step 129, the results of the updated genotype correlations for the individual are provided in an encrypted manner.

The initial and updated phenotype profiles may be services provided to registered users or consumers. Different levels of registration for genomic profiling and combinations thereof may be provided. Likewise, the registration level may be varied to provide individuals with a choice of the amount of service they wish to receive with their genotype correlations. In this way, the level of service provided will vary with the registration level of the service purchased by the individual.

Entry level registration of registered users may include genomic profiles and initial phenotype profiles. This may be the base registration level. There may be different levels of service within the base registration level. For example, a particular registration level may provide an introduction to genetic counseling, doctors with special expertise in treating or preventing a particular disease, and other service options. Genetic counseling can be obtained online or by telephone. In another embodiment, the price of the enrollment may depend on the number of phenotypes the individual selects for their phenotype profile. Another option might be whether the registered user chooses to access online genetic counseling.

In another case, registration may provide an initial genotypic correlation of the whole genome while maintaining the genomic profile of the individual in the database; this database may be encrypted if the individual so chooses. After this initial analysis, subsequent analyses and additional results may be completed upon request and additional payment by the individual. This may be a high level registration.

In one embodiment of the business method of the present invention, an update of the risk of the individual is made and the individual may be provided with corresponding information on a registered basis. Registered users who purchase advanced registrations may obtain updates. Registration for genotype correlation analysis can provide an update of a particular type or subclass of new genotype correlations according to individual preferences. For example, an individual may only wish to learn about the existence of genotype correlations for known therapeutic or prophylactic processes. To assist the individual in deciding whether to perform additional analyses, the individual may be provided with information regarding additional genotype correlations that have been made available. This information can be conveniently mailed or emailed to registered users.

In advanced registrations, there may be more service levels, such as those mentioned in the basic registration. Other registration modes may be provided in a high level. For example, the highest ranking may provide unlimited updates and reports to registered users. The profile of registered users may be updated when new correlations and rules are determined. In this level, registered users may also allow access to an unlimited number of individuals, such as family members and healthcare managers. Registered users may also have unlimited access to online genetic consultants and physicians.

The next registration level within the high hierarchy may provide more limited aspects, such as a limited number of updates. Registered users may make a limited number of updates to their genomic profile during the registration period, e.g., 4 times a year. In another registration level, registered users may update their stored genomic profile once a week, once a month, or once a year. In another embodiment, registered users may only have a limited number of phenotypes that may choose to update their genomic profile.

The personal portal will also conveniently enable individuals to maintain a registry of risk or relevance updates and/or information updates, or request updated risk assessments and information. As described above, different levels of registration may be provided to enable an individual to select various levels of genotype correlation results and updates, and registered users may select different levels of registration through their personal portals.

Any of these registration options will contribute to the revenue stream for the business method of the present invention. The revenue stream for the commercial process of the present invention is also increased by adding new consumers and registered users, wherein new genomic profiles are added to the database.

Table 1: a representative gene having a phenotype-associated genetic variant.

Gene	Phenotype
		A2M	Alzheimer's disease
ABCA1	Cholesterol, HDL
		ABCB1	HIV
ABCB1	Epilepsy
		ABCB1	Complications of renal transplantation
ABCB1	Digoxin, serum concentration
		ABCB1	Crohn's disease; ulcerative colitis
ABCB1	Parkinson's disease
		ABCC8	Type 2 diabetes mellitus
ABCC8	Diabetes mellitus, type 2
		ABO	Myocardial infarction
ACADM	Medium chain acyl-CoA dehydrogenase deficiency
		ACDC	Type 2, diabetes mellitus
ACE	Type 2 diabetes mellitus
		ACE	Hypertension (hypertension)
ACE	Alzheimer's disease
		ACE	Myocardial infarction
ACE	Cardiovascular
		ACE	Left ventricular hypertrophy
ACE	Coronary artery disease
		ACE	Atherosclerosis, coronary sclerosis
ACE	For retinopathy, diabetes
		ACE	Systemic Lupus Erythematosus (SLE)
ACE	Blood pressure, of the arteries
		ACE	Erectile dysfunction
ACE	Lupus (Lupus)

Gene	Phenotype
		ACE	Polycystic kidney disease
ACE	Apoplexy (apoplexy)
		ACP1	Diabetes mellitus, type 1
ACSM1(LIP)c	Cholesterol levels
		ADAM33	Asthma (asthma)
ADD1	Hypertension (hypertension)
		ADD1	Blood pressure, of the arteries
ADH1B	Abuse of alcohol
		ADH1C	Abuse of alcohol
ADIPOQ	Diabetes mellitus, type 2
		ADIPOQ	Obesity
ADORA2A	Panic disorder
		ADRB1	Hypertension (hypertension)
ADRB1	Heart failure
		ADRB2	Asthma (asthma)
ADRB2	Hypertension (hypertension)
		ADRB2	Obesity
ADRB2	Blood pressure, of the arteries
		ADRB2	Type 2 diabetes mellitus
ADRB3	Obesity
		ADRB3	Type 2 diabetes mellitus
ADRB3	Hypertension (hypertension)
		AGT	Hypertension (hypertension)
AGT	Type 2 diabetes mellitus
		AGT	Essential hypertension
AGT	Myocardial infarction
		AGTR1	Hypertension (hypertension)

Gene	Phenotype
		AGTR2	Hypertension (hypertension)
AHR	Breast cancer
		ALAD	Toxicity of lead
ALDH2	Alcoholism
		ALDH2	Abuse of alcohol
ALDH2	Colorectal cancer
		ALDRL2	Type 2 diabetes mellitus
ALOX5	Asthma (asthma)
		ALOX5AP	Asthma (asthma)
APBB1	Alzheimer's disease
		APC	Colorectal cancer
APEX1	Lung cancer
		APOA1	Atherosclerosis, coronary
APOA1	Cholesterol, HDL
		APOA1	Coronary artery disease
APOA1	Type 2 diabetes mellitus
		APOA4	Type 2 diabetes mellitus
APOA5	Triglycerides
		APOA5	Atherosclerosis, coronary
APOB	Hypercholesterolemia with high blood pressure
		APOB	Obesity
APOB	Cardiovascular
		APOB	Coronary artery disease
APOB	Coronary heart disease
		APOB	Type 2 diabetes mellitus
APOC1	Alzheimer's disease
		APOC3	Triglycerides

Gene	Phenotype
		APOC3	Type 2 diabetes mellitus
APOE	Alzheimer's disease
		APOE	Type 2 diabetes mellitus
APOE	Multiple sclerosis
		APOE	Atherosclerosis, coronary
APOE	Parkinson's disease
		APOE	Coronary heart disease
APOE	Myocardial infarction
		APOE	Apoplexy (apoplexy)
APOE	Alzheimer's disease
		APOE	Coronary artery disease
APP	Alzheimer's disease
		AR	Prostate cancer
AR	Breast cancer
		ATM	Breast cancer
ATP7B	Wilson's disease
		ATXN8OS	Spinocerebellar ataxia
BACE1	Alzheimer's disease
		BCHE	Alzheimer's disease
BDKRB2	Hypertension (hypertension)
		BDNF	Alzheimer's disease
BDNF	Bipolar disorder
		BDNF	Parkinson's disease
BDNF	Schizophrenia
		BDNF	Memory power
BGLAP	Bone mineral density
		BRAF	Thyroid cancer

Gene	Phenotype
		BRCA1	Breast cancer
BRCA1	Breast cancer; ovarian cancer
		BRCA1	Ovarian cancer
BRCA2	Breast cancer
		BRCA2	Breast cancer; ovarian cancer
BRCA2	Ovarian cancer
		BRIP1	Breast cancer
C4A	Systemic Lupus Erythematosus (SLE)
		CALCR	Bone mineral density
CAMTA1	Scenario memory
		CAPN10	Diabetes mellitus, type 2
CAPN10	Type 2 diabetes mellitus
		CAPN3	Muscular dystrophy
CARD15	Crohn's disease
		CARD15	Crohn's disease; ulcerative colitis
CARD15	Inflammatory bowel disease
		CART	Obesity
CASR	Bone mineral density
		CCKAR	Schizophrenia
CCL2	Systemic Lupus Erythematosus (SLE)
		CCL5	HIV
CCL5	Asthma (asthma)
		CCND1	Colorectal cancer
CCR2	HIV
		CCR2	HIV infection
CCR2	Hepatitis C
		CCR2	Myocardial infarction

Gene	Phenotype
		CCR3	Asthma (asthma)
CCR5	HIV
		CCR5	HIV infection
CCR5	Hepatitis C
		CCR5	Asthma (asthma)
CCR5	Multiple sclerosis
		CD14	Specific reactivity (atopy)
CD14	Asthma (asthma)
		CD14	Crohn's disease
CD14	Crohn's disease; ulcerative colitis
		CD14	Periodontitis
CD14	Total IgE
		CDH1	Prostate cancer
CDH1	Colorectal cancer
		CDKN2A	Melanoma (MEA)
CDSN	Psoriasis vulgaris
		CEBPA	Of leukemia, bone marrow
CETP	Atherosclerosis, coronary
		CETP	Coronary heart disease
CETP	Hypercholesterolemia with high blood pressure
		CFH	Macular degeneration
CFTR	Cystic fibrosis
		CFTR	Pancreatitis
CFTR	Cystic fibrosis
		CHAT	Alzheimer's disease
CHEK2	Breast cancer
		CHRNA7	Schizophrenia

Gene	Phenotype
		CMA1	Atopic dermatitis
CNR1	Schizophrenia
		COL1A1	Bone mineral density
COL1A1	Osteoporosis and its preparation method
		COL1A2	Bone mineral density
COL2A1	Osteoarthritis
		COMT	Schizophrenia
COMT	Breast cancer
		COMT	Parkinson's disease
COMT	Bipolar disorder
		COMT	Obsessive compulsive neurosis
COMT	Alcoholism
		CR1	Systemic Lupus Erythematosus (SLE)
CRP	C-reactive protein
		CST3	Alzheimer's disease
CTLA4	Type 1 diabetes mellitus
		CTLA4	Graves' disease
CTLA4	Multiple sclerosis
		CTLA4	Rheumatoid arthritis
CTLA4	Systemic Lupus Erythematosus (SLE)
		CTLA4	Lupus erythematosus (lupus erythematosus)
CTLA4	Celiac disease
		CTSD	Alzheimer's disease
CX3CR1	HIV
		CXCL12	HIV
CXCL12	HIV infection
		CYBA	Atherosclerosis, coronary

Gene	Phenotype
		CYBA	Hypertension (hypertension)
CYP11B2	Hypertension (hypertension)
		CYP11B2	Left ventricular hypertrophy
CYP17A1	Breast cancer
		CYP17A1	Prostate cancer
CYP17A1	Endometriosis of the endometrium
		CYP17A1	Endometrial cancer
CYP19A1	Breast cancer
		CYP19A1	Prostate cancer
CYP19A1	Endometriosis of the endometrium
		CYP1A1	Lung cancer
CYP1A1	Breast cancer
		CYP1A1	Colorectal cancer
CYP1A1	Prostate cancer
		CYP1A1	Esophageal cancer
CYP1A1	Endometriosis of the endometrium
		CYP1A1	Cytogenesis study
CYP1A2	Schizophrenia
		CYP1A2	Colorectal cancer
CYP1B1	Breast cancer
		CYP1B1	Glaucoma treatment
CYP1B1	Prostate cancer
		CYP21A2	Deletion of 21-hydroxylase
CYP21A2	Congenital adrenal hyperplasia
		CYP21A2	Adrenal hyperplasia, congenital
CYP2A6	Smoking behaviour
		CYP2A6	Nicotine

Gene	Phenotype
		CYP2A6	Lung cancer
CYP2C19	Infection of helicobacter pylori
		CYP2C19	Phenytoin
CYP2C19	Stomach disease
		CYP2C8	Malaria, plasmodium falciparum
CYP2C9	Anticoagulant complications
		CYP2C9	Sensitivity to Fahualing
CYP2C9	Favallin treatment, response thereof
		CYP2C9	Colorectal cancer
CYP2C9	Phenytoin
		CYP2C9	Reaction of acetonitre and coumaryl alcohol
CYP2C9	Blood coagulation disorders
		CYP2C9	Hypertension (hypertension)
CYP2D6	Colorectal cancer
		CYP2D6	Parkinson's disease
CYP2D6	CYP2D6 undesirable metaboliser phenotype
		CYP2E1	Lung cancer
CYP2E1	Colorectal cancer
		CYP3A4	Prostate cancer
CYP3A5	Prostate cancer
		CYP3A5	Esophageal cancer
CYP46A1	Alzheimer's disease
		DBH	Schizophrenia
DHCR7	Stern-Lon-Ouder syndrome
		DISC1	Schizophrenia
DLST	Alzheimer's disease
		DMD	Muscular dystrophy

Gene	Phenotype
		DRD2	Alcoholism
DRD2	Schizophrenia
		DRD2	Smoking behaviour
DRD2	Parkinson's disease
		DRD2	Tardive dyskinesia
DRD3	Schizophrenia
		DRD3	Tardive dyskinesia
DRD3	Bipolar disorder
		DRD4	Attention deficit disorder with hyperactivity]
DRD4	Schizophrenia
		DRD4	New pursuit (novelty seek)
DRD4	ADHD
		DRD4	Personality quality
DRD4	Abuse of heroin
		DRD4	Abuse of alcohol
DRD4	Alcoholism
		DRD4	Personality disorder
DTNBP1	Schizophrenia
		EDN1	Hypertension (hypertension)
EGFR	Lung cancer
		ELAC2	Prostate cancer
ENPP1	Type 2 diabetes mellitus
		EPHB2	Prostate cancer
EPHX1	Lung cancer
		EPHX1	Colorectal cancer
EPHX1	Cell generation study
		EPHX1	Chronic obstructive pulmonary disease/COPD

Gene	Phenotype
		ERBB2	Breast cancer
ERCC1	Lung cancer
		ERCC1	Colorectal cancer
ERCC2	Lung cancer
		ERCC2	Cell generation study
ERCC2	Cancer of the bladder
		ERCC2	Colorectal cancer
ESR1	Bone mineral density
		ESR1	Bone mineral density
ESR1	Breast cancer
		ESR1	Endometriosis of the endometrium
ESR1	Osteoporosis and its preparation method
		ESR2	Bone mineral density
ESR2	Breast cancer
		Estrogen receptors	Bone mineral density
F2	Coronary heart disease
		F2	Apoplexy (apoplexy)
F2	Of thromboembolism, of veins
		F2	Pre-eclampsia
F2	Thrombosis
		F5	Of thromboembolism, of veins
F5	Pre-eclampsia
		F5	Myocardial infarction
F5	Apoplexy (apoplexy)
		F5	Of stroke, ischemia
F7	Atherosclerosis, coronary
		F7	Myocardial infarction

Gene	Phenotype
		F8	Hemophilia
F9	Hemophilia
		FABP2	Type 2 diabetes mellitus
FAS	Alzheimer's disease
		FASLG	Multiple sclerosis
FCGR2A	Systemic Lupus Erythematosus (SLE)
		FCGR2A	Lupus erythematosus (lupus erythematosus)
FCGR2A	Periodontitis
		FCGR2A	Rheumatoid arthritis
FCGR2B	Lupus erythematosus (lupus erythematosus)
		FCGR2B	Systemic Lupus Erythematosus (SLE)
FCGR3A	Systemic Lupus Erythematosus (SLE)
		FCGR3A	Lupus erythematosus (lupus erythematosus)
FCGR3A	Periodontitis
		FCGR3A	Arthritis (arthritis)
FCGR3A	Rheumatoid arthritis
		FCGR3B	Periodontitis
FCGR3B	Periodontal disease
		FCGR3B	Lupus erythematosus (lupus erythematosus)
FGB	Fibrinogen
		FGB	Myocardial infarction
FGB	Coronary heart disease
		FLT3	Of leukemia, bone marrow
FLT3	Leukemia (leukemia)
		FMR1	Fragile X syndrome
FRAXA	Fragile X syndrome
		FUT2	Infection of helicobacter pylori

Gene	Phenotype
		FVL	Factor V Leiden
G6PD	Deletion of G6PD
		G6PD	Hyperbilirubinemia
GABRA5	Bipolar disorder
		GBA	Gaucher disease
GBA	Parkinson's disease
		GCGR(FAAH，ML4R，UCP2)	Body weight/obesity
GCK	Type 2 diabetes mellitus
		GCLM(F12，TLR4)	Atherosclerosis, myocardial infarction
GDNF	Schizophrenia
		GHRL	Obesity
GJB1	Charcot Marie-picture thinking disease
		GJB2	Deafness
GJB2	Of hearing loss, sensory nerve non-syndromic
		GJB2	Of hearing loss, sensory nerves
GJB2	Hearing loss/deafness
		GJB6	Of hearing loss, sensory nerve non-syndromic
GJB6	Hearing loss/deafness
		GNAS	Hypertension (hypertension)
GNB3	Hypertension (hypertension)
		GPX1	Lung cancer
GRIN1	Schizophrenia
		GRIN2B	Schizophrenia
GSK3B	Bipolar disorder
		GSTM1	Lung cancer
GSTM1	Colorectal cancer
		GSTM1	Breast cancer

Gene	Phenotype
		GSTM1	Prostate cancer
GSTM1	Cell generation study
		GSTM1	Cancer of the bladder
GSTM1	Esophageal cancer
		GSTM1	Head and neck cancer
GSTM1	Leukemia (leukemia)
		GSTM1	Parkinson's disease
GSTM1	Stomach cancer
		GSTP1	Lung cancer
GSTP1	Colorectal cancer
		GSTP1	Breast cancer
GSTP1	Cell generation study
		GSTP1	Prostate cancer
GSTT1	Lung cancer
		GSTT1	Colorectal cancer
GSTT1	Breast cancer
		GSTT1	Prostate cancer
GSTT1	Cancer of the bladder
		GSTT1	Cell generation study
GSTT1	Asthma (asthma)
		GSTT1	Toxicity of benzene
GSTT1	Esophageal cancer
		GSTT1	Head and neck cancer
GYS1	Type 2 diabetes mellitus
		HBB	Thalassemia
HBB	Thalassemia, beta-
		HD	Huntington's chorea

Gene	Phenotype
		HFE	Hemochromatosis
HFE	Iron level
		HFE	Colorectal cancer
HK2	Type 2 diabetes mellitus
		HLA	Rheumatoid arthritis
HLA	Type 1 diabetes mellitus
		HLA	Behcet's disease
HLA	Celiac disease
		HLA	Psoriasis vulgaris
HLA	Graves disease
		HLA	Multiple sclerosis
HLA	Schizophrenia
		HLA	Asthma (asthma)
HLA	Diabetes mellitus
		HLA	Lupus (Lupus)
HLA-A	Leukemia (leukemia)
		HLA-A	HIV
HLA-A	Diabetes mellitus, type 1
		HLA-A	Graft versus host disease
HLA-A	Multiple sclerosis
		HLA-B	Leukemia (leukemia)
HLA-B	Behcet's disease
		HLA-B	Celiac disease
HLA-B	Diabetes mellitus, type 1
		HLA-B	Graft versus host disease
HLA-B	Sarcoidosis of meat type
		HLA-C	Psoriasis vulgaris

Gene	Phenotype
		HLA-DPA1	Measles, measles and other diseases
HLA-DPB1	Diabetes mellitus, type 1
		HLA-DPB1	Asthma (asthma)
HLA-DQA1	Diabetes mellitus, type 1
		HLA-DQA1	Celiac disease
HLA-DQA1	Cervical cancer
		HLA-DQA1	Asthma (asthma)
HLA-DQA1	Multiple sclerosis
		HLA-DQA1	Diabetes, type 2; diabetes mellitus, type 1
HLA-DQA1	Lupus erythematosus (lupus erythematosus)
		HLA-DQA1	Loss of pregnancy, relapse
HLA-DQA1	Psoriasis vulgaris
		HLA-DQB1	Diabetes mellitus, type 1
HLA-DQB1	Celiac disease
		HLA-DQB1	Multiple sclerosis
HLA-DQB1	Cervical cancer
		HLA-DQB1	Lupus erythematosus (lupus erythematosus)
HLA-DQB1	Loss of pregnancy, relapse
		HLA-DQB1	Arthritis (arthritis)
HLA-DQB1	Asthma (asthma)
		HLA-DQB1	HIV
HLA-DQB1	Lymphoma (lymphoma)
		HLA-DQB1	Tuberculosis (tuberculosis)
HLA-DQB1	Rheumatoid arthritis
		HLA-DQB1	Diabetes mellitus, type 2
HLA-DQB1	Graft versus host disease
		HLA-DQB1	Narcolepsy

Gene	Phenotype
		HLA-DQB1	Arthritis, rheumatic
HLA-DQB1	Cholangitis, sclerosing
		HLA-DQB1	Diabetes, type 2; diabetes mellitus, type 1
HLA-DQB1	Graves' disease
		HLA-DQB1	Hepatitis C
HLA-DQB1	Hepatitis C, chronic
		HLA-DQB1	Malaria
HLA-DQB1	Malaria, plasmodium falciparum
		HLA-DQB1	Melanoma (MEA)
HLA-DQB1	Psoriasis vulgaris
		HLA-DQB1	Sjogren's syndrome
HLA-DQB1	Systemic Lupus Erythematosus (SLE)
		HLA-DRB1	Diabetes mellitus, type 1
HLA-DRB1	Multiple sclerosis
		HLA-DRB1	Systemic Lupus Erythematosus (SLE)
HLA-DRB1	Rheumatoid arthritis
		HLA-DRB1	Cervical cancer
HLA-DRB1	Arthritis (arthritis)
		HLA-DRB1	Celiac disease
HLA-DRB1	Lupus erythematosus (lupus erythematosus)
		HLA-DRB1	Sarcoidosis of meat type
HLA-DRB1	HIV
		HLA-DRB1	Tuberculosis (tuberculosis)
HLA-DRB1	Graves' disease
		HLA-DRB1	Lymphoma (lymphoma)
HLA-DRB1	Psoriasis vulgaris
		HLA-DRB1	Asthma (asthma)

Gene	Phenotype
		HLA-DRB1	Crohn's disease
HLA-DRB1	Graft versus host disease
		HLA-DRB1	Hepatitis C, chronic
HLA-DRB1	Narcolepsy
		HLA-DRB1	Sclerosis, systemic
HLA-DRB1	Sjogren's syndrome
		HLA-DRB1	Type 1 diabetes mellitus
HLA-DRB1	Arthritis, rheumatic
		HLA-DRB1	Cholangitis, sclerosing
HLA-DRB1	Diabetes, type 2; diabetes mellitus, type 1
		HLA-DRB1	Infection of helicobacter pylori
HLA-DRB1	Hepatitis C
		HLA-DRB1	Arthritis of teenagers
HLA-DRB1	Leukemia (leukemia)
		HLA-DRB1	Malaria
HLA-DRB1	Melanoma (MEA)
		HLA-DRB1	Loss of pregnancy, relapse
HLA-DRB3	Psoriasis vulgaris
		HLA-G	Loss of pregnancy, relapse
HMOX1	Atherosclerosis, coronary
		HNF4A	Diabetes mellitus, type 2
HNF4A	Type 2 diabetes mellitus
		HSD11B2	Hypertension (hypertension)
HSD17B1	Breast cancer
		HTR1A	Depression, major type
HTR1B	Dependence on alcohol
		HTR1B	Alcoholism

Gene	Phenotype
		HTR2A	Memory power
HTR2A	Schizophrenia
		HTR2A	Bipolar disorder
HTR2A	Depression (depression)
		HTR2A	Depression, major type
HTR2A	Suicide
		HTR2A	Alzheimer's disease
HTR2A	Anorexia nervosa
		HTR2A	Hypertension (hypertension)
HTR2A	Obsessive compulsive neurosis
		HTR2C	Schizophrenia
HTR6	Alzheimer's disease
		HTR6	Schizophrenia
HTRA1	Wet age-related macular degeneration
		IAPP	Type 2 diabetes mellitus
IDE	Alzheimer's disease
		IFNG	Tuberculosis (tuberculosis)
IFNG	Type 1 diabetes mellitus
		IFNG	Graft versus host disease
IFNG	Hepatitis B
		IFNG	Multiple sclerosis
IFNG	Asthma (asthma)
		IFNG	Breast cancer
IFNG	Kidney transplantation
		IFNG	Complications of renal transplantation
IFNG	Long service life
		IFNG	Loss of pregnancy, relapse

Gene	Phenotype
		IGFBP3	Breast cancer
IGFBP3	Prostate cancer
		IL10	Systemic Lupus Erythematosus (SLE)
IL10	Asthma (asthma)
		IL10	Graft versus host disease
IL10	HIV
		IL10	Kidney transplantation
IL10	Complications of renal transplantation
		IL10	Hepatitis B
IL10	Arthritis of teenagers
		IL10	Long service life
IL10	Multiple sclerosis
		IL10	Loss of pregnancy, relapse
IL10	Rheumatoid arthritis
		IL10	Tuberculosis (tuberculosis)
IL12B	Type 1 diabetes mellitus
		IL12B	Asthma (asthma)
IL13	Asthma (asthma)
		IL13	Specific reactivity
IL13	Chronic obstructive pulmonary disease/COPD
		IL13	Graves' disease
IL1A	Periodontitis
		IL1A	Alzheimer's disease
IL1B	Periodontitis
		IL1B	Alzheimer's disease
IL1B	Stomach cancer
		IL1R1	Type 1 diabetes mellitus

Gene	Phenotype
		IL1RN	Stomach cancer
IL2	Asthma; eczema; allergic diseases
		IL4	Asthma (asthma)
IL4	Specific reactivity
		IL4	HIV
IL4R	Asthma (asthma)
		IL4R	Specific reactivity
IL4R	Total serum IgE
		IL6	Bone mineralization
IL6	Kidney transplantation
		IL6	Complications of renal transplantation
IL6	Long service life
		IL6	Multiple sclerosis
IL6	Bone mineral density
		IL6	Bone mineral density
IL6	Colorectal cancer
		IL6	Arthritis of teenagers
IL6	Rheumatoid arthritis
		IL9	Asthma (asthma)
INHA	Premature ovarian failure
		INS	Type 1 diabetes mellitus
INS	Type 2 diabetes mellitus
		INS	Diabetes mellitus, type 1
INS	Obesity
		INS	Prostate cancer
INSIG2	Obesity
		INSR	Type 2 diabetes mellitus

Gene	Phenotype
		INSR	Hypertension (hypertension)
INSR	Polycystic ovarian syndrome
		IPF1	Diabetes mellitus, type 2
IRS1	Type 2 diabetes mellitus
		IRS1	Diabetes mellitus, type 2
IRS2	Diabetes mellitus, type 2
		ITGB3	Myocardial infarction
ITGB3	Atherosclerosis, coronary
		ITGB3	Coronary heart disease
ITGB3	Myocardial infarction
		KCNE1	EKG, Exception
KCNE2	EKG, Exception
		KCNH2	EKG, Exception
KCNH2	QT interval prolongation syndrome
		KCNJ11	Diabetes mellitus, type 2
KCNJ11	Type 2 diabetes mellitus
		KCNN3	Schizophrenia
KCNQ1	EKG, Exception
		KCNQ1	QT interval prolongation syndrome
KIBRA	Scenario memory
		KLK1	Hypertension (hypertension)
KLK3	Prostate cancer
		KRAS	Colorectal cancer
LDLR	Hypercholesterolemia with high blood pressure
		LDLR	Hypertension (hypertension)
LEP	Obesity
		LEPR	Obesity

Gene	Phenotype
		LIG4	Breast cancer
LIPC	Atherosclerosis, coronary
		LPL	Coronary artery disease
LPL	Hyperlipidemia
		LPL	Triglycerides
LRP1	Alzheimer's disease
		LRP5	Bone mineral density
LRRK2	Parkinson's disease
		LRRK2	Parkinson's disease
LTA	Type 1 diabetes mellitus
		LTA	Asthma (asthma)
LTA	Systemic Lupus Erythematosus (SLE)
		LTA	Septicemia
LTC4S	Asthma (asthma)
		MAOA	Alcoholism
MAOA	Schizophrenia
		MAOA	Bipolar disorder
MAOA	Smoking behaviour
		MAOA	Personality disorder
MAOB	Parkinson's disease
		MAOB	Smoking behaviour
MAPT	Parkinson's disease
		MAPT	Alzheimer's disease
MAPT	Dementia and method of treatment
		MAPT	Dementia of frontotemporal type
MAPT	Progressive supranuclear palsy
		MC1R	Melanoma (MEA)

Gene	Phenotype
		MC3R	Obesity
MC4R	Obesity
		MECP2	Rett syndrome
MEFV	Familial mediterranean fever
		MEFV	Amyloidosis of the disease
MICA	Type 1 diabetes mellitus
		MICA	Behcet's disease
MICA	Celiac disease
		MICA	Rheumatoid arthritis
MICA	Systemic Lupus Erythematosus (SLE)
		MLH1	Colorectal cancer
MME	Alzheimer's disease
		MMP1	Lung cancer
MMP1	Ovarian cancer
		MMP1	Periodontitis
MMP3	Myocardial infarction
		MMP3	Ovarian cancer
MMP3	Rheumatoid arthritis
		MPO	Lung cancer
MPO	Alzheimer's disease
		MPO	Breast cancer
MPZ	Charcot Marie-picture thinking disease
		MS4A2	Asthma (asthma)
MS4A2	Specific reactivity
		MSH2	Colorectal cancer
MSH6	Colorectal cancer
		MSR1	Prostate cancer

Gene	Phenotype
		MTHFR	Colorectal cancer
MTHFR	Type 2 diabetes mellitus
		MTHFR	Neural tube defect
MTHFR	Homocysteine
		MTHFR	Of thromboembolism, of veins
MTHFR	Atherosclerosis, coronary
		MTHFR	Alzheimer's disease
MTHFR	Esophageal cancer
		MTHFR	Pre-eclampsia
MTHFR	Loss of pregnancy, relapse
		MTHFR	Apoplexy (apoplexy)
MTHFR	Thrombosis, deep veins
		MT-ND1	Diabetes mellitus, type 2
MTR	Colorectal cancer
		MT-RNR1	Of hearing loss, sensory nerve non-syndromic
MTRR	Neural tube defect
		MTRR	Homocysteine
MT-TL1	Diabetes mellitus, type 2
		MUTYH	Colorectal cancer
MYBPC3	Cardiomyopathy
		MYH7	Cardiomyopathy
MYOC	Glaucoma, primary open angle
		MYOC	Glaucoma treatment
NAT1	Colorectal cancer
		NAT1	Breast cancer
NAT1	Cancer of the bladder
		NAT2	Colorectal cancer

Gene	Phenotype
		NAT2	Cancer of the bladder
NAT2	Breast cancer
		NAT2	Lung cancer
NBN	Breast cancer
		NCOA3	Breast cancer
NCSTN	Alzheimer's disease
		NEUROD1	Type 1 diabetes mellitus
NF1	Neurofibromatosis 1
		NOS1	Asthma (asthma)
NOS2A	Multiple sclerosis
		NOS3	Hypertension (hypertension)
NOS3	Coronary heart disease
		NOS3	Atherosclerosis, coronary
NOS3	Coronary artery disease
		NOS3	Myocardial infarction
NOS3	Acute coronary syndrome
		NOS3	Blood pressure, of the arteries
NOS3	Pre-eclampsia
		NOS3	Nitric oxide
NOS3	Alzheimer's disease
		NOS3	Asthma (asthma)
NOS3	Type 2 diabetes mellitus
		NOS3	Cardiovascular diseases
NOS3	Behcet's disease
		NOS3	Erectile dysfunction
NOS3	Renal failure, chronic
		NOS3	Toxicity of lead

Gene	Phenotype
		NOS3	Left ventricular hypertrophy
NOS3	Loss of pregnancy, relapse
		NOS3	For retinopathy, diabetes
NOS3	Apoplexy (apoplexy)
		NOTCH4	Schizophrenia
NPY	Abuse of alcohol
		NQO1	Lung cancer
NQO1	Colorectal cancer
		NQO1	Toxicity of benzene
NQO1	Cancer of the bladder
		NQO1	Parkinson's disease
NR3C2	Hypertension (hypertension)
		NR4A2	Parkinson's disease
NRG1	Schizophrenia
		NTF3	Schizophrenia
OGG1	Lung cancer
		OGG1	Colorectal cancer
OLR1	Alzheimer's disease
		OPA1	Glaucoma treatment
OPRM1	Abuse of alcohol
		OPRM1	Dependence on drugs
OPTN	Glaucoma, primary open angle
		P450	Metabolism of drugs
PADI4	Rheumatoid arthritis
		PAH	phenylketonuria/PKU
PAI1	Coronary heart disease
		PAI1	Asthma (asthma)

Gene	Phenotype
		PALB2	Breast cancer
PARK2	Parkinson's disease
		PARK7	Parkinson's disease
PDCD1	Lupus erythematosus (lupus erythematosus)
		PINK1	Parkinson's disease
PKA	Memory power
		PKC	Memory power
PLA2G4A	Schizophrenia
		PNOC	Schizophrenia
POMC	Obesity
		PON1	Atherosclerosis, coronary
PON1	Parkinson's disease
		PON1	Type 2 diabetes mellitus
PON1	Atherosclerosis of arteries
		PON1	Coronary artery disease
PON1	Coronary heart disease
		PON1	Alzheimer's disease
PON1	Long service life
		PON2	Atherosclerosis, coronary
PON2	Premature delivery
		PPARG	Type 2 diabetes mellitus
PPARG	Obesity
		PPARG	Diabetes mellitus, type 2
PPARG	Colorectal cancer
		PPARG	Hypertension (hypertension)
PPARGC1A	Diabetes mellitus, type 2
		PRKCZ	Type 2 diabetes mellitus

Gene	Phenotype
		PRL	Systemic Lupus Erythematosus (SLE)
PRNP	AAlzheimer's disease
		PRNP	Creutzfeldt-Jakob disease
PRNP	Yak-Ke-Shi disease
		PRODH	Schizophrenia
PRSS1	Pancreatitis
		PSEN1	Alzheimer's disease
PSEN2	Alzheimer's disease
		PSMB8	Type 1 diabetes mellitus
PSMB9	Type 1 diabetes mellitus
		PTCH	Skin cancer, non-melanoma
PTGIS	Hypertension (hypertension)
		PTGS2	Colorectal cancer
PTH	Bone mineral density
		PTPN11	Noonan syndrome
PTPN22	Rheumatoid arthritis
		PTPRC	Multiple sclerosis
PVT1	End stage renal disease
		RAD51	Breast cancer
RAGE	For retinopathy, diabetes
		RB1	Retinoblastoma
RELN	Schizophrenia
		REN	Hypertension (hypertension)
RET	Thyroid cancer
		RET	Hischutton's disease
RFC1	Neural tube defect
		RGS4	Schizophrenia

Gene	Phenotype
		RHO	Retinitis pigmentosa
RNASEL	Prostate cancer
		RYR1	Malignant hyperthermia
SAA1	Amyloidosis of the disease
		SCG2	Hypertension (hypertension)
SCG3	Obesity
		SCGB1A1	Asthma (asthma)
SCN5A	Brugada syndrome
		SCN5A	EKG, Exception
SCN5A	QT interval prolongation syndrome
		SCNN1B	Hypertension (hypertension)
SCNN1G	Hypertension (hypertension)
		SERPINA1	COPD
SERPINA3	Alzheimer's disease
		SERPINA3	COPD
SERPINA3	Parkinson's disease
		SERPINE1	Myocardial infarction
SERPINE1	Type 2 diabetes mellitus
		SERPINE1	Atherosclerosis, coronary
SERPINE1	Obesity
		SERPINE1	Pre-eclampsia
SERPINE1	Apoplexy (apoplexy)
		SERPINE1	Hypertension (hypertension)
SERPINE1	Loss of pregnancy, relapse
		SERPINE1	Of thromboembolism, of veins
SLC11A1	Tuberculosis (tuberculosis)
		SLC22A4	Crohn's disease; ulcerative colitis

Gene	Phenotype
		SLC22A5	Crohn's disease; ulcerative colitis
SLC2A1	Type 2 diabetes mellitus
		SLC2A2	Type 2 diabetes mellitus
SLC2A4	Type 2 diabetes mellitus
		SLC3A1	Cystinuria
SLC6A3	Attention deficit disorder with hyperactivity]
		SLC6A3	Parkinson's disease
SLC6A3	Smoking behaviour
		SLC6A3	Alcoholism
SLC6A3	Schizophrenia
		SLC6A4	Depression (depression)
SLC6A4	Depression, major type
		SLC6A4	Schizophrenia
SLC6A4	Suicide
		SLC6A4	Alcoholism
SLC6A4	Bipolar disorder
		SLC6A4	Personality quality
SLC6A4	Attention deficit disorder with hyperactivity]
		SLC6A4	Alzheimer's disease
SLC6A4	Personality disorder
		SLC6A4	Panic disorder
SLC6A4	Abuse of alcohol
		SLC6A4	Affective disorders
SLC6A4	Anxiety disorders
		SLC6A4	Smoking behaviour
SLC6A4	Depression, major; bipolar disorder
		SLC6A4	Abuse of heroin

Gene	Phenotype
		SLC6A4	Irritable bowel syndrome
SLC6A4	Migraine headache
		SLC6A4	Obsessive compulsive neurosis
SLC6A4	Suicide behavior
		SLC7A9	Cystinuria
SNAP25	ADHD
		SNCA	Parkinson's disease
SOD1	ALS/amyotrophic lateral sclerosis
		SOD2	Breast cancer
SOD2	Lung cancer
		SOD2	Prostate cancer
SPINK1	Pancreatitis
		SPP1	Multiple sclerosis
SRD5A2	Prostate cancer
		STAT6	Asthma (asthma)
STAT6	Total IgE
		SULT1A1	Breast cancer
SULT1A1	Colorectal cancer
		TAP1	Type 1 diabetes mellitus
TAP1	Lupus erythematosus (lupus erythematosus)
		TAP2	Type 1 diabetes mellitus
TAP2	Diabetes mellitus, type 1
		TBX21	Asthma (asthma)
TBXA2R	Asthma (asthma)
		TCF1	Diabetes mellitus, type 2
TCF1	Type 2 diabetes mellitus
		TF	Alzheimer's disease

Gene	Phenotype
		TGFB1	Breast cancer
TGFB1	Kidney transplantation
		TGFB1	Complications of renal transplantation
TH	Schizophrenia
		THBD	Myocardial infarction
TLR4	Asthma (asthma)
		TLR4	Crohn's disease; ulcerative colitis
TLR4	Septicemia
		TNF	Asthma (asthma)
TNFA	Cerebrovascular disease
		TNF	Type 1 diabetes mellitus
TNF	Rheumatoid arthritis
		TNF	Systemic Lupus Erythematosus (SLE)
TNF	Kidney transplantation
		TNF	Psoriasis vulgaris
TNF	Septicemia
		TNF	Type 2 diabetes mellitus
TNF	Alzheimer's disease
		TNF	Crohn's disease
TNF	Diabetes mellitus, type 1
		TNF	Hepatitis B
TNF	Complications of renal transplantation
		TNF	Multiple sclerosis
TNF	Schizophrenia
		TNF	Celiac disease
TNF	Obesity
		TNF	Loss of pregnancy, relapse

Gene	Phenotype
		TNFRSF11B	Bone mineral density
TNFRSF1A	Rheumatoid diseaseArthritis of joint
		TNFRSF1B	Rheumatoid arthritis
TNFRSF1B	Systemic Lupus Erythematosus (SLE)
		TNFRSF1B	Arthritis (arthritis)
TNNT2	Cardiomyopathy
		TP53	Lung cancer
TP53	Breast cancer
		TP53	Colorectal cancer
TP53	Prostate cancer
		TP53	Cervical cancer
TP53	Ovarian cancer
		TP53	Smoking
TP53	Esophageal cancer
		TP73	Lung cancer
TPH1	Suicide
		TPH1	Depression, major type
TPH1	Suicide behavior
		TPH1	Schizophrenia
TPMT	Thiopurine methyltransferase Activity
		TPMT	Leukemia (leukemia)
TPMT	Inflammatory bowel disease
		TPMT	Thiopurine S-methyltransferase phenotype
TSC1	Tuberous sclerosis
		TSC2	Tuberous sclerosis
TSHR	Graves' disease
		TYMS	Colorectal cancer

Gene	Phenotype
		TYMS	Stomach cancer
TYMS	Esophageal cancer
		UCHL1	Parkinson's disease
UCP1	Obesity
		UCP2	Obesity
UCP3	Obesity
		UGT1A1	Hyperbilirubinemia
UGT1A1	Syndrome of Rilbert syndrome
		UGT1A6	Colorectal cancer
UGT1A7	Colorectal cancer
		UTS2	Diabetes mellitus, type 2
VDR	Bone mineral density
		VDR	Prostate cancer
VDR	Bone mineral density
		VDR	Type 1 diabetes mellitus
VDR	Osteoporosis and its preparation method
		VDR	Bone mass
VDR	Breast cancer
		VDR	Toxicity of lead
VDR	Tuberculosis (tuberculosis)
		VDR	Type 2 diabetes mellitus
VEGF	Breast cancer
		Vit D rec	Idiopathic short stature
VKORC1	Warfarin therapy, response thereto
		WNK4	Hypertension (hypertension)
XPA	Lung cancer
		XPC	Lung cancer

Gene	Phenotype
		XPC	Cell generation study
XRCC1	Lung cancer
		XRCC1	Cell generation study
XRCC1	Breast cancer
		XRCC1	Cancer of the bladder
XRCC2	Breast cancer
		XRCC3	Breast cancer
XRCC3	Cell generation study
		XRCC3	Lung cancer
XRCC3	Cancer of the bladder
		ZDHHC8	Schizophrenia

Genetic Integrated index (GCI)

The etiology of many conditions or diseases is attributed to both genetic and environmental factors. Recent advances in genotyping technology have provided opportunities to identify new associations between disease and genetic markers throughout the genome. Indeed, many recent studies have found these associations, where a particular allele or genotype is associated with an increased risk of disease. Some of these studies include collecting a set of test cases and a set of controls and comparing the allelic distribution of genetic markers between the two populations. In some of these studies, the association between a particular genetic marker and a disease was determined in isolation from other genetic markers that were handled as background and did not play a role in statistical analysis.

Genetic markers and variants may include SNPs, nucleotide repeats, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations. Copy number variations may include microsatellite repeats, nucleotide repeats, centromere repeats or telomere repeats.

In one aspect of the invention, information about the association of multiple genetic markers with one or more diseases or conditions is combined and analyzed to derive a GCI score. GCI scoring can be used to provide people without genetic training with reliable (i.e., robust), understandable, and/or intuitive knowledge of their individual risk of disease compared to a relevant population based on current scientific research. In one embodiment, the method of generating a reliable GCI score for the combined effect of different loci is based on the reported individual risk for each locus studied. For example, a disease or condition of interest is identified and then sources of information (including, but not limited to, databases, patent publications, and scientific literature) are queried for information regarding the association of the disease or condition with one or more genetic loci. These sources of information are validated and evaluated using quality criteria. In some embodiments, the evaluation process includes multiple steps. In other embodiments, the information sources are evaluated against a plurality of quality criteria. Information derived from information resources is used to identify odds ratios or relative risks of one or more genetic loci for each disease or condition of interest.

In alternative embodiments, the Odds Ratio (OR) OR Relative Risk (RR) for at least one genetic locus is not available from available sources of information. The RRs are then calculated using (1) the reporter OR of multiple alleles of the same locus, (2) allele frequencies from datasets (e.g., HapMap datasets), and/OR (3) disease/status prevalence from available resources (e.g., CDC, national center for Health Statistics, etc.) to derive the RRs for all alleles of interest. In one embodiment, the ORs of multiple alleles of the same locus are assessed separately OR independently. In a preferred embodiment, the ORs of multiple alleles of the same locus are combined to account for the dependency (dependency) between the ORs of different alleles. In some embodiments, established disease models (including, but not limited to, models such as positive (additive), additive (additive), Harvard-modified, dominant effects) are used to generate intermediate scores representing individual risk according to the selected model.

In another embodiment, a method of analyzing multiple models of a disease or condition of interest is used, and correlates the results obtained from these different models; this minimizes the possible errors that may be introduced by selecting a particular disease model. This approach minimizes the impact of reasonable errors in prevalence, allele frequency, and OR estimates derived from the information sources on the calculation of relative risk. Incorrectly estimating prevalence has little or no effect on the final score due to the "linear" or monotonic nature of the effect of prevalence estimates on RRs; the same model is assumed to be consistently applied to all individuals generating the report.

In another embodiment, a method is used that considers environmental/behavioral/demographic data as an additional "locus". In related embodiments, such data may be obtained from information sources, such as medical or scientific literature or databases (e.g., association of smoking w/lung cancer or from insurance health risk assessment). In one embodiment, a GCI score is generated for one or more complex diseases. Complex diseases can be influenced by multiple genes, environmental factors, and their interactions. When studying complex diseases, a large number of possible interactions need to be analyzed. In one embodiment, a program such as the Bonferroni correction is used to correct multiple comparisons. In an alternative embodiment, when the tests are independent or show a particular type of dependency, the overall level of significance (also known as the "family error rate") is controlled using the Simes test (Sarkar S. (1998)). Some probability inequalities for ordered MTP2 random variables: proof of Simes hypothesis (Ann Stat 26: 494-504). If p (K) ≦ α K/K for any K in K1,., then the Simes test rejects all K tests for the global zero hypothesis with the specific zero hypothesis true (Simes RJ (1986) enhanced Bonferroni procedure for multiple tests of signalicities.biometrika 73: 751-754).

Other embodiments that may be used in the context of multi-gene and multi-environment factor analysis control false discovery rates (false-discovery rates), i.e., the expected proportion of false rejects that reject zero hypotheses. This approach is particularly beneficial when, as in microarray studies, a fraction of the null hypotheses can be assumed to be erroneous. Devlin et al (2003, Analysis of multiple genes of association. Gene expression 25: 36-47) proposed a variation of the Benjamini and Hochberg (1995, Controlling the false discovery rate: a practical and functional profiling. J R Stat Soc Ser B57: 289-300) incremental program that controls the rate of false discovery when testing a large number of possible gene-gene interactions in a multiple locus association study. The Benjamini and Hochberg programs are related to the Simes test; setting k^*Maxk such that p (K) ≦ α K/K, which rejects all responses toK of (a)^*A null hypothesis. In fact, when all null hypotheses are true, The Benjamini and Hochberg programs are reduced to The Simes test (Benjamini Y, Yekutieli D (2001) The control soft feel discovery rate in multiple testing under dependency.Ann Stat29：1165-1188)。

In some embodiments, the individual is ranked based on its median score compared to a population of individuals to generate a final score, which may be expressed as a ranking in the population, such as 99 th or 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, 5, or 0 th ranking. In another embodiment, the score may be displayed as a range, such as 100 th to 95 th quantile, 95 th to 85 th quantile, 85 th to 60 th quantile, or any subrange between 100 th to 0 th quantile. In yet another embodiment, the individuals are ranked in quartiles, such as the highest 75 th quartile or the lowest 25 th quartile. In further embodiments, the individual is ranked compared to the mean or median score in the population.

In one embodiment, the population compared to the individual includes a large number of people from different geographic and ethnic backgrounds, such as a global population. In other embodiments, the population compared to the individual is limited to a particular geography, family, race, gender, age (fetal, neonatal, child, juvenile, adolescent, adult, elderly individual), disease state (e.g., symptomatic, asymptomatic, carrier, early onset, late onset). In some embodiments, the population compared to the individual is derived from information reported from public and/or private information sources.

In one embodiment, the GCI score or GCIPlus score of the individual is visualized using a display device. In some embodiments, a display screen (e.g., a computer monitor or television screen) is used for visual display, such as a personal portal with associated information. In another embodiment, the display device is a static display device, such as a printed page. In one embodiment, the display may include, but is not limited to, one or more of the following: bin (bin) (e.g., 1-5, 6-10, 11-15, 16-20, 21-25, 26-30, 31-35, 36-40, 41-45, 46-50, 51-55, 56-60, 61-65, 66-70, 71-75, 76-80, 81-85, 86-90, 91-95, 96-100), color or gray scale gradient, thermometer, scale, pie chart, bar chart, or bar chart. For example, fig. 18 and 19 are different displays of MS and fig. 20 is for crohn's disease. In another embodiment, a thermometer is used to display the GCI score and disease/status prevalence. In another embodiment, the temperature table displays the level of change with the reported GCI score, e.g., fig. 15-17, the color corresponding to risk. The thermometer may display a colorimetric change with increasing GCI score (e.g., gradually changing from a blue color at a lower GCI score to a red color at a higher GCI score). In a related embodiment, the thermometer displays a level that varies with the reported GCI score and a colorimetric change that increases with the level of risk.

In an alternative embodiment, the individual's GCI score is communicated to the individual using auditory feedback. In one embodiment, the audible feedback is a verbal explanation that the risk level is high or low. In another embodiment, the auditory feedback is a recitation of a particular GCI score, such as a number, percentile, range, quartile, or comparison to a population average or median GCI score. In one embodiment, the live person delivers the audible feedback either personally or through a communication device, such as a telephone (landline, cellular, or satellite), or through a personal portal. In another embodiment, the audible feedback is delivered by an automated system (e.g., a computer). In one embodiment, the auditory feedback is delivered as part of an Interactive Voice Response (IVR) system, a technique that allows computers to detect voice and touch tones using normal telephone calls. In another embodiment, the individual may interact with a central server through an IVR system. IVR systems can react to pre-recorded or dynamically generated audio to interact with individuals and provide them with auditory feedback of their risk level. In one embodiment, the individual may call a number answered by the IVR. After optionally entering an authentication code, security code, or through a voice recognition program, the IVR system causes the object to select an option from a menu, such as a touch tone or voice menu. One of these options may provide the individual with his or her risk level.

In another embodiment, the GCI score of an individual is visualized using a display device and communicated using auditory feedback, for example through a personal portal. This combination may include a visual display of the GCI score and an audible feedback that discusses the relevance of the GCI score to the overall health of the individual and possible precautions that may be proposed.

In one embodiment, the GCI score is generated using a multi-step method. Initially, for each state to be studied, the relative risk of odds ratio derived from each genetic marker is calculated. For each prevalence value of p 0.01, 0.02,..., 0.5, GCI scores for the HapMap CEU population were calculated based on prevalence and HapMap allele frequency. If the GCI score does not change under varying prevalence, the only assumption considered is the existence of a cumulative model. Additionally, it may be determined that the model is sensitive to popularity. For any combination of the uncalled values, the distribution of relative risk and score in the HapMap population was obtained. For each new individual, the individual score was compared to the HapMap distribution and the resulting score was the ranking of the individual in this population. The resolution of the reported scores may be low due to assumptions made in the process. The population will be divided into quantiles (3-6 bins) and the reported bin will be the one in which the individual ranks fall. The number of bins may be different for different diseases based on considerations such as resolution of the scores for each disease. In the case of a link between scores of different HapMap individuals, an average ranking will be used.

In one embodiment, a higher GCI score is interpreted as indicating an increased risk of acquiring or being diagnosed with a condition or disease. In another embodiment, a mathematical model is used to derive the GCI score. In some embodiments, the GCI score is based on a mathematical model that accounts for incomplete features that underlie information about a population and/or a disease or condition. In some embodiments, the mathematical model includes at least one hypothesis that is specific as part of the basis for calculating the GCI score, wherein the hypothesis includes, but is not limited to: an assumption of a given odds ratio; an assumption that the popularity of the state is known; the hypothesis that the genotype frequencies in the population are known; and the hypothesis that consumers are from the same community background as the study used and HapMap; the combined risk is the hypothesis of the product of different risk factors for the individual genetic markers. In some embodiments, GCI may also include the hypothesis that the polygenic frequency of a genotype is the product of the allelic frequencies of each SNP or individual genetic marker (e.g., different SNPs or genetic markers are independent throughout the population).

Integral model

In one embodiment, the GCI score is calculated under the assumption that the risk attributed to the set of genetic markers is the product of the risks attributed to the individual genetic markers. This means that different genetic markers are due to the risk of disease independently of other genetic markers. Formally, there is an at risk allele r₁、...、r_kAnd non-risk allele n₁、...、n_kK genetic markers of (1). In SNPi, we mean that the three possible genotype values are r_ir_i、n_ir_iAnd n_in_i. Genotype information of an individual can be determined by vector (g)₁、...、g_k) Described, wherein g is based on the number of risk alleles at the i position_iMay be 0,1 or 2. We pass through λ₁ ⁱIndicating the relative risk of a heterozygous genotype at the same position compared to the homozygous non-risk allele at position i. In other words, we defineSimilarly, we mean r_ir_iThe relative risk of the genotype isUnder the integrative model, we assume that there is genotype (g)₁、...、g_k) The risk of the individual of (a) isVolumetric models have previously been used in the literature to simulate case-control studies or for visualization purposes.

Assessing relative risk

In another embodiment, the relative risk for different genetic markers is known and a cumulative model can be used for risk assessment. However, in some embodiments that include association studies, the study design prevents reporting of relative risk. In some case-control studies, the relative risk cannot be calculated directly from the data without further assumptions. Instead of reporting relative risk, the usual way is to report the Odds Ratio (OR) of genotypes, which are diseases (r) carrying a given risk genotype_ir_iOr n_ir_i) The ratio of the probability of not carrying a given risk genotype disease. In the form of a sheet, the sheet is,

finding the relative risk from the odds ratio may require additional assumptions. For example, assume allele frequencies in the entire populationAndknown or evaluated (these may be evaluated from existing datasets, e.g. a HapMap dataset comprising 120 chromosomes), and/or it is assumed that the prevalence of the disease, p ═ p (d), is known. From the three equations above, one can derive:

p＝a·P(D|n_in_i)+b·P(D|n_ir_i)+c·P(D|r_ir_i)

by definition of relative risk, in dividing by pP (D | n)_in_i) After the term, the first equation can be rewritten as:

and therefore the latter two equations can be rewritten as:

(1)

it should be noted that equation system 1 is equivalent to the Zhang and Yu formulas in Zhang J and Yu k when a is 1 (non-risk allele frequency of 1) (What's the relative rise a method of correcting the proportions in the co-ordinates of common outrecords, jama, 280: 1690-1, 1998, the entire contents of which are incorporated by reference). In contrast to Zhang and Yu formulas, some embodiments of the invention take into account allele frequencies in the population, which may affect relative risk. Still other embodiments allow for interdependence of relative risk. This is in contrast to calculating each relative risk independently.

Equation system 1 can be rewritten as two quadratic equations with up to four possible solutions. Gradient descent algorithm (gradient algorithm) can be used to solveSolving these equations, where the starting point is set to the odds ratio, for example,and

for example:

finding a solution to these equations is equivalent to finding the function g (λ)₁，λ₂)＝f₁(λ₁，λ₂)²+f₂(λ₁，λ₂)²Is measured.

Therefore, the temperature of the molten metal is controlled,

in this example, we pass the setting x₀＝OR₁，y₀＝OR₂And starting. We will find the value ε]＝10^-10Set to the tolerance constant (tolerance constant) of the whole algorithm. In iteration i, we defineThen, we set

These iterations are repeated until g (x)_i，y_i) < tolerance, wherein tolerance is set to 10 in the provided code^-7。

In this embodiment, these equations give a, b, c, p, OR₁And OR₂Positive solutions of different values of (b). FIG. 1 shows a schematic view of a0

Robustness of relative risk assessment

In some embodiments, the effect of different parameters (prevalence, allele frequency, and odds ratio error) on estimates of relative risk is determined. To determine the impact of allele frequencies and prevalence estimates on relative risk values, relative risk (under HWE) from a set of values of different odds ratios and different allele frequencies was calculated, and the results of these calculations were plotted for prevalence values in the range of 0 to 1. Fig. 10. In addition, for a fixed prevalence value, the resulting relative risk can be plotted as a function of risk allele frequency. Fig. 11. When p is 0, λ₁＝OR₁And λ₂＝OR₂And when p is 1, λ₁＝λ₂0. This can be calculated directly from the equation. In addition, in some embodiments, λ is when the risk allele frequency is high₁Is closer to a linear function, and λ₂Closer to a concave function with a bounded second derivative. In the limiting case, λ is when c is 1₂＝OR₂+p(1-OR₂) And is andif OR is present₁≈OR₂The latter is also close to a linear function. When the risk allele frequency is low, lambda₁And λ₂The behavior of the approximation function 1/p. In the limiting case, when c is 0,this indicates that for high risk allele frequencies, an incorrect prevalence estimate will not significantly affect the resulting relative risk. In addition, for low risk allele frequencies, if the correct prevalence p is replaced by a prevalence value p' ═ α p, then the resulting relative risk will be eliminated at mostThe coefficient of (a). This is illustrated in the (c) and (d) diagrams of fig. 11. It should be noted that for high risk allele frequencies, the two plots are quite similar, whereas for low allele frequencies, there is a higher deviation in the difference in relative risk values, which is less than a factor of 2.

Calculating GCI score

In one embodiment, the genetic composite index is calculated using a reference set representing the relevant population. This reference set may be one of the populations in the HapMap or another genotype dataset.

In this embodiment, the GCI is calculated as follows. For each of the k risk loci, the relative risk is calculated from the odds ratio using equation system 1. Then, a volumetric score is calculated for each individual in the reference set. The GCI of an individual with a positive score of s is the score of all individuals in the reference dataset with a score of s' ≦ s. For example, if 50% of the individuals in the reference set have a multiplicative score less than s, then the individual's final GCI score will be 0.5.

Other models

In one embodiment, a volumetric model is used. In alternative embodiments, other models may be used for the purpose of determining the GCI score. Other suitable models include, but are not limited to:

an additive model. In an additive model, has genotype (g)₁，...g_k) The risk of the individual is assumed to be

A generalized additive model. In the generalized additive model, it is assumed that the function f exists so as to have a genotype (g)₁，...g_k) The risk of the individual of (a) is

Harvard improvement score (Het). This score was derived by G.A Colditz et al, whereby the score was applied to a genetic marker (Harvard report on cancer preventionvolume 4: Harvard cancer risk index. cancer Causes and Controls, 11: 477-. Although the function f operates on odds ratios rather than relative risk, the Het score is essentially a generalized additive score. This is useful in situations where the relative risk is difficult to assess. To define the function f, the intermediate function g is defined as:

then calculateIn which p is_het ⁱThe frequency of individuals heterozygous for SNP i in the entire reference population. The function f is then defined as f (x) g (x)/Het, and Harvard improvement score (Het) is simply defined as

Harvard modified score (Hom). Except that the value het is valuedInstead, this score is similar to the Het score, where p is_hom ⁱThe frequency of individuals with homozygous risk alleles.

The maximum advantage ratio. In this model, it is assumed that one of the genetic markers (the one with the greatest odds ratio) gives a lower bound to the combined risk for the entire group of subjects. Formally, having a genotype (g)₁，...g_k) Is scored as

Comparison between scores

In one embodiment, GCI scores were calculated based on multiple models across the entire HapMap CEU population for 10 SNPs associated with T2D. The related SNPs are rs7754840, rs4506565, rs7756992, rs10811661, rs12804210, rs8050136, rs1111875, rs4402960, rs5215 and rs 1801282. Odds ratios of three possible genotypes for each of these SNPs are reported in the literature. The CEU population consists of a three-person group of thirty mothers-father-children. To avoid dependency, sixty parents from this group were employed. One individual with no calls in one of the 10 SNPs was excluded, resulting in a set of 59 individuals. The GCI rating of each individual was then calculated using several different models.

It can be observed that for this data set, the different models produce highly correlated results. Fig. 12 and 13. Spearman correlation was calculated between pairs of models (table 2), which showed that the additive and the multiplicative models had a correlation coefficient of 0.97, and thus the GCI score was robust using either the additive or the multiplicative models. Similarly, the correlation between the Harvard modified score and the multiplicative model was 0.83, and the correlation coefficient between the Harvard score and the additive model was 0.7. However, using the maximum odds ratio as the genetic score results in a dichotomous score (dichotomous score) defined by one SNP. Overall, these results show that scoring ranks provide a stable framework that minimizes model dependence.

Table 2: spearman correlation of score distribution of CEU data between model pairs.

The effect of variation in T2D prevalence on the resulting distribution was determined. The popularity values varied between 0.001 and 0.512 (FIG. 14). For the case of T2D, it can be seen that different prevalence values result in the same order of individuals (Spearman correlation > 0.99), so an artificially fixed value of 0.001 for prevalence can be assumed.

Extending a model to an arbitrary number of variants

In another embodiment, the model may be extended to the case where any number of possible variations occur. The previous considerations relate to the case where there are three possible variants (nn, nr, rr). In general, when multiple SNP associations are known, any number of variants can be found in a population. For example, when the interaction between two genetic markers is correlated with status, there are nine possible variants. This results in eight different odds ratios.

To summarize the original formula, it can be assumed that there are k +1 possible variants a₀，...，a_kHaving a frequency f₀，f₁，...，f_kThe measured odds ratio is 1, OR₁，...，OR_kAnd unknown relative risk value of 1, lambda₁，...，λ_k. It can further be assumed that with respect to a₀All relative risk and odds ratios were determined, and therefore,andbased on:

can determine

And, if setThis results in the following equation:

and therefore the number of the first and second channels,

or

The latter is an equation with one variable (C). This process can produce many different solutions (basically, up to k +1 different solutions). A criteria optimization tool (e.g., gradient descent) may be used to find the closest C₀＝∑f_it_iThe solution of (1).

The present invention uses a stable scoring framework for risk factor quantification. Although different genetic models may result in different scores, the results are often correlated. Thus, the quantification of risk factors is generally independent of the model used.

Comparative Risk case assessment study

Methods for assessing relative risk from odds ratios of multiallelic genes in case-control studies are also provided in the present invention. In contrast to previous approaches, this approach takes into account allele frequency, prevalence of disease, and dependence between the relative risk of different alleles. The performance of this method on a simulated case control study was determined and found to be extremely accurate.

Method of producing a composite material

In the case of testing for association of a particular SNP with disease D, R and N represent risk and non-risk alleles of this particular SNP. P (RR | D), P (RN | D) and P (NN | D) represent the probability of being affected by disease if the individual is assumed to be homozygous for the risk allele, heterozygous or homozygous for the non-risk allele, respectively. f. of_RR、f_RNAnd f_NNUsed to indicate the frequency of the three genotypes in the population. Using these definitions, relative risk is defined as

In case control studies, P (RR | D) values (i.e., the frequency of RR in case and control) and P (RN | D), P (NN | D) and P (NN | D), i.e., the frequency of RN and NN in case and control, can be evaluated. To estimate relative risk, Bayes (Bayes) law can be used to derive:

thus, if the frequency of genotypes is known, one can use them to calculate relative risk. The frequency of genotypes in a population cannot be calculated from the case-control study itself, as they depend on the prevalence of the disease in the population. In particular, if the prevalence of the disease is p (d):

f_RR＝P(RR|D)p(D)+P(RR|～D)(1-p(D))

f_RN＝P(RN|D)p(D)+P(RN|～D)(1-p(D))

f_NN＝P(NN|D)p(D)+P(NN|～D)(1-p(D))。

when p (d) is sufficiently small, the frequency of genotypes can approach that in the control population, but when prevalence is high, this will not be an accurate estimate. However, if a reference data set (e.g., HapMap [ cite ]) is given, one can estimate genotype frequencies based on the reference data set.

Most recent studies do not estimate relative risk using a reference dataset and only report odds ratios. Odds ratio can be written

The odds ratio is often advantageous since it is usually not necessary to have an estimate of allele frequency in the population; to calculate odds ratios, what is generally required is genotype frequency in cases and controls.

In some cases, genotype data is not available by itself, but summary data (e.g., odds ratio) is available. This is the case when a posterior analysis (meta-analysis) is performed based on results from previous case control studies. In this case, it is verified how to find the relative risk from the odds ratio. The fact shown using the following equation:

p(D)＝f_RRP(D|RR)+f_RNP(D|RN)+f_NNP(D|NN)

if this equation is divided by P (D | NN), we get

This enables the odds ratio to be written in the form:

by a similar calculation, the following system of equations is obtained:

equation 1

If odds ratios, genotype frequencies in the population, and prevalence of disease are known, then relative risk can be obtained by solving this system of equations.

It should be noted that there are two quadratic equations, so they have a maximum of four solutions. However, as shown below, there is typically one possible solution to this approach.

It should be noted that when f_NNWhen 1, equation system 1 is equivalent to Zhang and Yu formulas; however, the allele frequencies in the population are considered here. Moreover, our method takes into account the fact that: the two relative risks are dependent on each other, whereas previous methods propose calculating each relative risk independently.

Relative risk of multiallelic loci. The calculations are somewhat complex if multiple markers or other multiallelic variants are considered. a is₀、a₁、...、a_kRepresents the possible k +1 alleles, where a₀Is a non-risk allele. The allele frequency f in the population for k +1 possible alleles is assumed₀、f₁、f₂、...、f_k. For allele i, the relative risk and odds ratio is defined as

The following equation applies to the prevalence of the disease:

thus, by dividing both sides of the equation by p (D | a)₀) We get:

obtaining:

by settingTo obtainThus, by definition of C, we derive:

this is a polynomial equation with one variable C. Once C is determined, the relative risk is determined. The polynomial is k +1 degrees and therefore we expect to have at most k +1 solutions. However, since the right side of the equation strictly reduces as a function of C, there may typically be only one solution to this equation. This solution is easily found using a binary search, since the solution bounds on C ═ 1 andin the meantime.

Stability of relative risk assessment. The effect of various parameters (prevalence, allele frequency, and odds ratio error) on the estimate of relative risk was determined. To determine the impact of allele frequencies and prevalence estimates on relative risk values, relative risk was calculated from a set of values for different odds ratios, different allele frequencies (at HWE), and the results of these calculations were plotted for prevalence values in the range of 0 to 1.

In addition, for a fixed prevalence value, the resulting relative risk is plotted as a function of risk allele frequency. It is clear that in all cases of p (d) ═ 0, λ_RR＝OR_RRAnd λ_RN＝OR_RNAnd when p (D) is 1, λ_RR＝λ_RN0. This can be directly calculated from equation 1. In addition, λ is when the risk allele frequency is high_RRClose to linear behavior, and λ_RNClose to a concave function with a bounded second derivative. When the frequency of the risk allele is low,λ_RRand λ_RNClose to the behavior of the function 1/p (D). This means that for high risk allele frequencies, a false estimate of prevalence will not greatly affect the resulting relative risk.

The following examples illustrate and explain the present invention. The scope of the invention is not limited to these examples.

Example I

Generation and analysis of SNP profiles

The individual is provided with a sample tube in a kit (e.g., purchased from DNA Genotek) in which the individual deposits a saliva sample (approximately 4ml) from which genomic DNA will be extracted. Saliva samples were delivered to CLIA certified laboratories for processing and analysis. Typically, the sample is delivered to the testing facility by overnight mailing in a shipping container that is conveniently provided to the individual within the collection kit.

In a preferred embodiment, the genomic DNA is isolated from saliva. For example, using the DNA self-collection kit technology provided by DNA Genotek, an individual collects approximately 4ml of saliva samples for clinical processing. After delivery of the sample to an appropriate laboratory for processing, the DNA is isolated by heat denaturation and protease digestion of the sample (typically for at least one hour at 50 ℃ using reagents provided by the collection kit supplier). Subsequently, the sample was centrifuged, and the supernatant was subjected to ethanol precipitation. The DNA pellet is suspended in a buffer suitable for subsequent analysis.

Genomic DNA of an individual is isolated from a saliva sample according to well known procedures and/or procedures provided by the manufacturer of the collection kit. Typically, the sample is first heat denatured and protease digested. Next, the sample was centrifuged, and the supernatant was retained. The supernatant was then subjected to ethanol precipitation to obtain a precipitate containing approximately 5-16 ug of genomic DNA. The DNA pellet was suspended in 10mM Tris (pH 7.6), 1mM EDTA (TE). SNP profiles were generated by hybridizing genomic DNA to commercially available high density SNP arrays (e.g., those provided by Affymetrix or Illumina) using instrumentation and instructions provided by the array manufacturer. Individual SNP profiles are stored in an encrypted database or vault.

The data structure of the patient is queried for risk-conferring SNPs by comparison with a clinical database of established, medically relevant SNPs whose presence in the genome is associated with a given disease or condition. The database includes information on the statistical relevance of specific SNPs and SNP haplotypes to a particular disease or condition. For example, as shown in example III, polymorphisms in the apolipoprotein E gene lead to distinct isoforms of the protein, which in turn are associated with a statistical likelihood of developing Alzheimer's disease. As another example, individuals with a variant of the coagulation protein factor V, known as the factor VLeiden, have an increased tendency to clot. Many genes in which SNPs are associated with disease or status phenotypes are shown in table 1. The scientific accuracy and importance of the information in the database is approved by the research/clinical advisory committee and can be reviewed by a supervising governmental agency. The database can be continuously updated as more SNP-disease associations emerge from the scientific community.

The results of the analysis of the individual SNP profiles are securely provided to the patient through an online portal or email. The patient is provided with explanatory and supportive information, such as the information on factor V Leiden shown in example IV. Secure access to the individual's SNP profile information (e.g., through an online portal) would facilitate discussion with the patient's physician and give the ability to select for personalized medicine.

Example II

Updating of genotype correlations

In response to a request to initially determine the genotype correlations of an individual, a genomic profile is generated, the genotype correlations are obtained, and the results are provided to the individual as described in example I. After an initial determination of the genotype correlations of the individual, later when additional genotype correlations are known, updated correlations are determined or can be determined. The registered users have advanced registrations and their genotype profiles are stored in an encrypted database. The updated correlations were performed on the stored genotype profiles.

For example, as described in example I above, initial genotype correlations have determined that a particular individual does not have ApoE4, and therefore is not susceptible to early onset alzheimer's disease, and that this individual does not have factor V Leiden. After this initial determination, the new correlations become known and validated such that polymorphisms in a given gene (say gene XYZ) are correlated with a given state (say state 321). This new genotype correlation was added to the master database of human genotype correlations. Updates are then provided to the specific individual by first obtaining data for the relevant genes XYZ from the genomic profile of the specific individual stored in the encrypted database. The relevant gene XYZ data for a particular individual is compared to the gene XYZ information of the updated master database. From this comparison, the susceptibility or predisposition of a particular individual to state 321 is determined. The results of this determination are added to the genotype correlations of a particular individual. The updated results of whether a particular individual is sensitive or genetically susceptible to the state 321 are provided to the particular individual along with explanatory and supportive information.

Example III

Association of the ApoE4 locus with Alzheimer's disease

The risk of Alzheimer's Disease (AD) has been shown to be associated with polymorphisms in the apolipoprotein e (ApoE) gene, which result in three isoforms of ApoE known as ApoE2, ApoE3 and ApoE 4. These isoforms differ from each other by one or two amino acids at residues 112 and 158 of the APOE protein. ApoE2 contains cysteine/cysteine at position 112/158; ApoE3 contains cysteine/arginine at position 112/158; and ApoE4 contains arginine/arginine at position 112/158. As shown in Table 3, the risk of Alzheimer's disease onset at a younger age increases with the APOE ε 4 gene copy number. Also, as shown in table 3, the relative risk of AD increases with APOE ∈ 4 gene copy number.

Table 3: prevalence of AD risk alleles (Corder et al, Science: 261: 921-3, 1993)

APOE epsilon 4 copies	Popularity of	Risk of alzheimer's disease	Age of onset
				0	73％	20％	84
1	24％	47％	75
				2	3％	91％	68

Table 4: has the relative risk of AD of ApoE4 (Farrer et al, JAMA: 278: 1349-56, 1997)

APOE genotype	Ratio of advantages to each other
		ε2ε2	0.6
ε2ε3	0.6
		ε3ε3	1.0
ε2ε4	22.6
		ε3ε4	3.2
ε4ε4	14.9

Example IV

Information on factor V Leiden-positive patients

The following information is an example of information that may be provided to individuals with genomic SNP profiles showing the presence of the factor V Leiden gene. The individual may have a basic registration that may provide information in an initial report.

What is the factor V Leiden?

Factor V Leiden is not a disease, which means the presence of a specific gene inherited by one's parents. Factor V Leiden is a variant of the protein factor V (5) required for coagulation. Persons with factor V deficiency are more likely to bleed severely, while persons with factor V Leiden have an increased tendency to clot blood.

The human carrying the factor V Leiden gene has a 5-fold higher risk of developing a blood clot (thrombosis) than the rest of the population. But many people with this gene never develop blood clots. In the uk and usa, 5% of the population carries one or more factor V Leiden genes, which is much greater than the number of people who will actually suffer thrombosis.

How do you get the factor V Leiden?

The factor V gene is inherited by one's parents. As with all genetic traits, one gene is inherited from the mother and one from the father. Thus, it is possible to inherit: two normal genes or one factor V Leiden gene and one normal gene or two factor VLeiden genes. Having one factor V Leiden gene will result in a slightly higher risk of developing thrombosis, but having two genes results in a much greater risk.

What are the symptoms of factor V Leiden?

There are no signs unless you have a blood clot (thrombosis).

What is the danger signal?

The most common problem is blood clots in the legs. Leg swelling, pain and redness indicate this problem. In more rare cases, pulmonary blood clots (pulmonary thrombosis) may occur, which lead to breathing difficulties. Depending on the size of the blood clot, the severity of this condition is barely noticeable in patients with severe dyspnea. In even more rare cases, blood clots may occur in the arm or other body parts. Since these clots form in veins that transport blood to the heart rather than in arteries that carry blood from the heart, the factor VLeiden does not increase the risk of coronary thrombosis.

What can do to avoid blood clots?

Factor V Leiden is only slightly elevated leading to the risk of blood clots and many people with this state never develop thrombosis. One can do many things to avoid causing blood clots. Avoid standing or sitting in the same posture for a long time. When traveling long distances, it is important to exercise regularly-the blood must be left "still". Overnight or smoking will greatly increase the risk of blood clots. Women carrying the factor V Leiden gene should not take a contraceptive pill because this would significantly increase the chance of thrombosis. Women carrying the factor V Leiden gene should also consult their physician before pregnancy because this also increases the risk of thrombosis.

How do doctors find out if you have the factor V Leiden?

The gene for factor V Leiden can be found in blood samples.

Blood clots in the leg or arm are typically determined by ultrasound examination.

After a substance is injected into the blood to visualize the clot, the clot may also be detected by X-ray. Blood clots in the lungs are more difficult to find, but often physicians will use radioactive materials to test the distribution of blood flow in the lungs and the distribution of air flow into the lungs. The two distribution patterns should match-a mismatch indicates the presence of a blood clot.

How does the factor V Leiden handle?

Persons with factor V Leiden do not require treatment unless their blood begins to clot, in which case the physician will prescribe a blood-thinning (anticoagulant) drug, such as warfarin (e.g., warfarin sodium) or heparin, to prevent further blood clotting. Treatment will typically last three to six months, but may take longer if there are several blood clots. In severe cases, the course of medication may continue indefinitely; in extremely rare cases, blood clots may require surgical removal.

How does factor V Leiden treat during pregnancy?

Women carrying two factor V Leiden genes will need to receive treatment with heparin procoagulant drugs during pregnancy. The same treatment is applicable to women carrying only one factor V Leiden gene who have previously had a blood clot themselves or a family history of blood clotting.

All women carrying the factor V Leiden gene may need to wear special stockings to prevent blood clots in the latter half of pregnancy. After the birth of children, they may be prescribed the anticoagulant drug heparin.

Prognosis

The risk of developing blood clots increases with age, but in an age-based survey of 100 people carrying the gene, only a few have been found to have suffered from thrombosis. The National Society of Genetics Counselors (NSGC) can provide a list of Genetic consultants in The region of you and information about establishing family history. Their online databases are searched on www.nsgc.org/consumer.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that many alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that the invention cover methods and structures within the scope of these claims and their equivalents.

Claims

1. A method of assessing genotype correlations of an individual, the method comprising:

a) obtaining a genetic sample of the individual, wherein the genetic sample is DNA;

b) generating a genomic profile of the individual from the genetic sample;

c) determining a genotype correlation for the individual by comparing the genomic profile of the individual to a database of correlations of current human genotypes and phenotypes to determine, for each phenotype of interest, a plurality of relative risk or odds ratios for a plurality of alleles, including risk alleles or non-risk alleles, of the individual;

d) updating the human genotype correlations database with additional human genotype correlations, when the additional human genotype correlations are known; and

e) updating the genotype correlation of the individual by comparing the genomic profile of the individual of step c), or a portion thereof, to the additional human genotype correlation and determining additional genotype correlations for the individual.

2. The method of claim 1, wherein the genetic sample is obtained by a third party.

3. The method of claim 1, wherein the generating a genomic profile is performed by a third party.

4. The method of claim 1, further comprising calculating a GCI score, wherein the GCI is calculated from a plurality of relative risks or odds ratios.

5. The method of claim 1, wherein the genomic profile comprises single nucleotide polymorphisms, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations.

6. The method of claim 1, wherein the genomic profile is the entire genome of the individual.

7. The method of claim 1, wherein the method comprises assessing 2 or more genotype correlations.

8. The method of claim 1, wherein the method comprises assessing 10 or more genotype correlations.

9. The method of claim 1, wherein said human genotype correlation database comprises genetic variants in one or more genes listed in table 1, figure 4, figure 5, 6, 22, or 25 and phenotypes associated with the genetic variants.

10. The method of claim 1, wherein said human genotype correlation database comprises genetic variants determined from said genomic profile of said individual and predetermined phenotypes revealed by said individual.

11. The method of claim 9 or 10, wherein the genetic variant is a single nucleotide polymorphism, a nucleotide insertion, a nucleotide deletion, a chromosomal translocation, a chromosomal duplication, or a copy number variation.

12. The method of claim 1, wherein the genetic sample is blood, hair, skin, saliva, semen, urine, fecal matter, sweat, or an oral sample.

13. The method of claim 1, wherein the genomic profile is generated using a high density DNA microarray, DNA sequencing, or PCR-based methods.

14. The method of claim 4, wherein at least one of physical data, medical data, ethnicity, family, geography, gender, age, family history, known phenotype, demographic data, exposure data, lifestyle data, or behavioral data of the individual is incorporated into the calculation of the GCI.

15. The method of claim 1, wherein the genomic profile of the individual is compared to a correlation between a SNP and a phenotype, wherein the SNP:

rs 69883267 when the phenotype is colorectal cancer, rs2165241 when the phenotype is exfoliative glaucoma, rs9939609 when the phenotype is obesity, rs3087243 or DRB1 0301 when the phenotype is Graves' disease, rs1800562 when the phenotype is hemochromatosis, rs 6969 when the phenotype is myocardial infarction, rs6897932, rs12722489 or DRB1 1501 when the phenotype is multiple sclerosis, rs11209026 when the phenotype is Psoriasis (PS), rs2300478, rs1026732 or rs9296249 when the phenotype is restless leg syndrome, rs6840978 or rs2187668 when the phenotype is celiac disease, rs 69883267, rs 30909 or rs 30906 when the phenotype is prostate cancer, dr5744798 when the phenotype is lupus erythematosus, dr5431711, DRB 3809, DRB 125355639 or DRB 36933 9 when the phenotype is rheumatoid arthritis, dr6446579 or DRB 4705, DRB 125357243 when the phenotype is lupus erythematosus, DRB 125357263, DRB 12535729, or DRB 3564049 when the phenotype is lupus erythematosus, rs2981582, rs3817198 or rs3803662, rs 2066846845, rs10883365, rs17234657, rs10210302, rs9858542, rs11805303, rs1000113, rs2542151 or rs10761659 when the phenotype is crohn's disease, rs13266634, rs4506565, rs7756992, rs10811661, rs8050136, rs1111875, rs4402960, rs5215 or rs1801282 when the phenotype is type 2 diabetes.

16. The method of claim 15, further comprising:

f) calculating at least one GCI score for said phenotype in combination with said relative risk or odds ratio.

17. A method of assessing genotype correlations of an individual, the method comprising:

a) obtaining a plurality of genetic samples from a plurality of individuals;

b) providing a set of rules comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype;

c) providing a data set comprising a genomic profile of each individual of the plurality of individuals, wherein each genomic profile comprises a plurality of genotypes;

d) determining a genotype correlation for the individual by comparing the genomic profile of the individual to a database of correlations of current human genotypes and phenotypes to determine, for each phenotype of interest, a plurality of relative risk or odds ratios for a plurality of alleles, including risk alleles or non-risk alleles, of the individual;

e) periodically updating the rule set with at least one new rule, wherein the at least one new rule indicates a correlation between genotypes and phenotypes that were previously unrelated to each other in the rule set; and

f) applying each new rule to the genomic profile of at least one of the individuals, thereby correlating at least one genotype with at least one phenotype for the individual.

18. The method of claim 17, further comprising:

f) generating a report comprising the phenotype profile of the individual.

19. The method of claim 17, further comprising: after step b)

i) Applying the rules of the rule set to the genomic profile of the individual to determine a set of phenotypic profiles of the individual; and

ii) generating a report comprising the initial phenotype profile of the individual.

20. The method of claim 18 or 19, wherein providing the report comprises transmitting the report over a network.

21. The method of claim 18 or 19, wherein the report is provided in an encrypted manner.

22. The method of claim 18 or 19, wherein the report is provided in an unencrypted manner.

23. The method of claim 18 or 19, wherein the report is provided through an online portal.

24. The method of claim 18 or 19, wherein the report is provided as a paper or email.

25. The method of claim 17, wherein the new rule associates an unassociated genotype with a phenotype.

26. The method of claim 17, wherein the new rule associates an associated genotype with a phenotype not previously associated therewith in the rule set.

27. The method of claim 17, wherein the new rule changes a rule in the rule set.

28. The method of claim 17, wherein the new rule is generated by correlation of a genotype of the genomic profile from the individual and a predetermined phenotype of the individual.

29. The method of claim 17, wherein the rule associates a plurality of genotypes with a phenotype.

30. The method of claim 17, wherein applying the new rule further comprises determining the phenotype profile based at least in part on a characteristic of the individual selected from the group consisting of ethnicity, pedigree, geography, gender, age, family history, and a predetermined phenotype.

31. The method of claim 17, wherein the genotype comprises a nucleotide repeat, a nucleotide insertion, a nucleotide deletion, a chromosomal translocation, a chromosomal repeat, or a copy number variation.

32. The method of claim 31, wherein the copy number variation is a microsatellite repeat, a nucleotide repeat, a centromeric repeat or a telomeric repeat.

33. The method of claim 17, wherein the genotype comprises a single nucleotide polymorphism.

34. The method of claim 17, wherein the genotypes comprise a haplotype and a diplotype.

35. The method of claim 17, wherein the genotype comprises a genetic marker in linkage disequilibrium with a phenotype-associated single nucleotide polymorphism.

36. The method of claim 17, wherein the phenotype profile indicates the presence or risk of the quantitative trait.

37. The method of claim 17, wherein the phenotype profile indicates a probability that an individual having a genotype has or will have a phenotype.

38. The method of claim 37, wherein the probability is based on a GCI or GCI Plus score.

39. The method of claim 37, wherein the probability is an estimated lifetime risk.

40. The method of claim 17, wherein the correlation is validated.

41. The method of claim 17, wherein the rule set includes at least 20 rules.

42. The method of claim 17, wherein the set of rules includes at least 50 rules.

43. The method of claim 17, wherein the rule set comprises rules based on the genotype correlations in table 1.

44. The method of claim 17, wherein the rule set comprises rules based on the genotype correlations in figures 4,5, 6, 22, or 25.

45. The method of claim 17, wherein the phenotype comprises a quantitative trait.

46. The method of claim 45, wherein the quantitative trait comprises a medical condition.

47. The method of claim 46, wherein said phenotype profile indicates the presence or absence of said medical condition, the risk of developing said medical condition, the prognosis of said medical condition, the effect of treatment of said medical condition, or the response to treatment of said medical condition.

48. The method of claim 45, wherein the quantitative trait comprises a phenotype of a non-medical condition.

49. The method of claim 45, wherein the quantitative trait is selected from the group consisting of a physical trait, a physiological trait, a mental trait, an emotional trait, ethnicity, pedigree, or age.

50. The method of claim 17, wherein the subject is a human.

51. The method of claim 17, wherein the subject is a non-human.

52. The method of claim 17, wherein the individual is a registered user.

53. The method of claim 17, wherein the individual is a non-registered user.

54. The method of claim 17, wherein the genomic profile comprises at least 100,000 genotypes.

55. The method of claim 17, wherein the genomic profile comprises at least 400,000 genotypes.

56. The method of claim 17, wherein the genomic profile comprises at least 900,000 genotypes.

57. The method of claim 17, wherein the genomic profile comprises at least 1,000,000 genotypes.

58. The method of claim 17, wherein the genomic profile comprises substantially complete whole genome sequence.

59. The method of claim 17, wherein the data set comprises a plurality of data points, wherein each data point relates to an individual and comprises a plurality of data elements, wherein the data elements comprise at least one element selected from the group consisting of a unique identifier of the individual, genotype information, microarray SNP identification number, SNP rs number, chromosomal location, polymorphic nucleotides, quality metrics, raw data files, images, extracted intensity scores, physical data, medical data, ethnicity, pedigree, geography, gender, age, family history, known phenotype, demographic data, exposure data, lifestyle data, and behavioral data.

60. The method of claim 17, wherein the periodic updating and applying occurs at least once a year.

61. The method of claim 17, wherein providing the data set comprises obtaining a genomic profile for each individual of a plurality of individuals by:

i) performing genetic analysis on a genetic sample obtained from said individual, and

ii) encoding the analysis in a computer readable form.

62. The method of claim 17, wherein said phenotype profile comprises a monogenic phenotype.

63. The method of claim 17, wherein said phenotype profile comprises a multigenic phenotype.

64. The method of claim 17, wherein the report includes an initial phenotype profile.

65. The method of claim 17, wherein the report comprises an updated phenotype profile.

66. The method of claim 17, wherein said report further comprises information about said phenotype of said phenotypic profile selected from one or more of the following: preventive countermeasures, health information, therapy, symptom recognition, early detection protocols, intervention protocols, and precise identification and subclassification of the phenotypes in the phenotype profile.

67. The method of claim 17, further comprising:

e) adding a new genomic profile of a new individual to the individual dataset;

f) applying the rule set to the genomic profile of the new individual; and

g) generating an initial report of the phenotype profile of the new individual.

68. The method of claim 17, comprising:

e) adding a new genomic profile of the individual;

f) applying the rule set to the new genomic profile of the individual; and

g) generating a new report of the phenotype profile of the individual.

69. A system for assessing genotype correlations of an individual, the system comprising:

a) means for storing a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype, wherein the genotype correlation is determined by comparing the genomic profile of the individual to a database of correlations of current human genotypes and phenotypes to determine a plurality of relative risk or odds ratios for a plurality of alleles, including risk alleles or non-risk alleles, of the individual for each phenotype of interest;

b) means for periodically updating said rule set with at least one new rule, wherein said at least one new rule indicates a correlation between genotypes and phenotypes not previously correlated with each other in said rule set;

c) means for generating a genomic profile of an individual, thereby obtaining a database comprising genomic profiles of a plurality of individuals;

d) means for applying the rule set to the genomic profile of an individual to determine a phenotypic profile of the individual; and

e) and means for generating a report for each individual.

70. The system of claim 69, wherein the report is transmitted over a network.

71. The system of claim 69, wherein the report is provided in an encrypted manner.

72. The system of claim 69, wherein said report is provided in an unencrypted manner.

73. The system of claim 69, wherein said report is provided through an online portal.

74. The system of claim 69, wherein the report is provided via paper or email.

75. The system of claim 69, further comprising means for announcing to said individual a new or revised association.

76. The system of claim 69, further comprising code for advertising to said individual new or revised rules applicable to said genomic profile of said individual.

77. The system of claim 69, further comprising means for advertising to said individual new or revised prevention and health information regarding said phenotype of said phenotype profile of said individual.

78. A kit for performing the method of claim 1, the kit comprising:

a) at least one sample collection container;

b) instructions for obtaining a sample from an individual;

c) instructions for accessing a genomic profile of the individual obtained from the sample through an online portal;

d) instructions for accessing, via an online portal, a phenotype profile of the individual obtained from the sample; and

e) a package for delivering the sample collection container to the sample processing mechanism.