CN103642902B

CN103642902B - Genetic analysis systems and method

Info

Publication number: CN103642902B
Application number: CN201310565723.1A
Authority: CN
Inventors: D·A·斯坦芬; M·F·菲利普庞; J·韦塞尔; M·卡吉尔; E·哈尔佩里恩
Original assignee: Navigenics Inc
Current assignee: Navigenics Inc
Priority date: 2006-11-30
Filing date: 2007-11-30
Publication date: 2016-01-20
Anticipated expiration: 2027-11-30
Also published as: GB2444410A; EP2102651A4; GB2444410B; WO2008067551A3; TWI363309B; AU2007325021A1; EP2102651A2; CA2671267A1; TW200847056A; CN103642902A; KR20090105921A; JP2010522537A; WO2008067551A2; GB0723512D0; JP2014140387A; HK1139737A1; AU2007325021B2

Abstract

The invention provides the method being determined hereditary composite index score by the dependency assessed between individual genotype and at least one disease or state.This assessment comprises the Genome Atlas of individuality compared with confirming as the database of the medical science correlative heritability of being correlated with at least one disease or state.

Description

Genetic Analysis Systems and Methods

本申请是申请日为2007年11月30日和发明名称为“遗传分析系统和方法”的200780050019.5号发明专利申请的分案申请。This application is a divisional application of the No. 200780050019.5 invention patent application with the filing date of November 30, 2007 and the invention title of "Genetic Analysis System and Method".

背景技术Background technique

人类基因组测序和人类基因组学的其它最新进展已经揭示出，任何两个人之间的基因组组成具有超过99.9％的相似性。不同个体间DNA中相对较少量的变异是导致表型性状差异的原因，并且与许多人类疾病、对各种疾病的易感性和对疾病治疗的反应有关。个体间DNA的变异发生在编码区和非编码区，并且包括基因组DNA序列中特定位点上碱基的变化，以及DNA的插入和缺失。发生在基因组中单个碱基位置上的变化称为单核苷酸多态性，或者“SNP”。Sequencing of the human genome and other recent advances in human genomics have revealed that the genome makeup between any two individuals is more than 99.9% similar. Relatively small amounts of variation in DNA between individuals are responsible for differences in phenotypic traits and are associated with many human diseases, susceptibility to various diseases, and response to disease treatments. Inter-individual variation in DNA occurs in both coding and non-coding regions, and includes base changes at specific sites in the genomic DNA sequence, as well as insertions and deletions of DNA. Variations that occur at a single base position in the genome are called single nucleotide polymorphisms, or "SNPs."

虽然在人类基因组中SNP相对稀少，但是其占到个体间DNA序列变异的大部分，在人类基因组中大约每1,200个碱基对发生一个SNP(参见InternationalHapMapProject，www.hapmap.org)。由于可获得更多的人类遗传信息，SNP的复杂性开始为人所了解。随之，基因组中SNP的发生与多种疾病和状态的存在和／或易感性发生关联。Although relatively rare in the human genome, SNPs account for the majority of the interindividual DNA sequence variation, occurring approximately every 1,200 base pairs in the human genome (see InternationalHapMapProject, www.hapmap.org). As more human genetic information becomes available, the complexity of SNPs is beginning to be understood. Consequently, the occurrence of SNPs in the genome is associated with the presence and/or susceptibility to various diseases and conditions.

由于获得这些相关性和人类遗传学上的其它进展，一般而言医疗和个人保健正向着个性化的途径发展，其中患者将在其它因素以外考虑他或她的基因组信息的情况下作出适当的医疗选择和其它选择。因此，就需要向个人和他们的保健提供者提供特定于该个体的个人基因组的信息，从而提供个性化医疗和其它决策。As a result of these correlations and other advances in human genetics, medical and personal healthcare in general is moving toward a personalized approach in which a patient makes appropriate medical care taking into account his or her genomic information, among other factors choices and other choices. Accordingly, there is a need to provide individuals and their healthcare providers with information specific to that individual's personal genome to enable personalized medical and other decision-making.

发明内容Contents of the invention

本发明提供了一种评估个体的基因型相关性的方法，该方法包括：a)获得该个体的遗传样品，b)生成该个体的基因组图谱，c)通过将该个体的基因组图谱与人类基因型和表型的相关性的当前数据库相比较，确定该个体基因型与表型的相关性，d)向该个体或该个体的保健管理者报告由步骤c)得到的结果，e)当已知附加的人类基因型相关性时，用该附加的人类基因型相关性更新人类基因型相关性数据库，f)通过将由步骤c)得到的该个体的基因组图谱或其一部分与附加的人类基因型相关性相比较而更新该个体的基因型相关性，并确定该个体的附加基因型相关性，和g)向该个体或该个体的保健管理者报告由步骤f)得到的结果。The present invention provides a method for assessing the genotype correlation of an individual, the method comprising: a) obtaining a genetic sample of the individual, b) generating a genome map of the individual, c) combining the genome map of the individual with human genes determine the correlation of the individual's genotype to the phenotype by comparing it to a current database of correlations between the genotype and the phenotype, d) report the results obtained from step c) to the individual or to the individual's healthcare manager, e) when the When the additional human genotype correlation is known, the human genotype correlation database is updated with the additional human genotype correlation, f) by combining the genome map of the individual obtained by step c) or a part thereof with the additional human genotype The correlations are compared to update the individual's genotype correlations and determine additional genotype correlations for the individual, and g) report the results from step f) to the individual or to the individual's healthcare manager.

本发明进一步提供了一种评估个体的基因型相关性的商业方法，该方法包括：a)获得该个体的遗传样品；b)生成该个体的基因组图谱，c)通过将该个体的基因组图谱与人类基因型相关性数据库相比较确定该个体的基因型相关性；d)以加密的方式向该个体提供确定个体的基因型相关性的结果；e)当已知附加的人类基因型相关性时，用该附加的人类基因型相关性更新人类基因型相关性数据库；f)通过将该个体的基因组图谱或其一部分与附加的人类基因型相关性相比较而更新该个体的基因型相关性，并确定该个体的附加基因型相关性；和g)向该个体或该个体的保健管理者提供更新该个体的基因型相关性的结果。The present invention further provides a commercial method for assessing the genotype correlation of an individual, the method comprising: a) obtaining a genetic sample of the individual; b) generating a genomic profile of the individual, c) combining the genomic profile of the individual with comparison to a human genotype correlation database to determine the individual's genotype correlation; d) provide the individual with the results of determining the individual's genotype correlation in encrypted form; e) when additional human genotype correlations are known , updating the human genotype correlation database with the additional human genotype correlation; f) updating the individual's genotype correlation by comparing the individual's genome profile or a portion thereof with the additional human genotype correlation, and determining additional genotype correlations for the individual; and g) providing the individual or the individual's healthcare manager with the results of updating the genotype correlations for the individual.

本发明的另一方面是一种生成个体的表型谱的方法，该方法包括：a)提供包括规则的规则集(ruleset)，各条规则表明至少一种基因型与至少一种表型之间的相关性，b)提供包括多个个体中各个个体的基因组图谱的数据集，其中各个基因组图谱包括多种基因型；c)用至少一条新规则定期更新该规则集，其中该至少一条新规则表明先前在规则集中未彼此关联的基因型与表型之间的相关性；d)将各条新规则应用于至少一个个体的基因组图谱，从而使该个体的至少一种基因型与至少一种表型相关联，并且任选地，e)生成包括该个体的表型谱的报告。Another aspect of the invention is a method of generating a phenotypic profile of an individual, the method comprising: a) providing a ruleset comprising rules, each rule indicating the relationship between at least one genotype and at least one phenotype b) providing a data set comprising genome profiles of each of a plurality of individuals, wherein each genome profile comprises a plurality of genotypes; c) regularly updating the rule set with at least one new rule, wherein the at least one new rule The rules indicate correlations between genotypes and phenotypes that were not previously associated with each other in the rule set; d) applying each new rule to the genome profile of at least one individual such that at least one genotype of the individual is associated with at least one associated with the individual phenotypes, and optionally, e) generating a report comprising the individual's phenotype profile.

本发明还提供了一种系统，该系统包括：a)包括规则的规则集，各条规则表明至少一种基因型与至少一种表型之间的相关性；b)用至少一条新规则定期更新该规则集的代码，其中该至少一条新规则表明先前在规则集中未彼此关联的基因型与表型之间的相关性；c)包括多个个体的基因组图谱的数据库；d)将该规则集应用于个体的基因组图谱以确定个体的表型谱的代码；和e)生成各个体的报告的代码。The present invention also provides a system comprising: a) a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype; b) periodically using at least one new rule updating the code of the rule set, wherein the at least one new rule indicates a correlation between genotypes and phenotypes that were not previously associated with each other in the rule set; c) a database comprising a plurality of individual genome profiles; d) the rule collecting codes applied to an individual's genomic profile to determine an individual's phenotypic profile; and e) generating a report for each individual.

本发明的另一方面是在上述的方法和系统中以加密或不加密的方式通过网络进行传输。Another aspect of the present invention is that in the above-mentioned method and system, the transmission through the network is performed in an encrypted or unencrypted manner.

引入的参考内容Introduced references

在说明书中提及的所有出版物和专利申请在此引入作为参考，正如各单个出版物或专利申请特别地和单独地说明引入作为参考一样。All publications and patent applications mentioned in this specification are herein incorporated by reference as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.

具体地，本发明涉及以下各项：Specifically, the present invention relates to the following:

1.一种评估个体的基因型相关性的方法，该方法包括：1. A method of assessing genotype correlation in an individual, the method comprising:

a)获得所述个体的遗传样品；a) obtaining a genetic sample of said individual;

b)生成所述个体的基因组图谱；b) generating a genomic profile of said individual;

c)通过将所述个体的基因组图谱与当前人类基因型与表型的相关性数据库相比较而确定所述个体的基因型与表型的相关性；c) determining the genotype-phenotype correlation of the individual by comparing the genome profile of the individual to a current database of human genotype-phenotype correlations;

d)向所述个体或所述个体的保健管理者报告由步骤c)得到的所述结果；d) reporting said results obtained from step c) to said individual or said individual's healthcare manager;

e)当知道附加的人类基因型相关性时，用所述附加的人类基因型相关性而更新所述人类基因型相关性数据库；和e) updating said human genotype correlation database with additional human genotype correlations as they become known; and

f)通过将步骤c)的所述个体的基因组图谱或其一部分与所述附加的人类基因型相关性相比较更新所述个体的基因型相关性，并确定所述个体的附加基因型相关性；和f) updating said individual's genotype correlation by comparing said individual's genomic profile of step c) or a portion thereof with said additional human genotype correlation and determining said individual's additional genotype correlation ;and

g)向所述个体或所述个体的保健管理者报告由步骤f)得到的所述结果。g) reporting said results from step f) to said individual or said individual's healthcare manager.

2.第1项所述的方法，其中，第三方获得所述遗传样品。2. The method of item 1, wherein a third party obtains the genetic sample.

3.第1项所述的方法，其中，所述生成基因组图谱由第三方进行。3. The method of item 1, wherein said generating the genome map is performed by a third party.

4.第1项所述的方法，其中，所述结果基于GCI或者GCIPlus评分。4. The method of item 1, wherein the result is based on the GCI or GCIPlus score.

5.第1项所述的方法，其中，所述报告包括通过网络传输所述结果。5. The method of clause 1, wherein said reporting comprises transmitting said results over a network.

6.第1项所述的方法，其中，所述结果的所述报告是通过在线入口。6. The method of clause 1, wherein said reporting of said results is through an online portal.

7.第1项所述的方法，其中，所述结果的所述报告是通过纸件或者通过电子邮件。7. The method of clause 1, wherein said reporting of said results is by paper or by e-mail.

8.第1项所述的方法，其中，所述报告包括以加密的方式报告所述结果。8. The method of clause 1, wherein said reporting comprises reporting said results in an encrypted manner.

9.第1项所述的方法，其中，所述报告包括以非加密的方式报告所述结果。9. The method of clause 1, wherein said reporting comprises reporting said results in an unencrypted manner.

10.第1项所述的方法，其中，所述个体的基因组图谱存储至加密数据库或保险库中。10. The method of item 1, wherein the genome profile of the individual is stored in an encrypted database or vault.

11.第1项所述的方法，其中，所述个体为注册用户。11. The method of clause 1, wherein the individual is a registered user.

12.第1项所述的方法，其中，所述个体为非注册用户。12. The method of clause 1, wherein the individual is a non-registered user.

13.第1项所述的方法，其中，所述遗传样品为DNA。13. The method of item 1, wherein the genetic sample is DNA.

14.第1项所述的方法，其中，所述遗传样品为RNA。14. The method of item 1, wherein the genetic sample is RNA.

15.第1项所述的方法，其中，所述基因组图谱为单核苷酸多态性基因组图谱，所述人类基因型相关性数据库为人类单核苷酸多态性相关性，并且所述附加的人类基因型相关性为单核苷酸多态性相关性。15. The method of item 1, wherein the genomic profile is a single nucleotide polymorphism genomic profile, the human genotype correlation database is a human single nucleotide polymorphism correlation, and the An additional human genotype correlation is a single nucleotide polymorphism correlation.

16.第1项所述的方法，其中，所述基因组图谱包括平截、插入、缺失或重复，所述人类基因型相关性数据库为人类平截、插入、缺失或重复相关性，并且所述附加的人类基因型相关性为平截、插入、缺失或重复相关性。16. The method of item 1, wherein the genome profile includes truncations, insertions, deletions or duplications, the human genotype correlation database is human truncations, insertions, deletions or duplications correlations, and the Additional human genotype correlations are truncation, insertion, deletion or duplication correlations.

17.第1项所述的方法，其中，所述基因组图谱为所述个体的全基因组。17. The method of item 1, wherein the genome profile is the entire genome of the individual.

18.第1项所述的方法，其中，所述方法包括评估2个或更多的基因型相关性。18. The method of clause 1, wherein the method comprises evaluating 2 or more genotype correlations.

19.第1项所述的方法，其中，所述方法包括评估10个或更多的基因型相关性。19. The method of clause 1, wherein the method comprises evaluating 10 or more genotype correlations.

20.第1项所述的方法，其中，所述人类基因型相关性数据库包括列于表1的一个或多个基因中的遗传性变型和与所述遗传变型相关的表型。20. The method of item 1, wherein the human genotype correlation database includes genetic variants in one or more genes listed in Table 1 and phenotypes associated with the genetic variants.

21.第1项所述的方法，其中，所述人类基因型相关性数据库包括列于图4、5、6、22或25的一个或多个基因中的遗传性变型和与所述遗传性变型相关的表型。21. The method of item 1, wherein the human genotype correlation database includes genetic variants and genetic variants in one or more genes listed in Figures 4, 5, 6, 22 or 25 and the genetic variants associated with the genetic Variant-associated phenotypes.

22.第1项所述的方法，其中，所述人类基因型相关性数据库包括由所述个体的所述基因组图谱确定的遗传性变型和由所述个体显露的预先确定的表型。22. The method of clause 1, wherein said human genotype correlation database includes genetic variants determined from said genomic profile of said individual and predetermined phenotypes exhibited by said individual.

23.第1项所述的方法，其中，所述人类基因型相关性数据库包括在表1或图4、5、6、22或25所列的所述基因中的单核苷酸多态性和与所述单核苷酸多态性相关的表型。23. The method of item 1, wherein the human genotype correlation database includes single nucleotide polymorphisms in the genes listed in Table 1 or Figures 4, 5, 6, 22 or 25 and phenotypes associated with said single nucleotide polymorphisms.

24.第1项所述的方法，其中，所述遗传样品来自选自血液、头发、皮肤、唾液、精液、尿、粪便物质、汗液和口腔样品的生物样品。24. The method of clause 1, wherein the genetic sample is from a biological sample selected from the group consisting of blood, hair, skin, saliva, semen, urine, fecal material, sweat, and buccal samples.

25.第15项所述的方法，其中，所述基因型相关性为单核苷酸多态性与疾病和状态的相关性。25. The method of item 15, wherein the genotype correlation is the correlation of single nucleotide polymorphisms with diseases and conditions.

26.第15项所述的方法，其中，所述基因型相关性为单核苷酸多态性与非医学状态的表型的相关性。26. The method of item 15, wherein the genotype correlation is a correlation of a single nucleotide polymorphism with a phenotype of a non-medical condition.

27.第1项所述的方法，其中，所述基因组图谱使用高密度DNA微阵列生成。27. The method of clause 1, wherein the genomic profile is generated using a high-density DNA microarray.

28.第1项所述的方法，其中，所述基因组图谱使用基因组DNA测序生成。28. The method of clause 1, wherein the genomic profile is generated using genomic DNA sequencing.

29.第24项所述的方法，其中，所述遗传样品为基因组DNA并且所述生物样品为唾液。29. The method of item 24, wherein the genetic sample is genomic DNA and the biological sample is saliva.

30.一种方法，该方法包括：30. A method comprising:

a)提供包括规则的规则集，各条规则表明至少一种基因型与至少一种表型之间的相关性；a) providing a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype;

b)提供包括多个个体中各个个体的基因组图谱的数据集，其中各个基因组图谱包括多种基因型；b) providing a data set comprising genomic profiles of each of the plurality of individuals, wherein each genomic profile comprises a plurality of genotypes;

c)定期地使用至少一条新规则更新所述规则集，其中所述至少一条新规则表明先前在所述规则集中彼此不相关的基因型和表型之间的相关性；和c) periodically updating said rule set with at least one new rule, wherein said at least one new rule indicates correlations between genotypes and phenotypes that were previously uncorrelated with each other in said rule set; and

d)将各条新规则应用于至少所述个体之一的所述基因组图谱，从而对于所述个体使至少一种基因型与至少一种表型相关联。d) applying each new rule to said genomic profile of at least one of said individuals, thereby associating at least one genotype with at least one phenotype for said individual.

31.第30项所述的方法，该方法进一步包括：31. The method of item 30, further comprising:

e)生成包括所述个体的所述表型谱的报告。e) generating a report comprising said phenotypic profile of said individual.

32.第30项所述的方法，该方法进一步包括：在步骤b)之后32. The method of item 30, further comprising: after step b)

i)将所述规则集的所述规则应用于所述个体的所述基因组图谱以确定所述个体的一套表型谱；和i) applying said rules of said rule set to said genomic profile of said individual to determine a set of phenotypic profiles for said individual; and

ii)生成包括所述个体的初始表型谱的报告。ii) generating a report comprising the individual's initial phenotypic profile.

33.第31或32项所述的方法，其中，提供所述报告包括通过网络传输所述报告。33. The method of clause 31 or 32, wherein providing the report comprises transmitting the report over a network.

34.第31或32项所述的方法，其中，所述报告以加密方式提供。34. The method of clause 31 or 32, wherein the report is provided in encrypted form.

35.第31或32项所述的方法，其中，所述报告以非加密方式提供。35. The method of clause 31 or 32, wherein the report is provided unencrypted.

36.第31或32项所述的方法，其中，所述报告通过在线入口提供。36. The method of clause 31 or 32, wherein the report is provided through an online portal.

37.第31或32项所述的方法，其中，所述报告以纸件或电子邮件提供。37. The method of item 31 or 32, wherein the report is provided on paper or by email.

38.第30项所述的方法，其中，所述新规则使未关联的基因型与表型相关联。38. The method of item 30, wherein the new rule correlates uncorrelated genotypes with phenotypes.

39.第30项所述的方法，其中，所述新规则使已关联的基因型与先前未在所述规则集中与之关联的表型相关联。39. The method of clause 30, wherein the new rules associate associated genotypes with phenotypes not previously associated with them in the rule set.

40.第30项所述的方法，其中，所述新规则改变所述规则集中的规则。40. The method of clause 30, wherein the new rule changes a rule in the rule set.

41.第30项所述的方法，其中，所述新规则通过来自所述个体的所述基因组图谱的基因型和所述个体的预先确定的表型的相关性生成。41. The method of item 30, wherein said new rule is generated by correlation of a genotype from said genomic profile of said individual and a predetermined phenotype of said individual.

42.第30项所述的方法，其中，所述规则使多种基因型与一种表型相关联。42. The method of item 30, wherein the rule associates multiple genotypes with a phenotype.

43.第30项所述的方法，其中，应用所述新规则进一步包括至少部分基于选自种族、家系、地理、性别、年龄、家族史和预先确定的表型的所述个体的特征确定所述表型谱。43. The method of clause 30, wherein applying the new rule further comprises determining the determined value based at least in part on a characteristic of the individual selected from the group consisting of race, ancestry, geography, sex, age, family history, and a predetermined phenotype. Describe the phenotype spectrum.

44.第30项所述的方法，其中，所述基因型包括核苷酸重复、核苷酸插入、核苷酸缺失、染色体易位、染色体重复或拷贝数变异。44. The method of item 30, wherein the genotype comprises nucleotide duplication, nucleotide insertion, nucleotide deletion, chromosomal translocation, chromosomal duplication, or copy number variation.

45.第44项所述的方法，其中，所述拷贝数变异为微卫星重复、核苷酸重复、着丝粒重复或端粒重复。45. The method of item 44, wherein the copy number variation is a microsatellite duplication, a nucleotide duplication, a centromeric duplication, or a telomeric duplication.

46.第30项所述的方法，其中，所述基因型包括单核苷酸多态性。46. The method of item 30, wherein the genotype comprises a single nucleotide polymorphism.

47.第30项所述的方法，其中，所述基因型包括单体型和双体型。47. The method of item 30, wherein the genotype comprises a haplotype and a disomy.

48.第30项所述的方法，其中，所述基因型包括与表型相关的单核苷酸多态性连锁不平衡的遗传标记。48. The method of item 30, wherein the genotype comprises genetic markers of single nucleotide polymorphism linkage disequilibrium associated with the phenotype.

49.第30项所述的方法，其中，所述表型谱表明所述定量性状是否存在或者产生所述定量性状的风险。49. The method of item 30, wherein said phenotypic profile indicates the presence or risk of said quantitative trait.

50.第30项所述的方法，其中，所述表型谱表明具有基因型的个体具有或者将具有表型的概率。50. The method of clause 30, wherein the phenotypic profile indicates the probability that an individual with a genotype has or will have a phenotype.

51.第50项所述的方法，其中，所述概率基于GCI或者GCIPlus评分。51. The method of clause 50, wherein the probability is based on a GCI or GCIPlus score.

52.第50项所述的方法，其中，所述概率为估计的终生风险。52. The method of item 50, wherein the probability is an estimated lifetime risk.

53.第30项所述的方法，其中，所述相关性是经过验证的。53. The method of item 30, wherein said correlation is verified.

54.第30项所述的方法，其中，所述规则集包括至少20条规则。54. The method of clause 30, wherein the rule set includes at least 20 rules.

55.第30项所述的方法，其中，所述规则集包括至少50条规则。55. The method of clause 30, wherein the rule set includes at least 50 rules.

56.第30项所述的方法，其中，所述规则集包括基于表1中的所述基因型相关性的规则。56. The method of clause 30, wherein said rule set includes rules based on said genotype correlations in Table 1.

57.第30项所述的方法，其中，所述规则集包括基于图4、5、6、22或25中的所述基因型相关性的规则。57. The method of clause 30, wherein said set of rules includes rules based on said genotype correlations in Figures 4, 5, 6, 22, or 25.

58.第30项所述的方法，其中，所述表型包括定量性状。58. The method of item 30, wherein the phenotype comprises a quantitative trait.

59.第58项所述的方法，其中，所述定量性状包括医学状态。59. The method of item 58, wherein the quantitative trait comprises a medical condition.

60.第59项所述的方法，其中，所述表型谱表明所述医学状态是否存在、产生所述医学状态的风险、所述医学状态的预后、所述医学状态的治疗效果或者对于所述医学状态的治疗的反应。60. The method of clause 59, wherein the phenotypic profile indicates the presence or absence of the medical condition, the risk of developing the medical condition, the prognosis of the medical condition, the therapeutic effect of the medical condition, or the response to treatment for the medical condition described.

61.第58项所述的方法，其中，所述定量性状包括非医学状态的表型。61. The method of item 58, wherein the quantitative trait comprises a phenotype of a non-medical state.

62.第58项所述的方法，其中，所述定量性状选自身体性状、生理性状、精神性状、情绪性状、种族、家系或年龄。62. The method of item 58, wherein the quantitative trait is selected from a physical trait, a physiological trait, a mental trait, an emotional trait, race, ancestry, or age.

63.第30项所述的方法，其中，所述个体为人类。63. The method of item 30, wherein the individual is a human.

64.第30项所述的方法，其中，所述个体为非人类。64. The method of item 30, wherein the individual is non-human.

65.第30项所述的方法，其中，所述个体为注册用户。65. The method of item 30, wherein the individual is a registered user.

66.第30项所述的方法，其中，所述个体为非注册用户。66. The method of clause 30, wherein the individual is a non-registered user.

67.第30项所述的方法，其中，所述基因组图谱包括至少100,000种基因型。67. The method of clause 30, wherein said genomic profile comprises at least 100,000 genotypes.

68.第30项所述的方法，其中，所述基因组图谱包括至少400,000种基因型。68. The method of clause 30, wherein the genomic profile includes at least 400,000 genotypes.

69.第30项所述的方法，其中，所述基因组图谱包括至少900,000种基因型。69. The method of item 30, wherein the genomic profile includes at least 900,000 genotypes.

70.第30项所述的方法，其中，所述基因组图谱包括至少1,000,000种基因型。70. The method of item 30, wherein the genomic profile comprises at least 1,000,000 genotypes.

71.第30项所述的方法，其中，所述基因组图谱包括基本上完全的全基因组序列。71. The method of clause 30, wherein the genome map comprises a substantially complete whole genome sequence.

72.第30项所述的方法，其中，所述数据集包括多个数据点，其中各数据点涉及个体并且包括多个数据元，其中所述数据元包括选自所述个体的独特标识物、基因型信息、微阵列SNP识别号、SNPrs识别号、染色体位置、多态性核苷酸、质量度量、原始数据文件、图像、提取的强度得分、物理数据、医学数据、种族、家系、地理、性别、年龄、家族史、已知表型、人口数据、暴露数据、生活方式数据和行为数据的至少一个元素。72. The method of clause 30, wherein the data set comprises a plurality of data points, wherein each data point relates to an individual and comprises a plurality of data elements, wherein the data elements comprise a unique identifier selected from the individual , genotype information, microarray SNP identifiers, SNPrs identifiers, chromosome positions, polymorphic nucleotides, quality metrics, raw data files, images, extracted intensity scores, physical data, medical data, race, pedigree, geography , gender, age, family history, known phenotype, demographic data, exposure data, lifestyle data, and behavioral data.

73.第30项所述的方法，其中，定期更新和应用一年发生至少一次。73. The method of clause 30, wherein the periodic updating and applying occurs at least once a year.

74.第30项所述的方法，其中，提供所述数据集包括通过以下步骤获得多个个体中的各个个体的基因组图谱：74. The method of clause 30, wherein providing the data set comprises obtaining a genomic profile of each of a plurality of individuals by:

i)对由所述个体得到的遗传样品进行遗传分析，和i) performing a genetic analysis on a genetic sample obtained from said individual, and

ii)以计算机可读形式对所述分析进行编码。ii) encoding the analysis in computer readable form.

75.第30项所述的方法，其中，所述表型谱包括单基因表型。75. The method of clause 30, wherein the phenotypic profile comprises a monogenic phenotype.

76.第30项所述的方法，其中，所述表型包括多基因表型。76. The method of item 30, wherein the phenotype comprises a polygenic phenotype.

77.第30项所述的方法，其中，所述报告包括初始表型谱。77. The method of clause 30, wherein said report includes an initial phenotype profile.

78.第30项所述的方法，其中，所述报告包括更新的表型谱。78. The method of item 30, wherein the report includes an updated phenotype profile.

79.第30项所述的方法，其中，所述报告进一步包括关于所述表型谱的所述表型的信息，该信息选自以下所述的一种或多种：预防对策、健康信息、疗法、症状认识、早期检测方案、介入方案和所述表型谱中所述表型的精确鉴别及细分类。79. The method of clause 30, wherein said report further includes information about said phenotype of said phenotype profile selected from one or more of the following: preventive strategies, health information , therapy, symptom awareness, early detection programs, intervention programs and precise identification and sub-categorization of said phenotypes in said phenotype profile.

80.第30项所述的方法，该方法进一步包括：80. The method of item 30, further comprising:

e)将新个体的新基因组图谱加入到所述个体数据集中；e) adding a new genome profile of a new individual to the individual data set;

f)将所述规则集应用于所述新个体的所述基因组图谱；和f) applying said rule set to said genomic profile of said new individual; and

g)生成所述新个体的表型谱的初始报告。g) generating an initial report of the phenotypic profile of said new individual.

81.第30项所述的方法，该方法包括：81. The method of item 30, comprising:

e)添加所述个体的新基因组图谱；e) adding a new genome profile of said individual;

f)将所述规则集应用于所述个体的所述新基因组图谱；和f) applying said rule set to said new genomic profile of said individual; and

g)生成所述个体的表型谱的新报告。g) generating a new report of the individual's phenotype profile.

82.一种系统，该系统包括：82. A system comprising:

a)包括规则的规则集，各条规则表明至少一种基因型与至少一种表型之间的相关性；a) a rule set comprising rules, each rule indicating a correlation between at least one genotype and at least one phenotype;

b)使用至少一条新规则定期更新所述规则集的代码，其中所述至少一条新规则表明先前在所述规则集中未彼此相关的基因型和表型之间的相关性；b) regularly updating the code of the rule set with at least one new rule indicating correlations between genotypes and phenotypes that were not previously correlated with each other in the rule set;

c)包括多个个体的基因组图谱的数据库；c) a database comprising the genome profiles of a plurality of individuals;

d)将所述规则集应用于个体的所述基因组图谱以确定所述个体的表型谱的代码；和d) applying said rule set to said genomic profile of an individual to determine a code for a phenotypic profile of said individual; and

e)生成各个体的报告的代码。e) Code to generate reports for each individual.

83.第82项所述的系统，其中，所述报告通过网络传输。83. The system of clause 82, wherein the report is transmitted over a network.

84.第82项所述的系统，其中，所述报告以加密方式提供。84. The system of clause 82, wherein the report is provided encrypted.

85.第82项所述的系统，其中，所述报告以非加密方式提供。85. The system of clause 82, wherein the report is provided unencrypted.

86.第82项所述的系统，其中，所述报告通过在线入口提供。86. The system of clause 82, wherein the report is provided through an online portal.

87.第82项所述的系统，其中，所述报告通过纸件或电子邮件提供。87. The system of clause 82, wherein the report is provided by paper or email.

88.第82项所述的系统，该系统进一步包括向所述个体通告新的或修正的相关性的代码。88. The system of clause 82, further comprising code to notify said individual of a new or revised correlation.

89.第82项所述的系统，该系统进一步包括向所述个体通告能够应用于所述个体的所述基因组图谱的新的或修正的规则的代码。89. The system of clause 82, further comprising code for notifying said individual of new or revised rules that can be applied to said genomic profile of said individual.

90.第82项所述的系统，该系统进一步包括向所述个体通告有关所述个体的所述表型谱的所述表型的新的或修正的预防和健康信息的代码。90. The system of clause 82, further comprising code for notifying said individual of new or revised prevention and health information regarding said phenotype of said phenotype profile of said individual.

91.一种试剂盒，该试剂盒包括：91. A kit comprising:

a)至少一种样品收集容器；a) at least one sample collection container;

b)用于从个体得到样品的使用说明；b) instructions for use in obtaining samples from individuals;

c)用于通过在线入口访问由所述样品获得的所述个体的基因组图谱的使用说明；c) instructions for accessing the genomic profile of said individual obtained from said sample through an online portal;

d)用于通过在线入口访问由所述样品获得的所述个体的表型谱的使用说明；和d) instructions for accessing the phenotypic profile of said individual obtained from said sample through an online portal; and

e)用于将所述样品收集容器递送至所述样品处理机构的包装。e) Packaging for delivering said sample collection container to said sample processing facility.

92.一种在线入口，该在线入口包括个体能够访问所述表型谱的网站，其中所述网站允许所述个体进行如下所述的至少一种操作：92. An online portal comprising a website enabling an individual to access said phenotype profile, wherein said website allows said individual to perform at least one of the following:

a)选择所述规则以应用于所述个体的基因组图谱；a) selecting said rules to apply to said individual's genomic profile;

b)在所述网站上查看初始的和更新的报告；b) View initial and updated reports on said website;

c)从所述网站打印初始的和更新的报告；c) print initial and updated reports from said website;

d)将来自所述网站的初始的和更新的报告保存至所述个体的计算机上；d) saving initial and updated reports from the website to the individual's computer;

e)获得有关所述个体的表型谱的预防和健康信息；e) obtaining preventive and health information about the phenotypic profile of said individual;

f)获得在线的或者电话连接的遗传咨询；f) access to online or telephone-connected genetic counseling;

g)提取信息以与医生／遗传顾问共享；和／或g) extract information to share with physicians/genetic counselors; and/or

h)获取搭配的服务和提供的产品。h) Acquisition of matching services and products offered.

93.第92项所述的在线入口，其中，所述信息通过网络传输。93. The online portal of item 92, wherein the information is transmitted via a network.

94.第92项所述的在线入口，其中，所述网站是加密的。94. The online portal of item 92, wherein the website is encrypted.

95.第92项所述的在线入口，其中，所述网站是不加密的。95. The online portal of item 92, wherein the website is unencrypted.

96.第92项所述的在线入口，其中，所述个体具有涉及该个体的信息或其一个或多个部分的所述保密等级的一种或者多种选项。96. The online portal of clause 92, wherein said individual has one or more options pertaining to said level of confidentiality of the individual's information, or one or more portions thereof.

97.第92项所述的在线入口，其中，所述表型谱包括可处置的医学状态。97. The online portal of item 92, wherein the phenotypic profile includes treatable medical states.

98.第92项所述的在线入口，其中，所述表型谱包括不具有现行预防措施或者现行疗法的医学状态。98. The online portal of item 92, wherein said phenotypic profile includes medical conditions for which there is no current preventive measure or current therapy.

99.第92项所述的在线入口，其中，所述表型谱包括非医学状态。99. The online portal of clause 92, wherein said phenotypic profile includes non-medical states.

100.一种评估个体获得一种状态的风险的方法，该方法包括：100. A method of assessing an individual's risk of acquiring a condition, the method comprising:

a)获得个体的基因型；a) obtaining the genotype of the individual;

b)由所述基因型确定GCI或者GCIPlus评分；b) determining the GCI or GCIPlus score from the genotype;

c)由所述GCI或者GCIPlus评分生成报告；和c) generate a report from the GCI or GCIPlus score; and

d)将所述报告提供给所述个体或者所述个体的保健管理者。d) providing said report to said individual or said individual's healthcare manager.

101.一种评估个体获得一种状态的风险的方法，该方法包括：101. A method of assessing an individual's risk of acquiring a state, the method comprising:

a)获得个体的基因型；a) obtaining the genotype of the individual;

c)由所述基因组图谱和基因型相关性数据库确定个体获得状态的风险；c) determining the individual's risk of acquiring status from said genome profile and genotype correlation database;

d)由c)生成报告；d) generate a report from c);

e)从所述个体获得新的信息；e) obtain new information from said individual;

f)通过引入所述新的信息确定获得状态的新的风险；f) determining new risks to the state of acquisition by introducing said new information;

g)由f)生成报告；和g) generate a report from f); and

h)将所述报告提供给所述个体或者所述个体的保健管理者。h) providing said report to said individual or said individual's healthcare manager.

102.一种评估个体获得一种状态的风险的方法，该方法包括：102. A method of assessing an individual's risk of acquiring a state, the method comprising:

a)获得个体的基因型；a) obtaining the genotype of the individual;

c)由所述基因组图谱和基因型相关性数据库确定个体获得状态的风险，其中所述风险基于多于一种的SNP；c) determining an individual's risk of acquiring a status from said genomic profile and genotype association database, wherein said risk is based on more than one SNP;

d)由c)生成报告；d) generate a report from c);

e)将所述报告提供给所述个体或者所述个体的保健管理者。e) providing said report to said individual or said individual's healthcare manager.

103.第100、101或102项所述的方法，其中，所述个体的基因型直接从所述个体获得。103. The method of item 100, 101 or 102, wherein the genotype of the individual is obtained directly from the individual.

104.第100、101或102项所述的方法，其中，所述个体的基因型从第三方获得。104. The method of item 100, 101 or 102, wherein the individual's genotype is obtained from a third party.

105.第100、101或102项所述的方法，其中，所述提供是通过网络传输。105. The method of clause 100, 101 or 102, wherein said providing is via network transmission.

106.第101项所述的方法，其中，所述新的信息从所述个体的生物样品获得。106. The method of item 101, wherein said new information is obtained from a biological sample of said individual.

107.第101项所述的方法，其中，所述新的信息从个体的身体测量获得。107. The method of item 101, wherein the new information is obtained from anthropometric measurements of the individual.

108.第101或102项所述的方法，其中，所述风险由GCI或者GCIPlus评分得到。108. The method of item 101 or 102, wherein said risk is scored by GCI or GCIPlus.

109.第100或108项所述的方法，其中，所述GCI或者GCIPlus评分包括所述个体的家系。109. The method of item 100 or 108, wherein the GCI or GCIPlus score includes a pedigree of the individual.

110.第100或108项所述的方法，其中，所述GCI或者GCIPlus评分包括所述个体的性别。110. The method of item 100 or 108, wherein the GCI or GCIPlus score includes the sex of the individual.

111.第100或108项所述的方法，其中，所述GCI或者GCIPlus评分包括特定于所述个体的因素，其中所述因素不是源自所述基因型。111. The method of item 100 or 108, wherein the GCI or GCIPlus score includes factors specific to the individual, wherein the factors are not derived from the genotype.

112.第111项所述的方法，其中，所述因素选自：个体的出生地、父母和／或祖父母、亲缘家系、居住地位置、祖先的居住地位置、环境条件、已知健康状况、已知药物相互作用、家庭卫生条件、生活方式情况、饮食、锻炼习惯、婚姻状态和身体测量。112. The method of item 111, wherein the factors are selected from the group consisting of: the individual's place of birth, parents and/or grandparents, kinship lineage, location of residence, location of residence of ancestors, environmental conditions, known health conditions, Known drug interactions, household hygiene conditions, lifestyle conditions, diet, exercise habits, marital status, and body measurements.

113.第107或112项所述的方法，其中，所述个体的身体测量选自：血压、心率、葡萄糖水平、代谢物水平、离子水平、体重、身高、胆固醇水平、维生素水平、血细胞计数、体重指数(BMI)、蛋白水平和转录物水平。113. The method of item 107 or 112, wherein the individual's physical measurements are selected from the group consisting of: blood pressure, heart rate, glucose level, metabolite level, ion level, weight, height, cholesterol level, vitamin level, blood count, Body mass index (BMI), protein levels and transcript levels.

114.一种评估个体获得一种状态的风险的方法，该方法包括：114. A method of assessing an individual's risk of acquiring a state, the method comprising:

a)获得个体的基因型；a) obtaining the genotype of the individual;

c)确定个体获得阿尔茨海默氏病(AD)、结肠直肠癌(CRC)、骨关节炎(OA)或者剥脱性青光眼(XFG)的风险，其中，所述风险对于AD是基于rs4420638、对于CRC是基于rs6983267、对于OA是基于rs4911178和对于XFG是基于rs2165241；c) determining an individual's risk of acquiring Alzheimer's disease (AD), colorectal cancer (CRC), osteoarthritis (OA) or exfoliation glaucoma (XFG), wherein the risk for AD is based on rs4420638, for CRC is based on rs6983267, rs4911178 for OA and rs2165241 for XFG;

d)由c)生成报告；d) generate a report from c);

115.第102项所述的方法，其中，所述风险由至少3、4、5、6、7、8、9、10或11个SNP确定。115. The method of item 102, wherein the risk is determined by at least 3, 4, 5, 6, 7, 8, 9, 10, or 11 SNPs.

116.第102项所述的方法，其中，所述风险由至少2个SNP确定。116. The method of item 102, wherein the risk is determined by at least 2 SNPs.

117.第116项所述的方法，其中，所述风险是针对肥胖(BMIOB)并且所述至少2个SNP中的至少一个为rs9939609或rs9291171。117. The method of item 116, wherein said risk is for obesity (BMIOB) and at least one of said at least 2 SNPs is rs9939609 or rs9291171.

118.第116项所述的方法，其中，所述风险是针对格雷夫斯氏病(GD)并且所述至少2个SNP中的至少一个为rs3087243、DRB1*0301DQA1*0501或者与DRB1*0301DQA1*0501的连锁不平衡。118. The method of item 116, wherein said risk is for Graves' disease (GD) and at least one of said at least 2 SNPs is rs3087243, DRB1*0301DQA1*0501, or is associated with DRB1*0301DQA1* 0501's linkage disequilibrium.

119.第116项所述的方法，其中，所述风险是针对血色沉着症(HEM)并且所述至少2个SNP中的至少一个为rs1800562或者rs129128。119. The method of item 116, wherein said risk is for hemochromatosis (HEM) and at least one of said at least 2 SNPs is rs1800562 or rs129128.

120.第116项所述的方法，其中，所述风险是针对心肌梗死(MI)并且所述至少2个SNP中的至少一个为rs1866389、rs1333049或者rs6922269。120. The method of item 116, wherein the risk is for myocardial infarction (MI) and at least one of the at least 2 SNPs is rs1866389, rs1333049 or rs6922269.

121.第116项所述的方法，其中，所述风险是针对多发性硬化症(MS)并且所述至少2个SNP中的至少一个为rs6897932、rs12722489或者DRB1*1501。121. The method of item 116, wherein said risk is for multiple sclerosis (MS) and at least one of said at least 2 SNPs is rs6897932, rs12722489, or DRB1*1501.

i22.第116项所述的方法，其中，所述风险是针对牛皮癣(PS)并且所述至少2个SNP中的至少一个为rs6859018、rs11209026或者HLAC*0602。i22. The method of item 116, wherein said risk is for psoriasis (PS) and at least one of said at least 2 SNPs is rs6859018, rs11209026 or HLAC*0602.

123.第116项所述的方法，其中，所述风险是针对多动腿综合征(RLS)并且所述至少2个SNP中的至少一个为rs6904723、rs2300478、rs1026732或者rs9296249。123. The method of item 116, wherein said risk is for restless legs syndrome (RLS) and at least one of said at least 2 SNPs is rs6904723, rs2300478, rs1026732, or rs9296249.

124.第116项所述的方法，其中，所述风险是针对乳糜泻(CelD)并且所述至少2个SNP中的至少一个为rs6840978、rs11571315、rs2187668或者DQA1*0301DQB1*0302。124. The method of item 116, wherein the risk is for celiac disease (CelD) and at least one of the at least 2 SNPs is rs6840978, rs11571315, rs2187668, or DQA1*0301DQB1*0302.

125.第116项所述的方法，其中，所述风险是针对前列腺癌(PC)并且所述至少2个SNP中的至少一个为rs4242384、rs6983267、rs16901979、rs17765344或者rs4430796。125. The method of item 116, wherein the risk is for prostate cancer (PC) and at least one of the at least 2 SNPs is rs4242384, rs6983267, rs16901979, rs17765344, or rs4430796.

126.第116项所述的方法，其中，所述风险是针对狼疮(SLE)并且所述至少2个SNP中的至少一个为rs12531711、rs10954213、rs2004640、DRB1*0301或者DRB1*1501。126. The method of item 116, wherein said risk is for lupus (SLE) and at least one of said at least 2 SNPs is rs12531711, rs10954213, rs2004640, DRB1*0301 or DRB1*1501.

127.第116项所述的方法，其中，所述风险是针对黄斑变性(AMD)并且所述至少2个SNP中的至少一个为rs10737680、rs10490924、rs541862、rs2230199、rs1061170或者rs9332739。127. The method of item 116, wherein the risk is for macular degeneration (AMD) and at least one of the at least 2 SNPs is rs10737680, rs10490924, rs541862, rs2230199, rs1061170, or rs9332739.

128.第116项所述的方法，其中，所述风险是针对类风湿性关节炎(RA)并且所述至少2个SNP中的至少一个为rs6679677、rs11203367、rs6457617、DRB*0101、DRB1*0401或者DRB1*0404。128. The method of item 116, wherein said risk is for rheumatoid arthritis (RA) and at least one of said at least 2 SNPs is rs6679677, rs11203367, rs6457617, DRB*0101, DRB1*0401 Or DRB1*0404.

129.第116项所述的方法，其中，所述风险是针对乳腺癌(BC)并且所述至少2个SNP中的至少一个为rs3803662、rs2981582、rs4700485、rs3817198、rs17468277、rs6721996或者rs3803662。129. The method of item 116, wherein the risk is for breast cancer (BC) and at least one of the at least 2 SNPs is rs3803662, rs2981582, rs4700485, rs3817198, rs17468277, rs6721996, or rs3803662.

130.第116项所述的方法，其中，所述风险是针对克罗恩病(CD)并且所述至少2个SNP中的至少一个为rs2066845、rs5743293、rs10883365、rs17234657、rs10210302、rs9858542、rs11805303、rs1000113、rs17221417、rs2542151或者rs10761659。130. The method of item 116, wherein said risk is for Crohn's disease (CD) and at least one of said at least 2 SNPs is rs2066845, rs5743293, rs10883365, rs17234657, rs10210302, rs9858542, rs11805303, rs1000113, rs17221417, rs2542151 or rs10761659.

131.第116项所述的方法，其中，所述风险是针对2型糖尿病(T2D)并且所述至少2个SNP中的至少一个为rs13266634、rs4506565、rs10012946、rs7756992、rs10811661、rs12288738、rs8050136、rs1111875、rs4402960、rs5215或者rs1801282。131. The method of item 116, wherein said risk is for type 2 diabetes (T2D) and at least one of said at least 2 SNPs is rs13266634, rs4506565, rs10012946, rs7756992, rs10811661, rs12288738, rs8050136, rs1111875 , rs4402960, rs5215 or rs1801282.

附图说明Description of drawings

图1为举例说明本发明的方法方面的流程图。Figure 1 is a flow diagram illustrating the method aspects of the invention.

图2为基因组DNA质量控制措施的实例。Figure 2 is an example of genomic DNA quality control measures.

图3为杂交质量控制措施的实例。Figure 3 is an example of hybridization quality control measures.

图4为来自具有测试的SNP和效应评价的公开文献的典型基因型相关性的表。A-I)表示单个基因座的基因型相关性；J)表示两个基因座的基因型相关性；K)表示三个基因座的基因型相关性；L)为A-K中使用的种族和国家缩写的索引；M)为A-K中的表型名称缩写(ShortPhenotypeName)缩写的索引、遗传率和遗传率的参考文献。Figure 4 is a table of typical genotype correlations from published literature with tested SNPs and effect evaluations. A-I) Indicates the genotype correlation of a single locus; J) Indicates the genotype correlation of two loci; K) Indicates the genotype correlation of three loci; L) is the race and country abbreviation used in A-K Index; M) is the index of the abbreviated phenotype name (ShortPhenotypeName) in A-K, heritability and heritability reference.

图5A-J为具有效应评价的典型基因型相关性的表。Figures 5A-J are tables of typical genotype correlations with effect evaluations.

图6A-F为典型基因型相关性和估计的相对危险度的表。Figures 6A-F are tables of typical genotype correlations and estimated relative risks.

图7为示例报告。Figure 7 is a sample report.

图8为用于分析和通过网络传输基因组图谱和表型谱的系统的图解。Figure 8 is a diagram of a system for analyzing and transmitting genomic and phenotypic profiles over a network.

图9为举例说明本发明的商业方法方面的流程图。Figure 9 is a flow chart illustrating the business method aspects of the present invention.

图10：流行度(prevalence)评价对相对风险评估的效应。假定哈迪-温伯格平衡(Hardy-WeinbergEquilibrium)的情况下，各曲线对应于群体中等位基因频率的不同数值。两条黑线对应于9和6的优势比，两条红线对应于6和4的优势比，以及两条蓝线对应于3和2的优势比。Figure 10: Effect of prevalence assessment on relative risk assessment. Each curve corresponds to a different value of the allele frequency in the population, assuming a Hardy-Weinberg Equilibrium. The two black lines correspond to odds ratios of 9 and 6, the two red lines correspond to odds ratios of 6 and 4, and the two blue lines correspond to odds ratios of 3 and 2.

图11：等位基因频率评价对相对风险评估的效应。各曲线对应于群体中流行度的不同数值。两条黑线对应于9和6的优势比，两条红线对应于6和4的优势比，以及两条蓝线对应于3和2的优势比。Figure 11: Effect of allele frequency assessment on relative risk assessment. Each curve corresponds to a different value of prevalence in the population. The two black lines correspond to odds ratios of 9 and 6, the two red lines correspond to odds ratios of 6 and 4, and the two blue lines correspond to odds ratios of 3 and 2.

图12：不同模型的绝对值的配对比较。Figure 12: Pairwise comparison of absolute values of different models.

图13：基于不同模型的等级值(GCI评分)的配对比较。表2中给出了不同对之间的Spearman相关性。Figure 13: Pairwise comparison of rank values (GCI scores) based on different models. The Spearman correlations between different pairs are given in Table 2.

图14：流行度报告对GCI评分的效应。任何两个流行度值之间的Spearman相关性为至少0.99。Figure 14: Effect of popularity reporting on GCI scores. The Spearman correlation between any two popularity values is at least 0.99.

图15：为来自个人入口的示例网页的图。Figure 15: Diagram for an example web page from a personal portal.

图16：为说明个人患前列腺癌的风险的来自个人入口的示例网页的图。Figure 16: Diagram of an example web page from a personal portal illustrating a personal's risk of developing prostate cancer.

图17：为说明个人患克罗恩氏病的风险的来自个人入口的示例网页的图。Figure 17: Diagram of an example web page from a person portal illustrating a person's risk of developing Crohn's disease.

图18：为使用2个SNP的基于HapMAP的多发性硬化症的GCI评分的柱状图。Figure 18: Histogram of HapMAP-based GCI scores for multiple sclerosis using 2 SNPs.

图19：为使用GCIPlus的多发性硬化症的个体终生风险。Figure 19: Individual lifetime risk of multiple sclerosis using GCIPlus.

图20：为克罗恩氏病的GCI评分的柱状图。Figure 20: Histogram of GCI scores for Crohn's disease.

图21：为多基因座相关性的表。Figure 21: Table for multi-locus correlations.

图22：为SNP和表型相关性的表。Figure 22: Table for SNP and phenotype correlations.

图23：为表型和流行度的表。Figure 23: Table for phenotype and prevalence.

图24：为图21、22和25中缩写的词汇表。Figure 24: Glossary of abbreviations in Figures 21, 22 and 25.

图25：为SNP和表型相关性的表。Figure 25: Table for SNP and phenotype correlations.

具体实施方式detailed description

本发明提供基于个体或个体组的存储基因组图谱生成表型谱，以及基于存储的基因组图谱方便地生成原始的和更新的表型谱的方法和系统。通过由得自个体的生物样品确定基因型而生成基因组图谱。从个体获得的生物样品可以是可由其得到遗传样品的任何样品。样品可以来自口腔拭子、唾液、血液、头发或者任何其它类型的组织样品。然后可以由生物样品确定基因型。基因型可以是任何遗传性变型或者生物标志物，例如，单核苷酸多态性(SNPs)、单体型(haplotype))或者基因组的序列。基因型可以是个体的全部基因组序列。基因型可以由产生数千或者数百万的数据点的高流通量分析得到，例如，用于大多数或所有已知SNP的微阵列分析。在其它实施方式中，基因型也可以由高流通量测序确定。The present invention provides methods and systems for generating phenotype profiles based on stored genome profiles of individuals or groups of individuals, and conveniently generating original and updated phenotype profiles based on stored genome profiles. A genomic profile is generated by genotype determination from a biological sample obtained from an individual. A biological sample obtained from an individual can be any sample from which a genetic sample can be obtained. Samples can be from buccal swabs, saliva, blood, hair, or any other type of tissue sample. The genotype can then be determined from the biological sample. A genotype can be any genetic variant or biomarker, eg, single nucleotide polymorphisms (SNPs), haplotypes, or sequence of the genome. A genotype may be the entire genome sequence of an individual. Genotypes can be derived from high-throughput analysis generating thousands or millions of data points, eg, microarray analysis for most or all known SNPs. In other embodiments, genotypes can also be determined by high throughput sequencing.

基因型形成个体的基因组图谱。基因组图谱进行数字存储并且很容易在任何时间点进行访问以生成表型谱。通过应用使基因型与表型相关联或结合的规则生成表型谱。规则可以基于表明基因型与表型之间的相关性的科学研究制定。相关性可以由一个或多个专家组成的委员会进行验证(curate)或者确认。通过将规则应用于个体的基因组图谱，可以确定个体的基因型和表型之间的关联。个体的表型谱将具有这种确定性。该确定可以是个体的基因型与给定的表型之间的正相关，从而该个体具有给定的表型或者将产生该表型。或者，可以确定个体不具有或者将不产生给定的表型。在其它实施方式中，该确定可以是危险因子、估计值或者个体具有或将产生表型的概率。The genotype forms the genome map of an individual. Genomic profiles are digitally stored and easily accessed at any point in time to generate phenotypic profiles. Phenotype profiles are generated by applying rules that correlate or combine genotypes with phenotypes. Rules can be developed based on scientific studies showing correlations between genotypes and phenotypes. Correlations may be curated or confirmed by a committee of one or more experts. By applying rules to an individual's genomic profile, associations between an individual's genotype and phenotype can be determined. An individual's phenotypic profile will have this certainty. The determination may be a positive correlation between an individual's genotype and a given phenotype, such that the individual has or will develop the given phenotype. Alternatively, it can be determined that an individual does not have or will not develop a given phenotype. In other embodiments, the determination may be a risk factor, estimate, or probability that the individual has or will develop the phenotype.

可以基于多种规则进行确定，例如，可以将多种规则应用于基因组图谱以确定个体基因型与特定表型的关联。确定过程也可以包括特定于个体的因素，例如种族、性别、生活方式(例如，饮食和锻炼习惯)、年龄、环境(例如，居住位置)、家族病史、个人病史和其它已知表型。特定因素的并入可以通过修正现有的规则来包括这些因素。或者，可以由这些因素生成单独的规则并且在已经应用现有规则之后将其应用于个体的表型确定。The determination can be based on a variety of rules, for example, a variety of rules can be applied to a genomic profile to determine the association of an individual's genotype with a particular phenotype. The determination process can also include individual-specific factors such as race, gender, lifestyle (eg, eating and exercise habits), age, environment (eg, location of residence), family medical history, personal medical history, and other known phenotypes. The incorporation of specific factors can be done by amending existing rules to include these factors. Alternatively, separate rules can be generated from these factors and applied to the phenotype determination of an individual after existing rules have been applied.

表型可以包括任何可测定的性状或者特性，例如对于某种疾病的易感性或者对于药物治疗的反应。可以包括的其它表型是躯体和精神性状，例如，身高、体重、头发颜色、眼睛颜色、晒斑敏感性、尺码、记忆力、智力、乐观程度、整体性情。表型也可以包括与其他个体或生物体的遗传比较。例如，个体可能对他们的基因组图谱与名人的基因组图谱之间的相似性感兴趣。他们也可能使他们的基因图谱与其它有机体(例如细菌、植物或其它动物)进行比较。A phenotype can include any measurable trait or characteristic, such as susceptibility to a disease or response to drug treatment. Other phenotypes that may be included are physical and mental traits such as height, weight, hair color, eye color, sunburn sensitivity, size, memory, intelligence, optimism, general temperament. Phenotypes can also include genetic comparisons to other individuals or organisms. For example, an individual may be interested in the similarity between their genome profile and that of a famous person. They may also compare their genetic profile with other organisms such as bacteria, plants or other animals.

总之，对于个体所确定的相关表型的集合组成该个体的表型谱。表型谱可以通过在线入口访问。或者，表型谱可以按照在特定时间存在的形式以纸件形式提供，后续的更新也以纸件形式提供。表型谱也可以通过在线入口提供。该在线入口可以任选地为加密的在线入口。表型谱的访问权可以提供给注册用户，该注册用户为订制生成表型与基因型之间的相关性的规则、确定个体的基因组图谱、将规则应用于基因组图谱和生成个体的表型谱的服务的个体。访问权也可以提供给非注册用户，其中他们可以具有访问他们的表型谱和／或报告的有限权限，或者可以允许生成初始报告或表型谱，但是只有通过付费订制才生成更新的报告。保健管理者和提供者，例如护理人员、医生和遗传顾问也可以具有表型谱的访问权。Taken together, the collection of related phenotypes determined for an individual constitutes that individual's phenotypic profile. Phenotype profiles can be accessed through an online portal. Alternatively, the phenotype profile may be provided in paper form as it existed at a particular time, with subsequent updates also provided in paper form. Phenotype profiles are also available through the online portal. The online portal may optionally be an encrypted online portal. Access to phenotype profiles may be provided to registered users who order rules for generating correlations between phenotypes and genotypes, determine an individual's genomic profile, apply rules to a genomic profile, and generate an individual's phenotype spectrum of individuals served. Access may also be provided to non-registered users, where they may have limited access to their phenotype profiles and/or reports, or may allow generation of initial reports or phenotype profiles, but updated reports only through a paid subscription . Healthcare managers and providers such as nurses, physicians, and genetic counselors may also have access to phenotypic profiles.

在本发明的另一方面中，可以为注册用户和非注册用户生成基因组图谱，并且进行数字存储，但是对于表型谱和报告的访问可以限于注册用户。在另一变型中，注册用户和非注册用户都可以访问其基因型和表型谱，但是非注册用户具有受限制的访问权限或者允许生成有限的报告，然而注册用户具有完整的访问权限并且可以允许生成完整报告。在另一实施方式中，注册用户和非注册用户最初可以具有完全的访问权限或者完整的初始报告，但仅注册用户可以访问基于其存储的基因组图谱更新的报告。In another aspect of the invention, genomic profiles can be generated and digitally stored for both registered and non-registered users, but access to phenotypic profiles and reports can be limited to registered users. In another variation, both registered and non-registered users have access to their genotype and phenotype profiles, but non-registered users have restricted access or are allowed to generate limited reports, whereas registered users have full access and can Allows full report generation. In another embodiment, registered and non-registered users may initially have full access or complete initial reports, but only registered users may have access to updated reports based on their stored genomic profiles.

在本发明的另一方面中，组合并分析了关于多种遗传标记与一种或多种疾病或状态的关联的信息以获得遗传综合指数(geneticcompositeindex)(GCI)评分。这一评分包括了已知的危险因子以及其它信息和假设，例如，等位基因频率和疾病的流行度。GCI可以用于定量评估疾病或者状态与一系列遗传标记的综合效应的关联。GCI评分可以用于基于现有科学研究向未受过遗传学训练的人提供有关与相关群体相比其个体患病风险的可靠的(例如，稳固的)、可理解的和／或直观的认识。GCI评分可以用于生成GCIPlus评分。GCIPlus评分可以包括所有GCI假设，该假设包括状态的风险(例如，终生风险)、年龄限定的流行度和／或年龄限定的发病率。然后个体的终生风险可以计算为与个体GCI评分除以平均GCI评分成比例的GCIPlus评分。平均GCI评分可以由具有相似家系背景的个体组确定，例如一组高加索人、亚洲人、东印度人或者其他具有共同家系背景的组。所述组可以由至少5、10、15、20、25、30、35、40、45、50、55或60个个体组成。在某些实施方式中，平均GCI评分可以由至少75、80、95或100个个体确定。GCIPlus评分可以通过确定个体的GCI评分，用平均相对风险去除该GCI评分，并乘以状态或表型的终生风险来确定。例如，使用来自图22和／或图25的数据以及图24中的信息计算GCIPlus评分，例如图19中。In another aspect of the invention, information on the association of multiple genetic markers with one or more diseases or conditions is combined and analyzed to obtain a genetic composite index (GCI) score. This score incorporates known risk factors as well as other information and assumptions, such as allele frequencies and disease prevalence. GCI can be used to quantitatively assess the association of a disease or state with the combined effect of a set of genetic markers. The GCI score can be used to provide a person untrained in genetics with a reliable (eg, robust), comprehensible, and/or intuitive view of their individual risk of disease compared to a relevant population based on existing scientific research. GCI scores can be used to generate GCIPlus scores. The GCIPlus score can include all GCI assumptions including risk (eg, lifetime risk), age-defined prevalence, and/or age-defined incidence by status. An individual's lifetime risk can then be calculated as a GCIPlus score proportional to the individual GCI score divided by the mean GCI score. The mean GCI score can be determined from a group of individuals with similar familial background, such as a group of Caucasians, Asians, East Indians, or other groups with a common familial background. The group may consist of at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55 or 60 individuals. In certain embodiments, the mean GCI score can be determined from at least 75, 80, 95 or 100 individuals. The GCIPlus score can be determined by determining an individual's GCI score, dividing that GCI score by the mean relative risk, and multiplying by the lifetime risk for the state or phenotype. For example, a GCIPlus score, such as in FIG. 19 , is calculated using data from FIG. 22 and/or FIG. 25 and information in FIG. 24 .

本发明包括使用在此描述的GCI评分，并且本领域技术人员很容易认识到GCIPlus评分或其变型取代在此描述的GCI评分的用途。The invention includes the use of the GCI score described herein, and those skilled in the art will readily recognize the use of the GCIPlus score or variations thereof in place of the GCI score described herein.

在一个实施方式中，对于各感兴趣的疾病或者状态生成GCI评分。可以集中这些GCI评分以形成个体的风险分布图(riskprofile)。可以对该GCI评分进行数字存储以便它们可以在任何时间点方便地进行访问以生成风险分布图。风险分布图可以按照大的疾病分类进行分解，例如，癌症、心脏病、代谢紊乱、精神紊乱、骨病或者老年病(ageon-setdisorder)。大的疾病分类可以进一步被分解成子类。例如，对于如癌症的大的分类，可以例如按类型(肉瘤、癌瘤或者白血病等)或者按组织特异性(神经、乳腺、卵巢、睾丸、前列腺、骨、淋巴结、胰腺、食道、胃、肝、脑、肺、肾等)列出癌症的子类。In one embodiment, a GCI score is generated for each disease or condition of interest. These GCI scores can be aggregated to form an individual's risk profile. The GCI scores can be stored digitally so that they can be easily accessed at any point in time to generate a risk profile. The risk profile can be broken down by broad disease categories, eg, cancer, heart disease, metabolic disorder, mental disorder, bone disease, or ageon-set disorder. Large disease categories can be further broken down into subcategories. For example, for a large classification like cancer, one can e.g. by type (sarcoma, carcinoma or leukemia, etc.) or by tissue specificity (nerve, breast, ovary, testis, prostate, bone, lymph node, pancreas, esophagus, stomach, liver , brain, lung, kidney, etc.) lists the subcategories of cancer.

在另一实施方式中，生成个体的GCI评分，其提供容易理解的关于个体获得至少一种疾病或状态的风险或对于至少一种疾病或状态的易感性的信息。在一个实施方式中，对于不同的疾病或状态生成多项GCI评分。在另一实施方式中，可以通过在线入口访问至少一项GCI评分。或者，可以以纸件形式提供至少一项GCI评分，后续的更新也以纸件形式提供。在一个实施方式中，向注册用户提供对于至少一项GCI评分的访问，该注册用户是预订服务的个体。在一个替代的实施方式中，向非注册用户提供访问权限，其中他们可以具有访问他们的GCI评分中的至少一项的受限的访问权限，或者他们可以允许生成他们的GCI评分中的至少一项的初始报告，但是仅通过付费订制才生成更新的报告。在另一实施方式中，保健管理者和提供者，例如护理人员、医生和遗传顾问，也可以具有访问个体GCI评分中的至少一项的权限。In another embodiment, an individual's GCI score is generated that provides readily understandable information about the individual's risk of acquiring or susceptibility to at least one disease or condition. In one embodiment, multiple GCI scores are generated for different diseases or states. In another embodiment, at least one GCI score can be accessed through an online portal. Alternatively, at least one GCI score may be provided in paper form, with subsequent updates also provided in paper form. In one embodiment, access to at least one GCI score is provided to a registered user who is an individual subscribing to the service. In an alternative embodiment, access is provided to non-registered users, where they may have limited access to at least one of their GCI scores, or they may allow at least one of their GCI scores to be generated Initial reports for items, but updated reports are only generated through paid subscriptions. In another embodiment, healthcare managers and providers, such as nursing staff, physicians, and genetic counselors, may also have access to at least one of an individual's GCI score.

这里也可以有基本注册模式。基本注册可以提供表型谱，其中注册用户可以选择将所有现有规则应用于他们的基因组图谱，或者将现有规则的子集应用于他们的基因组图谱。例如，他们可以选择仅应用可处置(actionable)的疾病表型的规则。基本注册可以在注册等级内具有不同水平。例如，不同的水平可以取决于注册用户想要与他们的基因组图谱关联的表型数目，或者取决于可以访问他们的表型谱的人员的数目。基本注册的另一水平可以将特定于个体的因素，例如早已知道的表型(如年龄、性别或者病史)并入他们的表型谱。基本注册的再另一个水平可以允许个体生成对于疾病或状态的至少一项GCI评分。如果由于用于生成至少一项GCI评分的分析中的变化而导致至少一项GCI评分的任何变化，这一水平的变型形式可以进一步允许个体指定生成对于疾病或者状态的至少一项GCI评分的自动更新。在一些实施方式中，可以通过电子邮件、语音信息、文本信息、邮递或传真向个体通告自动更新。There can also be a basic registration pattern here. Basic registration can provide phenotype profiles, where registered users can choose to apply all existing rules to their genomic profile, or a subset of existing rules to their genomic profile. For example, they may choose to only apply rules for actionable disease phenotypes. Basic registration can have different levels within the registration class. For example, different levels may depend on the number of phenotypes a registered user wants to associate with their genomic profile, or on the number of persons who have access to their phenotypic profile. Another level of basic registration could incorporate individual-specific factors, such as already known phenotypes such as age, sex, or medical history, into their phenotypic profile. Yet another level of basic registration may allow an individual to generate at least one GCI score for a disease or condition. A variant of this level may further allow an individual to specify the automatic generation of at least one GCI score for a disease or state if any change in the at least one GCI score is due to a change in the analysis used to generate the at least one GCI score. renew. In some embodiments, individuals may be notified of automatic updates by email, voice message, text message, postal mail, or facsimile.

注册用户也可以生成具有他们的表型谱以及关于表型的信息(例如关于表型的遗传和医疗信息)的报告。例如，报告中可以包括群体中表型的流行度、用于相关性的遗传性变型、引起表型的分子机制、对于表型的治疗方法、对于表型的治疗选择和预防性行动。在其它实施方式中，报告还可以包括例如个体的基因型与其他个体(如名人或者其他知名人士)的基因型之间的相似性的信息。关于相似性的信息可以是，但不限于同源性百分比、相同变异的数目和可能相似的表型。这些报告可以进一步包括至少一项GCI评分。Registered users can also generate reports with their phenotype profile and information about the phenotype, such as genetic and medical information about the phenotype. For example, the prevalence of the phenotype in the population, genetic variants for association, molecular mechanisms causing the phenotype, treatments for the phenotype, treatment options for the phenotype, and preventive actions may be included in the report. In other embodiments, the report may also include information such as similarity between the individual's genotype and the genotype of other individuals, such as celebrities or other well-known persons. Information about similarity can be, but is not limited to, percent homology, number of identical variants, and possibly similar phenotypes. These reports may further include at least one GCI score.

如果在线访问报告，则报告也可以提供连接到具有关于表型的进一步信息的其他位置的链接、连接到具有相同表型或者一个或多个相似表型的人的在线支持小组和留言板的链接、联系在线遗传顾问或医生的链接或者连接到安排遗传顾问或医师的电话或现场预约的链接。如果报告是纸件形式，则信息可以是上述链接的站点位置或者遗传顾问或医生的电话号码和地址。注册用户也可以选择哪些表型包括在他们的表型谱中和哪些信息包括在他们的报告中。表型谱和报告也可以被个体的保健管理者或提供者取得，例如护理人员、医生、精神病医生、心理学家、治疗专家或者遗传顾问。注册用户也能够选择是否表型谱和报告或者其部分内容由个人的保健管理者或提供者得到。If the report is accessed online, the report may also provide links to other locations with further information about the phenotype, links to online support groups and message boards of people with the same phenotype or one or more similar phenotypes , a link to contact an online genetic counselor or physician, or a link to schedule a telephone or on-site appointment with a genetic counselor or physician. If the report is in paper form, the information could be the site location linked above or the phone number and address of the genetic counselor or doctor. Registered users can also choose which phenotypes to include in their phenotype profiles and which information to include in their reports. Phenotype profiles and reports can also be obtained by an individual's healthcare manager or provider, such as a nurse, physician, psychiatrist, psychologist, therapist, or genetic counselor. Registered users can also choose whether phenotype profiles and reports, or portions thereof, are obtained by the individual's healthcare administrator or provider.

本发明也可以包括注册的高级水平(premiumlevel)。注册的高级水平在生成初始表型谱和报告之后数字化地保持其基因组图谱，并且注册用户能够利用由最近的研究得到的更新的相关性生成表型谱和报告。在另一实施方式中，注册用户能够利用由最近的研究得到的更新的相关性生成风险分布图和报告。由于研究揭示出基因型与表型、疾病或者状态之间的新的相关性，基于这些新的相关性将产生新的规则，并且新的规则能够应用于已经存储和保持的基因组图谱。新的规则可以关联先前未与任何表型关联的基因型、使基因型与新的表型相关联、修正现有的相关性或者基于新发现的基因型与疾病或状态之间的关联提供调整GCI评分的基础。可以通过电子邮件或者其它电子方式告知注册用户新的相关性，并且如果是感兴趣的表型，他们可以选择用新的相关性更新他们的表型谱。注册用户可以选择为每次更新付费、为在指定时间期限(例如，3个月、6个月或者1年)内的多次更新或无限次更新付费的注册方式。另一注册水平可以是，无论何时基于新的相关性产生了新的规则，注册用户使他们的表型谱或者风险分布图自动地更新，而不是个体选择何时更新他们的表型谱或风险分布图。The invention may also include a premium level of registration. Registered premium levels digitally maintain their genomic profiles after initial phenotypic profiles and reports are generated, and registered users are able to generate phenotype profiles and reports with updated correlations from recent studies. In another embodiment, registered users are able to generate risk profiles and reports with updated correlations from recent research. As research reveals new correlations between genotypes and phenotypes, diseases or states, new rules will be generated based on these new correlations and can be applied to already stored and maintained genomic profiles. New rules can correlate genotypes not previously associated with any phenotype, correlate genotypes with new phenotypes, revise existing correlations, or provide adjustments based on newly discovered associations between genotypes and diseases or conditions Basis for GCI Scoring. Registered users can be notified of new correlations by email or other electronic means, and they can choose to update their phenotype profile with the new correlations if the phenotype is of interest. Registered users can choose to pay per update, multiple updates within a specified time period (eg, 3 months, 6 months, or 1 year) or unlimited updates. Another level of registration could be that registered users have their phenotype profile or risk profile automatically updated whenever new rules are generated based on new correlations, rather than individuals choosing when to update their phenotype profile or risk profile. Risk profile.

在注册的另一方面，注册用户可以向非注册用户介绍以下服务：生成表型与基因型之间的相关性规则，确定个体的基因组图谱，将规则应用于基因组图谱，并且生成个体的表型谱。注册用户通过介绍可以使注册用户提到优惠的服务订制价格或者使其现有的注册升级。被介绍的个体可以在有限时间内免费访问或者享受折扣注册费用。On the other side of registration, registered users can introduce the following services to non-registered users: Generate correlation rules between phenotypes and genotypes, determine an individual's genomic profile, apply the rules to the genomic profile, and generate an individual's phenotype Spectrum. Registered users can refer registered users to preferential service customization prices or upgrade their existing registrations through introductions. Referral individuals may receive free access for a limited time or a discounted registration fee.

可以对于人类和非人类个体生成表型谱和报告以及风险分布图和报告。例如，个体可以包括其它哺乳动物，例如牛、马、羊、犬或者猫。如在此所使用的，注册用户是通过购买或支付一项或多项服务而订制服务的人类个体。服务可以包括，但不限于以下一种或者多种：确定他们自己或另一个体(例如注册用户的孩子或宠物)的基因组图谱；获得表型谱；更新表型谱和获得基于他们的基因组图谱和表型谱的报告。Phenotype profiles and reports and risk profile maps and reports can be generated for human and non-human individuals. For example, a subject can include other mammals such as cows, horses, sheep, dogs, or cats. As used herein, a Registered User is a human individual who subscribes to the Services by purchasing or paying for one or more Services. Services may include, but are not limited to, one or more of the following: determining a genomic profile of themselves or another individual (such as a registered user's child or pet); obtaining a phenotypic profile; updating a phenotypic profile and obtaining a genomic profile based on them and report on phenotype profiles.

在本发明的另一方面中，可以从个体聚集得出“区域部署(field-deployed)”机制以生成个体的表型谱。在优选实施方式中，个体可以具有基于遗传信息生成的初始表型谱。例如，生成包括对于不同表型的危险因子以及建议的治疗或预防措施的初始表型谱。例如，表型谱可以包括对于关于某一状态的可利用的药物治疗的信息和／或对于饮食变化或锻炼方案的建议。个体可以选择去看医生或遗传顾问或者通过网络入口或电话接触医生或遗传顾问以讨论他们的表型谱。个体可以决定采取某种行动路线，例如，采用特定的药物治疗、改变他们的饮食等。In another aspect of the invention, a "field-deployed" mechanism can be derived from individual aggregations to generate individual phenotypic profiles. In preferred embodiments, an individual may have an initial phenotypic profile generated based on genetic information. For example, an initial phenotype profile including risk factors for different phenotypes and suggested treatment or preventive measures is generated. For example, a phenotype profile may include information on available drug treatments for a certain state and/or recommendations for dietary changes or exercise regimens. Individuals may choose to see or contact a physician or genetic counselor through a web portal or telephone to discuss their phenotypic profile. Individuals may decide to take a certain course of action, for example, take a particular drug treatment, change their diet, etc.

而后，个体可以随后提交生物样品以评估其身体状态的变化和危险因子的可能变化。个体可以通过直接将生物样品提交给生成基因组图谱和表型谱的机构(或者相关机构，例如由生成遗传分布图和表型谱的实体定约的机构)确定该变化。或者，个体可以利用“区域部署”机制，其中个体可以将他们的唾液、血液或者其它生物样品提交到其家庭处的检测装置中，由第三方进行分析，且数据经传输以包括在另一表型谱中。例如，个体可以接收基于其遗传数据的初始表型报告从而向具有增大的心肌梗死(MI)终生风险的个体报告。该报告也可以具有预防措施的建议以降低MI的风险，例如降胆固醇药物和饮食改变。个体可以选择接触遗传顾问或医生以讨论该报告和预防措施并且决定改变他们的饮食。在采用新的饮食一段时间之后，个体可以去看他们的个人医生以测量其胆固醇水平。可以将新的信息(胆固醇水平)传送(例如，通过Internet)给具有基因组信息的实体，并且新的信息用于生成个体的新的表型谱，以及心肌梗死和／或其它状态的新的危险因子。Individuals can then subsequently submit biological samples to assess changes in their physical state and possible changes in risk factors. Individuals can determine this change by submitting a biological sample directly to the agency that generated the genomic and phenotypic profiles (or a related agency, such as an agency contracted by the entity that generated the genetic profile and phenotypic profile). Alternatively, individuals may utilize a "regional deployment" mechanism in which individuals may submit their saliva, blood, or other biological samples to a testing device at their home, be analyzed by a third party, and the data transmitted for inclusion in another table. type spectrum. For example, an individual may receive an initial phenotypic report based on their genetic data reporting an increased lifetime risk of myocardial infarction (MI) to the individual. The report may also have recommendations for preventive measures to reduce the risk of MI, such as cholesterol-lowering drugs and dietary changes. Individuals may choose to contact a genetic counselor or physician to discuss the report and preventive measures and decide to change their diet. After a period of time on the new diet, individuals may visit their personal physician to have their cholesterol levels measured. New information (cholesterol levels) can be communicated (e.g., via the Internet) to entities with genomic information and used to generate new phenotypic profiles of individuals, as well as new risks for myocardial infarction and/or other conditions factor.

个体也可以使用“区域部署”机制或者直接机制以确定其对于具体药物治疗的个体反应。例如，个体可以测量其对于药物的反应，并且该信息可以用于确定更有效的治疗。可测定的信息包括，但不限于代谢产物水平、葡萄糖水平、离子水平(例如，钙、钠、钾、铁)、维生素、血细胞计数、体重指数(BMI)、蛋白质水平、转录物水平、心率等，这些信息能够通过容易利用的方法确定并且能够包括在算法中以与初始基因组图谱结合来确定修正的整体风险评估评分。Individuals may also use "regional deployment" mechanisms or direct mechanisms to determine their individual response to specific drug treatments. For example, an individual can measure their response to a drug, and this information can be used to determine more effective treatment. Information that can be measured includes, but is not limited to, metabolite levels, glucose levels, ion levels (eg, calcium, sodium, potassium, iron), vitamins, blood counts, body mass index (BMI), protein levels, transcript levels, heart rate, etc. , this information can be determined by readily available methods and can be included in an algorithm to be combined with the initial genomic profile to determine a revised overall risk assessment score.

术语“生物样品”是指任何能够从个体分离的生物样品，其包括可以从中分离遗传物质的样品。正如在这里所使用的，“遗传样品”是指从个体得到的或源自个体的DNA和／或RNA。The term "biological sample" refers to any biological sample capable of being isolated from an individual, including a sample from which genetic material can be isolated. As used herein, "genetic sample" refers to DNA and/or RNA obtained from or derived from an individual.

正如这里所使用的，术语“基因组”用来表示在人体细胞的细胞核中发现的整套染色体DNA。术语“基因组DNA”是指自然存在于人体细胞的细胞核中的一个或多个染色体DNA分子，或者染色体DNA分子的一部分。As used herein, the term "genome" is used to refer to the entire set of chromosomal DNA found in the nucleus of a human cell. The term "genomic DNA" refers to one or more chromosomal DNA molecules, or a portion of a chromosomal DNA molecule, naturally present in the nucleus of a human cell.

术语“基因组图谱”是指关于个体基因的一组信息，例如特定SNP或突变是否存在。基因组图谱包括个体的基因型。基因组图谱也可以是个体的基本完整基因组序列。在一些实施方式中，基因组图谱可以是个体完整基因组序列的至少60％、80％或95％的。基因组图谱可以是大约100％的个体完整基因组序列。在说到基因组图谱时，“其一部分”是指全基因组的基因组图谱的子集的基因组图谱。The term "genomic profile" refers to a set of information about an individual's genes, such as the presence or absence of a particular SNP or mutation. A genomic profile includes an individual's genotype. A genomic profile may also be a substantially complete genome sequence of an individual. In some embodiments, the genomic map can be at least 60%, 80%, or 95% of an individual's complete genome sequence. A genome map can be approximately 100% of an individual's complete genome sequence. "A portion thereof" when referring to a genome map refers to a genome map of a subset of the genome map of the whole genome.

术语“基因型”是指个体DNA的特定遗传组成。基因型可以包括个体的遗传性变型和遗传标记。遗传标记和遗传性变型可以包括核苷酸重复、核苷酸插入、核苷酸缺失、染色体易位、染色体重复或者拷贝数变异。拷贝数变异可以包括微卫星重复、核苷酸重复、着丝粒重复或者端粒重复。基因型也可以是SNP、单体型或者双体型(diplotype)。单体型可以指基因座或者等位基因。单体型也可以称为统计学上关联的单个染色单体上的一组单核苷酸多态性(SNP)。双体型为一组单体型。The term "genotype" refers to the specific genetic makeup of an individual's DNA. Genotypes can include genetic variants and genetic markers of an individual. Genetic markers and genetic variants may include nucleotide duplications, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations. Copy number variations may include microsatellite repeats, nucleotide repeats, centromeric repeats, or telomeric repeats. Genotypes can also be SNPs, haplotypes or diplotypes. Haplotypes can refer to loci or alleles. A haplotype can also be referred to as a statistically associated set of single nucleotide polymorphisms (SNPs) on a single chromatid. A diplotype is a group of haplotypes.

术语单核苷酸多态性或者“SNP”是指在染色体上相对于存在于人类种群中一基因座上的含氮胆碱的同一性表现出变异(例如至少1个百分点(1％))的特定基因座。例如，在一个个体在给定基因的特定核苷酸位置上可能具有腺苷(A)的情况下，另一个体可能在这一位置上有胞嘧啶(C)、鸟嘌呤(G)或者胸腺嘧啶(T)，从而在这个特定位置上存在SNP。The term single nucleotide polymorphism or "SNP" refers to an expression of variation (eg, at least 1 percent point (1%)) on a chromosome relative to the identity of nitrogenous choline at a locus present in the human population. specific gene loci. For example, where one individual may have adenosine (A) at a particular nucleotide position in a given gene, another individual may have cytosine (C), guanine (G), or thymus at that position pyrimidine (T), so that the SNP is present at this specific position.

正如在这里所使用的，术语“SNP基因组分布图”是指整个个体全基因组DNA序列的SNP位置上给定的个体DNA的碱基含量。“SNP分布图”是指完整的基因组分布图，或者是指其一部分，例如可能与特定基因或者特定的一组基因有关的更局部的SNP分布图。As used herein, the term "SNP genomic profile" refers to the base content of a given individual's DNA at SNP positions throughout the entire genome DNA sequence of the individual. "SNP profile" refers to the complete genome profile, or to a portion thereof, such as a more localized SNP profile that may be associated with a particular gene or a particular group of genes.

术语“表型”用于描述个体的定量性状或者特征。表型包括，但不限于医学和非医学状态。医学状态包括疾病和紊乱。表型也可以包括身体性状，例如发色、如肺容量的生理性状、如记忆保持的精神性状、如愤怒控制能力的情绪性状、如种族背景的种族特征、如个体出身位置的家系特征以及如年龄期待或不同表型的发病年龄的年龄特征。表型也可以是单基因的，其中据认为一个基因可能与表型相关联；或者是多基因的，其中一个以上的基因与表型相关联。The term "phenotype" is used to describe a quantitative trait or characteristic of an individual. Phenotypes include, but are not limited to medical and non-medical states. Medical states include diseases and disorders. Phenotypes can also include physical traits such as hair color, physical traits such as lung capacity, mental traits such as memory retention, emotional traits such as anger control, ethnic traits such as ethnic background, familial traits such as an individual's place of birth, and traits such as Age profile of age expectation or age of onset for different phenotypes. Phenotypes can also be monogenic, where it is thought that one gene may be associated with the phenotype, or polygenic, where more than one gene is associated with the phenotype.

“规则”用于定义基因型与表型之间的相关性。规则可以通过数值定义相关性，例如通过百分率、危险因子或者置信度评分。规则可以包括多个基因型与表型的相关性。“规则集”包括一个以上的规则。“新规则”可以是表明其规则目前尚不存在的基因型与表型之间的相关性的规则。新规则可以将未关联的基因型与表型相关联。新规则也可以将已经与表型相关联的基因型与先前不关联的表型相关联。“新规则”也可以是由其它因素(包括另一规则)修正的现有规则。现有规则可以由于个体的已知特征，例如种族、家系、地理、性别、年龄、家族史或其它先前确定的表型，而进行修正。"Rules" are used to define correlations between genotypes and phenotypes. Rules can define correlations numerically, such as percentages, risk factors, or confidence scores. Rules can include multiple genotype-phenotype correlations. A "rule set" includes more than one rule. A "new rule" may be a rule indicating a correlation between a genotype and a phenotype for which a rule does not currently exist. New rules can link unlinked genotypes to phenotypes. New rules can also associate genotypes already associated with phenotypes with previously unassociated phenotypes. A "new rule" may also be an existing rule modified by other factors, including another rule. Existing rules can be modified due to known characteristics of the individual, such as race, ancestry, geography, gender, age, family history, or other previously determined phenotypes.

如在此所使用的，“基因型相关性”指个体基因型(例如某一突变或多个突变的存在)之间的统计相关性，以及倾向于发生一种表型(例如特定疾病、状态、身体状态和／或精神状态)的可能性。在特定基因型存在下观察到特定表型的频率决定了基因型相关性的程度或者出现特定的表型的可能性。例如，正如在此所详述的，导致载脂蛋白E4同种型的SNP与诱发早发型阿尔茨海默氏病相关。基因型相关性也可以指其中不倾向于产生表型的相关性或者负相关性。基因型相关性也可以表示个体具有表型或者倾向于发生表型的评估。可以由数值表示基因型相关性，例如百分数、相对风险因子、效应评价或者置信度评分。As used herein, "genotype correlation" refers to the statistical correlation between an individual's genotype (e.g., the presence of a mutation or mutations), and the predisposition to a phenotype (e.g., a particular disease, state , physical state and/or mental state). The frequency with which a particular phenotype is observed in the presence of a particular genotype determines the degree of genotype correlation or the likelihood that a particular phenotype will occur. For example, as detailed herein, a SNP leading to the E4 isoform of apolipoprotein is associated with the induction of early-onset Alzheimer's disease. A genotype correlation can also refer to a correlation or a negative correlation in which a phenotype is not predisposed. Genotype correlation can also represent an assessment that an individual has a phenotype or is predisposed to develop a phenotype. Genotype correlations can be represented by numerical values, such as percentages, relative risk factors, effect estimates, or confidence scores.

术语“表型谱”是指与个体的一个基因型或者多个基因型相关的多个表型的集合。表型谱可以包括通过将一条或多条规则应用于基因组图谱所产生的信息或者有关应用于基因组图谱的基因型相关性的信息。可以通过应用多个基因型与表型关联的规则生成表型谱。概率或评估可以表示为数值，例如百分数、数字的危险因子或者数字的置信区间。概率也可以表示为高、中或低。表型谱也可以表明表型是否存在或者产生表型的风险。例如，表型谱可以表明蓝眼睛的存在或者发生糖尿病的高风险。表型谱也可以表明预测的预后、治疗效果或者对医学状态的治疗的反应。The term "phenotype profile" refers to the collection of phenotypes associated with a genotype or genotypes of an individual. A phenotypic profile may include information generated by applying one or more rules to a genomic profile or information about genotype correlations applied to a genomic profile. Phenotype profiles can be generated by applying multiple genotype-phenotype association rules. A probability or estimate can be expressed as a numerical value, such as a percentage, a numerical risk factor, or a numerical confidence interval. Probability can also be expressed as high, medium or low. Phenotype profiles can also indicate the presence or risk of a phenotype. For example, a phenotypic profile could indicate the presence of blue eyes or a high risk of developing diabetes. Phenotypic profiles can also indicate predicted prognosis, treatment efficacy, or response to treatment of a medical condition.

术语风险分布图是指对于一种以上的疾病或状态的GCI评分的集合。GCI评分基于对个体基因型与一种或多种疾病或状态之间的关联的分析。风险分布图可以显示按疾病分类分组的GCI评分。进一步，风险分布图可以显示如何随个体年龄或者多种危险因子的调整而预测GCI评分的变化的信息。例如，对于特定疾病的GCI评分可以考虑饮食变化或者采取的预防措施(停止吸烟、服药、双侧根治性乳房切除术、子宫切除术)的效应。GCI评分可以显示为数值计量、图形显示、听觉反馈或者任何前述方式的组合。The term risk profile refers to the collection of GCI scores for more than one disease or condition. The GCI score is based on the analysis of the association between an individual's genotype and one or more diseases or conditions. Risk distribution plots can display GCI scores grouped by disease category. Further, the risk distribution map can display information on how to predict changes in GCI score with individual age or adjustment for multiple risk factors. For example, the GCI score for a particular disease may take into account the effects of dietary changes or preventive measures taken (smoking cessation, medication, bilateral radical mastectomy, hysterectomy). The GCI score can be displayed as a numerical measure, a graphical display, auditory feedback, or a combination of any of the foregoing.

正如在此所使用的，术语“在线入口”是指个体通过计算机和互联网网站、电话或者允许对信息进行类似访问的其它方式方便地访问的信息源。在线入口可以是加密网站。该网站可以提供与其它加密和非加密网站的链接，例如连接具有个体的表型谱的加密网站的链接或者连接非加密网站(如共有特定表型的个体的留言板)的链接。As used herein, the term "online portal" refers to sources of information that are readily accessible to individuals through computers and Internet sites, telephones, or other means that allow similar access to information. The online portal can be an encrypted website. The website may provide links to other encrypted and non-encrypted websites, such as links to encrypted websites with individuals' phenotype profiles or links to non-encrypted websites such as message boards for individuals sharing a particular phenotype.

除非另外指明，本发明的实施可以利用本领域技术人员能力范围内的分子生物学、细胞生物学、生物化学和免疫学的常规技术和使用说明。这些常规技术包括核酸分离、聚合物阵列合成(polymerarraysynthesis)、杂交、连接(ligation)和使用标记物的杂交检测。本发明举例说明了适当技术的具体例证并给出了参考文献。但是，也可以使用其它等效的常规方法。其它常规技术和使用说明可以在以下标准实验室手册和文献中找到：例如，基因组分析：实验室手册系列(卷I-IV)(GenomeAnalysis：ALaboratoryManualSeries(Vols.I-IV))、PCR引物：实验室手册(PCRPrimer：ALaboratoryManual)、分子克隆法：实验室手册(MolecularCloning：ALaboratoryManual)(全部源自冷泉港实验室出版社(ColdSpringHarborLaboratoryPress))、Stryer，L.(1995)生物化学(第四版)Freeman，纽约、Gait，“低聚核苷酸合成：实践方法(OligonucleotideSynthesis：APracticalApproach)”1984，IRL出版社，伦敦，Nelson和Cox(2000)、Lehninger，生物化学原理，第三版，W.H.FreemanPub.，纽约，N.Y.；以及Berg等(2002)生物化学，第五版，W.H.FreemanPub.，纽约，N.Y.，上述所有文献的全部内容在此并入作为参考。The practice of the present invention will employ, unless otherwise indicated, conventional techniques and protocols of molecular biology, cell biology, biochemistry and immunology, which are within the skill of the art. These conventional techniques include nucleic acid isolation, polymer array synthesis, hybridization, ligation, and hybridization detection using labels. Specific illustrations of suitable techniques are illustrated herein and references are given. However, other equivalent conventional methods may also be used. Additional routine techniques and instructions for use can be found in the following standard laboratory manuals and literature: For example, Genome Analysis: A Laboratory Manual Series (Vols. I-IV) (Genome Analysis: A Laboratory Manual Series (Vols. I-IV)), PCR Primers: Experimental Laboratory Manual (PCR Primer: A Laboratory Manual), Molecular Cloning: Laboratory Manual (Molecular Cloning: A Laboratory Manual) (all from Cold Spring Harbor Laboratory Press (Cold Spring Harbor Laboratory Press)), Stryer, L. (1995) Biochemistry (Fourth Edition) Freeman , New York, Gait, "Oligonucleotide Synthesis: A Practical Approach" 1984, IRL Press, London, Nelson and Cox (2000), Lehninger, Principles of Biochemistry, Third Edition, W.H. Freeman Pub., New York, N.Y.; and Berg et al. (2002) Biochemistry, Fifth Edition, W.H. Freeman Pub., New York, N.Y., all of which are hereby incorporated by reference in their entirety.

本发明的方法包括分析个体基因组图谱以向个体提供关于表型的分子信息。正如在此所详述的，个体提供生成个人基因组图谱的遗传样品。通过使基因组图谱与已确立和验证的人类基因型相关性的数据库相比较，查询个体基因组图谱有关基因型相关性的数据。已确立和验证的基因型相关性的数据库可以来自同行评议(peer-reviewed)的文献，并且由本领域中一个或多个专家(例如遗传学家、流行病学家或者统计学家)的委员会进一步评判，并进行验证。在优选实施方式中，规则基于经验证的基因型相关性制定，并应用于个体的基因组图谱以生成表型谱。个体基因组图谱的分析结果(表型谱)与解释和支持性信息一起提供给个体或个人的保健管理者，从而给予对个体保健进行个性化选择的能力。The methods of the invention involve analyzing the genome profile of an individual to provide molecular information about the phenotype to the individual. As detailed herein, an individual provides a genetic sample that generates a personal genome profile. An individual's genome profile is queried for data on genotype correlations by comparing the genome profile to a database of established and validated human genotype correlations. Databases of established and validated genotype correlations can be derived from the peer-reviewed literature and further developed by a committee of one or more experts in the field (e.g., geneticists, epidemiologists, or statisticians). Criticize and verify. In a preferred embodiment, rules are formulated based on validated genotype correlations and applied to an individual's genomic profile to generate a phenotypic profile. The results of the analysis of an individual's genomic profile (phenotypic profile) are provided to the individual or the individual's healthcare manager along with interpretation and supportive information, thereby giving the ability to make individualized choices about the individual's healthcare.

本发明的方法在图1中详细描述，其中首先生成个体的基因组图谱。个体基因组图谱将包括有关基于遗传变异和遗传标记的个体基因的信息。遗传变异是基因型，其组成基因组图谱。这些遗传变异或者遗传标记包括，但不限于单核苷酸多态性、单和／或多核苷酸重复、单和／或多核苷酸缺失、微卫星重复(通常具有5～1,000重复单元的小量核苷酸重复)、二核苷酸重复、三核苷酸重复、序列重排(包括易位和重复)、拷贝数变异(在特定基因座上的缺失和增加)等。其它遗传变异包括染色体重复和易位以及着丝粒重复和端粒重复。The method of the present invention is described in detail in Figure 1, where a genome profile of an individual is first generated. An individual's genome profile will include information about an individual's genes based on genetic variation and genetic markers. Genetic variation is the genotype, which makes up the genome map. These genetic variations or genetic markers include, but are not limited to, single nucleotide polymorphisms, single and/or polynucleotide repeats, single and/or polynucleotide deletions, microsatellite repeats (small Quantitative nucleotide repeats), dinucleotide repeats, trinucleotide repeats, sequence rearrangements (including translocations and duplications), copy number variations (deletions and gains at specific loci), etc. Other genetic variations include chromosomal duplications and translocations as well as centromeric and telomeric duplications.

基因型也可以包括单体型和双体型。在一些实施方式中，基因组图谱可以具有至少100,000、300,000、500,000或者1,000,000个基因型。在一些实施方式中，基因组图谱可以是基本上个体的完整基因组序列。在其它实施方式中，基因组图谱为至少60％、80％或者95％的个体完整基因组序列。基因组图谱可以为大约100％的个体完整基因组序列。包含靶物质的遗传样品包括，但不限于未扩增的基因组DNA或RNA样品或者扩增的DNA(或cDNA)。靶物质可以为包含特别感兴趣的遗传标记的基因组DNA的特定区域。Genotypes can also include haplotypes and disomytypes. In some embodiments, a genomic profile can have at least 100,000, 300,000, 500,000, or 1,000,000 genotypes. In some embodiments, a genomic profile can be substantially the entire genome sequence of an individual. In other embodiments, the genomic map is at least 60%, 80%, or 95% of the complete genome sequence of the individual. A genome map can be approximately 100% of an individual's complete genome sequence. Genetic samples containing target material include, but are not limited to, unamplified genomic DNA or RNA samples or amplified DNA (or cDNA). The target substance may be a specific region of genomic DNA containing a genetic marker of particular interest.

在图1的步骤102中，个体的遗传样品从个体的生物样品中分离。这些生物样品包括，但不限于血液、头发、皮肤、唾液、精液、尿、粪便物质、汗液、口腔(buccal)和各种身体组织。在一些实施方式中，组织样品可以从个体直接采集，例如口腔样品可以通过个体用拭子拭抹其颊部内侧而获得。例如唾液、精液、尿、粪便物质或者汗液的其它样品也可以由个体本人提供。其它生物样品可以由保健专业人员(例如抽血者、护士或者医生)提取。例如，血液样品可以由护士从个体抽取。组织活检可以由保健专业人员进行，并且保健专业人员也可以利用试剂盒以有效地获得样品。可以移取小的柱面皮肤样品或者使用针移取小的组织或流体样品。In step 102 of Figure 1, a genetic sample of an individual is isolated from a biological sample of the individual. These biological samples include, but are not limited to, blood, hair, skin, saliva, semen, urine, fecal material, sweat, buccal, and various body tissues. In some embodiments, a tissue sample can be taken directly from an individual, for example an oral sample can be obtained by an individual swabbing the inside of their cheek. Other samples such as saliva, semen, urine, fecal matter or sweat may also be provided by the individual himself. Other biological samples can be taken by a healthcare professional such as a phlebotomist, nurse or doctor. For example, a blood sample can be drawn from an individual by a nurse. A tissue biopsy can be performed by a healthcare professional, and the kit can also be used by the healthcare professional to efficiently obtain a sample. A small cylindrical skin sample may be removed or a needle may be used to remove a small tissue or fluid sample.

在一些实施方式中，向个体提供具有用于个体生物样品的样品采集容器的试剂盒。试剂盒也可以提供个体直接采集其自身样品的说明书，例如需提供多少头发、尿、汗液或者唾液。试剂盒也可以包括个体要求由保健专业人员提取组织样品的说明书。试剂盒可以包括可由第三方采集样品的场所，例如可以将试剂盒提供给随后从个体采集样品的保健机构。试剂盒还可以提供用于将样品递送至样品处理机构的返回包装，在该机构中遗传物质从生物样品中分离(步骤104)。In some embodiments, an individual is provided with a kit having a sample collection container for the individual's biological sample. Kits may also provide instructions for individuals to directly collect their own samples, eg, how much hair, urine, sweat or saliva to provide. The kit may also include instructions for the individual requesting that the tissue sample be taken by a healthcare professional. The kit may include a site where samples may be collected by a third party, eg, the kit may be provided to a healthcare facility that subsequently collects samples from the individual. The kit may also provide return packaging for delivery of the sample to a sample processing facility where genetic material is isolated from the biological sample (step 104).

可以按照几种已知生物化学和分子生物学方法中的任何一种方法从生物样品中分离DNA或RNA的遗传样品，参见例如Sambrook等人，分子克隆：实验室手册(MolecularCloning：ALaboratoryManual)(冷泉港实验室，纽约)(1989)。也有几种用于从生物样品中分离DNA或RNA的可商购的试剂盒和试剂，例如可从DNAGenotek、GentraSystems、Qiagen、Ambion和其它供应商获得的试剂盒和试剂。口腔样品试剂盒是很容易商购得到的，例如得自EpicentreBiotechnologies的MasterAmp^TMBuccalSwabDNA提取试剂盒，同样还有从血液样品中提取DNA的试剂盒，例如得自SigmaAldrich的Extract-N-Amp^TM。源自其它组织的DNA可以通过用蛋白酶消化组织和进行热处理、离心样品和使用苯酚-氯仿抽提不需要的物质、将DNA留在水相中而获得。然后可以用乙醇沉淀法进一步分离DNA。Genetic samples of DNA or RNA can be isolated from biological samples according to any of several known biochemical and molecular biological methods, see, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory, New York) (1989). There are also several commercially available kits and reagents for isolating DNA or RNA from biological samples, such as those available from DNA Genotek, GentraSystems, Qiagen, Ambion, and other suppliers. Oral sample kits are readily available commercially, such as the MasterAmp ^™ BuccalSwab DNA Extraction Kit from Epicentre Biotechnologies, as well as kits for DNA extraction from blood samples, such as the Extract-N-Amp ^™ from SigmaAldrich. DNA from other tissues can be obtained by digesting the tissue with protease and heat treatment, centrifuging the sample and extracting unwanted material with phenol-chloroform, leaving the DNA in the aqueous phase. The DNA can then be further isolated by ethanol precipitation.

在优选的实施方式中，从唾液中分离基因组DNA。例如，使用可从DNAGenotek获得的DNA自采集试剂盒技术，个体采集唾液试样用于临床处理。样品可以在室温下方便地储存和运送。在将样品递送到进行处理的适当的实验室之后，通过对样品进行热变性和蛋白酶消化(通常利用由采集试剂盒供应商提供的试剂在50℃下进行至少1小时)来分离DNA。接着离心样品，并对上层清液进行乙醇沉淀。将DNA沉淀悬浮在适于后续分析的缓冲液中。In a preferred embodiment, genomic DNA is isolated from saliva. For example, using DNA self-collection kit technology available from DNAGenotek, individuals collect saliva samples for clinical processing. Samples can be conveniently stored and shipped at room temperature. After delivery of the sample to an appropriate laboratory for processing, DNA is isolated by heat denaturation and protease digestion of the sample (usually at 50°C for at least 1 hour using reagents provided by the collection kit supplier). The samples were then centrifuged and the supernatant was subjected to ethanol precipitation. The DNA pellet was suspended in a buffer suitable for subsequent analysis.

在另一实施方式中，可以使用RNA作为遗传样品。特别地，可以从mRNA鉴定表达的遗传变异。术语“信使RNA”或“mRNA”包括，但不限于前mRNA转录物、转录物加工中间体、准备用于一个基因或多个基因的翻译和转录的成熟mRNA或者源自mRNA转录物的核酸。转录物加工可以包括剪接、编辑和降解。如在此所使用的，源自mRNA转录物的核酸是指mRNA转录物或其子序列最终充当其合成模板的核酸。因此，由mRNA反转录的cDNA、从cDNA扩增的DNA、从扩增的DNA转录的RNA等都是源自mRNA转录物。可以使用本领域已知的方法从几种身体组织中的任意一种分离RNA，例如使用从PreAnalytiX获得的PAXgene^TM血液RNA系统从未分级的(unfractionated)全血中分离RNA。典型地，mRNA将用于反转录cDNA，cDNA随后被使用或进行扩增以用于基因变异分析。In another embodiment, RNA can be used as a genetic sample. In particular, genetic variation in expression can be identified from mRNA. The term "messenger RNA" or "mRNA" includes, but is not limited to, pre-mRNA transcripts, transcript processing intermediates, mature mRNA ready for translation and transcription of a gene or genes, or nucleic acids derived from mRNA transcripts. Transcript processing can include splicing, editing and degradation. As used herein, a nucleic acid derived from an mRNA transcript refers to the nucleic acid for which the mRNA transcript or a subsequence thereof ultimately serves as a template for its synthesis. Thus, cDNA reverse transcribed from mRNA, DNA amplified from cDNA, RNA transcribed from amplified DNA, etc. are all derived from mRNA transcripts. RNA can be isolated from any of several body tissues using methods known in the art, for example RNA isolation from unfractionated whole blood using the PAXgene ^™ Blood RNA System available from PreAnalytiX. Typically, mRNA will be used to reverse transcribe cDNA, which is then used or amplified for genetic variation analysis.

在基因组图谱分析之前，通常由DNA或RNA反转录的cDNA扩增遗传样品。可以通过多种方法扩增DNA，这些方法中的许多使用了PCR。参见例如，PCR技术：DNA扩增机理和应用(PCRTechnology：PrinciplesandApplicationsforDNAAmplification)(Ed.H.A.Erlich，FreemanPress，NY，N.Y.，1992)；PCR方案：方法和应用指南(PCRProtocols：AGuidetoMethodsandApplications)(Eds.Innis等人，AcademicPress，SanDiego，Calif.，1990)；Mattila等人，NucleicAcidsRes.19，4967(1991)；Eckert等人，PCR方法和应用(PCRMethodsandApplications)1，17(1991)；PCR(Eds.McPherson等人，IRLPress，Oxford)；和美国专利第4,683,202、4,683,195、4,800,159、4,965,188和5,333,675号，上述各文献在此以其全部内容并入作为参考。Genetic samples are usually amplified from cDNA reverse transcribed from DNA or RNA prior to genomic profiling. DNA can be amplified by a variety of methods, many of which use PCR. See, for example, PCR Technology: Principles and Applications for DNA Amplification (PCR Technology: Principles and Applications for DNA Amplification) (Ed.H.A. Erlich, Freeman Press, NY, N.Y., 1992); , Academic Press, San Diego, Calif., 1990); Mattila et al., Nucleic Acids Res.19, 4967 (1991); Eckert et al., PCR Methods and Applications (PCRMethodsandApplications) 1, 17 (1991); PCR (Eds.McPherson et al., IRL Press, Oxford); and US Patent Nos. 4,683,202, 4,683,195, 4,800,159, 4,965,188, and 5,333,675, each of which is hereby incorporated by reference in its entirety.

其它适合的扩增方法包括连接酶链反应(LCR)(例如，Wu和Wallace，基因组学，4,560(1989)，Landegren等人，科学，241，1077(1988)以及Barringer等人，基因，89：117(1990))、转录扩增(Kwoh等人，Proc.Natl.Acad.Sci.USA86：1173-1177(1989)和WO88／10315)、自主序列复制(Guatelli等人，Proc.Nat.Acad.Sci.USA，87：1874-1878(1990)和WO90／06995)、靶多核苷酸序列的选择性扩增(美国专利第6,410,276号)、共有序列引物聚合酶链式反应(CP-PCR)(美国专利第4,437,975号)、随机引物聚合酶链式反应(AP-PCR)(美国专利第5,413,909、5,861,245号)、基于核酸的序列扩增(nucleicacidbasedsequenceamplification)(NABSA)、滚环扩增(RCA)、多重置换扩增(multipledisplacementamplification)(MDA)(美国专利第6,124,120和6,323,009号)和环至环扩增(circle-to-circleamplification)(C2CA)(Dahl等人，Proc.Natl.Acad.Sci101：4548-4553(2004))。(参见美国专利第5,409,818、5,554,517和6,063,603号，上述各文献在此并入作为参考)。在美国专利第5,242,794、5,494,810、5,409,818、4,988,617、6,063,603和5,554,517号以及美国专利申请第09／854,317号中描述了可以使用的其它扩增方法，上述各文献在此并入作为参考。Other suitable amplification methods include ligase chain reaction (LCR) (for example, Wu and Wallace, Genomics, 4, 560 (1989), Landegren et al., Science, 241, 1077 (1988) and Barringer et al., Genes, 89: 117 (1990)), transcriptional amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA86:1173-1177 (1989) and WO88/10315), autonomous sequence replication (Guatelli et al., Proc. Sci.USA, 87:1874-1878 (1990) and WO90/06995), selective amplification of target polynucleotide sequences (US Patent No. 6,410,276), consensus sequence primer polymerase chain reaction (CP-PCR) ( U.S. Patent No. 4,437,975), random primer polymerase chain reaction (AP-PCR) (U.S. Patent Nos. 5,413,909, 5,861,245), nucleic acid-based sequence amplification (NABSA), rolling circle amplification (RCA), Multiple displacement amplification (MDA) (US Pat. Nos. 6,124,120 and 6,323,009) and circle-to-circle amplification (C2CA) (Dahl et al., Proc. Natl. Acad. Sci 101:4548- 4553 (2004)). (See US Patent Nos. 5,409,818, 5,554,517 and 6,063,603, each of which is hereby incorporated by reference). Other amplification methods that may be used are described in US Patent Nos. 5,242,794, 5,494,810, 5,409,818, 4,988,617, 6,063,603, and 5,554,517, and US Patent Application Serial No. 09/854,317, each of which is incorporated herein by reference.

使用几种方法中的任意一种完成步骤106的基因组图谱的生成。本领域已知用以鉴定遗传变异的几种方法，并且这些方法包括，但不限于通过几种方法中的任意一种进行的DNA测序、基于PCR的方法、片断长度多态性分析(限制性片段长度多态性(RFLP)、裂解片段长度多态性(CFLP))、使用等位基因特异性寡核苷酸作为模板的杂交方法(例如，TaqManPCR方法、侵入物方法(invadermethod)、DNA芯片法)、使用引物延伸反应的方法、质谱分析法(MALDI-TOF／MS法)等。Generating the genome map of step 106 is accomplished using any of several methods. Several methods are known in the art to identify genetic variation, and these methods include, but are not limited to, DNA sequencing by any of several methods, PCR-based methods, fragment length polymorphism analysis (limited Fragment length polymorphism (RFLP), cleavage fragment length polymorphism (CFLP)), hybridization methods using allele-specific oligonucleotides as templates (e.g., TaqManPCR method, invader method, DNA chip method), a method using a primer extension reaction, a mass spectrometry method (MALDI-TOF/MS method), etc.

在一个实施方式中，高密度DNA阵列用于SNP鉴定和分布图生成。这些阵列可从Affymetrix和Illumina购得(参见Affymetrix500KAssayManual，Affymetrix，SantaClara，CA(并入作为参考)；humanHap650Y基因分型微珠芯片(genotypingbeadchip)，Illumina，SanDiego，CA)。In one embodiment, high density DNA arrays are used for SNP identification and profile generation. These arrays are commercially available from Affymetrix and Illumina (see Affymetrix 500KAssayManual, Affymetrix, Santa Clara, CA (incorporated by reference); humanHap650Y genotyping beadchip, Illumina, San Diego, CA).

例如，可以使用AffymetrixGenomeWideHumanSNPArray6.0通过对超过900,000的SNP进行基因分型以生成SNP分布图。或者，可以通过使用AffymetrixGeneChipHumanMapping500KArraySet确定经过完全基因组采样分析的超过500,000个SNP。在这些分析方法中，人类基因组的子集使用限制性内切酶消化的、接头连接的人基因组DNA通过单引物扩增反应进行扩增。如图2中所示，然后可以确定连接的DNA的浓度。而后扩增的DNA断裂，并且在继续步骤106前确定样品的质量。如果样品符合PCR和片段化标准，则对样品进行变性、标记和随后与涂覆的石英面上特定位置的小DNA探针组成的微阵列进行杂交。监测随扩增的DNA序列变化的与各探针杂交的标记物的量，从而产生序列信息和最终的SNP基因分型。For example, Affymetrix GenomeWideHumanSNPArray 6.0 can be used to generate SNP distribution maps by genotyping over 900,000 SNPs. Alternatively, more than 500,000 SNPs analyzed with a complete genome sampling can be identified by using the Affymetrix GeneChipHumanMapping500KArraySet. In these analytical methods, a subset of the human genome is amplified by a single-primer amplification reaction using restriction enzyme-digested, adapter-ligated human genomic DNA. As shown in Figure 2, the concentration of ligated DNA can then be determined. The amplified DNA is then fragmented and the quality of the sample is determined before proceeding to step 106 . If the sample meets PCR and fragmentation criteria, the sample is denatured, labeled, and subsequently hybridized to a microarray consisting of small DNA probes at specific locations on the coated quartz surface. The amount of label hybridized to each probe is monitored as a function of the amplified DNA sequence, yielding sequence information and ultimately SNP genotyping.

AffymetrixGeneChip500KAssay的使用按照制造商的指导进行。简要地说，首先用NspI或StyI限制性内切核酸酶消化分离的基因组DNA。然后消化的DNA与分别与NspI或StyI限制酶切DNA退火的NspI或StyI接头寡核苷酸连接。然后连接后的包含接头的DNA通过PCR进行扩增以产生在约200至1100碱基对之间的扩增DNA片段，这由凝胶电泳所证实。符合扩增标准的PCR产物进行纯化和定量以进行片段化。PCR产物用DNaseI进行断裂以达到最佳的DNA芯片杂交。断裂之后，DNA片段应小于250碱基对，并且平均为180碱基对，这通过凝胶电泳证实。然后使用末端脱氧核苷酸转移酶以生物素化合物标记符合片段化标准的样品。接着将标记的片段变性，而后杂交到GeneChip250K阵列中。杂交之后，在扫描前按三步的处理过程对阵列进行染色，所述的三部处理过程由下列步骤组成：抗生蛋白链菌素藻红蛋白(SAPE)染色，随后是利用生物素化的抗抗生蛋白链菌素抗体(山羊)的抗体扩增步骤，和用抗生蛋白链菌素藻红蛋白(SAPE)的最终染色。在标记之后，阵列用阵列保持缓冲液覆盖，然后用例如AffymetrixGeneChipScanner3000的扫描仪进行扫描。AffymetrixGeneChip500KAssay was used according to the manufacturer's directions. Briefly, isolated genomic DNA was first digested with NspI or StyI restriction endonucleases. The digested DNA was then ligated with NspI or StyI linker oligonucleotides annealed to NspI or StyI restriction DNA, respectively. The ligated adapter-containing DNA was then amplified by PCR to generate amplified DNA fragments between about 200 and 1100 base pairs, as confirmed by gel electrophoresis. PCR products meeting the amplification criteria were purified and quantified for fragmentation. PCR products were fragmented with DNaseI to achieve optimal DNA chip hybridization. After fragmentation, DNA fragments should be less than 250 bp and average 180 bp as confirmed by gel electrophoresis. Samples meeting the fragmentation criteria are then labeled with a biotin compound using terminal deoxynucleotidyl transferase. The labeled fragments were then denatured and then hybridized into the GeneChip250K array. After hybridization, arrays were stained prior to scanning in a three-step process consisting of streptavidin phycoerythrin (SAPE) staining followed by biotinylated anti- Antibody amplification step for streptavidin antibody (goat), and final staining with streptavidin phycoerythrin (SAPE). After labeling, the array is covered with array holding buffer and then scanned with a scanner such as the Affymetrix GeneChipScanner3000.

在AffymetrixGeneChipHumanMapping500KArraySet扫描后，按照制造商的指导进行数据分析，如图3所示。简要地说，使用GeneChip操作软件(GCOS)获得原始数据。也可以通过使用AffymetrixGeneChipCommandConsole^TM获得数据。获得初始数据后用GeneChip基因分型分析软件(GTYPE)进行分析。为了本发明的目的，排除GTYPE调用率(callrate)小于80％的样品。然后用BRLMM和／或SNiPer算法分析对样品进行检验。排除BRLMM调用率小于95％或者SNiPer调用率小于98％的样品。最终，进行关联分析，并且排除SNiPer质量指数小于0.45和／或哈迪-温伯格p-值小于0.00001的样品。After scanning with the AffymetrixGeneChipHumanMapping500KArraySet, data analysis was performed according to the manufacturer's instructions, as shown in Figure 3. Briefly, raw data were acquired using the GeneChip Operating Software (GCOS). Data can also be obtained by using the Affymetrix GeneChipCommandConsole ^™ . After obtaining the initial data, the GeneChip genotyping analysis software (GTYPE) was used for analysis. For the purposes of the present invention, samples with a GTYPE callrate of less than 80% were excluded. The samples were then analyzed using the BRLMM and/or SNiPer algorithms for testing. Samples with a call rate of less than 95% for BRLMM or less than 98% for SNiPer were excluded. Finally, correlation analysis was performed and samples with SNiPer quality index less than 0.45 and/or Hardy-Weinberg p-value less than 0.00001 were excluded.

作为DNA微阵列分析的替代或者附加，可以通过DNA测序检测遗传变异，例如SNP和突变。也可以使用DNA测序对个体的主要部分或者全部基因组序列进行测序。通常，常用的DNA测序是基于聚丙烯酰胺凝胶分级分离以解析链端片段群(Sanger等人，Proc.Natl.Acad.Sci.USA74：5463-5467(1977))。已经开发出来的和继续进行开发的替代方法提高了DNA测序的速度和简便性。例如，高通量和单分子测序平台可从454LifeSciences(Branford，CT)(Margulies等人，自然，(2005)437：376-380(2005))、Solexa(Hayward，CA)、HelicosBioSciences公司(Cambridge，MA)(于2005年6月23日提交的美国申请第11／167046号)和Li-CorBiosciences(Lincoln，NE)(于2005年4月29日提交的美国申请第11／118031号)商购得到，或者正由它们进行开发。As an alternative or in addition to DNA microarray analysis, genetic variations such as SNPs and mutations can be detected by DNA sequencing. DNA sequencing can also be used to sequence a substantial portion or the entire genome of an individual. In general, commonly used DNA sequencing is based on polyacrylamide gel fractionation to resolve strand end fragment populations (Sanger et al., Proc. Natl. Acad. Sci. USA 74:5463-5467 (1977)). Alternative methods have been developed and continue to be developed to improve the speed and ease of DNA sequencing. For example, high-throughput and single-molecule sequencing platforms are available from 454 LifeSciences (Branford, CT) (Margulies et al., Nature, (2005) 437:376-380 (2005)), Solexa (Hayward, CA), Helicos BioSciences (Cambridge, MA) (US Application No. 11/167046 filed June 23, 2005) and Li-Cor Biosciences (Lincoln, NE) (US Application No. 11/118031 filed April 29, 2005) are commercially available , or are being developed by them.

在步骤106中生成个体的基因组图谱之后，在步骤108中数字化存储该图谱，这一图谱可以以加密方式数字化存储。以计算机可读格式对该基因组图谱进行编码以存储为数据集的部分，并且可以存储为数据库，其中基因组图谱可以被“储蓄”，并且能够以后再次存取。数据集包括多个数据点，其中每个数据点涉及一个个体。每个数据点可以具有多个数据元。一个数据元是用以识别个体的基因组图谱的唯一识别符。其也可以是条形码。另一数据元是基因型信息，例如个体基因组的SNP或核苷酸序列。对应于基因型信息的数据元也可以包括在数据点中。例如，如果基因型信息包括由微阵列分析鉴定的SNP，那么其它数据元可以包括微阵列SNP识别号、SNPrs号和多态性核苷酸(polymorphicnucleotide)。其它数据元可以是基因型信息的染色体位置、数据的质量量度、原始数据文件、数据图像和提取强度得分。After the individual's genome profile is generated in step 106, the profile is digitally stored in step 108, which may be digitally stored in an encrypted manner. The genomic profile is encoded in a computer readable format for storage as part of a data set, and can be stored as a database, where the genomic profile can be "saved" and accessed again at a later time. A dataset includes multiple data points, where each data point relates to an individual. Each data point can have multiple data elements. A data element is a unique identifier used to identify an individual's genomic profile. It can also be a barcode. Another data element is genotype information, such as SNP or nucleotide sequence of an individual's genome. Data elements corresponding to genotype information may also be included in the data points. For example, if the genotype information includes SNPs identified by microarray analysis, other data elements may include microarray SNP identification numbers, SNPrs numbers, and polymorphic nucleotides. Other data elements may be chromosomal positions for genotype information, quality measures for data, raw data files, images of data, and extracted strength scores.

个体的特异性因素，例如身体数据、医学数据、种族、家系、地理、性别、年龄、家族史、已知表型、人口数据、暴露数据(exposuredata)、生活方式数据、行为数据和其它已知表型，也可以作为数据元包括在内。例如，这些因素可以包括，但不限于个体的：出生地、父母和／或祖父母、亲缘家系、居住地位置、祖先的居住地位置、环境条件、已知健康状况、已知药物相互作用、家庭卫生条件、生活方式条件、饮食、锻炼习惯、婚姻状态和身体测量数据(例如，体重、身高、胆固醇水平、心率、血压、葡萄糖水平和本领域已知的其它测量数据)。个体的亲戚或者祖先(例如，父母和祖父母)的上述因素也可以引入作为数据元并且用于确定个体的表型或状态的风险。Individual-specific factors such as physical data, medical data, race, ancestry, geography, gender, age, family history, known phenotypes, demographic data, exposure data, lifestyle data, behavioral data, and other known Phenotypes can also be included as data elements. For example, these factors may include, but are not limited to, an individual's: place of birth, parents and/or grandparents, kinship lineage, location of residence, location of ancestors' residence, environmental conditions, known medical conditions, known drug interactions, family Health conditions, lifestyle conditions, diet, exercise habits, marital status, and body measurements (eg, weight, height, cholesterol levels, heart rate, blood pressure, glucose levels, and other measurements known in the art). The aforementioned factors of an individual's relatives or ancestors (eg, parents and grandparents) may also be introduced as data elements and used to determine the individual's risk for a phenotype or condition.

特定因素可以从调查表或者从个体的保健管理者处获得。然后，可以访问来自“储蓄”的图谱的信息并且按所需要进行使用。例如，在个体的基因型相关性的初始评估中，将分析个体的全部信息(通常在整个基因组上的或者从整个基因组取得的SNP或其它基因组序列)用于确定基因型相关性。在后续的分析中，可以按需要或适当地访问来自存储的或储蓄的基因组图谱的全部信息或者其一部分。Specific factors can be obtained from questionnaires or from the individual's healthcare manager. Information from the "saved" graph can then be accessed and used as desired. For example, in an initial assessment of an individual's genotype relatedness, the entirety of the individual's information (typically SNPs or other genomic sequences on or taken from the entire genome) will be analyzed for genotype relatedness. In subsequent analyses, all or part of the information from the stored or banked genomic profile can be accessed as needed or as appropriate.

基因组图谱与基因型相关性数据库的比较Comparison of Genome Maps and Genotype Correlation Databases

在步骤1l0中，基因型相关性从科学文献中获得。遗传变异的基因型相关性由已经对是否存在一种或多种感兴趣的表型性状和对基因型谱进行了测试的个体的群体所进行的分析中确定。然后对基因型谱中各遗传变异或多态性的等位基因进行检测以确定是否特定的等位基因的存在与感兴趣的性状相关联。可以通过标准统计方法进行相关性分析，并记录遗传变异与表型特征之间的统计学显著的相关性。比如，可能确定，多态性A的等位基因A1的存在与心脏病相关。作为进一步的例子，可能发现在多态性A的等位基因A1和多态性B的等位基因B1的组合存在与癌症风险的增大相关。分析的结果可以在同行评议文献中公布，由其它研究组进行确认，和／或由专家委员会(例如，遗传学家、统计学家、流行病学家和医生)进行分析，并且也可以进行验证。In step 110, genotype correlations are obtained from the scientific literature. The genotypic relevance of a genetic variation is determined from an analysis of populations of individuals that have been tested for the presence or absence of one or more phenotypic traits of interest and genotype profiles. The alleles of each genetic variation or polymorphism in the genotype profile are then tested to determine whether the presence of a particular allele is associated with the trait of interest. Correlation analyzes can be performed by standard statistical methods and statistically significant associations between genetic variants and phenotypic traits are documented. For example, it may be determined that the presence of allele A1 of polymorphism A is associated with heart disease. As a further example, it may be found that the combined presence of allele A1 of polymorphism A and allele B1 of polymorphism B is associated with an increased risk of cancer. The results of the analysis can be published in the peer-reviewed literature, confirmed by other research groups, and/or analyzed by a committee of experts (eg, geneticists, statisticians, epidemiologists, and physicians), and can also be validated .

图4、5和6中为基因型与表型之间的相关性的实例，其中应用于基因组图谱的基因型与表型之间的规则基于这些相关性。例如，图4A和B中，各行对应于表型／基因座／种族，其中图4C至I包括这些行中各行的相关性的进一步的信息。作为例子，在图4A中BC的“表型名称缩写”如图4M表型名称缩写的索引中所注明的为乳腺癌的缩写。在BC_4(其为基因座的类名)这一行中，基因LSP1与乳腺癌相关。如图4C中所示，对于这一相关性确认的公开的或者功能性的SNP为rs3817198，而公开的风险等位基因为C，非风险等位基因为T。公开的SNP和等位基因通过出版物(例如，图4E-G中的基本的公开文献)确认。在图4E的LSP1的实例中，基本的公开文献为Easton等人，自然，447：713-720(2007)。图22和25进一步列出了相关性。可以使用图22和25中的相关性计算个体对于一种状态或表型的风险，例如，计算GCI或GCIPlus评分。GCI或GCIPlus评分也可以引入例如状态的流行度的信息，如在图23中。Examples of correlations between genotypes and phenotypes on which the rules applied to genomic profiles are based are shown in Figures 4, 5 and 6. For example, in Figures 4A and B, each row corresponds to phenotype/locus/ethnicity, where Figures 4C to I include further information on the correlation of each of these rows. As an example, the "phenotype name abbreviation" of BC in FIG. 4A is the abbreviation of breast cancer as noted in the index of phenotype name abbreviations in FIG. 4M . In the row BC_4 (which is the class name of the locus), the gene LSP1 is associated with breast cancer. As shown in Figure 4C, the published or functional SNP identified for this association was rs3817198, while the published risk allele was C and the non-risk allele was T. Published SNPs and alleles were identified through publications (eg, basic publications in Figures 4E-G). In the example of LSP1 of Figure 4E, the underlying publication is Easton et al., Nature, 447:713-720 (2007). Figures 22 and 25 further list the dependencies. The correlations in Figures 22 and 25 can be used to calculate an individual's risk for a state or phenotype, eg, to calculate a GCI or GCIPlus score. GCI or GCIPlus scores can also incorporate information such as the popularity of a state, as in FIG. 23 .

或者，可以由存储的基因组图谱形成相关性。例如，具有存储的基因组图谱的个体也可能存储了已知的表型信息。对存储的基因组图谱和已知的表型的分析可以形成基因型相关性。作为例子，250个具有存储基因组图谱的个体也具有先前诊断为患有糖尿病的存储信息。对他们的基因组图谱进行分析并与无糖尿病个体的对照组进行比较。然后确定先前诊断为患有糖尿病的个体与对照组相比具有特定的遗传性变型的比率较高，因而可以在特定的遗传性变型与糖尿病之间得出基因型相关性。Alternatively, correlations can be formed from stored genomic profiles. For example, individuals with stored genomic profiles may also have stored known phenotypic information. Analysis of stored genomic profiles and known phenotypes can form genotype correlations. As an example, 250 individuals with stored genome profiles also had stored information for a previous diagnosis of diabetes. Their genomic profiles were analyzed and compared to a control group of individuals without diabetes. Individuals previously diagnosed with diabetes are then determined to have a higher rate of the particular genetic variant compared to the control group so that a genotype association can be drawn between the specific genetic variant and diabetes.

在步骤112中，基于已证实的遗传性变型与特定表型之间的相关性形成规则。例如可以基于表1所列的相互关联的基因型和表型生成规则。基于相关性的规则可以引入其它因素，例如，性别(如，图4)或者种族(图4和5)以产生如图4和5中的效应评价。由规则产生的其它量度可以评估如图6中的相对风险增加。效应评价和估计的相对风险增加可以来自公开的文献，或者由公开的文献进行计算。或者，规则可以基于由存储的基因组图谱和先前已知的表型产生的相关性。在一些实施方式中，规则可以基于图22和25中的相关性。In step 112, rules are formed based on the proven correlations between genetic variants and specific phenotypes. For example, rules can be generated based on the interrelated genotypes and phenotypes listed in Table 1. Correlation-based rules can incorporate other factors such as gender (eg, Figure 4) or race (Figures 4 and 5) to produce effect estimates as in Figures 4 and 5 . Other metrics produced by the rules can assess the relative risk increase as in FIG. 6 . Effect estimates and estimated relative risk increases can be derived from, or calculated from, published literature. Alternatively, rules can be based on correlations generated from stored genomic profiles and previously known phenotypes. In some implementations, the rules may be based on the correlations in FIGS. 22 and 25 .

在优选的实施方式中，遗传性变型是SNP。虽然SNP发生在单位点上，但是携带在一个位点上的特定SNP等位基因的个体通常可预测在其它位点上携带特殊的SNP等位基因。SNP与使个体易发疾病或状态的等位基因的相关性通过连锁不平衡(linkagedisequilibrium)产生，其中在群体中两个或多个基因座上的等位基因发生非随机关联的频率大于或者小于预计通过重组随机形成而得到的频率。In preferred embodiments, the genetic variant is a SNP. Although SNPs occur at single loci, individuals who carry a particular SNP allele at one locus can often be predicted to carry particular SNP alleles at other loci. The association of a SNP with an allele that predisposes an individual to a disease or condition arises through linkage disequilibrium, in which alleles at two or more loci are nonrandomly associated more or less frequently in a population than Frequency expected to result from random formation by recombination.

其它遗传标记或变型(例如核苷酸重复或插入)也可以与已经显示为与特定的表型相关的遗传标记发生连锁不平衡。例如，核苷酸插入与表型相关，并且SNP与核苷酸插入发生连锁不平衡。基于SNP与表型之间的相关性形成规则。也可以形成基于核苷酸插入与表型之间的相关性的规则。可以将任一规则或者两个规则应用于基因组图谱，因为一个SNP的存在可以给出某一危险因子，另一规则可以给出另一危险因子，并且当它们结合时可以增大风险。Other genetic markers or variants (eg, nucleotide repeats or insertions) may also be in linkage disequilibrium with genetic markers that have been shown to be associated with particular phenotypes. For example, nucleotide insertions are associated with phenotypes, and SNPs are in linkage disequilibrium with nucleotide insertions. Rules are formed based on the correlation between SNPs and phenotypes. Rules based on correlations between nucleotide insertions and phenotypes can also be formed. Either rule or both rules can be applied to genomic profiles, since the presence of one SNP can give a certain risk factor, the other can give another risk factor, and when combined they can increase risk.

通过连锁不平衡，易发疾病的等位基因与SNP的特定等位基因或者SNP的特定等位基因的组合共分离(cosegregate)。沿染色体的SNP等位基因的特定组合称为单体型，并且其中它们发生组合的DNA区域可以称为单体型段。虽然单体型段可以由一个SNP组成，但是典型的单体型段表示在个体之间表现出低的单体型多样性且通常具有低重组频率的2个或多个邻近的SNP的系列。可以通过鉴定位于单体型段中的一个或多个SNP进行单体型的鉴定。这样，通常SNP分布图可以用于鉴定单体型段而不是必须鉴定给定的单体型段中的所有SNP。By linkage disequilibrium, disease-prone alleles cosegregate with specific alleles of SNPs or combinations of specific alleles of SNPs. Specific combinations of SNP alleles along a chromosome are called haplotypes, and the regions of DNA where they combine can be called haplotype segments. Although a haplotype segment can consist of one SNP, a typical haplotype segment represents a series of 2 or more contiguous SNPs that exhibit low haplotype diversity among individuals and typically have low recombination frequencies. Identification of a haplotype can be performed by identifying one or more SNPs located in a haplotype block. In this way, often SNP profiles can be used to identify haplotype blocks rather than having to identify all SNPs in a given haplotype block.

在SNP单体型模式与疾病、状态或身体状态之间的基因型相关性逐渐变得为人所知。对于给定的疾病，将已知具有该疾病的一组人的单体型模式与无该疾病的一组人相比较。通过分析许多个体，可以确定在群体中多态性的频率，并且随后这些频率或基因型可以与特定的表型(例如疾病或者状态)相关联。已知的SNP-疾病相关性的实例包括在与年龄相关性黄斑变性中补体因子H的多态性(Klein等人，科学，308：385-389，(2005))和与肥胖相关的靠近INSIG2基因的变型(Herbert等人，科学，312：279-283(2006))。其它已知SNP相关性包括例如，包括CDKN2A和B的9p21区域中的多态性(例如与心肌梗死有关的rs10757274、rs2383206、rs13333040、rs2383207和rs10116277(Helgadottir等人，科学，316：1491-1493(2007)；McPherson等人，科学，316：1488-1491(2007))。Genotypic correlations between SNP haplotype patterns and diseases, conditions or physical states are increasingly known. For a given disease, the haplotype pattern of a group of people known to have the disease is compared to a group of people without the disease. By analyzing many individuals, the frequency of polymorphisms in a population can be determined, and these frequencies or genotypes can then be correlated with a particular phenotype (eg, disease or condition). Examples of known SNP-disease associations include polymorphisms of complement factor H in age-related macular degeneration (Klein et al., Science, 308:385-389, (2005)) and obesity-associated proximity to INSIG2 Variation of genes (Herbert et al., Science, 312:279-283 (2006)). Other known SNP associations include, for example, polymorphisms in the 9p21 region including CDKN2A and B (eg, rs10757274, rs2383206, rs13333040, rs2383207, and rs10116277 associated with myocardial infarction (Helgadottir et al., Science, 316:1491-1493( 2007); McPherson et al., Science, 316: 1488-1491 (2007)).

SNP可以是功能性的或者非功能性的。例如，功能性SNP对细胞功能有影响，从而导致表型，然而非功能性SNP在功能上是静默的，但可以与功能性SNP发生连锁不平衡。SNP也可以是同义的或者非同义的。同义的SNP是其中不同形式导致相同多肽序列的SNP，且为非功能性SNP。如果SNP导致不同多肽，那么SNP是非同义的并且可以是功能性的或非功能性的。用于鉴定双体型(其为2个或多个单体型)中的单体型的SNP或者其它遗传标记也可以用于关联与双体型相关的表型。关于个体的单体型、双体型和SNP分布图的信息可以在个体的基因组图谱中。SNPs can be functional or non-functional. For example, a functional SNP has an effect on cellular function, resulting in a phenotype, whereas a non-functional SNP is functionally silent but can be in linkage disequilibrium with a functional SNP. SNPs can also be synonymous or non-synonymous. Synonymous SNPs are SNPs in which different forms result in the same polypeptide sequence, and are non-functional SNPs. SNPs are non-synonymous and can be functional or non-functional if they result in different polypeptides. SNPs or other genetic markers used to identify haplotypes in a diplotype (which is 2 or more haplotypes) can also be used to correlate phenotypes associated with the diplotype. Information about an individual's haplotypes, disomytypes, and SNP profiles can be in the individual's genomic profile.

在优选的实施方式中，对于基于与表型关联的另一遗传标记形成连锁不平衡的遗传标记产生的规则，该遗传标记可以具有大于0.5的r²或D’得分，该得分通常在本领域中用于确定连锁不平衡。在优选的实施方式中，得分大于0.6、0.7、0.8、0.90、0.95或0.99。结果，在本发明中，用于将表型与个体的基因组图谱关联的遗传标记可以相同或者不同于与表型相关的功能性的或公开的SNP。例如，使用BC_4，测试SNP和公开的SNP是相同的，正如测试的风险和非风险等位基因与公开的风险和非风险等位基因是相同的(图4A和C)。但是，对于BC_5，CASP8及其与乳腺癌的相关性，测试SNP与其功能性的或公开的SNP不同，正如测试的风险和非风险等位基因对于公开的风险和非风险等位基因一样。测试的和公开的等位基因相对于基因组的正链定向，并且从这些列中可以推断纯合型风险或非风险基因型，这可以生成用于例如注册用户的个体的基因组图谱的规则。在一些实施方式中，也可以不鉴定测试SNP，而是使用公开的SNP信息，可以基于另一分析方法(例如TaqMan)鉴定等位基因差异或SNP。例如，图25A中的AMD_5，公开的SNP为rs1061170，但没有鉴定测试SNP。可以通过公开的SNP的LD分析鉴定测试SNP。或者，可以不使用测试SNP，而是用TaqMan或其它相当的分析方法评价具有该测试SNP的个体基因组。In a preferred embodiment, for a rule based on genetic marker generation in linkage disequilibrium with another genetic marker associated with a phenotype, the genetic marker may have an ^r2 or D' score greater than 0.5, which is commonly used in the art used to determine linkage disequilibrium. In preferred embodiments, the score is greater than 0.6, 0.7, 0.8, 0.90, 0.95 or 0.99. Consequently, in the present invention, the genetic markers used to correlate a phenotype with an individual's genomic profile can be the same as or different from the functional or published SNPs associated with the phenotype. For example, using BC_4, the tested SNP and the published SNP are the same, just as the tested risk and non-risk alleles are the same as the published risk and non-risk alleles (Figure 4A and C). However, for BC_5, CASP8, and their association with breast cancer, the tested SNPs were different from their functional or published SNPs, just as the tested risk and non-risk alleles were to the published risk and non-risk alleles. Tested and published alleles are oriented relative to the positive strand of the genome, and from these columns a homozygous risk or non-risk genotype can be inferred, which can generate rules for, for example, a registered user's individual's genomic profile. In some embodiments, instead of identifying a test SNP, an allelic difference or SNP can be identified based on another analytical method (eg, TaqMan) using published SNP information. For example, for AMD_5 in Figure 25A, the published SNP is rs1061170, but no test SNP was identified. Test SNPs can be identified by LD analysis of published SNPs. Alternatively, instead of using a test SNP, TaqMan or other equivalent assays can be used to evaluate the genomes of individuals with the test SNP.

测试SNP可以为“直接(DIRECT)”或“标签(TAG)”SNP(图4E-G，图5)。直接SNP为与公开的或功能性SNP相同的测试SNP，例如对于BC_4。使用欧洲人和亚洲人的SNPrs1073640，直接SNP也可以用于乳腺癌的FGFR2相关性，其中次要等位基因为A且其它等位基因为G(Easton等人，自然，447：1087-1093(2007))。也是在欧洲人和亚洲人中的乳腺癌的FGFR2相关性的另一公开的或功能性的SNP为rs1219648(Hunter等人，Nat.Genet.39：870-874(2007))。标签SNP为测试SNP与功能性的或公开的SNP不同的情况，如BC_5的情况。标签SNP也可以用于其它遗传性变型，例如，对于CAMTA1(rs4908449)、9p21(rs10757274、rs2383206、rs13333040、rs2383207、rs10116277)、COL1A1(rs1800012)、FVL(rs6025)、HLA-DQA1(rs4988889、rs2588331)、eNOS(rs1799983)、MTHFR(rs1801133)和APC(rs28933380)的SNP。Test SNPs can be "DIRECT" or "TAG" SNPs (Figure 4E-G, Figure 5). A direct SNP is the same test SNP as a published or functional SNP, eg for BC_4. Direct SNPs can also be used for FGFR2 association in breast cancer, using SNPrs1073640 in Europeans and Asians, where the minor allele is A and the other allele is G (Easton et al., Nature, 447:1087-1093( 2007)). Another published or functional SNP of FGFR2 association also in breast cancer in Europeans and Asians is rs1219648 (Hunter et al., Nat. Genet. 39:870-874 (2007)). A tagging SNP is where the test SNP differs from a functional or published SNP, as in the case of BC_5. Tag SNPs can also be used for other genetic variants, for example, for CAMTA1 (rs4908449), 9p21 (rs10757274, rs2383206, rs13333040, rs2383207, rs10116277), COL1A1 (rs1800012), FVL (rs6025), HLA-DQA1 (rs4982888)5 , eNOS (rs1799983), MTHFR (rs1801133) and APC (rs28933380) SNPs.

SNP的数据库可以从以下地方公开获得：例如，InternationalHapMapProject(参见www.hapmap.org，TheInternationalHapMapConsortium，自然，426.789-796(2003)，和TheInternationalHapMapConsortium，自然，437：1299-1320(2005))、人类基因突变数据库(theHumanGeneMutationDatabase)(HGMD)公开数据库(参见www.hgmd.org)和单核苷酸多态性数据库(theSingleNucleotidePolymorphismdatabase)(dbSNP)(参见www.ncbi.nlm.nih.gov／SNP／)。这些数据库提供了SNP单体型，或者使得能够确定SNP单体型模式。因此，这些SNP数据库使得能够检测作为大范围的疾病和状态(例如癌症、炎性疾病、心血管病、神经变性疾病和传染病)的基础的遗传危险因子。这些疾病或状态可以是可处置的，其中当前存在其处理和治疗方法。处理可以包括预防处理和改善症状和状态的处理，包括改变生活方式。Databases of SNPs are publicly available from, for example, the InternationalHapMapProject (see www.hapmap.org, The InternationalHapMapConsortium, Nature, 426.789-796 (2003), and The InternationalHapMapConsortium, Nature, 437:1299-1320 (2005)), Human Gene Mutation The Human Gene Mutation Database (HGMD) public database (see www.hgmd.org) and the Single Nucleotide Polymorphism database (dbSNP) (see www.ncbi.nlm.nih.gov/SNP/). These databases provide SNP haplotypes, or enable the determination of SNP haplotype patterns. These SNP databases thus enable the detection of genetic risk factors underlying a wide range of diseases and conditions such as cancer, inflammatory, cardiovascular, neurodegenerative and infectious diseases. These diseases or conditions may be treatable for which treatments and treatments currently exist. Treatment can include prophylactic treatment as well as treatment to ameliorate symptoms and condition, including lifestyle changes.

也可以检测许多其它表型，例如身体性状、生理性状、精神性状、情绪性状、种族、家系和年龄。身体性状可以包括身高、发色、眼睛颜色、躯体或者例如精力、耐力和敏捷性的性状。精神性状可以包括智力、记忆能力或者学习能力。种族和家系可以包括家系或种族的鉴定，或者个体的祖先源于哪里。年龄可以是确定个体的实际年龄，或者是个体的遗传学特征使其相对于总的群体所处的年龄。例如，个体的实际年龄为38岁，但是其遗传学特征可以确定其记忆能力或身体健康状态可能为平均28岁。另外的年龄性状可以是个体的预计寿命。Many other phenotypes can also be detected, such as physical traits, physiological traits, mental traits, emotional traits, race, pedigree, and age. Physical traits may include height, hair color, eye color, body, or traits such as stamina, stamina, and agility. Mental traits may include intelligence, memory ability, or learning ability. Ethnicity and pedigree can include identification of pedigree or ethnicity, or where an individual's ancestry originated. Age may be a determination of the actual age of the individual, or the genetic characteristics of the individual that place it at that age relative to the general population. For example, an individual whose real age is 38 years old, but whose genetic characteristics determine his memory abilities or state of physical health may be an average of 28 years old. An additional age trait may be the life expectancy of the individual.

其它表型也可以包括非医学状态，例如“娱乐”表型。这些表型可以包括与知名个体的对比，例如，外国贵族、政治家、名人、发明家、运动员、音乐家、艺术家、商业人士和声名狼藉的个体(例如罪犯)。其它“娱乐”表型可以包括与其它生物体的对比，例如，细菌、昆虫、植物或者非人类的动物。例如，个体可能感兴趣看看其基因组图谱与其宠物狗或前任总统的基因组图谱对比会如何。Other phenotypes may also include non-medical states, such as the "recreational" phenotype. These phenotypes can include comparisons to well-known individuals, eg, foreign aristocrats, politicians, celebrities, inventors, athletes, musicians, artists, business people, and individuals of notoriety (eg, criminals). Other "recreational" phenotypes may include comparisons to other organisms, eg, bacteria, insects, plants, or non-human animals. For example, an individual might be interested to see how their genome profile compares to that of their pet dog or a former president.

在步骤114中，将规则应用于存储的基因组图谱以生成步骤116的表型谱。例如，在图4、5或6中的信息可以形成规则或测试的基础以应用于个体的基因组图谱。规则可以包括图4中关于测试SNP和等位基因以及效应评价的信息，其中，效应评价的UNITS为效应评价的单位，例如OR，或优势比(95％置信区间)或者平均值。在优选实施方式中效应评价可以是基因型风险(图4C-G)，例如对于纯合子的风险(homoz或RR)、风险杂合子(heteroz或RN)和非风险纯合子(homoz或NN)。在其它实施方式中，效应评价可以为携带者风险(carrierrisk)，其是RR或RN对NN。在再另外的实施方式中，效应评价可以基于等位基因、等位基因风险，例如R对N。这里也存在两个基因座(图4J)或三个基因座(图4K)的基因型效应评价(例如，对于两个基因座效应评价的9种可能的基因型组合：RRRR、RRNN等)。在图4H和I中还记录了公共HapMap中的测试SNP频率。In step 114 , rules are applied to the stored genomic profile to generate a phenotype profile of step 116 . For example, the information in Figures 4, 5 or 6 may form the basis of rules or tests to be applied to an individual's genomic profile. The rules may include the information in Figure 4 about the tested SNPs and alleles and the effect estimates, where the UNITS of the effect estimates are the units of the effect estimates, eg OR, or odds ratio (95% confidence interval) or mean. In a preferred embodiment the effect assessment may be genotype risk (Fig. 4C-G), eg risk for homozygotes (homoz or RR), risk heterozygotes (heteroz or RN) and non-risk homozygotes (homoz or NN). In other embodiments, the effect estimate can be carrier risk, which is RR or RN versus NN. In yet other embodiments, effect assessment can be based on allelic, allelic risk, eg, R versus N. There is also genotype effect evaluation for two loci (Fig. 4J) or three loci (Fig. 4K) (eg, 9 possible genotype combinations for two locus effect evaluation: RRRR, RRNN, etc.). Test SNP frequencies in the public HapMap are also reported in Figures 4H and I.

在其它实施方式中，来自图21、22、23和／或25的信息可以用于生成信息以应用于个体的基因组图谱。例如，信息可以用于生成个体的GCI或GCIPlus评分(例如，图19)。评分可以用于生成在个体的表型谱中一种或多种状态的遗传风险(例如估计的终生风险)的信息(例如，图15)。该方法允许计算如图22或25所列的一个或多个表型或者状态的估计终生风险或者相对风险。单个状态的风险可以基于一个或者多个SNP。例如，对于表型或状态的估计风险可以基于至少2、3、4、5、6、7、8、9、10、11或12个SNP，其中用于估计风险的SNP可以为公开的SNP、测试SNP或以上两者(例如，图25)。In other embodiments, the information from Figures 21, 22, 23 and/or 25 can be used to generate information to apply to an individual's genomic profile. For example, the information can be used to generate a GCI or GCIPlus score for an individual (eg, Figure 19). Scoring can be used to generate information on the genetic risk (eg, estimated lifetime risk) of one or more states in an individual's phenotypic profile (eg, FIG. 15 ). The method allows calculation of estimated lifetime risk or relative risk for one or more phenotypes or states as listed in Figures 22 or 25 . Risk for a single condition can be based on one or more SNPs. For example, the estimated risk for a phenotype or state can be based on at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 SNPs, wherein the SNPs used to estimate risk can be published SNPs, Test for SNPs or both (eg, Figure 25).

对于状态的估计风险可以基于图22或25所列的SNP。在一些实施方式中，状态的风险可以基于至少一个SNP。例如，个体对于阿尔茨海默症(AD)、结肠直肠癌(CRC)、骨关节炎(OA)或者剥脱性青光眼(XFG)的风险的评估可以基于1个SNP(例如，对于AD是rs4420638、对于CRC是rs6983267、对于OA是rs4911178和对于XFG是rs2165241)。对于其它状态，例如肥胖(BMIOB)、格雷夫斯氏病(GD)或者血色素沉着症(HEM)，个体的估计风险可以基于至少1个或2个SNP(例如，对于BMIOB是rs9939609和／或rs9291171；对于GD是DRB1*0301DQA1*0501和／或rs3087243；对于HEM是rs1800562和／或rs129128)。对于例如，但不限于心肌梗死(MI)、多发性硬化症(MS)或牛皮癣(PS)的状态，1、2或3个SNP可以用于评估个体对于这些状态的风险(例如，对于MI是rs1866389、rs1333049和／或rs6922269；对于MS是rs6897932、rs12722489和／或DRB1*1501；对于PS是rs6859018、rs11209026和／或HLAC*0602)。为了评估多动腿综合症(RLS)或乳糜泻(CelD)的个体风险，可以使用1、2、3或4个SNP(例如，对于RLS是rs6904723、rs2300478、rs1026732和／或rs9296249；对于CelD是rs6840978、rs11571315、rs2187668和／或DQA1*0301DQB1*0302)。对于前列腺癌(PC)或狼疮(SLE)，1、2、3、4或5个SNP可以用于评估个体对于PC或者SLE的风险(例如，对于PC是rs4242384、rs6983267、rs16901979、rs17765344和／或rs4430796；对于SLE是rs12531711、rs10954213、rs2004640、DRB1*0301和／或DRB1*1501)。为了评估黄斑变性(AMD)或类风湿性关节炎(RA)的个体终生风险，可以使用1、2、3、4、5或6个SNP(例如，对于AMD是rs10737680、rs10490924、rs541862、rs2230199、rs1061170和／或rs9332739；对于RA是rs6679677、rs11203367、rs6457617、DRB*0101、DRB1*0401和／或DRB1*0404)。为了评估乳腺癌(BC)的个体终生风险，可以使用1、2、3、4、5、6或7个SNP(例如，rs3803662、rs2981582、rs4700485、rs3817198、rs17468277、rs6721996和／或rs3803662)。为了评估克罗恩氏病(CD)或2型糖尿病(T2D)的个体终生风险，可以使用1、2、3、4、5、6、7、8、9、10或11个SNP(例如，对于CD是rs2066845、rs5743293、rs10883365、rs17234657、rs10210302、rs9858542、rs11805303、rs1000113、rs17221417、rs2542151和／或rs10761659；对于T2D是rs13266634、rs4506565、rs10012946、rs7756992、rs10811661、rs12288738、rs8050136、rs1111875、rs4402960、rs5215和／或rs1801282)。在一些实施方式中，用作风险确定的基础的SNP可以与上述的或者列于图22或25中的SNP形成连锁不平衡。The estimated risk for a state can be based on the SNPs listed in Figure 22 or 25. In some embodiments, the risk of a condition can be based on at least one SNP. For example, an individual's assessment of risk for Alzheimer's disease (AD), colorectal cancer (CRC), osteoarthritis (OA), or exfoliation glaucoma (XFG) can be based on 1 SNP (eg, rs4420638, rs6983267 for CRC, rs4911178 for OA and rs2165241 for XFG). For other states, such as obesity (BMIOB), Graves' disease (GD) or hemochromatosis (HEM), an individual's estimated risk can be based on at least 1 or 2 SNPs (e.g., rs9939609 and/or rs9291171 for BMIOB) ; DRB1*0301DQA1*0501 and/or rs3087243 for GD; rs1800562 and/or rs129128 for HEM). For conditions such as, but not limited to, myocardial infarction (MI), multiple sclerosis (MS), or psoriasis (PS), 1, 2, or 3 SNPs can be used to assess an individual's risk for these conditions (e.g., for MI is rs1866389, rs1333049, and/or rs6922269; for MS, rs6897932, rs12722489, and/or DRB1*1501; for PS, rs6859018, rs11209026, and/or HLAC*0602). To assess individual risk for restless legs syndrome (RLS) or celiac disease (CelD), 1, 2, 3, or 4 SNPs can be used (e.g., rs6904723, rs2300478, rs1026732, and/or rs9296249 for RLS; rs6840978, rs11571315, rs2187668 and/or DQA1*0301DQB1*0302). For prostate cancer (PC) or lupus (SLE), 1, 2, 3, 4, or 5 SNPs can be used to assess an individual's risk for PC or SLE (e.g., rs4242384, rs6983267, rs16901979, rs17765344, and/or rs4430796; for SLE rs12531711, rs10954213, rs2004640, DRB1*0301 and/or DRB1*1501). To assess an individual's lifetime risk of macular degeneration (AMD) or rheumatoid arthritis (RA), 1, 2, 3, 4, 5, or 6 SNPs (for example, rs10737680, rs10490924, rs541862, rs2230199, rs1061170 and/or rs9332739; for RA rs6679677, rs11203367, rs6457617, DRB*0101, DRB1*0401 and/or DRB1*0404). To assess an individual's lifetime risk of breast cancer (BC), 1, 2, 3, 4, 5, 6 or 7 SNPs (eg, rs3803662, rs2981582, rs4700485, rs3817198, rs17468277, rs6721996 and/or rs3803662) can be used. To assess an individual's lifetime risk of Crohn's disease (CD) or type 2 diabetes (T2D), 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 SNPs (eg,对于CD是rs2066845、rs5743293、rs10883365、rs17234657、rs10210302、rs9858542、rs11805303、rs1000113、rs17221417、rs2542151和／或rs10761659；对于T2D是rs13266634、rs4506565、rs10012946、rs7756992、rs10811661、rs12288738、rs8050136、rs1111875、rs4402960、rs5215和/or rs1801282). In some embodiments, the SNPs used as the basis for the risk determination may be in linkage disequilibrium with the SNPs described above or listed in Figures 22 or 25 .

个体的表型谱可以包括许多表型。特别地，无论在有症状、症状前或无症状的个体(包括一种或多种疾病／状态的易感等位基因的携带者)中，通过本发明的方法评估病人患疾病或其它状态(例如，可能的药物反应，包括代谢、功效和／或安全性)的风险使得能够对多种不相关的疾病和状态的易感性进行预后或者诊断分析。因此，这些方法提供了对于疾病或状态的个体易感性的总评价而不需要预先设想任何特定疾病或状态的测试。例如，本发明的方法使得能够基于个体基因组图谱对表1、图4、5或6中所列的多种状态中的任何一种的个体易感性进行评价。而且，这些方法允许评价一种或多种表型或状态的个体估计终生风险或相对风险，例如图22或25中的那些表型。An individual's phenotypic profile can include many phenotypes. In particular, whether in symptomatic, pre-symptomatic or asymptomatic individuals (including carriers of one or more disease/state susceptibility alleles), patients are assessed for disease or other conditions ( For example, risk of possible drug response, including metabolism, efficacy and/or safety, enables prognostic or diagnostic analysis of susceptibility to a variety of unrelated diseases and conditions. Thus, these methods provide an overall assessment of an individual's susceptibility to a disease or condition without the need to preconceive any particular disease or condition test. For example, the methods of the invention enable the assessment of an individual's susceptibility to any of the various states listed in Table 1, Figures 4, 5 or 6 based on the individual's genomic profile. Furthermore, these methods allow for the estimation of lifetime risk or relative risk for individuals assessed for one or more phenotypes or states, such as those phenotypes in Figures 22 or 25 .

所述评价优选提供有关这些状态中的2种或多种的信息，并且更优选这些状态中的3、4、5、10、20、50、100或者甚至更多种状态的信息。在优选实施方式中，将至少20条规则应用于个体的基因组图谱而得到表型谱。在其它的实施方式中，将至少50条规则应用于个体的基因组图谱。表型的单一的规则可以应用于单基因的表型。多于一条的规则也可以用于单一表型，例如多基因的表型或单一基因中的多个遗传性变型影响出现该表型的概率的单基因的表型。The evaluation preferably provides information on 2 or more of these states, and more preferably 3, 4, 5, 10, 20, 50, 100 or even more of these states. In a preferred embodiment, the phenotypic profile is obtained by applying at least 20 rules to the individual's genomic profile. In other embodiments, at least 50 rules are applied to the individual's genomic profile. A single rule of phenotype can be applied to a single gene phenotype. More than one rule may also be used for a single phenotype, such as a polygenic phenotype or a monogenic phenotype in which multiple genetic variants in a single gene affect the probability of that phenotype occurring.

在对个别患者基因组图谱进行初始扫描之后，当知道附加的核苷酸变型时，通过与这些附加的核苷酸变型(例如，SNP)的比较进行(或采用)个体基因型相关性的更新。例如，步骤110可以由搜索科学文献以寻找新基因型相关性的遗传学领域的一名或多名普通技术人员定期地进行，如，每天、每周或每月进行。然后，新基因型相关性可以进一步由本领域中的一位或多位专家的委员会确认。而后，步骤112可以以基于新的确认有效的相关性的新规则定期地更新。After an initial scan of an individual patient's genomic profile, updating of individual genotype associations is performed (or employed) by comparison to additional nucleotide variants (eg, SNPs) as they become known. For example, step 110 may be performed on a regular basis, eg, daily, weekly or monthly, by one or more persons of ordinary skill in the field of genetics who search the scientific literature for correlations with new genotypes. The new genotype correlations can then be further confirmed by a committee of one or more experts in the field. Thereafter, step 112 may be periodically updated with new rules based on newly validated correlations.

新规则可以包括在现有规则之外的基因型或者表型。例如，未与任何表型关联的基因型被发现与新的或现有的表型相关。新规则也可以用于先前无基因型与其关联的表型间的相关性。新规则也可以确定用于已具有现有规则的基因型和表型。例如，现有基于基因型A与表型A之间的相关性的规则。新的研究揭示了基因型B与表型A相关，因而产生基于这一相关性的新规则。另一个例子为发现表型B与基因型A相关，并因此制定新规则。New rules can include genotypes or phenotypes that are outside of existing rules. For example, a genotype not associated with any phenotype is found to be associated with a new or existing phenotype. The new rules can also be used for correlations between previously unavailable genotypes and their associated phenotypes. New rules can also be determined for genotypes and phenotypes that already have existing rules. For example, there are existing rules based on the correlation between genotype A and phenotype A. New research reveals that genotype B correlates with phenotype A, leading to new rules based on this correlation. Another example is finding that phenotype B correlates with genotype A, and making new rules accordingly.

可以在发现基于已知的但没有在公开的科学文献中进行初始确认的相关性时制定规则。例如，可能有人报道，基因型C与表型C相关。另外的出版物报道，基因型D与表型D相关。表型C和D是相关的症状，例如表型C可以是呼吸急促，而表型D是较小的肺容量。利用现有存储的具有基因型C和D以及表型C和D的个体的基因组图谱通过统计学方法，或者通过进一步的研究可以发现和确认基因型C与表型D或者基因型D与表型C之间的相关性。然后，可以基于新发现的和确认的相关性生成新规则。在另一实施方式中，可以研究存储的具有特定或相关表型的多个个体的基因型谱来确定这些个体共有的基因型，并且确定相关性。基于这一相关性可以生成新规则。Rules could be developed when discovering correlations based on known but not initially identified in the published scientific literature. For example, one might report that genotype C is associated with phenotype C. Additional publications report that genotype D is associated with phenotype D. Phenotypes C and D are related symptoms, eg phenotype C may be shortness of breath while phenotype D is smaller lung volumes. Genotype C and phenotype D or genotype D and phenotype can be discovered and confirmed by statistical methods using existing stored genome profiles of individuals with genotypes C and D and phenotypes C and D, or by further research Correlation between C. New rules can then be generated based on the newly discovered and confirmed correlations. In another embodiment, stored genotype profiles of a plurality of individuals with a particular or related phenotype can be studied to determine genotypes shared by these individuals, and to determine correlations. New rules can be generated based on this correlation.

也可以制定规则以修正现有规则。例如，基因型与表型之间的相关性可能部分地由已知个体特征确定，例如，种族、家系、地理、性别、年龄、家族史或者个体的任何其它已知表型。可以制定基于这些已知个体特征的规则并且引入现有规则中以提供修正的规则。应用修正的规则的选择将取决于个体的特定个体因素。例如，规则可能基于当个体具有基因型E时个体具有表型E的概率为35％。但是，如果个体为特定的种族，所述概率是5％。新规则可以基于这一结果制定并且应用于具有该特定种族特性的个体。或者，可以应用确定值为35％的现有规则，然后应用基于该表型的种族特征的另一规则。基于已知个体特征的规则可以由科学文献确定或者基于对存储的基因组图谱的研究确定。在产生了新规则时，可以在步骤114中添加新的规则并将其应用于基因组图谱，或者可以定期地应用它们，例如一年至少一次。Rules can also be made to amend existing rules. For example, a correlation between a genotype and a phenotype may be determined in part by known characteristics of the individual, such as race, ancestry, geography, sex, age, family history, or any other known phenotype of the individual. Rules based on these known individual characteristics can be formulated and incorporated into existing rules to provide revised rules. The choice of rule to apply the modification will depend on individual factors specific to the individual. For example, a rule might be based on a 35% probability that an individual has phenotype E when the individual has genotype E. However, if the individual is of a particular race, the probability is 5%. New rules can be formulated based on this result and applied to individuals with that particular racial identity. Alternatively, one could apply an existing rule that determines a value of 35%, and then apply another rule based on the racial characteristics of that phenotype. Rules based on known individual characteristics may be determined from the scientific literature or based on studies of stored genome profiles. When new rules are generated, they can be added in step 114 and applied to the genome map, or they can be applied periodically, for example at least once a year.

疾病的个体风险的信息也可以随着更高分辨率SNP基因组图谱的技术进步得到扩展。如上所述，使用用于扫描500,000个SNP的微阵列技术可以很容易地生成初始SNP基因组分布图。假定单体型段的情况，这一数字可用于个体基因组中所有SNP的典型分布图。虽然如此，在人类基因组中估计通常发生大约1000万个SNP(theInternationalHapMapProject；www.hapmap.org)。随着能够以更高细节水平对SNP进行实用和经济的解析(例如1,000,000、1,500,000、2,000,000、3,000,000或更多SNP的微阵列)的或者全基因组测序方面的技术进步，可以生成更详细的SNP基因组分布图。同样地，通过计算机分析方法技术的进展将使得更精细的SNP基因组分布图的经济分析和SNP-疾病相关性主数据库的更新成为可能。Information on individual risk of disease can also be expanded with technological advances in higher-resolution SNP genomic maps. As mentioned above, initial SNP genome distribution maps can be easily generated using microarray technology for scanning 500,000 SNPs. Assuming the case of haplotype segments, this number can be used for a typical distribution plot of all SNPs in an individual's genome. Nonetheless, it is estimated that about 10 million SNPs typically occur in the human genome (the International HapMap Project; www.hapmap.org). With technological advances in whole-genome sequencing or whole-genome sequencing that enable practical and economical resolution of SNPs at higher levels of detail (e.g., microarrays of 1,000,000, 1,500,000, 2,000,000, 3,000,000, or more SNPs), more detailed SNP genomes can be generated Distribution. Likewise, technological advances in computer analysis methods will enable the economic analysis of finer SNP genomic profiles and the updating of master databases of SNP-disease associations.

在步骤116生成表型谱之后，注册用户或者其保健管理者可以如步骤118中通过在线入口或网站访问他们的基因组图谱或表型谱。也可以将包括表型谱和其它关于表型谱和基因组图谱的信息的报告提供给注册用户或其保健管理者，如步骤120和122中所述。可以将报告打印出来、存储在注册用户的电脑里或者在线察看。After the phenotype profile is generated at step 116 , the registered user or their healthcare manager can access their genomic or phenotype profile through an online portal or website as in step 118 . A report including the phenotype profile and other information about the phenotype profile and the genome profile may also be provided to the registered user or their healthcare manager, as described in steps 120 and 122 . Reports can be printed, stored on the registered user's computer, or viewed online.

图7示出了示例的在线报告。注册用户可以选择显示单一表型或者多于一个的表型。注册用户也可以具有不同的察看选项，例如，如图7中所示“QuickView”选项。表型可以是医学状态并且在快速报告中的不同处理和症状可以链接至其它包含有关处理的进一步信息的网页。例如，通过点击药物，会导向包括关于剂量、费用、副作用和功效的信息的网站。也可以将药物与其它治疗进行比较。网站也可以包括导向药物制造商的网站的链接。另一链接可以向注册用户提供生成药物性基因组学(pharmacogenomic)图谱的选项，这将包括基于其基因组图谱他们对于药物的可能反应的信息。也可以提供对于药物的替代方案的链接，例如预防性行为(如康体(fitness)和减轻体重)；并且也可以提供对于饮食补充、饮食计划的链接及对于附近的健身俱乐部、健康诊所、保健及康复提供者、都市型spa(dayspa)等的链接。也可以提供教育和情报视频、可利用的治疗的概要、可能的疗法和一般建议。Figure 7 shows an example online report. Registered users can choose to display a single phenotype or more than one phenotype. Registered users may also have different viewing options, for example, a "QuickView" option as shown in FIG. 7 . The phenotype can be a medical state and the different treatments and symptoms in the quick report can be linked to other web pages containing further information about the treatment. For example, clicking on a drug leads to a website that includes information on dosage, cost, side effects, and efficacy. Drugs can also be compared with other treatments. The website may also include a link to the website of the drug manufacturer. Another link could provide registered users with the option to generate a pharmacogenomic profile, which would include information on their likely response to drugs based on their genomic profile. Links to medication alternatives, such as preventive behaviors (such as fitness and weight loss); and links to dietary supplements, diet plans, and information on nearby health clubs, health clinics, health and links to rehabilitation providers, urban spas (dayspas), and more. Educational and informative videos, summaries of available treatments, possible therapies, and general advice may also be provided.

在线报告也可以提供安排个人医生或遗传咨询预约的链接或者访问在线遗传顾问或医生的链接，从而为注册用户提供询问更多关于其表型谱的信息的机会。在线报告上也可以提供在线遗传咨询和医师询问的链接。The online report may also provide a link to schedule a personal physician or genetic counseling appointment or to access an online genetic counselor or physician, thereby providing registered users with the opportunity to ask for more information about their phenotypic profile. Links to online genetic counseling and physician inquiries are also available on the online report.

也可以以其它形式观看报告，例如对于单一表型的综合观察，其中提供了对于各个类别的更多详情。例如，可以存在关于注册用户出现表型的可能性的更详细的统计；关于典型症状或表型的更多信息，例如医学状态的代表症状或者身体非医学状态(如身高)的范围；或者关于基因和遗传性变型的更多信息，例如群体流行度，如在世界上或者在不同国家中，或者在不同年龄范围或性别中的群体流行度。例如，图15显示了许多状态的估计终生风险的总结。个体可以察看特定状态(例如前列腺癌(图16)或者克罗恩氏病(图17))的更多信息。Reports can also be viewed in other formats, such as aggregated observations for a single phenotype, which provide more detail for each category. For example, there may be more detailed statistics about the likelihood of a registered user developing a phenotype; more information about typical symptoms or phenotypes, such as symptoms representative of medical conditions or ranges for non-medical conditions of the body (such as height); or information about More information on genes and genetic variants, such as population prevalence, eg, in the world or in different countries, or in different age ranges or sexes. For example, Figure 15 shows a summary of estimated lifetime risks for a number of states. Individuals can view more information for a particular condition, such as prostate cancer (FIG. 16) or Crohn's disease (FIG. 17).

在另一实施方式中，报告可以是“娱乐”表型的报告，例如，个体基因组图谱与知名个体(如阿尔伯特·爱因斯坦)的基因组图谱的相似性。报告可以显示个体基因组图谱与爱因斯坦的个体基因组图谱之间的百分比相似性，并且可以进一步显示爱因斯坦的预测IQ和该个体的预测IQ。进一步的信息可以包括总群体的基因组图谱和其IQ与该个体和爱因斯坦的基因组图谱和IQ比较的情况。In another embodiment, the report may be a report of a "recreational" phenotype, eg, the similarity of an individual's genome profile to that of a well-known individual such as Albert Einstein. The report can show the percent similarity between the individual's genome profile and Einstein's individual's genome profile, and can further show Einstein's predicted IQ and the individual's predicted IQ. Further information may include how the genome profile and IQ of the overall population compares to the genome profile and IQ of the individual and Einstein.

在另一实施方式中，报告可以显示已与注册用户的基因组图谱相关联的所有表型。在其它的实施方式中，报告可以仅显示确定与个体的基因组图谱正相关的表型。个体可以选择以其它形式显示表型的特定亚类，例如仅医学表型或者仅可处置的医学表型。例如，可处置的表型及其相关的基因型可以包括克罗恩氏病(与IL23R和CARD15相关)、1型糖尿病(与HLA-DR／DQ相关)、狼疮(与HLA-DRB1相关)、牛皮癣(HLA-C)、多发性硬化症(HLA-DQA1)、格雷夫斯病(HLA-DRB1)、类风湿性关节炎(HLA-DRB1)、2型糖尿病(TCF7L2)、乳腺癌(BRCA2)、结肠癌(APC)、情景记忆(KIBRA)和骨质疏松症(COL1A1)。个体也可以选择在报告中显示表型的子类，例如，仅医学状态的炎性疾病或仅非医学状态的身体性状。在一些实施方式中，个体可以选择通过突出显示计算了估计风险的那些状态(例如，图15A，D)、仅具有较高风险的状态(图15B)或仅具有较低风险(图15C)的状态而显示对该个体计算了估计风险的所有状态。In another embodiment, the report may display all phenotypes that have been associated with the registered user's genomic profile. In other embodiments, the report may only display phenotypes determined to be positively associated with the individual's genomic profile. Individuals may choose to display particular subclasses of phenotypes in other forms, such as only medical phenotypes or only treatable medical phenotypes. For example, addressable phenotypes and their associated genotypes may include Crohn's disease (associated with IL23R and CARD15), type 1 diabetes (associated with HLA-DR/DQ), lupus (associated with HLA-DRB1), Psoriasis (HLA-C), Multiple Sclerosis (HLA-DQA1), Graves' Disease (HLA-DRB1), Rheumatoid Arthritis (HLA-DRB1), Type 2 Diabetes (TCF7L2), Breast Cancer (BRCA2) , colon cancer (APC), episodic memory (KIBRA), and osteoporosis (COL1A1). Individuals can also choose to display subcategories of the phenotype in the report, for example, only inflammatory diseases for medical states or only physical traits for non-medical states. In some embodiments, an individual can select by highlighting those states for which estimated risk is calculated (e.g., Figure 15A, D), only states with higher risk (Figure 15B), or only states with lower risk (Figure 15C). States to display all the states for which an estimated risk has been calculated for that individual.

交付并传送至个体的信息可以是加密的和保密的，并且可以控制个体对这些信息的访问。由复杂基因组图谱得到的信息可以提供给个体作为管理部门批准的、可理解的、医疗相关的和／或具有高度影响的数据。信息也可以是具有一般的重要性，而与医疗无关。可以通过几种方式向个体加密地传送信息，所述方式包括，但不限于入口界面和／或邮寄。更优选地，信息通过入口界面加密地(如果个体如此选择)向个体提供，其中个体对该入口界面具有安全和保密的访问权限。这一界面优选通过在线的、互联网站入口提供，或者可选择地，通过电话或允许提供私密、安全和易于使用的访问的其它方式。基因组图谱、表型谱和报告通过网络的数据传输向个体或其保健管理者提供。Information delivered and transmitted to individuals may be encrypted and confidential, and access to such information by individuals may be controlled. Information derived from complex genomic profiling can be provided to individuals as regulatory-approved, understandable, medically relevant, and/or high-impact data. Information may also be of general importance and not related to medical treatment. Encrypted transfer of information to an individual may be through several means including, but not limited to, portal interface and/or post. More preferably, the information is provided to the individual encrypted (if the individual so chooses) through a portal interface to which the individual has secure and confidential access. This interface is preferably provided through an on-line, Internet website portal, or alternatively, by telephone or other means that allow for private, secure and easy-to-use access. Genomic profiles, phenotypic profiles, and reports are provided to individuals or their healthcare managers through data transmission over the network.

因此，图8为显示了可以通过其生成表型谱和报告的代表性示例逻辑设备的框图。图8显示了计算机系统(或者数字设备)800，其用于接收和存储基因组图谱、分析基因型相关性、基于基因型相关性生成规则、将规则应用于基因组图谱和产生表型谱和报告。计算机系统800可以理解为能够从介质811和／或网络端口805读取指令的逻辑设备，该网络端口805能够任选地与具有固定介质812的服务器809相连。图8中显示的系统包括CPU801、磁盘驱动器803、任选的输入设备(例如键盘815和／或鼠标816)以及任选的监视器807。与本地或远方位置的服务器809的数据通信可以通过所示的通信媒介完成。通信媒介可以包括传送和／或接收数据的任何手段。例如，通信媒介可以是网络连接、无线连接或者互联网连接。这一连接可以提供环球网(WorldWideWeb)上的通信。可以预想，本发明有关的数据可通过这些手段以用于一方822接收和／或检验的网络或连接进行传送。接收方822可以为个体、注册用户、保健提供者或保健管理者，但不限于此。在一个实施方式中，计算机可读的介质包括适于传送生物样品或基因型相关性的分析结果的介质。所述介质可以包括关于个体对象的表型谱的结果，其中使用在此所描述的方法得到这一结果。Accordingly, FIG. 8 is a block diagram illustrating a representative example logic device by which phenotype profiles and reports may be generated. Figure 8 shows a computer system (or digital device) 800 for receiving and storing genomic profiles, analyzing genotype correlations, generating rules based on genotype correlations, applying rules to genomic profiles, and generating phenotype profiles and reports. The computer system 800 can be understood as a logical device capable of reading instructions from a medium 811 and/or a network port 805 which can optionally be connected to a server 809 with a fixed medium 812 . The system shown in FIG. 8 includes a CPU 801 , disk drive 803 , optional input devices (eg, keyboard 815 and/or mouse 816 ), and optional monitor 807 . Data communication with server 809 at a local or remote location can be accomplished through the communication media shown. Communication media may include any means of transmitting and/or receiving data. For example, a communication medium can be a network connection, a wireless connection, or an Internet connection. This connection can provide communication on the World Wide Web. It is envisioned that data pertaining to the present invention may be transmitted by these means over a network or connection for receipt and/or verification by party 822 . The recipient 822 may be, but is not limited to, an individual, a registered user, a healthcare provider, or a healthcare manager. In one embodiment, the computer-readable medium comprises a medium suitable for communicating the results of analysis of biological samples or genotype correlations. The medium may include results regarding a phenotypic profile of an individual subject, where such results are obtained using the methods described herein.

个人入口将优选用作接收和评价基因组数据的个体的基本界面。入口将使个体能够跟踪其样品从收集到测试的过程并能够跟踪结果。通过入口访问，基于其基因组图谱向个体介绍常见遗传病的相对风险。注册用户可以通过入口选择将哪些规则应用于其基因组图谱。The Personal Portal will preferably be used as the primary interface for individuals receiving and evaluating genomic data. Portals will enable individuals to follow their samples from collection to testing and to track results. Accessed through the portal, individuals are informed about their relative risk of common genetic diseases based on their genomic profile. Registered users can choose which rules to apply to their genome map through the portal.

在一个实施方式中，一个或多个网页将具有表型的列表和靠近每个表型有一个方框，注册用户可以选择方框以将其包括在他们的表型谱中。表型可以链接至与该表型有关的信息，以帮助注册用户明智地选择关于他们希望包括在其表型谱中的表型。网页也可以具有按疾病分组(例如可处置的疾病或不可处置的疾病)组织的表型。例如，注册用户可以仅选择可处置的表型，例如HLA-DQA1和乳糜泻。注册用户也可以选择显示表型的症状前或症状后治疗。例如，个体可以选择具有症状前治疗的可处置表型(在进一步筛查以外)，对于乳糜泻为无谷蛋白饮食的症状前治疗。另一实例可以是阿尔茨海默氏病，症状前治疗为他汀类药物、锻炼、维生素和精神作用。血栓形成是另一实例，症状前治疗是避免口服避孕药和避免常时间久坐。具有经批准的症状后治疗的表型的实例为与CFH有关的湿性AMD，其中个体可以进行对其状态的激光治疗。In one embodiment, one or more web pages will have a list of phenotypes and a box next to each phenotype that a registered user can select to include it in their phenotype profile. Phenotypes can be linked to information related to that phenotype to help registered users make informed choices about which phenotypes they wish to include in their phenotype profiles. Web pages may also have phenotypes organized by disease groupings (eg, treatable or non-treatable). For example, registered users can select only treatable phenotypes, such as HLA-DQA1 and celiac disease. Registered users can also choose to display phenotypes with presymptomatic or postsymptomatic treatment. For example, an individual may choose to have a manageable phenotype (in addition to further screening) with presymptomatic treatment, for celiac disease a presymptomatic treatment of a gluten-free diet. Another example could be Alzheimer's disease, presymptomatic treatments being statins, exercise, vitamins and psychoactive effects. Thrombosis is another example, and presymptomatic treatment is avoidance of oral contraceptives and avoiding prolonged sitting. An example of a phenotype with approved post-symptomatic treatment is wet AMD associated with CFH, where an individual can undergo laser therapy for their condition.

表型也可以按疾病或状态的类型或种类进行组织，例如神经学、心血管、内分泌、免疫等等。表型也可以分组为医学和非医学表型。在网页上的表型的其它分类可以按照身体性状、生理性状、精神性状或情绪性状进行。网页可以进一步提供通过选择一个方框而选择一组表型的分区。例如，选择所有表型、仅与医学相关的表型、仅非医学相关的表型、仅可处置的表型、仅不可处置的表型、不同的疾病组或者“娱乐”表型。“娱乐”表型可以包括与名人或其他知名个体的对比，或者与其它动物或甚至其它生物体的对比。可用于对比的基因组图谱的列表也可以在网页上提供以用于由注册用户选择与注册用户的基因组图谱对比。Phenotypes can also be organized by type or category of disease or condition, eg, neurological, cardiovascular, endocrine, immunological, etc. Phenotypes can also be grouped into medical and non-medical phenotypes. Other classifications of phenotypes on web pages can be by physical traits, physiological traits, mental traits or emotional traits. The webpage may further provide for selecting a partition of a set of phenotypes by selecting a box. For example, select all phenotypes, only medically relevant phenotypes, only non-medically relevant phenotypes, only treatable phenotypes, only non-treatable phenotypes, different disease groups, or "recreational" phenotypes. "Entertainment" phenotypes may include comparisons to celebrities or other well-known individuals, or comparisons to other animals or even other organisms. A list of genomic profiles available for comparison may also be provided on the web page for selection by the registered user to compare with the registered user's genomic profile.

在线入口也可以提供搜索引擎，以帮助注册用户浏览入口、检索特定表型或者检索由其表型谱或报告所揭示的特定术语或信息。也可以由入口提供访问搭配的服务和提供的产品的链接。也可以提供连接到支持小组、留言板和具有共同或相似表型的个体的聊天室的另外的链接。在线入口也可以提供连接到具有更多与注册用户表型谱中表型有关的信息的其它地址的链接。在线入口也可以提供允许注册用户与朋友、家人或保健管理者分享其表型谱和报告的服务。注册用户可以选择在表型谱中显示他们希望与其朋友、家人或保健管理者分享的表型。The online portal may also provide a search engine to help registered users browse the portal, retrieve a specific phenotype, or retrieve specific terms or information revealed by their phenotype profiles or reports. Links to access companion services and offered products may also be provided by the portal. Additional links to support groups, message boards, and chat rooms for individuals with a common or similar phenotype may also be provided. The online portal may also provide links to other sites with more information about the phenotypes in the registered user's phenotype profile. The online portal may also provide services that allow registered users to share their phenotype profiles and reports with friends, family or healthcare administrators. Registered users can choose to display the phenotypes they wish to share with their friends, family or healthcare managers in the phenotype profile.

表型谱和报告提供了个体的个人化基因型相关性。向个体提供的基因型相关性能够用于确定个人保健和生活方式选择。如果发现了在遗传性变型与可进行治疗的疾病之间的强相关性，遗传性变型的检测可以帮助决定开始疾病治疗和／或个体监测。在存在统计学上显著的相关性但不认为是强相关性的情况下，个体可以与个人医生讨论该信息并决定适当、有益的行动方案。就特定基因型相关性而言可能有益于个体的潜在行动方案包括进行治疗处理、监测潜在的治疗需要或治疗效果或者在饮食、锻炼和其它个人习惯／活动等方面改变生活方式。例如，可处置表型(如乳糜泻)可以进行无谷蛋白饮食的症状治疗。同样，通过药物基因组学，基因型相关性信息可应用于预测必须用特定药物或药物疗程进行治疗的个体的可能反应，例如特定药物治疗的可能的效力或安全性。Phenotype profiles and reports provide individualized genotype correlations for individuals. The genotype correlations provided to individuals can be used to determine personal healthcare and lifestyle choices. If a strong correlation is found between a genetic variant and a treatable disease, detection of the genetic variant can aid in the decision to initiate disease treatment and/or individual monitoring. Where a statistically significant correlation exists but is not considered strong, the individual can discuss this information with a personal physician and decide on an appropriate, beneficial course of action. Potential courses of action that may benefit individuals with respect to specific genotype associations include therapeutic management, monitoring for potential treatment needs or effects, or lifestyle changes in terms of diet, exercise, and other personal habits/activities. For example, treatable phenotypes such as celiac disease could be treated symptomatically with a gluten-free diet. Also, through pharmacogenomics, genotype correlation information can be applied to predict the likely response of an individual who must be treated with a particular drug or course of drugs, such as the likely efficacy or safety of a particular drug treatment.

注册用户可以选择将基因组图谱和表型谱提供给其保健管理者，例如医生或遗传顾问。基因组图谱和表型谱可以由保健管理者直接访问，由注册用户打印出一份以交给保健管理者，或者通过在线入口(例如通过在线报告上的链接)将其直接发送给保健管理者。Registered users can choose to provide their genomic and phenotypic profiles to their healthcare managers, such as physicians or genetic counselors. The genomic and phenotypic profiles can be directly accessed by the healthcare manager, printed out by a registered user to give to the healthcare manager, or sent directly to the healthcare manager through an online portal (eg, via a link on an online report).

这一相关信息的传递将使患者进行与其医生协调的行动。特别是，在患者与其医生间的讨论可以通过个人入口和连接到医学信息的链接以及使患者的基因组信息结合到其医学记录中而成为可能。医学信息可以包括预防和健康信息。通过本发明提供给个体患者的信息将能够使患者作出对于其保健的明智选择。在这一方式中，患者能够选择可以帮助他们避免和／或延迟其个体基因组图谱(遗传的DNA)更可能导致的疾病。另外，患者将能够采用适合其个人本身的特定医疗需要的治疗方案。个体也将具有访问其基因型数据的能力，如果他们发生疾病并需要这一信息帮助其医生形成治疗对策。The delivery of this relevant information will allow the patient to take action in coordination with their physician. In particular, discussions between patients and their physicians can be enabled through personal portals and links to medical information and the integration of patients' genomic information into their medical records. Medical information can include preventive and health information. The information provided to individual patients by the present invention will enable patients to make informed choices about their healthcare. In this way, patients are able to make choices that help them avoid and/or delay diseases that are more likely to result from their individual genomic profile (inherited DNA). In addition, patients will be able to adopt treatment options tailored to their own specific medical needs. Individuals will also have the ability to access their genotype data should they develop a disease and need this information to help their doctors formulate treatment strategies.

基因型相关性信息也可与遗传咨询结合以用于向考虑生育的夫妇提出建议，以及提出对于母亲、父亲和／或孩子的潜在遗传关注。遗传顾问可以向具有显示增加的特定状态或疾病的风险的表型谱的注册用户提供信息和支持。他们可以解释关于该病症的信息、分析遗传模式和复发风险并与注册用户讨论可用选择。遗传顾问也可以提供支持性咨询以向注册用户推荐社区或国家支持服务。遗传咨询可以包括特定注册计划。在一些实施方式中，遗传咨询可以安排在所请求的24小时内且可在如晚上、星期六、星期日和／或假目的时间内利用。Genotype association information can also be used in conjunction with genetic counseling to advise couples considering reproduction and to address potential genetic concerns for the mother, father, and/or child. A genetic counselor can provide information and support to registered users with a phenotypic profile that shows an increased risk of a particular condition or disease. They can interpret information about the condition, analyze inheritance patterns and risk of recurrence, and discuss available options with registered users. Genetic counselors can also provide supportive counseling and refer registered users to community or national support services. Genetic counseling can include certain registered programs. In some embodiments, genetic counseling can be scheduled within 24 hours of request and is available during hours such as evenings, Saturdays, Sundays, and/or holidays.

个体的入口也将便于传递初始筛查以外的附加信息。个体将被告知有关其个人遗传图谱的新的科学发现，例如关于其目前或潜在状态的新的治疗或预防对策的信息。新发现也可以传递给其保健管理者。在优选实施方式中，通过电子向邮件注册用户或其保健提供者通告关于注册用户的表型谱中的表型的新基因型相关性和新研究。在其它实施方式中，将“娱乐”表型的电子邮件发送给注册用户，例如电子信件可以告知他们其基因组图谱的77％与阿伯拉罕·林肯的基因组图谱相同以及进一步的信息通过在线入口提供。Individual entry will also facilitate the delivery of additional information beyond the initial screening. Individuals will be informed of new scientific discoveries about their personal genetic profile, such as information about new therapeutic or preventive strategies for their current or potential condition. New findings can also be passed on to their care managers. In a preferred embodiment, the registered user or his healthcare provider is notified by electronic mail of new genotype correlations and new studies regarding phenotypes in the registered user's phenotype profile. In other embodiments, emails of "recreational" phenotypes are sent to registered users, e.g. an e-letter may inform them that their genome profile is 77% identical to that of Abraham Lincoln and further information is available through an online portal supply.

本发明也提供了一种用于生成新规则、修正规则、组合规则、定期用新规则更新规则集、安全地维持基因组图谱数据库、将规则应用于基因组图谱以确定表型谱和用于生成报告的计算机代码系统。计算机代码告知注册用户新的或者修正的相关性和新的或者修正的报告，例如具有新的预防和健康信息、关于开发中的新治疗方法的信息或可获得的新治疗的报告。The present invention also provides a method for generating new rules, amending rules, combining rules, regularly updating rule sets with new rules, securely maintaining a genome map database, applying rules to a genome map to determine phenotypic profiles and for generating reports computer code system. The computer code notifies registered users of new or revised correlations and new or revised reports, for example, reports with new prevention and health information, information about new treatments in development, or new treatments available.

商业方法business method

本发明提供了一种商业方法，该方法基于患者的基因组图谱与已确立的医学相关核苷酸变型的临床数据库的比较来评估个体的基因型相关性。本发明进一步提供了一种商业方法，该方法使用存储的个体基因组图谱评估初始未知的新的相关性以生成个体的更新表型谱，而无需个体提交另外的生物样品。图9为举例说明该商业方法的流程图。The present invention provides a commercial method for assessing an individual's genotype correlation based on a comparison of a patient's genomic profile to established clinical databases of medically relevant nucleotide variants. The present invention further provides a commercial method for evaluating initially unknown new correlations using a stored individual's genomic profile to generate an updated phenotypic profile of an individual without requiring the individual to submit additional biological samples. Figure 9 is a flowchart illustrating the business method.

在个体因为多种常见人类疾病、状态和身体状态的基因型相关性而最初请求和购买个人基因组图谱时，在步骤101中部分地产生本发明的商业方法的收入流。请求和购买可以通过许多来源进行，包括但不限于在线网络入口、在线健康服务和个体的个人医生或者类似的个人医疗关注的来源。在替代的实施方式中，基因组图谱可以免费提供，并且可以在随后的步骤(例如步骤103)中生成收入流。The revenue stream of the business method of the present invention is generated in part in step 101 when an individual initially requests and purchases a personal genome profile for genotypic correlations of various common human diseases, conditions, and body states. Requests and purchases can be made through many sources including, but not limited to, online web portals, online health services, and an individual's personal physician or similar source of personal medical attention. In an alternative embodiment, the genomic profile may be provided free of charge, and a revenue stream may be generated in a subsequent step (eg, step 103).

注册用户或者消费者作出购买表型谱的请求。响应于需求和购买向消费者提供采集试剂盒以用于采集在步骤103中进行遗传样品分离的生物样品。当在线、通过电话或其它消费者不易于亲身获得采集试剂盒的来源作出请求时，通过快递提供采集试剂盒，例如提供当日或隔夜交付的速递服务。采集试剂盒中包括的是样品的容器以及用于将样品快速递送至生成基因组图谱的实验室的包装材料。试剂盒也可以包括将样品送至样品处理机构或实验室的说明和访问其基因组图谱和表型谱的说明，这可以通过在线入口进行。A registered user or customer makes a request to purchase a phenotype profile. Collection kits are provided to consumers in response to demand and purchase for collection of the biological sample subjected to genetic sample isolation in step 103 . Provide collection kits by courier service, such as courier service offering same-day or overnight delivery, when requested online, by phone, or by other sources where collection kits are not readily available to consumers in person. Included in the collection kit are containers for the sample and packaging materials for rapid delivery of the sample to the laboratory where the genomic profile will be generated. The kit may also include instructions for sending the sample to a sample processing facility or laboratory and for accessing its genomic and phenotypic profiles, which may be done through an online portal.

正如以上所详细说明的，可以从多种类型的生物样品中的任何一种类型获得基因组DNA。优选地，使用商购的采集试剂盒(例如从DNAGenotek购得的试剂盒)从唾液中分离基因组DNA。唾液和这一试剂盒的使用使得能够进行无损伤样品采集，因为消费者很方便在来自采集试剂盒的容器中提供唾液样品，然后密封该容器。另外，唾液样品可以在室温下储存和运输。As detailed above, genomic DNA can be obtained from any of a variety of types of biological samples. Preferably, genomic DNA is isolated from saliva using a commercially available collection kit (eg, from DNAGenotek). The use of saliva and this kit enables atraumatic sample collection as it is convenient for the consumer to provide a saliva sample in a container from the collection kit and then seal the container. Alternatively, saliva samples can be stored and shipped at room temperature.

在将生物样品存放在采集或标本容器中后，在步骤105中消费者把样品递送至进行处理的实验室。典型地，通过例如同日或隔夜快递服务的快速递送，消费者可以使用在采集试剂盒中提供的包装材料将样品递送／寄送至实验室。After depositing the biological sample in a collection or specimen container, in step 105 the consumer delivers the sample to a laboratory for processing. Typically, the consumer can deliver/ship the sample to the laboratory using the packaging materials provided in the collection kit by expedited delivery, eg, same day or overnight courier service.

处理样品并生成基因组图谱的实验室可以遵循适当的政府机构指导和规定。例如，在美国，处理实验室可以被例如食品与药品管理局(FDA)或医疗保险和医疗补助服务中心(CentersforMedicareandMedicaidServices)(CMS)的一个或多个联邦机构和／或一个或多个州立机构管理。在美国，可以依照1988年的ClinicalLaboratoryImprovementAmendments(CLIA)授权或批准临床实验室。Laboratories that process samples and generate genomic profiles can follow appropriate government agency guidance and regulations. For example, in the United States, processing laboratories may be regulated by one or more federal agencies and/or one or more state agencies, such as the Food and Drug Administration (FDA) or the Centers for Medicare and Medicaid Services (CMS) . In the United States, clinical laboratories may be authorized or approved under the Clinical Laboratory Improvement Amendments (CLIA) of 1988.

在步骤107中，如先前描述的实验室对样品进行处理以分离DNA或RNA的遗传样品。然后，在步骤109中，对分离的遗传样品进行分析和生成基因组图谱。优选地，生成基因组SNP分布图。如上所述，可以使用几种方法生成SNP分布图。优选地，高密度阵列(例如来自Affymetrix或Illumina的商购平台)用于SNP鉴定和分布图生成。例如，如以上更详细地描述的，使用AffymetrixGeneChipassay生成SNP分布图。随着技术发展，可能会有能生成高密度SNP分布图的其它技术供应商。在另一实施方式中，注册用户的基因组图谱将是注册用户的基因组序列。In step 107, the sample is processed by the laboratory as previously described to isolate a genetic sample of DNA or RNA. Then, in step 109, the isolated genetic sample is analyzed and a genome map is generated. Preferably, a genomic SNP profile is generated. As mentioned above, SNP profiles can be generated using several methods. Preferably, high density arrays (such as commercially available platforms from Affymetrix or Illumina) are used for SNP identification and profile generation. For example, SNP profiles were generated using the AffymetrixGeneChipassay as described in more detail above. As technology develops, there may be other technology providers that can generate high-density SNP profiles. In another embodiment, the registered user's genome profile will be the registered user's genome sequence.

在生成个体的基因组图谱之后，在步骤111中，优选对基因型数据进行加密、输入，并且在步骤113中将该数据存放在加密数据库或者保险库中，其中信息存储以备将来使用。基因组图谱和有关信息可以是机密的，按照个体和／或者他或她的个人医生的指令对访问这一私有信息和基因组图谱进行限制。其他人(例如个体的家人和遗传顾问)也可以由注册用户许可访问。After the individual's genomic profile is generated, in step 111 the genotype data is preferably encrypted, entered, and in step 113 the data is stored in an encrypted database or vault where the information is stored for future use. The genomic profile and related information may be confidential, and access to this private information and genomic profile is limited as directed by the individual and/or his or her personal physician. Others (such as an individual's family and genetic counselors) may also be granted access by registered users.

数据库或保险库可以就地位于处理实验室处。或者，数据库可以位于独立的场所。在这一情况下，在步骤111中可以将由处理实验室生成的基因组图谱数据输送到包括数据库的单独的机构。The database or vault may be located locally at the processing laboratory. Alternatively, the database can be located at a separate location. In this case, the genomic profile data generated by the processing laboratory may be delivered in step 111 to a separate institution including a database.

在生成个体的基因组图谱之后，随后在步骤115中将个体的遗传变异与已确定的医学上相关的遗传性变型的临床数据库相对比。或者，基因型相关性可以不是医学相关的但仍包括在基因型相关性数据库中，例如，如眼睛颜色的身体性状，或者如与名人基因组图谱的相似性的“娱乐”表型。After the individual's genomic profile is generated, the individual's genetic variation is then compared in step 115 to a clinical database of identified medically relevant genetic variants. Alternatively, a genotype correlation may not be medically relevant but still be included in a genotype correlation database, eg, a physical trait like eye color, or a "recreational" phenotype like similarity to a celebrity's genome profile.

医学上相关的SNP可以通过科学文献和相关来源建立。也可以建立非SNP遗传性变型以与表型相关联。通常，通过将已知己具有疾病的一组人的单体型模式与没有疾病的一组人相比较来建立给定疾病的SNP相关性。通过分析许多个体，可以确定在群体中多态性的频率，并且随之这些基因型频率可以与特定表型(例如疾病或状态)相关联。或者，表型可以是非医学状态。Medically relevant SNPs can be established through scientific literature and related sources. Non-SNP genetic variants can also be established to correlate with phenotypes. Typically, SNP correlations for a given disease are established by comparing the haplotype patterns of a group of people known to have the disease to those without the disease. By analyzing many individuals, the frequency of a polymorphism in a population can be determined, and these genotype frequencies can then be correlated with a particular phenotype (eg, a disease or condition). Alternatively, the phenotype can be a non-medical state.

也可以通过分析存储的个体基因组图谱确定相关的SNP和非SNP遗传性变型，而不是通过可利用的公开文献确定。具有存储的基因组图谱的个体可以揭示先前已经确定的表型。可以将对个体的基因型和揭示的表型的分析与没有该表型的个体相对比以确定而后可以用于其它基因组图谱的相关性。确定其基因组图谱的个体可以填写关于先前已经确定的表型的调查表。调查表可以包括有关医学和非医学状态的问题，例如先前诊断的疾病、医学状态的家族史、生活方式、身体性状、精神性状、年龄、社会生活、环境等。Associated SNP and non-SNP genetic variants can also be determined by analysis of stored individual genome profiles rather than from available published literature. Individuals with stored genomic profiles can reveal previously identified phenotypes. Analysis of an individual's genotype and revealed phenotype can be compared to individuals without the phenotype to determine correlations that can then be used for other genomic profiles. Individuals whose genomic profiles are determined may fill out questionnaires regarding previously identified phenotypes. The questionnaire can include questions about medical and non-medical conditions such as previously diagnosed diseases, family history of medical conditions, lifestyle, physical traits, mental traits, age, social life, environment, etc.

在一个实施方式中，如果个体填写了调查表，他们就可以免费确定其基因组图谱。在一些实施方式中，个体定期填写调查表以免费访问其表型谱和报告。在其它实施方式中，填写了调查表的个体可以给予注册升级，以便他们具有比其先前的注册水平更高的访问权限，或者他们可以以较低的价格购买或更新注册。In one embodiment, individuals can have their genome profile determined for free if they fill out a questionnaire. In some embodiments, individuals periodically fill out questionnaires to gain free access to their phenotype profiles and reports. In other embodiments, individuals who complete the questionnaire can be given a registration upgrade so that they have a higher level of access than their previous registration, or they can purchase or renew their registration at a lower price.

为了保证科学精确性和重要性，在步骤121中存放在医学相关的遗传性变型数据库中的所有信息首先由研究/临床顾问组核准，同时如果在步骤119中被授权的话，由适当的政府机构检查和监督。例如在美国，FDA可以通过核准用于确认遗传性变型(通常为SNP、转录物水平或突变)相关数据的算法进行监督。在步骤123中，为了附加的遗传性变型-疾病或者状态相关性，对科学文献和其它相关来源进行监控，并且在确认它们的精确性和重要性后，以及经过政府机构的检查和批准，这些附加的基因型相关性步骤125中加入主数据库中。In order to ensure scientific accuracy and significance, all information deposited in the medically relevant genetic variant database in step 121 is first approved by the research/clinical advisory group and, if authorized in step 119, by the appropriate government agencies Inspection and supervision. In the United States, for example, FDA can provide oversight by approving algorithms used to validate data associated with genetic variants (typically SNPs, transcript levels, or mutations). In step 123, the scientific literature and other relevant sources are monitored for additional genetic variant-disease or state correlations, and after confirming their accuracy and importance, and after review and approval by government agencies, these Additional genotype correlations are added to the master database in step 125 .

经核准和验证的医学相关遗传性变型的数据库与全基因组个体图谱相结合将有利地允许对大量疾病或状态进行遗传风险评估。在汇编个体的基因组图谱之后，可以通过将个体的核苷酸(遗传)变型或遗传标记与已经与特定表型(例如疾病、状态或身体状态)相关联的人类核苷酸变型的数据库相比较而确定个体基因型相关性。通过将个体基因组图谱与基因型相关性的主数据库相比较，可以告知个体是否发现他们对于遗传危险因子是阳性或阴性的以及程度如何。个体将收到有关大范围的经科学验证的疾病状态(例如，阿尔茨海默氏病、心血管病、凝血)的相对风险和／或患病体质数据。例如，可以包括表1中的基因型相关性。另外，数据库中的SNP疾病相关性可以包括，但不限于图4中所示的那些相关性。也可以包括图5和6中的其它相关性。本发明的商业方法因此提供了对于大量疾病和状态的风险分析而无需预先了解那些疾病和状态可能导致什么风险。Databases of approved and validated medically relevant genetic variants combined with genome-wide individual profiles will advantageously allow genetic risk assessment for a large number of diseases or conditions. After compiling an individual's genomic profile, it can be done by comparing the individual's nucleotide (genetic) variants or genetic markers to a database of human nucleotide variants that have been associated with a particular phenotype (e.g., disease, condition, or body state) while determining individual genotype correlations. By comparing an individual's genomic profile to a master database of genotype correlations, individuals can be informed whether and to what extent they are found to be positive or negative for genetic risk factors. Individuals will receive relative risk and/or predisposition data for a wide range of scientifically validated disease states (eg, Alzheimer's disease, cardiovascular disease, blood coagulation). For example, the genotype correlations in Table 1 can be included. Additionally, SNP disease correlations in the database may include, but are not limited to, those correlations shown in FIG. 4 . Other dependencies in FIGS. 5 and 6 may also be included. The business methods of the present invention thus provide risk analysis for a large number of diseases and conditions without prior knowledge of what risks those diseases and conditions may cause.

在其它实施方式中，与全基因组个体图谱相结合的基因型相关性为非医学相关表型，例如“娱乐”表型或例如发色的身体性状。在优选的实施方式中，如上所述，将规则或规则集应用于个体的基因组图谱或SNP分布图。将规则应用于基因组图谱生成对于个体的表型谱。In other embodiments, the genotype correlations combined with the genome-wide individual profile are non-medically relevant phenotypes, such as "recreational" phenotypes or physical traits such as hair color. In a preferred embodiment, a rule or set of rules is applied to an individual's genomic profile or SNP profile, as described above. Applying the rules to the genomic profile generates a phenotypic profile for the individual.

因此，当发现和验证新的相关性时，用附加的基因型相关性扩展人类基因型相关性的主数据库。在需要时或适当时，可以通过访问来自存储在数据库中的个体基因组图谱中的相关信息进行更新。例如，获知的新基因型相关性可以基于特定的基因变型。然后，可以通过仅仅获取和比较个体的完整基因组图谱中仅该基因的部分而确定个体是否可能受该新的基因型相关性的影响。Thus, the master database of human genotype correlations is expanded with additional genotype correlations as new correlations are discovered and validated. When needed or appropriate, it can be updated by accessing relevant information from the individual's genome profile stored in the database. For example, learned novel genotype associations may be based on specific genetic variants. Whether an individual is likely to be affected by this new genotype correlation can then be determined by obtaining and comparing only that gene's portion of the individual's complete genome profile.

优选对基因组查询的结果进行分析和解释以便以可以理解的形式呈递给个体。然后，在步骤117中，如上面详细描述的通过邮寄或通过在线入口界面以安全、机密的方式向患者提供初始筛查的结果。The results of the genomic query are preferably analyzed and interpreted for presentation to the individual in an understandable form. Then, in step 117, the results of the initial screening are provided to the patient in a secure, confidential manner by mail or through an online portal interface as described in detail above.

报告可以包括表型谱以及关于表型谱中表型的基因组信息，例如，关于所涉及的基因的基本遗传学信息或者遗传性变型在不同群体中的统计学信息。可以包括在报告中的基于表型谱的其它信息是预防对策、健康信息、治疗方法、症状认识、早期检测方案、介入方案以及表型的进一步鉴定和分类。在个体基因组图谱的初始筛查之后，进行或可以进行可控的、适度的更新。A report may include a phenotype profile as well as genomic information about the phenotypes in the phenotype profile, for example, basic genetic information about the genes involved or statistical information about genetic variants in different populations. Other information based on phenotype profiles that can be included in the report are preventive strategies, health information, treatment methods, symptom awareness, early detection protocols, intervention protocols, and further identification and classification of phenotypes. After the initial screening of an individual's genomic profile, controlled, modest updates are or can be made.

当新的基因型相关性出现并且被验证和核准时，结合主数据库的更新，对个体基因组图谱进行更新或者可获得更新。基于新的基因型相关性的新规则可以应用于初始基因组图谱以提供更新的表型谱。在步骤127中通过将个体的基因组图谱的相关部分与新的基因型相关性相比较，可以生成更新的基因型相关性分布图。例如，如果基于特定基因中的变异发现新的基因型相关性，则可以就新的基因型相关性对个体基因组图谱的该基因部分进行分析。在这一情况下，可以将一条或多条规则应用于生成更新的表型谱，而不是用具有已经应用的规则的整个规则集更新表型谱。在步骤129中，以加密的方式提供个体的更新基因型相关性的结果。Updates to the individual's genome profile are made or available in conjunction with updates to the master database as new genotype correlations arise and are validated and approved. New rules based on new genotype correlations can be applied to the initial genomic profile to provide an updated phenotypic profile. An updated genotype correlation profile may be generated in step 127 by comparing the relevant portion of the individual's genomic profile with the new genotype correlation. For example, if a new genotype association is discovered based on a variation in a particular gene, that gene portion of the individual's genomic profile can be analyzed for the new genotype association. In this case, one or more rules may be applied to generate an updated phenotype profile, rather than updating the phenotype profile with the entire rule set with rules already applied. In step 129, the results of the individual's updated genotype correlations are provided in encrypted form.

初始的和更新的表型谱可以是提供给注册用户或消费者的服务。可以提供基因组图谱分析的不同注册水平及其组合。同样地，注册水平可以发生变化以向个体提供他们希望接受的具有其基因型相关性的服务量的选择。这样，提供的服务等级将随着个体购买的服务注册水平发生改变。Initial and updated phenotype profiles may be a service provided to registered users or customers. Different registration levels and combinations thereof for genomic profiling are available. Likewise, enrollment levels can be varied to provide individuals with a choice of the amount of services they wish to receive in relation to their genotype. In this way, the level of service offered will vary with the level of service registration an individual purchases.

注册用户的入门级注册可以包括基因组图谱和初始表型谱。这可以是基础注册水平。在基础注册水平内可以有不同的服务等级。例如，特定的注册水平可以提供对于遗传咨询、在治疗或预防特定疾病方面具有特别专业知识的医生和其它服务选项的介绍。可以在线或通过电话获得遗传咨询。在另一实施方式中，注册的价格可能取决于个体选择用于其表型谱的表型的数量。另一选项可能为是否注册用户选择访问在线遗传咨询。An entry-level registration for a registered user can include a genomic profile and an initial phenotypic profile. This can be the base registration level. There can be different service levels within the base registration level. For example, specific registration levels may provide access to genetic counseling, physicians with particular expertise in treating or preventing specific diseases, and other service options. Genetic counseling is available online or by phone. In another embodiment, the price of registration may depend on the number of phenotypes an individual chooses for their phenotype profile. Another option might be whether registered users choose to access online genetic counseling.

在另一情况中，注册可以提供初始的全基因组的基因型相关性，同时在数据库中维持个体的基因组图谱；如果个体如此选择的话，这一数据库可以是加密的。在这一初始分析之后，后续分析和附加的结果可以在个体请求和另外付款时完成。这可以是高级注册。In another instance, a registry can provide an initial genome-wide genotype correlation while maintaining an individual's genomic profile in a database; this database can be encrypted if the individual so chooses. After this initial analysis, subsequent analysis and additional results can be done upon individual request and additional payment. This can be a premium registration.

在本发明商业方法的一个实施方式中，进行个体风险的更新并且在注册基础上可以向个体提供相应信息。购买高级注册的注册用户可以获得更新。对于基因型相关性分析的注册可以根据个体偏好提供新基因型相关性的特定类型或亚类的更新。例如，个体可能仅希望获悉存在已知治疗或预防过程的基因型相关性。为了帮助个体决定是否进行另外的分析，可以向个体提供关于已可利用的另外的基因型相关性的信息。这一信息可以方便地邮寄或发送电子邮件给注册用户。In one embodiment of the business method according to the invention, an update of the individual's risk is carried out and corresponding information can be provided to the individual on the basis of registration. Updates are available to registered users who purchase premium registrations. Registration for genotype correlation analysis can provide updates for specific types or subclasses of new genotype correlations according to individual preferences. For example, an individual may only wish to be informed of genotype correlations for which there is a known course of treatment or prevention. To assist an individual in deciding whether to perform additional analyses, the individual can be provided with information about additional genotype correlations that are already available. This information can be conveniently mailed or emailed to registered users.

在高级注册中，可以存在更多的服务等级，例如在基础注册中所提及的那些。可以在高等级中提供其它的注册模式。例如，最高等级可以向注册用户提供无限制的更新和报告。当确定新的相关性和规则时，可以更新注册用户的分布图。在这一等级中，注册用户也可以允许无限制数目的个体进行访问，例如家庭成员和保健管理者。注册用户也可以无限制地访问在线遗传顾问和医生。In advanced registration there can be more service levels such as those mentioned in basic registration. Other registration modes may be provided at a high level. For example, the highest tier may provide registered users with unlimited updates and reports. The profile of registered users may be updated as new correlations and rules are determined. At this level, registered users can also allow access to an unlimited number of individuals, such as family members and healthcare managers. Registered users also get unlimited access to online genetic counselors and doctors.

在高等级内的下一注册水平可以提供更多限制的方面，例如有限次数的更新。注册用户可以在注册期间内对其基因组图谱进行有限次数的更新，例如，一年4次。在另一注册水平中，注册用户可以一周一次、一月一次或一年一次对其存储的基因组图谱进行更新。在另一实施方式中，注册用户仅可以具有可以选择更新其基因组图谱的有限数目的表型。The next level of registration within a high level may offer more restricted aspects, such as a limited number of updates. Registered users can update their genome profile a limited number of times during the registration period, for example, 4 times a year. In another level of registration, registered users can update their stored genomic profiles once a week, once a month, or once a year. In another embodiment, registered users may only have a limited number of phenotypes that they may choose to update their genomic profile.

个人入口也将方便地使个体能够维持对于风险或相关性更新和／或信息更新的注册，或者请求更新的风险评估和信息。如上所述，可以提供不同的注册水平以使个体能够选择各种水平的基因型相关性结果和更新，并且注册用户可以通过其个人入口选择不同注册水平。The personal portal will also conveniently enable individuals to maintain registration for risk or relevance updates and/or information updates, or to request updated risk assessments and information. As noted above, different enrollment levels may be provided to enable individuals to select various levels of genotype correlation results and updates, and registered users may select different enrollment levels through their personal portal.

这些注册选项中的任一项将对本发明商业方法的收入流作出贡献。本发明商业方法的收入流也通过添加新的消费者和注册用户而增加，其中新的基因组图谱加入到数据库中。Any of these registration options will contribute to the revenue stream of the present invention business method. The revenue stream of the business method of the present invention is also increased by adding new customers and registered users, where new genome profiles are added to the database.

表1：具有与表型相关的遗传性变型的典型基因。Table 1: Typical genes with genetic variants associated with phenotypes.

基因Gene 表型Phenotype A2MA2M 阿尔茨海默氏病Alzheimer's disease ABCA1ABCA1 胆固醇，HDLCholesterol, HDL ABCB1ABCB1 HIVHIV ABCB1ABCB1 癫痫epilepsy ABCB1ABCB1 肾移植并发症Kidney Transplant Complications ABCB1ABCB1 地高辛，血清浓度digoxin, serum concentration ABCB1ABCB1 克罗恩氏病；溃疡性结肠炎Crohn's disease; ulcerative colitis ABCB1ABCB1 帕金森氏病Parkinson's disease ABCC8ABCC8 2型糖尿病type 2 diabetes ABCC8ABCC8 糖尿病，2型diabetes, type 2 ABOABOs 心肌梗死myocardial infarction ACADMACADM 中链酰基-CoA脱氢酶缺乏Medium-chain acyl-CoA dehydrogenase deficiency ACDCACDC 2型，糖尿病type 2, diabetes ACEACE 2型糖尿病type 2 diabetes ACEACE 高血压hypertension ACEACE 阿尔茨海默氏病Alzheimer's disease ACEACE 心肌梗死myocardial infarction ACEACE 心血管的cardiovascular ACEACE 左心室肥大left ventricular hypertrophy

基因Gene 表型Phenotype ACEACE 冠状动脉疾病coronary artery disease ACEACE 动脉粥样硬化，冠状atherosclerosis, coronary ACEACE 视网膜病，糖尿病的retinopathy, diabetic ACEACE 系统性红斑狼疮systemic lupus erythematosus ACEACE 血压，动脉的blood pressure, arterial ACEACE 勃起机能障碍erectile dysfunction ACEACE 狼疮lupus ACEACE 多囊性肾病polycystic kidney disease ACEACE 中风stroke ACP1ACP1 糖尿病，1型diabetes, type 1 ACSM1(LIP)cACSM1(LIP)c 胆固醇水平cholesterol level ADAM33ADAM33 哮喘asthma ADD1ADD1 高血压hypertension ADD1ADD1 血压，动脉的blood pressure, arterial ADH1BADH1B 酒精滥用alcohol abuse ADH1CADH1C 酒精滥用alcohol abuse ADIPOQADIPOQ 糖尿病，2型diabetes, type 2 ADIPOQADIPOQ 肥胖obesity ADORA2AADORA2A 恐慌病panic attack ADRB1ADRB1 高血压hypertension ADRB1ADRB1 心力衰竭heart failure ADRB2ADRB2 哮喘asthma ADRB2ADRB2 高血压hypertension ADRB2ADRB2 肥胖obesity ADRB2ADRB2 血压，动脉的blood pressure, arterial ADRB2ADRB2 2型糖尿病type 2 diabetes ADRB3ADRB3 肥胖obesity

基因Gene 表型Phenotype ADRB3ADRB3 2型糖尿病type 2 diabetes ADRB3ADRB3 高血压hypertension AGTAGT 高血压hypertension AGTAGT 2型糖尿病type 2 diabetes AGTAGT 原发性高血压essential hypertension AGTAGT 心肌梗死myocardial infarction AGTR1AGTR1 高血压hypertension AGTR2AGTR2 高血压hypertension AHRAHR 乳腺癌breast cancer ALADALAD 铅毒性lead toxicity ALDH2ALDH2 酒精中毒alcoholism ALDH2ALDH2 酒精滥用alcohol abuse ALDH2ALDH2 结肠直肠癌colorectal cancer ALDRL2ALDRL2 2型糖尿病type 2 diabetes ALOX5ALOX5 哮喘asthma ALOX5APALOX5AP 哮喘asthma APBB1APBB1 阿尔茨海默氏病Alzheimer's disease APCAPCs 结肠直肠癌colorectal cancer APEX1APEX1 肺癌lung cancer APOA1APOA1 动脉粥样硬化，冠状的Atherosclerosis, coronary APOA1APOA1 胆固醇，HDLCholesterol, HDL APOA1APOA1 冠状动脉疾病coronary artery disease APOA1APOA1 2型糖尿病type 2 diabetes APOA4APOA4 2型糖尿病type 2 diabetes APOA5APOA5 甘油三酯Triglycerides APOA5APOA5 动脉粥样硬化，冠状的Atherosclerosis, coronary APOBAPOB 高胆固醇血症Hypercholesterolemia

基因Gene 表型Phenotype APOBAPOB 肥胖obesity APOBAPOB 心血管的cardiovascular APOBAPOB 冠状动脉疾病coronary artery disease APOBAPOB 冠心病coronary heart disease APOBAPOB 2型糖尿病type 2 diabetes APOC1APOC1 阿尔茨海默氏病Alzheimer's disease APOC3APOC3 甘油三酯Triglycerides APOC3APOC3 2型糖尿病type 2 diabetes APOEAPOE 阿尔茨海默氏病Alzheimer's disease APOEAPOE 2型糖尿病type 2 diabetes APOEAPOE 多发性硬化症multiple sclerosis APOEAPOE 动脉粥样硬化，冠状的Atherosclerosis, coronary APOEAPOE 帕金森氏病Parkinson's disease APOEAPOE 冠心病coronary heart disease APOEAPOE 心肌梗死myocardial infarction APOEAPOE 中风stroke APOEAPOE 阿尔茨海默氏病Alzheimer's disease APOEAPOE 冠状动脉疾病coronary artery disease APPapp 阿尔茨海默氏病Alzheimer's disease ARAR 前列腺癌prostate cancer ARAR 乳腺癌breast cancer ATMATMs 乳腺癌breast cancer ATP7BATP7B 威尔逊病Wilson disease ATXN8OSATXN8OS 脊髓小脑性共济失调spinocerebellar ataxia BACE1BACE1 阿尔茨海默氏病Alzheimer's disease BCHEBCHE 阿尔茨海默氏病Alzheimer's disease BDKRB2BDKRB2 高血压hypertension

基因Gene 表型Phenotype BDNFBDNF 阿尔茨海默氏病Alzheimer's disease BDNFBDNF 双相性精神障碍bipolar disorder BDNFBDNF 帕金森氏病Parkinson's disease BDNFBDNF 精神分裂症schizophrenia BDNFBDNF 记忆力memory BGLAPBGLAP 骨密度bone density BRAFBRAF 甲状腺癌Thyroid cancer BRCA1BRCA1 乳腺癌breast cancer BRCA1BRCA1 乳腺癌；卵巢癌breast cancer; ovarian cancer BRCA1BRCA1 卵巢癌ovarian cancer BRCA2BRCA2 乳腺癌breast cancer BRCA2BRCA2 乳腺癌；卵巢癌breast cancer; ovarian cancer BRCA2BRCA2 卵巢癌ovarian cancer BRIP1BRIP1 乳腺癌breast cancer C4AC4A 系统性红斑狼疮systemic lupus erythematosus CALCRCALCR 骨密度bone density CAMTA1CAMTA1 情景记忆episodic memory CAPN10CAPN10 糖尿病，2型diabetes, type 2 CAPN10CAPN10 2型糖尿病type 2 diabetes CAPN3CAPN3 肌肉萎缩症muscular dystrophy CARD15CARD15 克罗恩氏病Crohn's disease CARD15CARD15 克罗恩氏病；溃疡性结肠炎Crohn's disease; ulcerative colitis CARD15CARD15 炎性肠病Inflammatory bowel disease CARTCART 肥胖obesity CASRCASR 骨密度bone density CCKARCCKAR 精神分裂症schizophrenia CCL2CCL2 系统性红斑狼疮systemic lupus erythematosus

基因Gene 表型Phenotype CCL5CCL5 HIVHIV CCL5CCL5 哮喘asthma CCND1CCND1 结肠直肠癌colorectal cancer CCR2CCR2 HIVHIV CCR2CCR2 HIV感染HIV infection CCR2CCR2 丙型肝炎Hepatitis C CCR2CCR2 心肌梗塞myocardial infarction CCR3CCR3 哮喘asthma CCR5CCR5 HIVHIV CCR5CCR5 HIV感染HIV infection CCR5CCR5 丙型肝炎Hepatitis C CCR5CCR5 哮喘asthma CCR5CCR5 多发性硬化症multiple sclerosis CD14CD14 特异反应性(atopy)Atopy CD14CD14 哮喘asthma CD14CD14 克罗恩氏病Crohn's disease CD14CD14 克罗恩氏病；溃疡性结肠炎Crohn's disease; ulcerative colitis CD14CD14 牙周炎Periodontitis CD14CD14 总IgEtotal IgE CDH1CDH1 前列腺癌prostate cancer CDH1CDH1 结肠直肠癌colorectal cancer CDKN2ACDKN2A 黑素瘤melanoma CDSNCDSN 牛皮癣psoriasis CEBPACEBPA 白血病，骨髓的leukemia, bone marrow CETPCETP 动脉粥样硬化，冠状的Atherosclerosis, coronary CETPCETP 冠心病coronary heart disease CETPCETP 高胆固醇血症Hypercholesterolemia

基因Gene 表型Phenotype CFHCFH 黄斑变性macular degeneration CFTRCFTR 囊性纤维病cystic fibrosis CFTRCFTR 胰腺炎pancreatitis CFTRCFTR 囊性纤维病cystic fibrosis CHATCHAT 阿尔茨海默氏病Alzheimer's disease CHEK2CHEK2 乳腺癌breast cancer CHRNA7CHRNA7 精神分裂症schizophrenia CMA1CMA1 特应性皮炎atopic dermatitis CNR1CNR1 精神分裂症schizophrenia COL1A1COL1A1 骨密度bone density COL1A1COL1A1 骨质疏松症osteoporosis COL1A2COL1A2 骨密度bone density COL2A1COL2A1 骨关节炎Osteoarthritis COMTCOMT 精神分裂症schizophrenia COMTCOMT 乳腺癌breast cancer COMTCOMT 帕金森氏病Parkinson's disease COMTCOMT 双相性精神障碍bipolar disorder COMTCOMT 强迫性神经症obsessive-compulsive neurosis COMTCOMT 酒精中毒alcoholism CR1CR1 系统性红斑狼疮systemic lupus erythematosus CRPCRP C-反应蛋白C-reactive protein CST3CST3 阿尔茨海默氏病Alzheimer's disease CTLA4CTLA4 1型糖尿病type 1 diabetes CTLA4CTLA4 格雷夫斯氏病Graves' disease CTLA4CTLA4 多发性硬化症multiple sclerosis CTLA4CTLA4 类风湿性关节炎rheumatoid arthritis CTLA4CTLA4 系统性红斑狼疮systemic lupus erythematosus

基因Gene 表型Phenotype CTLA4CTLA4 红斑狼疮lupus erythematosus CTLA4CTLA4 乳糜泻celiac disease CTSDCTSD 阿尔茨海默氏病Alzheimer's disease CX3CR1CX3CR1 HIVHIV CXCL12CXCL12 HIVHIV CXCL12CXCL12 HIV感染HIV infection CYBACYBA 动脉粥样硬化，冠状的Atherosclerosis, coronary CYBACYBA 高血压hypertension CYP11B2CYP11B2 高血压hypertension CYP11B2CYP11B2 左心室肥大left ventricular hypertrophy CYP17A1CYP17A1 乳腺癌breast cancer CYP17A1CYP17A1 前列腺癌prostate cancer CYP17A1CYP17A1 子宫内膜异位endometriosis CYP17A1CYP17A1 子宫内膜癌endometrial cancer CYP19A1CYP19A1 乳腺癌breast cancer CYP19A1CYP19A1 前列腺癌prostate cancer CYP19A1CYP19A1 子宫内膜异位endometriosis CYP1A1CYP1A1 肺癌lung cancer CYP1A1CYP1A1 乳腺癌breast cancer CYP1A1CYP1A1 结肠直肠癌colorectal cancer CYP1A1CYP1A1 前列腺癌prostate cancer CYP1A1CYP1A1 食管癌Esophageal cancer CYP1A1CYP1A1 子宫内膜异位endometriosis CYP1A1CYP1A1 细胞发生研究Cytogenesis Research CYP1A2CYP1A2 精神分裂症schizophrenia CYP1A2CYP1A2 结肠直肠癌colorectal cancer CYP1B1CYP1B1 乳腺癌breast cancer

基因Gene 表型Phenotype CYP1B1CYP1B1 青光眼glaucoma CYP1B1CYP1B1 前列腺癌prostate cancer CYP21A2CYP21A2 21-羟化酶缺失21-hydroxylase deficiency CYP21A2CYP21A2 先天性肾上腺增生congenital adrenal hyperplasia CYP21A2CYP21A2 肾上腺增生，先天的Adrenal hyperplasia, congenital CYP2A6CYP2A6 吸烟行为smoking behavior CYP2A6CYP2A6 烟碱Nicotine CYP2A6CYP2A6 肺癌lung cancer CYP2C19CYP2C19 幽门螺旋杆菌感染Helicobacter pylori infection CYP2C19CYP2C19 苯妥英Phenytoin CYP2C19CYP2C19 胃病stomach trouble CYP2C8CYP2C8 疟疾，恶性疟原虫Malaria, Plasmodium falciparum CYP2C9CYP2C9 抗凝血剂并发症Anticoagulant Complications CYP2C9CYP2C9 法华令敏感性Warren Sensitivity CYP2C9CYP2C9 法华林治疗，其反应warfarin therapy, its response CYP2C9CYP2C9 结肠直肠癌colorectal cancer CYP2C9CYP2C9 苯妥英Phenytoin CYP2C9CYP2C9 醋硝香豆醇反应Acenocoumarol reaction CYP2C9CYP2C9 凝血障碍coagulation disorder CYP2C9CYP2C9 高血压hypertension CYP2D6CYP2D6 结肠直肠癌colorectal cancer CYP2D6CYP2D6 帕金森氏病Parkinson's disease CYP2D6CYP2D6 CYP2D6不良代谢者表型CYP2D6 poor metabolizer phenotype CYP2E1CYP2E1 肺癌lung cancer CYP2E1CYP2E1 结肠直肠癌colorectal cancer CYP3A4CYP3A4 前列腺癌prostate cancer CYP3A5CYP3A5 前列腺癌prostate cancer

基因Gene 表型Phenotype CYP3A5CYP3A5 食管癌Esophageal cancer CYP46A1CYP46A1 阿尔茨海默氏病Alzheimer's disease DBHDBH 精神分裂症schizophrenia DHCR7DHCR7 史-伦-奥三氏综合症Steiner-Osman Syndrome DISC1DISC1 精神分裂症schizophrenia DLSTDLST 阿尔茨海默氏病Alzheimer's disease DMDDMD 肌肉萎缩症muscular dystrophy DRD2DRD2 酒精中毒alcoholism DRD2DRD2 精神分裂症schizophrenia DRD2DRD2 吸烟行为smoking behavior DRD2DRD2 帕金森氏病Parkinson's disease DRD2DRD2 迟发性运动障碍tardive dyskinesia DRD3DRD3 精神分裂症schizophrenia DRD3DRD3 迟发性运动障碍tardive dyskinesia DRD3DRD3 双相性精神障碍bipolar disorder DRD4DRD4 注意缺陷障碍[伴多动]attention deficit disorder [with hyperactivity] DRD4DRD4 精神分裂症schizophrenia DRD4DRD4 新异寻求(novelty seeking)novelty seeking DRD4DRD4 ADHDADHD DRD4DRD4 个性品质personality qualities DRD4DRD4 海洛因滥用heroin abuse DRD4DRD4 酒精滥用alcohol abuse DRD4DRD4 酒精中毒alcoholism DRD4DRD4 人格障碍personality disorder DTNBP1DTNBP1 精神分裂症schizophrenia EDN1EDN1 高血压hypertension EGFREGFR 肺癌lung cancer

基因Gene 表型Phenotype ELAC2ELAC2 前列腺癌prostate cancer ENPP1ENPP1 2型糖尿病type 2 diabetes EPHB2EPHB2 前列腺癌prostate cancer EPHX1EPHX1 肺癌lung cancer EPHX1EPHX1 结肠直肠癌colorectal cancer EPHX1EPHX1 细胞生成研究Cell Generation Research EPHX1EPHX1 慢性阻塞性肺病／COPDChronic Obstructive Pulmonary Disease/COPD ERBB2ERBB2 乳腺癌breast cancer ERCC1ERCC1 肺癌lung cancer ERCC1ERCC1 结肠直肠癌colorectal cancer ERCC2ERCC2 肺癌lung cancer ERCC2ERCC2 细胞生成研究Cell Generation Research ERCC2ERCC2 膀胱癌Bladder Cancer ERCC2ERCC2 结肠直肠癌colorectal cancer ESR1ESR1 骨密度bone density ESR1ESR1 骨矿物质密度bone mineral density ESR1ESR1 乳腺癌breast cancer ESR1ESR1 子宫内膜异位endometriosis ESR1ESR1 骨质疏松症osteoporosis ESR2ESR2 骨密度bone density ESR2ESR2 乳腺癌breast cancer 雌激素受体estrogen receptor 骨矿物质密度bone mineral density F2F2 冠心病coronary heart disease F2F2 中风stroke F2F2 血栓栓塞，静脉的Thromboembolism, Venous F2F2 先兆子痫preeclampsia F2F2 血栓症thrombosis

基因Gene 表型Phenotype F5F5 血栓栓塞，静脉的Thromboembolism, Venous F5F5 先兆子痫preeclampsia F5F5 心肌梗塞myocardial infarction F5F5 中风stroke F5F5 中风，局部缺血的stroke, ischemic F7F7 动脉粥样硬化，冠状的Atherosclerosis, coronary F7F7 心肌梗塞myocardial infarction F8F8 血友病hemophilia F9F9 血友病hemophilia FABP2FABP2 2型糖尿病type 2 diabetes FASFAS 阿尔茨海默氏病Alzheimer's disease FASLGFASLG 多发性硬化症multiple sclerosis FCGR2AFCGR2A 系统性红斑狼疮systemic lupus erythematosus FCGR2AFCGR2A 红斑狼疮lupus erythematosus FCGR2AFCGR2A 牙周炎Periodontitis FCGR2AFCGR2A 类风湿性关节炎rheumatoid arthritis FCGR2BFCGR2B 红斑狼疮lupus erythematosus FCGR2BFCGR2B 系统性红斑狼疮systemic lupus erythematosus FCGR3AFCGR3A 系统性红斑狼疮systemic lupus erythematosus FCGR3AFCGR3A 红斑狼疮lupus erythematosus FCGR3AFCGR3A 牙周炎Periodontitis FCGR3AFCGR3A 关节炎arthritis FCGR3AFCGR3A 类风湿性关节炎rheumatoid arthritis FCGR3BFCGR3B 牙周炎Periodontitis FCGR3BFCGR3B 牙周病periodontal disease FCGR3BFCGR3B 红斑狼疮lupus erythematosus FGBFGB 纤维蛋白原Fibrinogen

基因Gene 表型Phenotype FGBFGB 心肌梗死myocardial infarction FGBFGB 冠心病coronary heart disease FLT3FLT3 白血病，骨髓的leukemia, bone marrow FLT3FLT3 白血病leukemia FMR1FMR1 脆性X染色体综合症fragile X syndrome FRAXAFRAXA 脆性X染色体综合症fragile X syndrome FUT2FUT2 幽门螺旋杆菌感染Helicobacter pylori infection FVLFVL 因子V LeidenFactor V Leiden G6PDG6PD G6PD缺失G6PD deletion G6PDG6PD 高胆红素血症hyperbilirubinemia GABRA5GABRA5 双相性精神障碍bipolar disorder GBAGBA 戈谢病Gaucher disease GBAGBA 帕金森氏病Parkinson's disease GCGR(FAAH，ML4R，UCP2)GCGR (FAAH, ML4R, UCP2) 体重／肥胖weight/obesity GCKGCK 2型糖尿病type 2 diabetes GCLM(F12，TLR4)GCLM (F12, TLR4) 动脉粥样硬化，心肌梗死Atherosclerosis, myocardial infarction GDNFGDNF 精神分裂症schizophrenia GHRLGHRL 肥胖obesity GJB1GJB1 夏科-马里-图思病Charcot-Marie-Tooth disease GJB2GJB2 耳聋deaf GJB2GJB2 听力丧失，感觉神经非综合征的Hearing loss, sensorineural nonsyndromic GJB2GJB2 听力丧失，感觉神经的hearing loss, sensorineural GJB2GJB2 听力丧失／耳聋hearing loss/deafness GJB6GJB6 听力丧失，感觉神经非综合征的Hearing loss, sensorineural nonsyndromic GJB6GJB6 听力丧失／耳聋hearing loss/deafness GNASGNAS 高血压hypertension GNB3GNB3 高血压hypertension

基因Gene 表型Phenotype GPX1GPX1 肺癌lung cancer GRIN1GRIN1 精神分裂症schizophrenia GRIN2BGRIN2B 精神分裂症schizophrenia GSK3BGSK3B 双相性精神障碍bipolar disorder GSTM1GSTM1 肺癌lung cancer GSTM1GSTM1 结肠直肠癌colorectal cancer GSTM1GSTM1 乳腺癌breast cancer GSTM1GSTM1 前列腺癌prostate cancer GSTM1GSTM1 细胞生成研究Cell Generation Research GSTM1GSTM1 膀胱癌Bladder Cancer GSTM1GSTM1 食管癌Esophageal cancer GSTM1GSTM1 头颈癌head and neck cancer GSTM1GSTM1 白血病leukemia GSTM1GSTM1 帕金森氏病Parkinson's disease GSTM1GSTM1 胃癌stomach cancer GSTP1GSTP1 肺癌lung cancer GSTP1GSTP1 结肠直肠癌colorectal cancer GSTP1GSTP1 乳腺癌breast cancer GSTP1GSTP1 细胞生成研究Cell Generation Research GSTP1GSTP1 前列腺癌prostate cancer GSTT1GSTT1 肺癌lung cancer GSTT1GSTT1 结肠直肠癌colorectal cancer GSTT1GSTT1 乳腺癌breast cancer GSTT1GSTT1 前列腺癌prostate cancer GSTT1GSTT1 膀胱癌Bladder Cancer GSTT1GSTT1 细胞生成研究Cell Generation Research GSTT1GSTT1 哮喘asthma

基因Gene 表型Phenotype GSTT1GSTT1 苯毒性Benzene toxicity GSTT1GSTT1 食管癌Esophageal cancer GSTT1GSTT1 头颈癌head and neck cancer GYS1GYS1 2型糖尿病type 2 diabetes HBBHBB 地中海贫血Thalassemia HBBHBB 地中海贫血，β-Thalassemia, beta- HDHD 亨延顿氏舞蹈病Huntington's disease HFEHFE 血色沉着症Hemochromatosis HFEHFE 铁水平iron level HFEHFE 结肠直肠癌colorectal cancer HK2HK2 2型糖尿病type 2 diabetes HLAHLA 类风湿性关节炎rheumatoid arthritis HLAHLA 1型糖尿病type 1 diabetes HLAHLA 贝切特氏病Behcet's disease HLAHLA 乳糜泻celiac disease HLAHLA 牛皮癣psoriasis HLAHLA 格雷夫斯病Graves' disease HLAHLA 多发性硬化症multiple sclerosis HLAHLA 精神分裂症schizophrenia HLAHLA 哮喘asthma HLAHLA 糖尿病diabetes HLAHLA 狼疮lupus HLA—AHLA-A 白血病leukemia HLA—AHLA-A HIVHIV HLA—AHLA-A 糖尿病，1型diabetes, type 1 HLA—AHLA-A 移植物抗宿主病graft versus host disease HLA—AHLA-A 多发性硬化症multiple sclerosis

基因Gene 表型Phenotype HLA—BHLA-B 白血病leukemia HLA—BHLA-B 贝切特氏病Behcet's disease HLA—BHLA-B 乳糜泻celiac disease HLA—BHLA-B 糖尿病，1型diabetes, type 1 HLA—BHLA-B 移植物抗宿主病graft versus host disease HLA—BHLA-B 肉样瘤病Sarcoidosis HLA—CHLA-C 牛皮癣psoriasis HLA—DPA1HLA-DPA1 麻疹measles HLA—DPB1HLA—DPB1 糖尿病，1型diabetes, type 1 HLA—DPB1HLA—DPB1 哮喘asthma HLA—DQA1HLA-DQA1 糖尿病，1型diabetes, type 1 HLA—DQA1HLA-DQA1 乳糜泻celiac disease HLA—DQA1HLA-DQA1 子宫颈癌cervical cancer HLA—DQA1HLA-DQA1 哮喘asthma HLA—DQA1HLA-DQA1 多发性硬化症multiple sclerosis HLA—DQA1HLA-DQA1 糖尿病，2型；糖尿病，1型Diabetes mellitus, type 2; Diabetes mellitus, type 1 HLA—DQA1HLA-DQA1 红斑狼疮lupus erythematosus HLA—DQA1HLA-DQA1 妊娠丧失，复发的pregnancy loss, recurrent HLA—DQA1HLA-DQA1 牛皮癣psoriasis HLA—DQB1HLA-DQB1 糖尿病，1型diabetes, type 1 HLA—DQB1HLA-DQB1 乳糜泻celiac disease HLA—DQB1HLA-DQB1 多发性硬化症multiple sclerosis HLA—DQB1HLA-DQB1 子宫颈癌cervical cancer HLA—DQB1HLA-DQB1 红斑狼疮lupus erythematosus HLA—DQB1HLA-DQB1 妊娠丧失，复发的pregnancy loss, recurrent HLA—DQB1HLA-DQB1 关节炎arthritis HLA—DQB1HLA-DQB1 哮喘asthma

基因Gene 表型Phenotype HLA-DQB1HLA-DQB1 HIVHIV HLA—DQB1HLA-DQB1 淋巴瘤Lymphoma HLA—DQB1HLA-DQB1 结核病tuberculosis HLA—DQB1HLA-DQB1 类风湿性关节炎rheumatoid arthritis HLA—DQB1HLA-DQB1 糖尿病，2型diabetes, type 2 HLA—DQB1HLA-DQB1 移植物抗宿主病graft versus host disease HLA—DQB1HLA-DQB1 发作性睡眠Narcolepsy HLA—DQB1HLA-DQB1 关节炎，风湿样的arthritis, rheumatoid HLA—DQB1HLA-DQB1 胆管炎，硬化性的Cholangitis, sclerosing HLA—DQB1HLA-DQB1 糖尿病，2型；糖尿病，1型Diabetes mellitus, type 2; Diabetes mellitus, type 1 HLA—DQB1HLA-DQB1 格雷夫斯氏病Graves' disease HLA—DQB1HLA-DQB1 丙型肝炎Hepatitis C HLA—DQB1HLA-DQB1 丙型肝炎，慢性的hepatitis C, chronic HLA—DQB1HLA-DQB1 疟疾malaria HLA—DQB1HLA-DQB1 疟疾，恶性疟原虫Malaria, Plasmodium falciparum HLA—DQB1HLA-DQB1 黑素瘤melanoma HLA—DQB1HLA-DQB1 牛皮癣psoriasis HLA—DQB1HLA-DQB1 舍格伦综合征Sjögren syndrome HLA—DQB1HLA-DQB1 系统性红斑狼疮systemic lupus erythematosus HLA—DRB1HLA-DRB1 糖尿病，1型diabetes, type 1 HLA—DRB1HLA-DRB1 多发性硬化症multiple sclerosis HLA—DRB1HLA-DRB1 系统性红斑狼疮systemic lupus erythematosus HLA—DRB1HLA-DRB1 类风湿性关节炎rheumatoid arthritis HLA—DRB1HLA-DRB1 子宫颈癌cervical cancer HLA—DRB1HLA-DRB1 关节炎arthritis HLA—DRB1HLA-DRB1 乳糜泻celiac disease HLA—DRB1HLA-DRB1 红斑狼疮lupus erythematosus

基因Gene 表型Phenotype HLA—DRB1HLA-DRB1 肉样瘤病Sarcoidosis HLA-DRB1HLA-DRB1 HIVHIV HLA—DRB1HLA-DRB1 结核病tuberculosis HLA—DRB1HLA-DRB1 格雷夫斯氏病Graves' disease HLA—DRB1HLA-DRB1 淋巴瘤Lymphoma HLA—DRB1HLA-DRB1 牛皮癣psoriasis HLA-DRB1HLA-DRB1 哮喘asthma HLA—DRB1HLA-DRB1 克罗恩氏病Crohn's disease HLA—DRB1HLA-DRB1 移植物抗宿主病graft versus host disease HLA—DRB1HLA-DRB1 丙型肝炎，慢性的hepatitis C, chronic HLA—DRB1HLA-DRB1 发作性睡眠Narcolepsy HLA—DRB1HLA-DRB1 硬化症，全身的sclerosis, generalized HLA—DRB1HLA-DRB1 舍格伦综合征Sjögren syndrome HLA—DRB1HLA-DRB1 1型糖尿病type 1 diabetes HLA—DRB1HLA-DRB1 关节炎，风湿样的arthritis, rheumatoid HLA—DRB1HLA-DRB1 胆管炎，硬化性的Cholangitis, sclerosing HLA—DRB1HLA-DRB1 糖尿病，2型；糖尿病，1型Diabetes mellitus, type 2; Diabetes mellitus, type 1 HLA—DRB1HLA-DRB1 幽门螺旋杆菌感染Helicobacter pylori infection HLA—DRB1HLA-DRB1 丙型肝炎Hepatitis C HLA—DRB1HLA-DRB1 青少年关节炎juvenile arthritis HLA—DRB1HLA-DRB1 白血病leukemia HLA—DRB1HLA-DRB1 疟疾malaria HLA—DRB1HLA-DRB1 黑素瘤melanoma HLA—DRB1HLA-DRB1 妊娠丧失，复发的pregnancy loss, recurrent HLA—DRB3HLA-DRB3 牛皮癣psoriasis HLA—GHLA-G 妊娠丧失，复发的pregnancy loss, recurrent HMOX1HMOX1 动脉粥样硬化，冠状的Atherosclerosis, coronary

基因Gene 表型Phenotype HNF4AHNF4A 糖尿病，2型diabetes, type 2 HNF4AHNF4A 2型糖尿病type 2 diabetes HSD11B2HSD11B2 高血压hypertension HSD17B1HSD17B1 乳腺癌breast cancer HTR1AHTR1A 抑郁症，重型的depression, severe HTR1BHTR1B 酒精依赖alcohol dependence HTR1BHTR1B 酒精中毒alcoholism HTR2AHTR2A 记忆力memory HTR2AHTR2A 精神分裂症schizophrenia HTR2AHTR2A 双相性精神障碍bipolar disorder HTR2AHTR2A 抑郁depression HTR2AHTR2A 抑郁症，重型的depression, severe HTR2AHTR2A 自杀suicide HTR2AHTR2A 阿尔茨海默氏病Alzheimer's disease HTR2AHTR2A 神经性厌食症anorexia nervosa HTR2AHTR2A 高血压hypertension HTR2AHTR2A 强迫性神经症obsessive-compulsive neurosis HTR2CHTR2C 精神分裂症schizophrenia HTR6HTR6 阿尔茨海默氏病Alzheimer's disease HTR6HTR6 精神分裂症schizophrenia HTRA1HTRA1 湿性年龄相关性黄斑变性wet age-related macular degeneration IAPPIAPP 2型糖尿病type 2 diabetes IDEIDEs 阿尔茨海默氏病Alzheimer's disease IFNGIFNG 结核病tuberculosis IFNGIFNG 1型糖尿病type 1 diabetes IFNGIFNG 移植物抗宿主病graft versus host disease IFNGIFNG 乙型肝炎Hepatitis B

基因Gene 表型Phenotype IFNGIFNG 多发性硬化症multiple sclerosis IFNGIFNG 哮喘asthma IFNGIFNG 乳腺癌breast cancer IFNGIFNG 肾移植Kidney transplant IFNGIFNG 肾移植并发症Kidney Transplant Complications IFNGIFNG 长寿longevity IFNGIFNG 妊娠丧失，复发的pregnancy loss, recurrent IGFBP3IGFBP3 乳腺癌breast cancer IGFBP3IGFBP3 前列腺癌prostate cancer IL10IL10 系统性红斑狼疮systemic lupus erythematosus IL10IL10 哮喘asthma IL10IL10 移植物抗宿主病graft versus host disease IL10IL10 HIVHIV IL10IL10 肾移植Kidney transplant IL10IL10 肾移植并发症Kidney Transplant Complications IL10IL10 乙型肝炎Hepatitis B IL10IL10 青少年关节炎juvenile arthritis IL10IL10 长寿longevity IL10IL10 多发性硬化症multiple sclerosis IL10IL10 妊娠丧失，复发的pregnancy loss, recurrent IL10IL10 类风湿性关节炎rheumatoid arthritis IL10IL10 结核病tuberculosis IL12BIL12B 1型糖尿病type 1 diabetes IL12BIL12B 哮喘asthma IL13IL13 哮喘asthma IL13IL13 特异反应性Atopy IL13IL13 慢性阻塞性肺病／COPDChronic Obstructive Pulmonary Disease/COPD

基因Gene 表型Phenotype IL13IL13 格雷夫斯氏病Graves' disease IL1AIL1A 牙周炎Periodontitis IL1AIL1A 阿尔茨海默氏病Alzheimer's disease IL1BIL1B 牙周炎Periodontitis IL1BIL1B 阿尔茨海默氏病Alzheimer's disease IL1BIL1B 胃癌stomach cancer IL1R1IL1R1 1型糖尿病type 1 diabetes IL1RNIL1RN 胃癌stomach cancer IL2IL2 哮喘；湿疹；变应性疾病Asthma; Eczema; Allergic diseases IL4IL4 哮喘asthma IL4IL4 特异反应性Atopy IL4IL4 HIVHIV IL4RIL4R 哮喘asthma IL4RIL4R 特异反应性Atopy IL4RIL4R 总血清IgEtotal serum IgE IL6IL6 骨矿化bone mineralization IL6IL6 肾移植Kidney transplant IL6IL6 肾移植并发症Kidney Transplant Complications IL6IL6 长寿longevity IL6IL6 多发性硬化症multiple sclerosis IL6IL6 骨密度bone density IL6IL6 骨矿物质密度bone mineral density IL6IL6 结肠直肠癌colorectal cancer IL6IL6 青少年关节炎juvenile arthritis IL6IL6 类风湿性关节炎rheumatoid arthritis IL9IL9 哮喘asthma INHAINHA 卵巢功能早衰premature ovarian failure

基因Gene 表型Phenotype INSINS 1型糖尿病type 1 diabetes INSINS 2型糖尿病type 2 diabetes INSINS 糖尿病，1型diabetes, type 1 INSINS 肥胖obesity INSINS 前列腺癌prostate cancer INSIG2INSIG2 肥胖obesity INSRINSR 2型糖尿病type 2 diabetes INSRINSR 高血压hypertension INSRINSR 多囊性卵巢综合征polycystic ovary syndrome IPF1IPF1 糖尿病，2型diabetes, type 2 IRS1IRS1 2型糖尿病type 2 diabetes IRS1IRS1 糖尿病，2型diabetes, type 2 IRS2IRS2 糖尿病，2型diabetes, type 2 ITGB3ITGB3 心肌梗死myocardial infarction ITGB3ITGB3 动脉粥样硬化，冠状的Atherosclerosis, coronary ITGB3ITGB3 冠心病coronary heart disease ITGB3ITGB3 心肌梗塞myocardial infarction KCNE1KCNE1 EKG，异常EKG, abnormal KCNE2KCNE2 EKG，异常EKG, abnormal KCNH2KCNH2 EKG，异常EKG, abnormal KCNH2KCNH2 QT间期延长综合症long QT syndrome KCNJ11KCNJ11 糖尿病，2型diabetes, type 2 KCNJ11KCNJ11 2型糖尿病type 2 diabetes KCNN3KCNN3 精神分裂症schizophrenia KCNQ1KCNQ1 EKG，异常EKG, abnormal KCNQ1KCNQ1 QT间期延长综合症long QT syndrome KIBRAKIBRA 情景记忆episodic memory

基因Gene 表型Phenotype KLK1KLK1 高血压hypertension KLK3KLK3 前列腺癌prostate cancer KRASKRAS 结肠直肠癌colorectal cancer LDLRLDLR 高胆固醇血症Hypercholesterolemia LDLRLDLR 高血压hypertension LEPLEP 肥胖obesity LEPRLEPR 肥胖obesity LIG4LIG4 乳腺癌breast cancer LIPCLIPCs 动脉粥样硬化，冠状的Atherosclerosis, coronary LPLLPL 冠状动脉疾病coronary artery disease LPLLPL 高脂血症Hyperlipidemia LPLLPL 甘油三酯Triglycerides LRP1LRP1 阿尔茨海默氏病Alzheimer's disease LRP5LRP5 骨密度bone density LRRK2LRRK2 帕金森氏病Parkinson's disease LRRK2LRRK2 帕金森病Parkinson's Disease LTALTA 1型糖尿病type 1 diabetes LTALTA 哮喘asthma LTALTA 系统性红斑狼疮systemic lupus erythematosus LTALTA 败血症septicemia LTC4SLTC4S 哮喘asthma MAOAMAOA 酒精中毒alcoholism MAOAMAOA 精神分裂症schizophrenia MAOAMAOA 双相性精神障碍bipolar disorder MAOAMAOA 吸烟行为smoking behavior MAOAMAOA 人格障碍personality disorder MAOBMAOB 帕金森氏病Parkinson's disease

基因Gene 表型Phenotype MAOBMAOB 吸烟行为smoking behavior MAPTMAPT 帕金森氏病Parkinson's disease MAPTMAPT 阿尔茨海默氏病Alzheimer's disease MAPTMAPT 痴呆dementia MAPTMAPT 额颞痴呆frontotemporal dementia MAPTMAPT 进行性核上性麻痹progressive supranuclear palsy MC1RMC1R 黑素瘤melanoma MC3RMC3R 肥胖obesity MC4RMC4R 肥胖obesity MECP2MECP2 Rett综合征Rett syndrome MEFVMEFV 家族性地中海热familial mediterranean fever MEFVMEFV 淀粉样变性病Amyloidosis MICAMICA 1型糖尿病type 1 diabetes MICAMICA 贝切特氏病Behcet's disease MICAMICA 乳糜泻celiac disease MICAMICA 类风湿性关节炎rheumatoid arthritis MICAMICA 系统性红斑狼疮systemic lupus erythematosus MLH1MLH1 结肠直肠癌colorectal cancer MMEMME 阿尔茨海默氏病Alzheimer's disease MMP1MMP1 肺癌lung cancer MMP1MMP1 卵巢癌ovarian cancer MMP1MMP1 牙周炎Periodontitis MMP3MMP3 心肌梗塞myocardial infarction MMP3MMP3 卵巢癌ovarian cancer MMP3MMP3 类风湿性关节炎rheumatoid arthritis MPOMPO 肺癌lung cancer MPOMPO 阿尔茨海默氏病Alzheimer's disease

基因Gene 表型Phenotype MPOMPO 乳腺癌breast cancer MPZMPZ 夏科-马里-图思病Charcot-Marie-Tooth disease MS4A2MS4A2 哮喘asthma MS4A2MS4A2 特异反应性Atopy MSH2MSH2 结肠直肠癌colorectal cancer MSH6MSH6 结肠直肠癌colorectal cancer MSR1MSR1 前列腺癌prostate cancer MTHFRMTHFR 结肠直肠癌colorectal cancer MTHFRMTHFR 2型糖尿病type 2 diabetes MTHFRMTHFR 神经管缺陷neural tube defect MTHFRMTHFR 高半胱氨酸homocysteine MTHFRMTHFR 血栓栓塞，静脉的Thromboembolism, Venous MTHFRMTHFR 动脉粥样硬化，冠状的Atherosclerosis, coronary MTHFRMTHFR 阿尔茨海默氏病Alzheimer's disease MTHFRMTHFR 食管癌Esophageal cancer MTHFRMTHFR 先兆子痫preeclampsia MTHFRMTHFR 妊娠丧失，复发的pregnancy loss, recurrent MTHFRMTHFR 中风stroke MTHFRMTHFR 血栓症，深静脉thrombosis, deep vein MT—ND1MT—ND1 糖尿病，2型diabetes, type 2 MTRMTR 结肠直肠癌colorectal cancer MT—RNR1MT—RNR1 听力丧失，感觉神经非综合征的Hearing loss, sensorineural non-syndromic MTRRMTRR 神经管缺陷neural tube defect MTRRMTRR 高半胱氨酸homocysteine MT—TL1MT—TL1 糖尿病，2型diabetes, type 2 MUTYHMUTYH 结肠直肠癌colorectal cancer MYBPC3MYBPC3 心肌病Cardiomyopathy

基因Gene 表型Phenotype MYH7MYH7 心肌病Cardiomyopathy MYOCMYOC 青光眼，原发开角Glaucoma, open angle primary MYOCMYOC 青光眼glaucoma NAT1NAT1 结肠直肠癌colorectal cancer NAT1NAT1 乳腺癌breast cancer NAT1NAT1 膀胱癌Bladder Cancer NAT2NAT2 结肠直肠癌colorectal cancer NAT2NAT2 膀胱癌Bladder Cancer NAT2NAT2 乳腺癌breast cancer NAT2NAT2 肺癌lung cancer NBNNBN 乳腺癌breast cancer NCOA3NCOA3 乳腺癌breast cancer NCSTNNCSTN 阿尔茨海默氏病Alzheimer's disease NEUROD1NEUROD1 1型糖尿病type 1 diabetes NF1NF1 神经纤维瘤病1Neurofibromatosis 1 NOS1NOS1 哮喘asthma NOS2ANOS2A 多发性硬化症multiple sclerosis NOS3NOS3 高血压hypertension NOS3NOS3 冠心病coronary heart disease NOS3NOS3 动脉粥样硬化，冠状的Atherosclerosis, coronary NOS3NOS3 冠状动脉疾病coronary artery disease NOS3NOS3 心肌梗死myocardial infarction NOS3NOS3 急性冠状动脉综合征acute coronary syndrome NOS3NOS3 血压，动脉的blood pressure, arterial NOS3NOS3 先兆子痫preeclampsia NOS3NOS3 一氧化氮Nitric oxide NOS3NOS3 阿尔茨海默氏病Alzheimer's disease

基因Gene 表型Phenotype NOS3NOS3 哮喘asthma NOS3NOS3 2型糖尿病type 2 diabetes NOS3NOS3 心血管病Cardiovascular disease NOS3NOS3 贝切特氏病Behcet's disease NOS3NOS3 勃起机能障碍erectile dysfunction NOS3NOS3 肾衰竭，慢性的kidney failure, chronic NOS3NOS3 铅毒性lead toxicity NOS3NOS3 左心室肥大left ventricular hypertrophy NOS3NOS3 妊娠丧失，复发的pregnancy loss, recurrent NOS3NOS3 视网膜病，糖尿病的retinopathy, diabetic NOS3NOS3 中风stroke NOTCH4NOTCH4 精神分裂症schizophrenia NPYNPY 酒精滥用alcohol abuse NQO1NQO1 肺癌lung cancer NQO1NQO1 结肠直肠癌colorectal cancer NQO1NQO1 苯毒性Benzene toxicity NQO1NQO1 膀胱癌Bladder Cancer NQO1NQO1 帕金森氏病Parkinson's disease NR3C2NR3C2 高血压hypertension NR4A2NR4A2 帕金森氏病Parkinson's disease NRG1NRG1 精神分裂症schizophrenia NTF3NTF3 精神分裂症schizophrenia OGG1OGG1 肺癌lung cancer OGG1OGG1 结肠直肠癌colorectal cancer OLR1OLR1 阿尔茨海默氏病Alzheimer's disease OPA1OPA1 青光眼glaucoma OPRM1OPRM1 酒精滥用alcohol abuse

基因Gene 表型Phenotype OPRM1OPRM1 药物依赖drug dependence OPTNOPTN 青光眼，原发开角Glaucoma, open angle primary P450P450 药物代谢作用drug metabolism PADI4PADI4 类风湿性关节炎rheumatoid arthritis PAHPAH 苯丙酮酸尿症／PKUPhenylketonuria/PKU PAI1PAI1 冠心病coronary heart disease PAI1PAI1 哮喘asthma PALB2PALB2 乳腺癌breast cancer PARK2PARK2 帕金森氏病Parkinson's disease PARK7PARK7 帕金森氏病Parkinson's disease PDCD1PDCD1 红斑狼疮lupus erythematosus PINK1PINK1 帕金森氏病Parkinson's disease PKAPKA 记忆力memory PKCPKC 记忆力memory PLA2G4APLA2G4A 精神分裂症schizophrenia PNOCPNOC 精神分裂症schizophrenia POMCPOMC 肥胖obesity PON1PON1 动脉粥样硬化，冠状的Atherosclerosis, coronary PON1PON1 帕金森氏病Parkinson's disease PON1PON1 2型糖尿病type 2 diabetes PON1PON1 动脉粥样硬化atherosclerosis PON1PON1 冠状动脉疾病coronary artery disease PON1PON1 冠心病coronary heart disease PON1PON1 阿尔茨海默氏病Alzheimer's disease PON1PON1 长寿longevity PON2PON2 动脉粥样硬化，冠状的Atherosclerosis, coronary PON2PON2 早产premature birth

基因Gene 表型Phenotype PPARGPPARG 2型糖尿病type 2 diabetes PPARGPPARG 肥胖obesity PPARGPPARG 糖尿病，2型diabetes, type 2 PPARGPPARG 结肠直肠癌colorectal cancer PPARGPPARG 高血压hypertension PPARGC1APPARGC1A 糖尿病，2型diabetes, type 2 PRKCZPRKCZ 2型糖尿病type 2 diabetes PRLPRL 系统性红斑狼疮systemic lupus erythematosus PRNPPRNP 阿尔茨海默氏病Alzheimer's disease PRNPPRNP 克雅氏病Creutzfeldt-Jakob disease PRNPPRNP 雅-克二氏病Jacques-Craft disease PRODHPRODH 精神分裂症schizophrenia PRSS1PRSS1 胰腺炎pancreatitis PSEN1PSEN1 阿尔茨海默氏病Alzheimer's disease PSEN2PSEN2 阿尔茨海默氏病Alzheimer's disease PSMB8PSMB8 1型糖尿病type 1 diabetes PSMB9PSMB9 1型糖尿病type 1 diabetes PTCHPTCH 皮肤癌，非黑素瘤Skin cancer, non-melanoma PTGISPTGIS 高血压hypertension PTGS2PTGS2 结肠直肠癌colorectal cancer PTHPTH 骨密度bone density PTPN11PTPN11 努南综合症Noonan Syndrome PTPN22PTPN22 类风湿性关节炎rheumatoid arthritis PTPRCPTPRC 多发性硬化症multiple sclerosis PVT1PVT1 终末期肾病end stage renal disease RAD51RAD51 乳腺癌breast cancer RAGERAGE 视网膜病，糖尿病的retinopathy, diabetic

基因Gene 表型Phenotype RB1RB1 视网膜母细胞瘤Retinoblastoma RELNRELN 精神分裂症schizophrenia RENREN 高血压hypertension RETRET 甲状腺癌Thyroid cancer RETRET 赫希施普龙氏病Hirschsprung's disease RFC1RFC1 神经管缺陷neural tube defect RGS4RGS4 精神分裂症schizophrenia RHORHO 色素性视网膜炎retinitis pigmentosa RNASELRNASEL 前列腺癌prostate cancer RYR1RYR1 恶性体温过高malignant hyperthermia SAA1SAA1 淀粉样变性病Amyloidosis SCG2SCG2 高血压hypertension SCG3SCG3 肥胖obesity SCGB1A1SCGB1A1 哮喘asthma SCN5ASCN5A Brugada综合症Brugada syndrome SCN5ASCN5A EKG，异常EKG, abnormal SCN5ASCN5A QT间期延长综合症long QT syndrome SCNN1BSCNN1B 高血压hypertension SCNN1GSCNN1G 高血压hypertension SERPINA1SERPINA1 COPDCOPD SERPINA3SERPINA3 阿尔茨海默氏病Alzheimer's disease SERPINA3SERPINA3 COPDCOPD SERPINA3SERPINA3 帕金森氏病Parkinson's disease SERPINE1SERPINE1 心肌梗塞myocardial infarction SERPINE1SERPINE1 2型糖尿病type 2 diabetes SERPINE1SERPINE1 动脉粥样硬化，冠状的Atherosclerosis, coronary SERPINE1SERPINE1 肥胖obesity

基因Gene 表型Phenotype SERPINE1SERPINE1 先兆子痫preeclampsia SERPINE1SERPINE1 中风stroke SERPINE1SERPINE1 高血压hypertension SERPINE1SERPINE1 妊娠丧失，复发的pregnancy loss, recurrent SERPINE1SERPINE1 血栓栓塞，静脉的Thromboembolism, Venous SLC11A1SLC11A1 结核病tuberculosis SLC22A4SLC22A4 克罗恩氏病；溃疡性结肠炎Crohn's disease; ulcerative colitis SLC22A5SLC22A5 克罗恩氏病；溃疡性结肠炎Crohn's disease; ulcerative colitis SLC2A1SLC2A1 2型糖尿病type 2 diabetes SLC2A2SLC2A2 2型糖尿病type 2 diabetes SLC2A4SLC2A4 2型糖尿病type 2 diabetes SLC3A1SLC3A1 胱氨酸尿cystinuria SLC6A3SLC6A3 注意缺陷障碍[伴多动]attention deficit disorder [with hyperactivity] SLC6A3SLC6A3 帕金森氏病Parkinson's disease SLC6A3SLC6A3 吸烟行为smoking behavior SLC6A3SLC6A3 酒精中毒alcoholism SLC6A3SLC6A3 精神分裂症schizophrenia SLC6A4SLC6A4 抑郁depression SLC6A4SLC6A4 抑郁症，重型的depression, severe SLC6A4SLC6A4 精神分裂症schizophrenia SLC6A4SLC6A4 自杀suicide SLC6A4SLC6A4 酒精中毒alcoholism SLC6A4SLC6A4 双相性精神障碍bipolar disorder SLC6A4SLC6A4 个性品质personality qualities SLC6A4SLC6A4 注意缺陷障碍[伴多动]attention deficit disorder [with hyperactivity] SLC6A4SLC6A4 阿尔茨海默氏病Alzheimer's disease SLC6A4SLC6A4 人格障碍personality disorder

基因Gene 表型Phenotype SLC6A4SLC6A4 恐慌病panic attack SLC6A4SLC6A4 酒精滥用alcohol abuse SLC6A4SLC6A4 情感障碍affective disorder SLC6A4SLC6A4 焦虑障碍anxiety disorder SLC6A4SLC6A4 吸烟行为smoking behavior SLC6A4SLC6A4 抑郁症，重型的；双相性精神障碍Depression, severe; bipolar disorder SLC6A4SLC6A4 海洛因滥用heroin abuse SLC6A4SLC6A4 肠易激综合症irritable bowel syndrome SLC6A4SLC6A4 偏头痛Migraine SLC6A4SLC6A4 强迫性神经症obsessive-compulsive neurosis SLC6A4SLC6A4 自杀行为suicidal behavior SLC7A9SLC7A9 胱氨酸尿cystinuria SNAP25SNAP25 ADHDADHD SNCASNCA 帕金森氏病Parkinson's disease SOD1SOD1 ALS／肌萎缩性脊髓侧索硬化ALS/Amyotrophic Lateral Sclerosis SOD2SOD2 乳腺癌breast cancer SOD2SOD2 肺癌lung cancer SOD2SOD2 前列腺癌prostate cancer SPINK1SPINK1 胰腺炎pancreatitis SPP1SPP1 多发性硬化症multiple sclerosis SRD5A2SRD5A2 前列腺癌prostate cancer STAT6STAT6 哮喘asthma STAT6STAT6 总IgEtotal IgE SULT1A1SULT1A1 乳腺癌breast cancer SULT1A1SULT1A1 结肠直肠癌colorectal cancer TAP1TAP1 1型糖尿病type 1 diabetes TAP1TAP1 红斑狼疮lupus erythematosus

基因Gene 表型Phenotype TAP2TAP2 1型糖尿病type 1 diabetes TAP2TAP2 糖尿病，1型diabetes, type 1 TBX21TBX21 哮喘asthma TBXA2RTBXA2R 哮喘asthma TCF1TCF1 糖尿病，2型diabetes, type 2 TCF1TCF1 2型糖尿病type 2 diabetes TFTF 阿尔茨海默氏病Alzheimer's disease TGFB1TGFB1 乳腺癌breast cancer TGFB1TGFB1 肾移植Kidney transplant TGFB1TGFB1 肾移植并发症Kidney Transplant Complications THTH 精神分裂症schizophrenia THBDTHBD 心肌梗死myocardial infarction TLR4TLR4 哮喘asthma TLR4TLR4 克罗恩氏病；溃疡性结肠炎Crohn's disease; ulcerative colitis TLR4TLR4 败血症septicemia TNFTNF 哮喘asthma TNFATNFA 脑血管疾病Cerebrovascular disease TNFTNF 1型糖尿病type 1 diabetes TNFTNF 类风湿性关节炎rheumatoid arthritis TNFTNF 系统性红斑狼疮systemic lupus erythematosus TNFTNF 肾移植Kidney transplant TNFTNF 牛皮癣psoriasis TNFTNF 败血症septicemia TNFTNF 2型糖尿病type 2 diabetes TNFTNF 阿尔茨海默氏病Alzheimer's disease TNFTNF 克罗恩氏病Crohn's disease TNFTNF 糖尿病，1型diabetes, type 1

基因Gene 表型Phenotype TNFTNF 乙型肝炎Hepatitis B TNFTNF 肾移植并发症Kidney Transplant Complications TNFTNF 多发性硬化症multiple sclerosis TNFTNF 精神分裂症schizophrenia TNFTNF 乳糜泻celiac disease TNFTNF 肥胖obesity TNFTNF 妊娠丧失，复发的pregnancy loss, recurrent TNFRSF11BTNFRSF11B 骨密度bone density TNFRSF1ATNFRSF1A 类风湿性关节炎rheumatoid arthritis TNFRSF1BTNFRSF1B 类风湿性关节炎rheumatoid arthritis TNFRSF1BTNFRSF1B 系统性红斑狼疮systemic lupus erythematosus TNFRSF1BTNFRSF1B 关节炎arthritis TNNT2TNNT2 心肌病Cardiomyopathy TP53TP53 肺癌lung cancer TP53TP53 乳腺癌breast cancer TP53TP53 结肠直肠癌colorectal cancer TP53TP53 前列腺癌prostate cancer TP53TP53 子宫颈癌cervical cancer TP53TP53 卵巢癌ovarian cancer TP53TP53 吸烟smoking TP53TP53 食管癌Esophageal cancer TP73TP73 肺癌lung cancer TPH1TPH1 自杀suicide TPH1TPH1 抑郁症，重型的depression, severe TPH1TPH1 自杀行为suicidal behavior TPH1TPH1 精神分裂症schizophrenia TPMTTPMT 硫代嘌呤甲基转移酶活性Thiopurine methyltransferase activity

基因Gene 表型Phenotype TPMTTPMT 白血病leukemia TPMTTPMT 炎性肠病Inflammatory bowel disease TPMTTPMT 硫代嘌呤S-甲基转移酶表型Thiopurine S-methyltransferase Phenotype TSC1TSC1 结节性硬化症tuberous sclerosis TSC2TSC2 结节性硬化症tuberous sclerosis TSHRTSHR 格雷夫斯氏病Graves' disease TYMSTYMS 结肠直肠癌colorectal cancer TYMSTYMS 胃癌stomach cancer TYMSTYMS 食管癌Esophageal cancer UCHL1UCHL1 帕金森氏病Parkinson's disease UCP1UCP1 肥胖obesity UCP2UCP2 肥胖obesity UCP3UCP3 肥胖obesity UGT1A1UGT1A1 高胆红素血症hyperbilirubinemia UGT1A1UGT1A1 日尔贝综合症Gerber syndrome UGT1A6UGT1A6 结肠直肠癌colorectal cancer UGT1A7UGT1A7 结肠直肠癌colorectal cancer UTS2UTS2 糖尿病，2型diabetes, type 2 VDRVDR 骨密度bone density VDRVDR 前列腺癌prostate cancer VDRVDR 骨矿物质密度bone mineral density VDRVDR 1型糖尿病type 1 diabetes VDRVDR 骨质疏松症osteoporosis VDRVDR 骨量bone mass VDRVDR 乳腺癌breast cancer VDRVDR 铅毒性lead toxicity VDRVDR 结核病tuberculosis

基因Gene 表型Phenotype VDRVDR 2型糖尿病type 2 diabetes VEGFVEGF 乳腺癌breast cancer Vit D recVit D rec 特发性身材矮小症idiopathic short stature VKORC1VKORC1 华法林疗法，其反应warfarin therapy, its response WNK4WNK4 高血压hypertension XPAXPA 肺癌lung cancer XPCXPC 肺癌lung cancer XPCXPC 细胞生成研究Cell Generation Research XRCC1XRCC1 肺癌lung cancer XRCC1XRCC1 细胞生成研究Cell Generation Research XRCC1XRCC1 乳腺癌breast cancer XRCC1XRCC1 膀胱癌Bladder Cancer XRCC2XRCC2 乳腺癌breast cancer XRCC3XRCC3 乳腺癌breast cancer XRCC3XRCC3 细胞生成研究Cell Generation Research XRCC3XRCC3 肺癌lung cancer XRCC3XRCC3 膀胱癌Bladder Cancer ZDHHC8ZDHHC8 精神分裂症schizophrenia

遗传综合指数(GCI)Genetic Composite Index (GCI)

许多状态或疾病的病原学归因于遗传和环境因素。基因分型技术的最新进展已经提供机会以鉴定疾病与整个基因组的遗传标记之间的新的关联。实际上，许多新近的研究已经发现这些关联，其中特定的等位基因或基因型与增大的疾病风险有关。这些研究中的一些包括收集一组测试病例和一组对照以及比较两个群体间遗传标记的等位基因分布。在这些研究的一些研究中，特定遗传标记与疾病之间的关联在与其它遗传标记隔离的情况下测定，其它遗传标记作为背景处理并且不在统计分析中起作用。The etiology of many states or diseases has been attributed to genetic and environmental factors. Recent advances in genotyping technology have provided opportunities to identify new associations between diseases and genetic markers across the genome. Indeed, many recent studies have found these associations, in which specific alleles or genotypes are associated with increased disease risk. Some of these studies involve collecting a set of test cases and a set of controls and comparing the allelic distribution of genetic markers between the two populations. In some of these studies, the association between a particular genetic marker and disease was determined in isolation from other genetic markers, which were treated as background and played no role in the statistical analysis.

遗传标记和变型可以包括SNP、核苷酸重复、核苷酸插入、核苷酸缺失、染色体易位、染色体重复或拷贝数变异。拷贝数变异可以包括微卫星重复、核苷酸重复、着丝粒重复或端粒重复。Genetic markers and variants may include SNPs, nucleotide duplications, nucleotide insertions, nucleotide deletions, chromosomal translocations, chromosomal duplications, or copy number variations. Copy number variations may include microsatellite repeats, nucleotide repeats, centromeric repeats, or telomeric repeats.

在本发明的一个方面中，结合关于多遗传标记与一种或多种疾病或状态的关联的信息并进行分析以得到GCI评分。GCI评分可用于基于当前科学研究向未受过遗传学培训的人提供与相关群体相比他们的疾病个体风险的可靠的(即，稳固的)、可理解的和／或直观的认识。在一个实施方式中，生成不同基因座的组合效应的可靠GCI评分的方法是基于各已研究的基因座的已报告个体危险。例如，鉴定感兴趣的疾病或状态，然后查询信息来源(包括，但不限于数据库、专利公开和科学文献)以寻找有关疾病或状态与一个或多个遗传基因座的关联的信息。这些信息来源经过验证并使用质量标准进行评估。在一些实施方式中，评估过程包括多个步骤。在其它实施方式中，以多个质量标准评估信息来源。源自信息资源的信息用于对于感兴趣的各疾病或状态鉴定一个或多个遗传基因座的优势比或者相对风险。In one aspect of the invention, information on the association of multiple genetic markers with one or more diseases or conditions is combined and analyzed to obtain a GCI score. The GCI score can be used to provide individuals with no genetics training a reliable (ie, robust), comprehensible and/or intuitive view of their individual risk of disease compared to a relevant population based on current scientific research. In one embodiment, the method of generating a reliable GCI score for the combined effect of different loci is based on the reported individual risk for each studied locus. For example, a disease or condition of interest is identified, and information sources (including, but not limited to, databases, patent publications, and scientific literature) are consulted for information on the association of the disease or condition with one or more genetic loci. These sources of information are verified and evaluated using quality criteria. In some embodiments, the assessment process includes multiple steps. In other embodiments, information sources are evaluated on multiple quality criteria. Information derived from information sources is used to identify odds ratios or relative risks for one or more genetic loci for each disease or condition of interest.

在替代的实施方式中，对于至少一个遗传基因座的优势比(OR)或相对风险(RR)不能从可得的信息来源中获得。然后使用(1)相同基因座的多个等位基因的报告OR、(2)来自数据集(例如HapMap数据集)的等位基因频率和／或(3)来自可利用资源(例如，CDC、NationalCenterforHealthStatistics等)的疾病／状态流行度计算RR以得出所有感兴趣的等位基因的RR。在一个实施方式中，分别或独立地评估相同基因座的多个等位基因的OR。在优选实施方式中，结合相同基因座的多个等位基因的OR以说明在不同等位基因的OR之间的相依性(dependency)。在一些实施方式中，建立的疾病模型(包括，但不限于如积性(multiplicative)、加性(additive)、Harvard改良的、显性效应的模型)用于生成按照所选模型表示个体风险的中间评分。In alternative embodiments, the odds ratio (OR) or relative risk (RR) for at least one genetic locus cannot be obtained from available sources of information. Then use (1) reported ORs of multiple alleles for the same locus, (2) allele frequencies from a dataset (e.g., HapMap dataset), and/or (3) from available resources (e.g., CDC, National Center for Health Statistics, etc.) calculates RR for disease/state prevalence to derive RR for all alleles of interest. In one embodiment, ORs for multiple alleles of the same locus are assessed separately or independently. In a preferred embodiment, ORs for multiple alleles of the same locus are combined to account for dependencies between ORs for different alleles. In some embodiments, established disease models (including, but not limited to, multiplicative, additive, Harvard-modified, dominant-effects models) are used to generate a model representing individual risk according to the selected model. Intermediate rating.

在另一实施方式中，使用分析感兴趣的疾病或状态的多个模型的方法，并且该方法将由这些不同模型得到的结果相互关联；这使得可能通过选择特定疾病模型而引入的可能误差最小化。这一方法使得由信息来源得到的流行度、等位基因频率和OR评估中的合理误差对相对风险的计算的影响最小化。由于流行度评估对RR的影响的“线性”或单调性特征，不正确地估计流行度对最终评分只有很少或没有影响；假定相同的模型一致地应用于生成报告的所有个体。In another embodiment, a method of analyzing multiple models of a disease or condition of interest is used, and the method correlates the results obtained from these different models; this minimizes possible errors that might be introduced by choosing a model for a particular disease . This approach minimizes the impact of reasonable errors in the estimates of prevalence, allele frequencies, and ORs derived from sources on the calculation of relative risk. Due to the "linear" or monotonic nature of the effect of prevalence estimates on RR, incorrectly estimating prevalence has little or no effect on the final score; the same model is assumed to be applied consistently to all individuals generating reports.

在另一实施方式中，使用将环境／行为／人口数据作为附加的“基因座”考虑的方法。在相关的实施方式中，这些数据可以由信息来源得到，例如医学或科学文献或数据库(例如，吸烟w／肺癌的关联或者来自保险业健康风险评估)。在一个实施方式中，对于一种或多种复杂疾病产生GCI评分。复杂疾病可以被多个基因、环境因素及它们的相互作用影响。当研究复杂疾病时，需要分析大量可能的相互作用。在一个实施方式中，例如Bonferroni校正的程序用于校正多重比较。在替代的实施方式中，当测试是独立的或者显示特别类型的相依性时，使用Simes检验控制整体显著性水平(也称为“族系误差率”)(SarkarS.(1998))。对于有序MTP2随机变量的一些概率不等式：Simes假设的证明(AnnStat26：494-504)。如果在1，...，K中对于任何k，p(k)≤αk／K，那么Simes检验拒绝所有K检验特异性零假设为真的全局零假设(SimesRJ(1986)AnimprovedBonferroniprocedureformultipletestsofsignificance.Biometrika73：751-754)。In another embodiment, a method that considers environmental/behavioral/population data as additional "loci" is used. In related embodiments, these data may be derived from information sources such as medical or scientific literature or databases (eg, association of smoking w/lung cancer or from insurance industry health risk assessments). In one embodiment, a GCI score is generated for one or more complex diseases. Complex diseases can be influenced by multiple genes, environmental factors and their interactions. When studying complex diseases, a large number of possible interactions need to be analyzed. In one embodiment, a procedure such as Bonferroni correction is used to correct for multiple comparisons. In an alternative embodiment, the Simes test is used to control the overall significance level (also called "family error rate") when the tests are independent or show a particular type of dependence (Sarkar S. (1998)). Some probability inequalities for ordered MTP2 random variables: A proof of the Simes hypothesis (AnnStat26:494-504). If for any k in 1,...,K, p(k) ≤ αk/K, then the Simes test rejects the global null hypothesis for all K-test specificity null hypotheses true (SimesRJ (1986) An improved Bonferroniprocedureformultipletestsofsignificance.Biometrika73:751 -754).

可在多基因和多环境因素分析的情况中使用的其它实施方式控制错误发现率(false-discoveryrate)，即错误拒绝的拒绝零假设的预计比例。正如在微阵列研究中，当零假设的一部分可以假定为错误时，这一方法是特别有益的。Devlin等人(2003，Analysisofmultilocusmodelsofassociation.GenetEpidemiol25：36-47)提出了当在多基因座关联研究中测试大量可能的基因×基因相互作用时控制错误发现率的Benjamini和Hochberg(1995，Controllingthefalsediscoveryrate：apracticalandpowerfulapproachtomultipletesting.JRStatSocSerB57：289-300)递增程序的变型。Benjamini和Hochberg程序与Simes检验有关；设定k^*=maxk以致p(k)≤αk／K，其拒绝所有对应于p(1)，...，p(k^*)的k^*零假设。事实上，当所有零假设为真时，Benjamini和Hochberg程序简化为Simes检验(BenjaminiY，YekutieliD(2001)Thecontrolofthefalsediscoveryrateinmultipletestingunderdependency.AnnStat29：1165-1188)。Other embodiments, which can be used in the context of multigene and multienvironmental factor analyses, control the false-discovery rate, the expected proportion of falsely rejected null hypotheses. This approach is particularly beneficial when, as in microarray studies, part of the null hypothesis can be assumed to be wrong. Devlin et al. (2003, Analysis of multilocus models of fassociation. Genet Epidemiol 25:36-47) proposed Benjamini and Hochberg (1995, Controlling the false discovery rate: practical and powerful approach to multiple testing. JRStatSocSerB57 :289-300) A variant of the increment procedure. The Benjamini and Hochberg procedure is related to the Simes test; setting k ^* =maxk such that p(k)≦αk/K rejects all k ^* null hypotheses corresponding to p(1), . . . , p(k ^* ). In fact, the Benjamini and Hochberg procedure reduces to a Simes test when all null hypotheses are true (Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29: 1165-1188).

在一些实施方式中，个体基于其中问评分与个体的群体比较进行排位以产生最终评分，这可以表示为在群体中的排位，例如第99分位或第99、98、97、96、95、94、93、92、91、90、89、88、87、86、85、84、83、82、81、80、79、78、77、76、75、74、73、72、71、70、69、65、60、55、50、45、40、40、35、30、25、20、15、10、5或0分位。在另一实施方式中，评分可以显示为范围，例如第100至95分位、第95至85分位、第85至60分位或者在第100至0分位之间的任何子范围。在又另一实施方式中，个体按四分位进行排位，例如最高的第75四分位或者最低的第25四分位。在进一步的实施方式中，个体与群体中的平均或中位评分比较进行排位。In some embodiments, individuals are ranked based on their median score compared to a population of individuals to produce a final score, which can be expressed as a rank within the population, such as the 99th percentile or 99th, 98th, 97th, 96th, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 65, 60, 55, 50, 45, 40, 40, 35, 30, 25, 20, 15, 10, 5, or 0 percentiles. In another embodiment, the scores may be displayed as a range, such as the 100th to 95th percentile, the 95th to 85th percentile, the 85th to 60th percentile, or any subrange between the 100th to 0 percentile. In yet another embodiment, individuals are ranked by quartile, such as the highest 75th quartile or the lowest 25th quartile. In further embodiments, individuals are ranked compared to the mean or median score in the population.

在一个实施方式中，与个体相比较的群体包括大量来自不同地理和种族背景的人，例如全球性群体。在其它实施方式中，与个体相比较的群体限于特定地理、家系、种族、性别、年龄(胎儿、新生儿、儿童、少年、青年、成年人、老年人个体)、疾病状态(例如，有症状的、无症状的、携带者、早发、迟发)。在一些实施方式中，与个体相比较的群体源自公开和／或私人信息来源报道的信息。In one embodiment, the population to which individuals are compared includes a large number of people from different geographic and ethnic backgrounds, such as a global population. In other embodiments, the population to which individuals are compared is limited to a particular geography, pedigree, race, sex, age (fetal, neonatal, child, juvenile, adolescent, adult, elderly individual), disease state (e.g., symptomatic of, asymptomatic, carrier, early onset, late onset). In some embodiments, the population to which the individual is compared is derived from information reported from public and/or private sources.

在一个实施方式中，使用显示装置使个体的GCI评分或GCIPlus评分可视化。在一些实施方式中，显示屏(例如，计算机监视器或电视屏)用于可视化显示，例如具有相关信息的个人入口。在另一实施方式中，显示装置是静态显示装置，例如打印页面。在一个实施方式中，显示可以包括，但不限于一种或多种以下装置：箱元(bin)(例如，1-5、6-10、11-15、16-20、21-25、26-30、31-35、36-40、41-45、46-50、51-55、56-60、61-65、66-70、71-75、76-80、81-85、86-90、91-95、96-100)、彩色或灰度梯度、温度表、量表、饼图、柱形图或棒图。例如，图18和19为MS的不同显示并且图20为用于克罗恩氏病。在另一实施方式中，温度表用于显示GCI评分和疾病／状态流行度。在另一实施方式中，温度表显示随着报告的GCI评分变化的水平，例如，图15至17，颜色与风险对应。温度表可以显示随GCI评分增大的色度变化(例如，从较低GCI评分的蓝色逐渐变化至较高GCI评分的红色)。在相关实施方式中，温度表显示随报告的GCI评分变化的水平和随风险级别增大的色度变化。In one embodiment, the individual's GCI score or GCIPlus score is visualized using a display device. In some embodiments, a display screen (eg, a computer monitor or television screen) is used for visual display, such as a personal portal with relevant information. In another embodiment, the display device is a static display device, such as a printed page. In one embodiment, a display may include, but is not limited to, one or more of the following: bins (eg, 1-5, 6-10, 11-15, 16-20, 21-25, 26 -30, 31-35, 36-40, 41-45, 46-50, 51-55, 56-60, 61-65, 66-70, 71-75, 76-80, 81-85, 86-90 , 91-95, 96-100), color or grayscale gradients, thermometers, scales, pie charts, histograms or bar charts. For example, Figures 18 and 19 are different displays for MS and Figure 20 is for Crohn's disease. In another embodiment, a thermometer is used to display GCI score and disease/condition prevalence. In another embodiment, a thermometer shows levels as a function of reported GCI score, eg, Figures 15-17, with colors corresponding to risk. The thermometer can show a change in color with increasing GCI score (eg, a gradual change from blue at lower GCI scores to red at higher GCI scores). In a related embodiment, the thermometer displays level as a function of reported GCI score and color change as risk level increases.

在替代的实施方式中，使用听觉反馈向个体传递个体的GCI评分。在一个实施方式中，听觉反馈为危险等级是高或低的口头说明。在另一实施方式中，听觉反馈为特殊的GCI评分的叙述，例如数字、百分位、范围、四分位或者与群体平均或中间GCI评分的比较。在一个实施方式中，有生命的人亲自或者通过通信装置，例如电话(陆上线路电话、便携式电话或卫星电话)传递听觉反馈，或者通过个人入口传递听觉反馈。在另一实施方式中，听觉反馈通过自动系统(例如计算机)传递。在一个实施方式中，听觉反馈作为互动声音反应(IVR)系统的部分传递，该系统是一种允许计算机使用正常电话呼叫检测语音和按键音的技术。在另一实施方式中，个体可以通过IVR系统与中央服务器互动。IVR系统可以对事先录制或动态产生的音频作出反应以与个体互动并且向他们提供其风险等级的听觉反馈。在一个实施例中，个体可以呼叫由IVR回答的号码。在任选地输入认证码、安全码或经过语音识别程序后，IVR系统让对象从菜单中选择选项，例如按键音或语音菜单。这些选项中的一个可以向个体提供他或她的风险等级。In an alternative embodiment, the individual's GCI score is communicated to the individual using auditory feedback. In one embodiment, the auditory feedback is a verbal indication of whether the risk level is high or low. In another embodiment, the auditory feedback is a narrative of a particular GCI score, such as a number, percentile, range, quartile, or comparison to a population mean or median GCI score. In one embodiment, the living person delivers the auditory feedback in person or through a communication device, such as a telephone (landline, cellular or satellite phone), or through a personal portal. In another embodiment, auditory feedback is delivered by an automated system such as a computer. In one embodiment, auditory feedback is delivered as part of an Interactive Voice Response (IVR) system, a technology that allows a computer to detect speech and touchtones using normal phone calls. In another embodiment, individuals can interact with the central server through an IVR system. IVR systems can respond to pre-recorded or dynamically generated audio to interact with individuals and provide them with auditory feedback of their risk level. In one embodiment, the individual can call a number answered by the IVR. After optionally entering an authentication code, a security code, or going through a speech recognition process, the IVR system lets the subject select an option from a menu, such as a touch tone or spoken menu. One of these options may provide the individual with his or her risk level.

在另一实施方式中，个体的GCI评分使用显示装置可视化并且使用听觉反馈传递，例如通过个人入口。这一组合可以包括GCI评分的可视显示和听觉反馈，其讨论GCI评分对个体的整体健康的相关性和可以提出的可能的预防措施。In another embodiment, an individual's GCI score is visualized using a display device and delivered using auditory feedback, such as through a personal portal. This combination may include a visual display of the GCI score and auditory feedback that discusses the relevance of the GCI score to the individual's overall health and possible preventive measures that may be suggested.

在一个实施例中，使用多步法生成GCI评分。开始，对于要研究的各状态，计算源自各遗传标记的优势比的相对风险。对于p=0.01、0.02、...、0.5的每个流行度值，HapMapCEU群体的GCI评分基于流行度和HapMap等位基因频率计算。如果在变化的流行度下GCI评分不变，则考虑的唯一假定为存在积性模型。另外，可以确定该模型对流行度敏感。对于未调用值的任何组合，获得相对风险和评分在HapMap群体中的分布。对于各新个体，个体得分与HapMap分布比较并且所得评分为个体在这一群体中的排位。由于过程中所作的假设的原因，报告的评分的分辨率可能较低。群体将划分成分位点(3-6个箱元)，并且报告的箱元将是其中个体排位落入的一个。基于例如对于各疾病的评分的分辨率的考虑，箱元的数量对不同疾病可以是不同的。在不同HapMap个体的评分之间连结的情况下，将使用平均排位。In one embodiment, a multi-step approach is used to generate the GCI score. Initially, for each state to be studied, the relative risk from the odds ratio for each genetic marker is calculated. For each prevalence value of p = 0.01, 0.02, ..., 0.5, the GCI score for the HapMap CEU population was calculated based on prevalence and HapMap allele frequencies. If the GCI score is constant under varying prevalence, the only assumption considered is the existence of a constructive model. Additionally, it can be determined that the model is sensitive to popularity. For any combination of uncalled values, the distribution of relative risk and score across the HapMap population is obtained. For each new individual, the individual score is compared to the HapMap distribution and the resulting score is the individual's rank in this population. Reported scores may be of lower resolution due to assumptions made during the process. The population will be divided into loci (3-6 bins), and the reported bin will be the one into which the individual rank falls. The number of bins may be different for different diseases based on considerations such as the resolution of the scores for each disease. In case of linkage between scores of different HapMap individuals, the average rank will be used.

在一个实施方式中，较高的GCI评分解释为表示获得或被诊断具有状态或疾病的增大风险。在另一实施方式中，使用数学模型以得出GCI评分。在一些实施方式中，GCI评分基于说明作为关于群体和／或疾病或状态的信息的基础的不完全特征的数学模型。在一些实施方式中，数学模型包括作为计算GCI评分的基础的部分的特定的至少一个假设，其中所述假设包括，但不限于：给定优势比值的假设；状态的流行度已知的假设；群体中的基因型频率已知的假设；和消费者来自与研究所使用的群体和与HapMap相同的家系背景的假设；合并风险为个体遗传标记的不同危险因子的积的假设。在一些实施方式中，GCI也可以包括基因型的多基因型频率为各SNP或个体遗传标记(例如，不同SNP或遗传标记在整个群体内是独立的)的等位基因频率的积的假设。In one embodiment, a higher GCI score is interpreted as indicating an increased risk of acquiring or being diagnosed with a condition or disease. In another embodiment, a mathematical model is used to derive the GCI score. In some embodiments, the GCI score is based on a mathematical model that accounts for the incomplete characteristics that underlie information about a population and/or a disease or condition. In some embodiments, the mathematical model includes certain at least one assumption as part of the basis for calculating the GCI score, wherein the assumptions include, but are not limited to: an assumption that an odds ratio is given; an assumption that the prevalence of a state is known; The assumption that the genotype frequencies in the population are known; and the assumption that the consumer comes from the population used in the study and the same family background as the HapMap; the assumption that the pooled risk is the product of the different risk factors for individual genetic markers. In some embodiments, a GCI can also include the assumption that the polygenotype frequency of a genotype is the product of the allele frequencies of each SNP or individual genetic marker (eg, different SNPs or genetic markers are independent across the population).

积性模型Productive model

在一个实施方式中，在归因于遗传标记集合的风险是归因于个别遗传标记的风险的积的假设下计算GCI评分。这意味着不同遗传标记与其它遗传标记无关地归因于疾病的风险。形式上，存在具有风险等位基因r₁、...、r_k和非风险等位基因n₁、...、n_k的k个遗传标记。在SNPi中，我们表示三个可能的基因型值为r_ir_i、n_ir_i和n_in_i。个体的基因型信息可以通过向量(g₁、...、g_k)描述，其中根据i位置上风险等位基因的数目，g_i可以是0、1或2。我们通过表示与i位置上纯合非风险等位基因相比的相同位置上杂合基因型的相对风险。换句话说，我们定义相似地，我们表示r_ir_i遗传型的相对风险为在积性模型下，我们假定具有基因型(g₁、...、g_k)的个体的风险为积性模型此前已经用于文献中以模拟病例对照研究或用于可视化目的。In one embodiment, the GCI score is calculated under the assumption that the risk attributable to a set of genetic markers is the product of the risks attributable to individual genetic markers. This means that different genetic markers are attributable to the risk of disease independently of other genetic markers. Formally, there are _k genetic markers with risk alleles r ₁ , . . . , _rk and non-risk alleles n ₁ , . . . , nk. In SNPi we denote three possible genotype values r _i r _i , _ni r _i and n _i _ni . The genotype information of an individual can be described by a vector (g ₁ , . . . , g _k ), where g _i can be 0, 1 or 2 according to the number of risk alleles at position i. we pass Indicates the relative risk of a heterozygous genotype at the same position compared to a homozygous non-risk allele at position i. In other words, we define Similarly, we denote the relative risk of r _i r _i genotype as Under the productive model, we assume that the risk of an individual with a genotype (g ₁ ,...,g _k ) is Active models have been used previously in the literature to simulate case-control studies or for visualization purposes.

评估相对风险Assess relative risk

在另一实施方式中，对于不同遗传标记的相对风险是已知的，并且积性模型可以用于风险评价。但是，在一些包括关联研究的实施方式中，研究设计防止报告相对风险。在一些病例对照研究中，相对风险不能在没有进一步的假设的情况下直接由数据计算。代替报告相对风险，通常的方式是报告基因型的优势比(OR)，其是携带给定风险基因型疾病(r_ir_i或n_ir_i)的机率对不携带给定风险基因型疾病的机率的比。形式上，In another embodiment, the relative risk for different genetic markers is known, and a cumulative model can be used for risk assessment. However, in some embodiments including association studies, the study design prevents reporting of relative risk. In some case-control studies, relative risks could not be calculated directly from the data without further assumptions. Instead of reporting relative risk, the usual way is to report the odds ratio (OR) of the genotype, which is the probability of carrying a given risk genotype disease (r _i r _i or _ni r _i ) versus not carrying a given risk genotype disease The ratio of the probability of . formal,

${OR OR}_{i i}^{11} = = \frac{P P ((D D. | | {n no}_{i i} {r r}_{i i} | |))}{P P ((D D. | | {n no}_{i i} {r r}_{i i} | |))} \cdot \cdot \frac{11 - - P P ((D D. | | {n no}_{i i} {n no}_{i i} | |))}{11 - - P P ((D D. | | {n no}_{i i} {r r}_{i i} | |))}$

${OR OR}_{i i}^{22} = = \frac{P P ((D D. | | {r r}_{i i} {r r}_{i i} | |))}{P P ((D D. | | {n no}_{i i} {n no}_{i i} | |))} \cdot \cdot \frac{11 - - P P ((D D. | | {n no}_{i i} {n no}_{i i} | |))}{11 - - P P ((D D. | | {r r}_{i i} {r r}_{i i} | |))}$

由优势比找到相对风险可能要求额外的假设。例如，假设整个种群中的等位基因频率和已知或经过评估(这些可以由现有的数据集，例如包括120个染色体的HapMap数据集进行评估)，和／或假设疾病的流行度p=p(D)是已知的。由前述三个等式可以得到：Finding relative risk from odds ratios may require additional assumptions. For example, assuming the allele frequencies in the entire population and known or evaluated (these can be evaluated from existing datasets such as the HapMap dataset including 120 chromosomes), and/or assume that the prevalence p=p(D) of the disease is known. From the above three equations, we can get:

通过相对风险的定义，在除以pP(D|n_in_i)项后，第一等式可以改写为：Through the definition of relative risk, after dividing by the pP(D|n _i _ni ) term, the first equation can be rewritten as:

$\frac{11}{P P ((D D. | | {n no}_{i i} {n no}_{i i}))} = = \frac{a a + + {bλ bλ}_{11}^{i i} + + {cλ cλ}_{22}^{i i}}{p p},,$

并且因此，后两个等式可以改写为：And thus, the latter two equations can be rewritten as:

${OR OR}_{11}^{i i} = = {λ λ}_{11}^{i i} \cdot &Center Dot; \frac{((a a - - p p)) + + {bλ bλ}_{11}^{i i} + + {cλ cλ}_{22}^{i i}}{a a + + ((b b - - p p)) {λ λ}_{11}^{i i} + + {cλ cλ}_{22}^{i i}}$

(1)(1)

${OR OR}_{i i}^{22} = = {λ λ}_{22}^{i i} \cdot &Center Dot; \frac{((a a - - p p)) + + {bλ bλ}_{11}^{i i} + + {cλ cλ}_{22}^{i i}}{a a + + {bλ bλ}_{11}^{i i} + + ((c c - - p p)) {λ λ}_{22}^{i i}}$

应注意到，当a=1(非风险等位基因频率为1)时，等式系统1等同于在ZhangJ和YuK.中的Zhang和Yu公式(What’stherelativerisk?Amethodofcorrectingtheoddsratioincohortstudiesofcommonoutcomes.JAMA，280：1690-1，1998，其全部内容引入作为参考)。与Zhang和Yu公式相反，本发明的一些实施方式考虑到群体中的等位基因频率，其可能影响相对风险。另外一些实施方式考虑到相对风险的相互依赖性。这与独立地计算各相对风险相反。It should be noted that when a=1 (the non-risk allele frequency is 1), the equation system 1 is equivalent to the Zhang and Yu formula in ZhangJ and YuK. 1, 1998, the entire contents of which are incorporated by reference). Contrary to the Zhang and Yu formula, some embodiments of the invention take into account allele frequencies in a population, which may affect relative risk. Still other embodiments take into account the interdependence of relative risks. This is in contrast to calculating each relative risk independently.

等式系统1可以改写为具有至多四个可能解的两个二次方程。梯度下降算法(gradientdescentalgorithm)可以用于求解这些方程，其中起点设置为优势比，例如，和。System of equations 1 can be rewritten as two quadratic equations with up to four possible solutions. A gradient descent algorithm (gradient descent algorithm) can be used to solve these equations, where the starting point is set to the odds ratio, e.g., and .

例如：For example:

${f f}_{11} (({λ λ}_{11},, {λ λ}_{22})) = = {OR OR}_{i i}^{11} ((a a + + ((b b - - p p)) {λ λ}_{11}^{i i} + + {cλ cλ}_{22}^{i i})) - - {λ λ}_{11}^{i i} \cdot \cdot ((((a a - - p p)) + + {bλ bλ}_{11}^{i i} + + {cλ cλ}_{22}^{i i}))$

${f f}_{22} (({λ λ}_{11},, {λ λ}_{22})) = = {OR OR}_{i i}^{22} ((a a + + {bλ bλ}_{11}^{i i} + + ((c c - - p p)) {λ λ}_{22}^{i i})) - - {λ λ}_{22}^{i i} \cdot &Center Dot; ((((a a - - p p)) + + {bλ bλ}_{11}^{i i} + + {cλ cλ}_{22}^{i i}))$

找到这些方程的解相当于找到函数g(λ₁，λ₂)＝f₁(λ₁，λ₂)²+f₂(λ₁，λ₂)²的最小值。Finding solutions to these equations amounts to finding the minimum of the function g(λ ₁ , λ ₂ )=f ₁ (λ ₁ ,λ ₂ ) ² +f ₂ (λ ₁ ,λ ₂ ) ² .

因此，therefore,

$\frac{dg dg}{{dλ dλ}_{11}} = = {22 f f}_{11} (({λ λ}_{11},, {λ λ}_{22})) \cdot \cdot b b \cdot \cdot (({λ λ}_{22} - - {OR OR}_{22})) + + {22 f f}_{22} (({λ λ}_{11},, {λ λ}_{22})) (({22 bλ bλ}_{11} + + {cλ cλ}_{22} + + a a - - {OR OR}_{11} b b - - p p + + {OR OR}_{11} p p))$

$\frac{dg dg}{{dλ dλ}_{22}} = = {22 f f}_{22} (({λ λ}_{11},, {λ λ}_{22})) \cdot \cdot c c \cdot &Center Dot; (({λ λ}_{11} - - {OR OR}_{11})) + + {22 f f}_{11} (({λ λ}_{11},, {λ λ}_{22})) (({22 cλ cλ}_{22} + + {bλ bλ}_{11} + + a a - - {OR OR}_{22} c c - - p p + + {OR OR}_{22} p p))$

在这一实例中，我们通过设定x₀=OR₁，y₀=OR₂开始。我们将值[ε]＝10^-10设定为整个算法的容差常数(toleranceconstant)。在迭代i中，我们定义 $γ = \min {0.001, \frac{x_{i - 1}}{[epsilon] + 10 | \frac{dg}{{dλ}_{1}} (x_{i - 1}, y_{i - 1}) |}, \frac{y_{i - 1}}{[epsilon] + 10 | \frac{dg}{{dλ}_{2}} (x_{i - 1}, y_{i - 1}) |}} .$ 而后，我们设定In this example, we start by setting x ₀ =OR ₁ , y ₀ =OR ₂ . We set the value [ε]= ^10-10 as the tolerance constant of the whole algorithm. In iteration i, we define $γ = \min {0.001, \frac{x_{i - 1}}{[epsilon] + 10 | \frac{d g}{{dλ}_{1}} (x_{i - 1}, {the y}_{i - 1}) |}, \frac{{the y}_{i - 1}}{[epsilon] + 10 | \frac{d g}{{dλ}_{2}} (x_{i - 1}, {the y}_{i - 1}) |}} .$ Then, we set

${x x}_{i i} = = {x x}_{i i - - 11} - - γ γ \frac{dg dg}{{dλ dλ}_{11}} (({x x}_{i i - - 11},, {y the y}_{i i - - 11}))$

${y the y}_{i i} = = {y the y}_{i i - - 11} - - γ γ \frac{dg dg}{{dλ dλ}_{22}} (({x x}_{i i - - 11},, {y the y}_{i i - - 11}))$

重复这些迭代直到g(x_i，y_i)<容差，其中在提供的代码中容差设定为10^-7。These iterations are repeated until g(x _i , y _i ) < tolerance, where the tolerance is set to 10 ⁻⁷ in the code provided.

在这一实施例中，这些方程给出了a、b、c、p、OR₁和OR₂的不同值的正解。图10In this example, these equations give positive solutions for different values of a, b, c, p, OR ₁ and OR ₂ . Figure 10

相对风险评估的稳固性Robustness of relative risk assessment

在一些实施方式中，测定了不同参数(流行度、等位基因频率和优势比误差)对相对风险的估计值的影响。为了测定等位基因频率和流行度估计值对相对风险值的影响，计算来自一组不同优势比和不同等位基因频率的值的相对风险(在HWE下)，并且这些计算的结果对于在0至1范围内的流行度值绘图。图10。另外，对于固定的流行度值，所得的相对风险可以作为风险等位基因频率的函数绘图。图11。在p=0时，λ₁=OR₁，且λ₂=OR₂，并且当p=1时，λ₁=λ₂=0。这可以直接从所述等式计算。另外，在一些实施方式中，当风险等位基因频率高时，λ₁更接近于线性函数，并且λ₂更接近于具有有界二次导数的凹函数。在极限情况下，当c=1时，λ₂=OR₂+p(1-OR₂)，并且如果OR₁≈OR₂，后者同样接近于线性函数。当风险等位基因频率低时，λ₁和λ₂接近函数1/p的行为。在极限情况下，当c=0时， $λ_{1} = \frac{O R_{1}}{1 - p + pO R_{1}},$ $λ_{2} = \frac{O R_{2}}{1 - p + pO R_{2}} .$ 这表明，对于高的风险等位基因频率，不正确的流行度估计值将不会显著地影响所得的相对风险。另外，对于低的风险等位基因频率，如果用流行度值p′=αp替代正确的流行度p，那么所得的相对风险将消除至多的系数。这被图示在图11的(c)和(d)图面中。应注意到，对于高的风险等位基因频率，两幅图面十分相似，而对于低的等位基因频率，在相对风险值的差异中存在较高的偏差，该偏差小于系数2。In some embodiments, the effect of different parameters (prevalence, allele frequency, and odds ratio error) on the estimate of relative risk is determined. To determine the effect of allele frequency and prevalence estimates on relative risk values, the relative risk (under HWE) from a set of values for different odds ratios and different allele frequencies was calculated, and the results of these calculations for Plotting popularity values ranging from 1 to 1. Figure 10. Alternatively, for a fixed prevalence value, the resulting relative risk can be plotted as a function of risk allele frequency. Figure 11. When p=0, λ ₁ =OR ₁ , and λ ₂ =OR ₂ , and when p=1, λ ₁ =λ ₂ =0. This can be calculated directly from the equation. Additionally, in some embodiments, when the risk allele frequency is high, λ ₁ is closer to a linear function, and λ ₂ is closer to a concave function with a bounded second derivative. In the limit, when c=1, λ ₂ =OR ₂ +p(1-OR ₂ ), and If OR ₁ ≈OR ₂ , the latter is also close to a linear function. When the risk allele frequency is low, λ1 and _λ2 approximate the behavior of the function ₁ /p. In the limit case, when c=0, $λ_{1} = \frac{o R_{1}}{1 - p + pO R_{1}},$ $λ_{2} = \frac{o R_{2}}{1 - p + pO R_{2}} .$ This suggests that for high risk allele frequencies, incorrect prevalence estimates will not significantly affect the resulting relative risk. Alternatively, for low risk allele frequencies, if the correct prevalence p is replaced by the prevalence value p' = αp, the resulting relative risk is eliminated by at most coefficient. This is illustrated in panels (c) and (d) of FIG. 11 . It should be noted that for high risk allele frequencies the two plots are quite similar, while for low allele frequencies there is a higher bias in the difference in relative risk values, which is less than a factor of two.

计算GCI评分Calculate GCI Score

在一个实施方式中，使用代表相关群体的参考集计算遗传综合指数。这一参考集可以为HapMap中的群体之一或者另一基因型数据集。In one embodiment, a genetic composite index is calculated using a reference set representative of a relevant population. This reference set can be one of the populations in HapMap or another genotype dataset.

在这一实施方式中，GCI计算如下。对于k个风险基因座中的每个，使用等式系统1由优势比计算相对风险。然后，计算在参考集中各个个体的积性评分。具有积性评分s的个体的GCI是参考数据集中具有s′≤s的评分的所有个体的分数。例如，如果参考集中50％的个体具有小于s的积性评分，那么该个体的最终GCI评分将为0.5。In this embodiment, the GCI is calculated as follows. For each of the k risk loci, the relative risk was calculated from the odds ratio using Equation System 1. Then, calculate the product score for each individual in the reference set. The GCI of an individual with a productive score s is the score of all individuals in the reference dataset with a score of s' ≤ s. For example, if 50% of individuals in the reference set have a product score less than s, then the final GCI score for that individual will be 0.5.

其它模型other models

在一个实施方式中，使用积性模型。在替代的实施方式中，可以将其它模型用于确定GCI评分的目的。其它适当的模型包括，但不限于：In one embodiment, a product model is used. In alternative embodiments, other models may be used for the purpose of determining the GCI score. Other suitable models include, but are not limited to:

加性模型。在加性模型下，具有基因型(g₁，...g_k)的个体的风险假设为 $GCI (g_{1}, . . ., g_{k}) = Σ_{i = 1}^{k} f (λ_{g_{i}}^{i}) .$ Additive model. Under an additive model, the risk hypothesis for an individual with a genotype (g ₁ , ... g _k ) is $GCI (g_{1}, . . ., g_{k}) = Σ_{i = 1}^{k} f (λ_{g_{i}}^{i}) .$

广义加性模型。在广义加性模型中，假设存在函数f以致具有基因型(g₁，...g_k)的个体的风险为 Generalized additive models. In a generalized additive model, it is assumed that there exists a function f such that the risk of an individual with a genotype (g ₁ ,...g _k ) is

Harvard改良评分(Het)。这一评分由G.AColditz等人得出，从而该评分应用于遗传标记(Harvardreportoncancerpreventionvolume4：Harvardcancerriskindex.CancerCausesandControls，11：477-488，2000，在此引入其全部内容)。虽然函数f以优势比值而不是相对风险进行运算，但是Het评分本质上是广义加性评分。这在相对风险难以评估的情况中是有用的。为了定义函数f，中间函数g定义为：Harvard modified score (Het). This score was derived by G. AColditz et al. and thus applied to genetic markers (Harvard report on cancer prevention volume 4: Harvard cancer risk index. Cancer Causes and Controls, 11:477-488, 2000, the entire content of which is hereby incorporated). Although the function f operates on odds ratios rather than relative risks, the Het score is essentially a generalized additive score. This is useful in situations where relative risk is difficult to assess. To define a function f, an intermediate function g is defined as:

$g g ((x x)) = = \{\begin{matrix} 00 & 11 < < x x \leq \leq 1.09 1.09 \\ 55 & 1.09 1.09 < < x x \leq \leq 1.49 1.49 \\ 1010 & 1.49 1.49 < < x x \leq \leq 2.99 2.99 \\ 2525 & 2.99 2.99 < < x x \leq \leq 6.99 6.99 \\ 5050 & 6.99 6.99 < < x x \end{matrix}$

接着计算的量，其中为整个参考群体中SNPi杂合个体的频率。然后将函数f定义为f(x)=g(x)／het，并且Harvard改良评分(Het)简单地定义为 then calculate amount of which is the frequency of SNPi heterozygous individuals in the entire reference population. The function f is then defined as f(x)=g(x)/het, and the Harvard modified score (Het) is simply defined as

Harvard改良评分(Hom)。除了值het被值所代替以外，这一评分与Het评分相似，其中，为具有纯合风险等位基因的个体的频率。Harvard modified score (Hom). In addition to the value het is valued This score is similar to the Het score except instead of , where, is the frequency of individuals homozygous for the risk allele.

最大优势比。在这一模型中，假设遗传标记之一(具有最大优势比的一个)给出了整个对象组的组合风险的下界。形式上，具有基因型(g₁，...g_k)的个体的评分为 maximum odds ratio. In this model, it is assumed that one of the genetic markers (the one with the largest odds ratio) gives a lower bound on the combined risk for the entire subject group. Formally, the score of an individual with genotype (g ₁ ,...g _k ) is

评分间的比较Comparison Between Ratings

在一个实施例中，对于10个与T2D相关的SNP，在整个HapMapCEU群体上基于多个模型计算GCI评分。相关SNP为rs7754840、rs4506565、rs7756992、rs10811661、rs12804210、rs8050136、rs1111875、rs4402960、rs5215、rs1801282。对于这些SNP中的每个，三个可能的基因型的优势比在文献中进行了报道。CEU群体由三十个母亲-父亲-孩子的三人组组成。为了避免相依性，采用来自这一群体的六十位父母。排除在10个SNP之一中具有无调用的一个个体，得到59个个体的一组。然后使用几种不同的模型计算各个个体的GCI等级。In one embodiment, for the 10 T2D-associated SNPs, GCI scores were calculated based on multiple models on the entire HapMap CEU population. The associated SNPs are rs7754840, rs4506565, rs7756992, rs10811661, rs12804210, rs8050136, rs1111875, rs4402960, rs5215, rs1801282. For each of these SNPs, the odds ratios of the three possible genotypes are reported in the literature. The CEU cohort consisted of thirty mother-father-child trios. To avoid dependencies, sixty parents from this group were employed. Excluding one individual with a no call in one of the 10 SNPs resulted in a set of 59 individuals. GCI ranks for individual individuals were then calculated using several different models.

可以观察到，对于这一数据集，不同模型产生高度相关的结果。图12和13。在各对模型之间计算Spearman相关性(表2)，其显示出积性和加性模型具有0.97的相关系数，并且因此使用加性或积性模型时GCI评分是稳固的。相似地，Harvard改良评分和积性模型之间的相关性为0.83，并且Harvard评分和加性模型之间的相关系数为0.7。但是，使用最大优势比作为遗传得分产生由一个SNP定义的二分评分(dichotomousscore)。总的说来，这些结果表明，评分排位提供了使模型依赖性最小化的稳定构架。It can be observed that for this dataset, the different models produce highly correlated results. Figures 12 and 13. Spearman correlations were calculated between pairs of models (Table 2), which showed that the additive and additive models had a correlation coefficient of 0.97, and thus the GCI scores were robust when using the additive or additive models. Similarly, the correlation between the Harvard modified score and the integrative model was 0.83, and the correlation coefficient between the Harvard score and the additive model was 0.7. However, using the maximum odds ratio as a genetic score yields a dichotomous score defined by one SNP. Collectively, these results suggest that scoring rankings provide a stable framework that minimizes model dependencies.

表2：模型对之间CEU数据的评分分布的Spearman相关性。Table 2: Spearman correlations of score distributions for CEU data between model pairs.

测定T2D流行度的变异对所得分布的影响。流行度值在0.001～0.512之间变化(图14)。对于T2D的情况，可以看出，不同的流行度值导致个体的相同顺序(Spearman相关性>0.99)，因此可以假设流行度的人工固定值0.001。The effect of variation in T2D prevalence on the resulting distributions was determined. The prevalence values varied between 0.001 and 0.512 (Figure 14). For the T2D case, it can be seen that different prevalence values lead to the same order of individuals (Spearman correlation > 0.99), so an artificial fixed value of 0.001 for popularity can be assumed.

将模型扩展至任意数量的变型Extend the model to any number of variants

在另一实施方式中，可以将模型扩展至发生任意数量的可能变型的情况。先前的考虑涉及存在三个可能的变型(nn、nr、rr)的情况。通常，当已知多SNP关联时，可以在群体中发现任意数量的变型。例如，当两个遗传标记之间的相互作用与状态相关联时，存在九种可能的变型。这导致了八个不同优势比值。In another embodiment, the model can be extended to situations where any number of possible variants occur. The previous considerations dealt with the case where there are three possible variants (nn, nr, rr). In general, any number of variants can be found in a population when multiple SNP associations are known. For example, when an interaction between two genetic markers is associated with a state, there are nine possible variants. This resulted in eight different odds ratio values.

为了概括原始公式，可以假设存在k+1种可能的变型a₀，...，a_k，具有频率f₀，f₁，...，f_k，测定的优势比为1，OR₁，...，OR_k以及未知的相对风险值为1，λ₁，...，λ_k。可以进一步假设，相对于a₀测定所有相对风险和优势比，并且因此， $λ_{i} = \frac{P (D | a_{i})}{P (D | a_{o})}$ 和 ${OR}_{i} = \frac{P (D | a_{i})}{P (D | a_{o})} \cdot \frac{1 - P (D | a_{i})}{1 - P (D | a_{o})} .$ 基于：To generalize the original formula, it can be assumed that there are k+1 possible variants a ₀ , ..., a _k , with frequencies f ₀ , f ₁ , ..., f _k , with an odds ratio of 1 determined, OR ₁ , ..., OR _k and unknown relative risk values 1, λ ₁ , ..., λ _k . It can be further assumed that all relative risks and odds ratios are determined relative to a ₀ , and therefore, $λ_{i} = \frac{P (D. | a_{i})}{P (D. | a_{o})}$ and ${OR}_{i} = \frac{P (D. | a_{i})}{P (D. | a_{o})} &Center Dot; \frac{1 - P (D. | a_{i})}{1 - P (D. | a_{o})} .$ based on:

$p p = = {Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} P P ((D D. | | {a a}_{i i})),,$

可以确定can be sure

${OR OR}_{i i} = = {λ λ}_{i i} \frac{{Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} {λ λ}_{i i} - - p p}{{Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} {λ λ}_{i i} - - {λ λ}_{i i} p p} . .$

而且，如果设定这导致如下等式：Also, if you set This leads to the following equation:

${λ λ}_{i i} = = \frac{C C \cdot &Center Dot; {OR OR}_{i i}}{C C - - p p + + {OR OR}_{i i} p p},,$

并且因此，and therefore,

$C C = = {Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} {λ λ}_{i i} = = {Σ Σ}_{i i = = 00}^{k k} \frac{C C \cdot &Center Dot; {OR OR}_{i i} {f f}_{i i}}{C C - - p p + + {OR OR}_{i i} p p},,$

或or

$11 = = {Σ Σ}_{i i = = 00}^{k k} \frac{{OR OR}_{i i} {f f}_{i i}}{C C - - p p + + {OR OR}_{i i} p p} . .$

后者是具有一个变量(C)的方程。这一方程可以产生许多不同的解(基本上，多至k+1个不同的解)。标准优化工具(例如梯度下降)可以用于找到最接近C₀=∑f_it_i的解。The latter is an equation with one variable (C). This equation can yield many different solutions (essentially, up to k+1 different solutions). Standard optimization tools (such as gradient descent) can be used to find the closest solution to C ₀ =∑f _i t _i .

本发明使用了用于危险因子定量的稳定的评分构架。虽然不同遗传模型可以导致不同的评分，但是结果通常是相关的。因此，危险因子的定量通常不依赖于所使用的模型。The present invention uses a robust scoring framework for risk factor quantification. Although different genetic models can lead to different scores, the results are generally correlated. Therefore, the quantification of risk factors is generally independent of the model used.

评估相对风险病例对照研究Evaluating relative risk case-control studies

本发明中也提供了在病例对照研究中由多等位基因的优势比评价相对风险的方法。与先前的方法相反，该方法考虑了等位基因频率、疾病的流行度和在不同等位基因的相对风险间的相依性。测定了该方法对模拟的病例对照研究的性能，发现它是极准确的。Also provided in the present invention is a method for estimating relative risk from multi-allelic odds ratios in case-control studies. In contrast to previous approaches, this approach takes into account allele frequencies, disease prevalence and dependencies among the relative risks of different alleles. The performance of the method was measured for a simulated case-control study and found to be extremely accurate.

方法method

在测试特定SNP与疾病D的关联性的情况下，R和N表示这一特定SNP的风险和非风险等位基因。P(RR|D)、P(RN|D)和P(NN|D)表示分别假设个人对于风险等位基因是纯合的、对于非风险等位基因是杂合的或纯合的情况下受到疾病影响的概率。f_RR、f_RN和f_NN用于表示群体中三个基因型的频率。使用这些定义，相对风险定义为In the case of testing the association of a particular SNP with disease D, R and N represent the risk and non-risk alleles of this particular SNP. P(RR|D), P(RN|D) and P(NN|D) denote the assumption that the individual is homozygous for the risk allele, heterozygous or homozygous for the non-risk allele, respectively Probability of being affected by disease. f _RR , f _RN and f _NN are used to represent the frequencies of the three genotypes in the population. Using these definitions, relative risk is defined as

${λ λ}_{RR RR} = = \frac{P P ((D D. | | RR RR))}{P P ((D D. | | NN NN))}$

${λ λ}_{RN RN} = = \frac{P P ((D D. | | RN RN))}{P P ((D D. | | NN NN))}$

${λ λ}_{RR RR} = = \frac{P P ((RR RR | | D D.)) {f f}_{NN NN}}{P P ((NN NN | | D D.)) {f f}_{RR RR}}$

${λ λ}_{RN RN} = = \frac{P P ((D D. | | RN RN)) {f f}_{NN NN}}{P P ((D D. | | NN NN)) {f f}_{RR RR}}$

因此，如果已知基因型的频率，人们可以使用它们计算相对风险。群体中基因型的频率不能从病例-对照研究本身计算，因为它们取决于疾病在群体中的流行度。特别是，如果疾病的流行度为p(D)，则：Thus, if the frequencies of genotypes are known, one can use them to calculate relative risk. The frequencies of genotypes in a population cannot be calculated from case-control studies themselves because they depend on the prevalence of the disease in the population. In particular, if the prevalence of a disease is p(D), then:

f_RR=P(RR|D)p(D)+P(RR|～D)(1-p(D))f _RR =P(RR|D)p(D)+P(RR|～D)(1-p(D))

f_RN=P(RN|D)p(D)+P(RN|～D)(1-p(D))f _RN =P(RN|D)p(D)+P(RN|～D)(1-p(D))

f_NN=P(NN|D)p(D)+P(NN|～D)(1-p(D))。f _NN =P(NN|D)p(D)+P(NN|~D)(1-p(D)).

当p(D)足够小时，基因型的频率可以接近对照群体中的基因型频率，但是当流行度高时，这将不会是准确的估计值。但是，如果给出参照数据集(例如，HapMap[cite])，人们可以基于参照数据集估计基因型频率。When p(D) is small enough, the frequency of the genotype can approach that in the control population, but when the prevalence is high, this will not be an accurate estimate. However, given a reference dataset (eg, HapMap[cite]), one can estimate genotype frequencies based on the reference dataset.

大多数新近的研究不使用参照数据集估计相对风险，并且仅报告优势比。优势比可以写作Most recent studies do not use reference data sets to estimate relative risks and report only odds ratios. odds ratio can be written

${OR OR}_{RR RR} = = \frac{P P ((RR RR | | D D.)) P P ((NN NN | | ~ ~ D D.))}{P P ((NN NN | | D D.)) P P ((RR RR | | ~ ~ D D.))}$

${OR OR}_{RN RN} = = \frac{P P ((RN RN | | D D.)) P P ((NN NN | | ~ ~ D D.))}{P P ((NN NN | | D D.)) P P ((RN RN | | ~ ~ D D.))}$

由于通常不需要具有群体中等位基因频率的估计值，所以优势比通常是有利的；为了计算优势比，通常所需要的是病例和对照中的基因型频率。Odds ratios are often favorable since it is generally not necessary to have estimates of allele frequencies in a population; to calculate odds ratios, all that is usually required are the genotype frequencies in cases and controls.

在一些情况中，基因型数据本身是不可得的，但是概括数据(例如优势比)是可得的。这是在基于来自先前的病例对照研究的结果进行后设分析(meta-analysis)时的情况。在这一情况下，证实了如何从优势比找到相对风险。使用以下等式显示的事实：In some cases, genotype data per se are not available, but summary data (eg odds ratios) are available. This is the case when a meta-analysis is performed based on results from previous case-control studies. In this case, it was demonstrated how to find the relative risk from the odds ratio. The facts shown using the following equation:

如果这一等式除以P(D|NN)，我们得到If this equation is divided by P(D|NN), we get

$\frac{p p ((D D.))}{p p ((D D. | | NN NN))} = = {f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN}$

这使得优势比能够写成以下形式：This allows the odds ratio to be written as:

$\begin{matrix} {OR OR}_{RR RR} = = \frac{P P ((D D. | | RR RR)) ((11 - - P P ((D D. | | NN NN))))}{P P ((D D. | | NN NN)) ((11 - - P P ((D D. | | RR RR))))} = = {λ λ}_{RR RR} \frac{\frac{p p ((D D.))}{p p ((D D. | | NN NN))} - - p p ((D D.))}{\frac{p p ((D D.))}{p p ((D D. | | NN NN))} - - p p ((D D.)) {λ λ}_{RR RR}} = = \\ {λ λ}_{RR RR} \frac{{f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN} - - p p ((D D.))}{{f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN} - - p p ((D D.)) {λ λ}_{RR RR}} \end{matrix}$

通过类似计算，得到以下等式系统：Through similar calculations, the following equation system is obtained:

${OR OR}_{RR RR} = = {λ λ}_{RR RR} \frac{{f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN} - - p p ((D D.))}{{f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN} - - p p ((D D.)) {λ λ}_{RR RR}}$

${OR OR}_{RN RN} = = {λ λ}_{RN RN} \frac{{f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN} - - p p ((D D.))}{{f f}_{RR RR} {λ λ}_{RR RR} + + {f f}_{RN RN} {λ λ}_{RN RN} + + {f f}_{NN NN} - - p p ((D D.)) {λ λ}_{RN RN}} . .$

等式1Equation 1

如果已知优势比、群体中的基因型频率和疾病的流行度，则可以通过求解这一方程组得到相对风险。Relative risk can be obtained by solving this system of equations if the odds ratio, the genotype frequency in the population, and the prevalence of the disease are known.

应注意到，存在两个二次方程，因此它们具有最多四个解。但是，如以下所示，对于这一方程通常存在一个可能的解。Note that there are two quadratic equations, so they have a maximum of four solutions. However, as shown below, there is usually one possible solution to this equation.

应注意到，当f_NN=1时，等式系统1等同于Zhang和Yu公式；但是，这里考虑了群体中的等位基因频率。而且，我们的方法考虑了如下事实：两个相对风险彼此依赖，而先前的方法提出独立地计算各相对风险。It should be noted that when f _NN = 1, Equation System 1 is equivalent to the Zhang and Yu formula; however, the allele frequencies in the population are considered here. Furthermore, our method takes into account the fact that two relative risks are dependent on each other, whereas previous methods propose to calculate each relative risk independently.

多等位基因基因座的相对风险。如果考虑多标记或其它多等位基因变型，计算略微复杂。a₀、a₁、...、a_k表示可能的k+1个等位基因，其中a₀为非风险等位基因。假设了对于k+1个可能的等位基因在群体中的等位基因频率f₀、f₁、f₂、...、f_k。对于等位基因i，相对风险和优势比定义为Relative risk of multiallelic loci. Calculations are slightly more complicated if multiple markers or other multi-allelic variants are considered. a ₀ , a ₁ , . . . , a _k represent possible k+1 alleles, where a ₀ is a non-risk allele. Allele frequencies f ₀ , f ₁ , f ₂ , . . . , f _k in the population for k+1 possible alleles are assumed. For allele i, the relative risk and odds ratios are defined as

${λ λ}_{i i} = = \frac{P P ((D D. | | {a a}_{i i}))}{P P ((D D. | | {a a}_{00}))}$

${OR OR}_{i i} = = \frac{P P ((D D. | | {a a}_{i i})) ((11 - - P P ((D D. | | {a a}_{00}))))}{P P ((D D. | | {a a}_{00})) ((11 - - P P ((D D. | | {a a}_{i i}))))} = = {λ λ}_{i i} \frac{11 - - P P ((D D. | | {a a}_{00}))}{11 - - P P ((D D. | | {a a}_{i i}))}$

以下等式适用于疾病的流行度：The following equation applies to the prevalence of a disease:

$p p ((D D.)) = = {Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} P P ((D D. | | {a a}_{i i}))$

因此，通过将等式两侧都除以p(D|a₀)，我们得到：Therefore, by dividing both sides of the equation by p(D|a ₀ ), we get:

$\frac{p p ((D D.))}{P P ((D D. | | {a a}_{00}))} = = {Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} {λ λ}_{i i}$

得到：get:

${OR OR}_{i i} = = {λ λ}_{i i} \frac{{Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} {λ λ}_{i i} - - p p ((D D.))}{{Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} {λ λ}_{i i} - - {λ λ}_{i i} p p ((D D.))},,$

通过设定 $C = Σ_{i = 0}^{k} f_{i} λ_{i},$ 得到 $λ_{i} = C \cdot \frac{{OR}_{i}}{p (D) {OR}_{i} + C - p (D)} .$ 因此，通过C的定义，得出：by setting $C = Σ_{i = 0}^{k} f_{i} λ_{i},$ get $λ_{i} = C \cdot \frac{{OR}_{i}}{p (D.) {OR}_{i} + C - p (D.)} .$ Therefore, by the definition of C, it follows that:

$11 = = {Σ Σ}_{i i = = 00}^{k k} {f f}_{i i} \frac{{λ λ}_{i i}}{C C} = = {Σ Σ}_{i i = = 00}^{k k} \frac{{f f}_{i i} {OR OR}_{i i}}{p p ((D D.)) {OR OR}_{i i} + + C C - - p p ((D D.))} . .$

这是具有一个变量C的多项式方程。一旦确定了C，就确定了相对风险。多项式为k+1度，并且因此我们预计具有至多k+1个解。但是，由于方程的右侧严格地减化为C的函数，那么对于这一方程可能通常仅存在一个解。使用对半检索容易找到这个解，因为该解界于C=1和 $C = Σ_{i = 0}^{k} O R_{i}$ 之间。This is a polynomial equation with one variable C. Once C is determined, the relative risk is determined. The polynomial is of degree k+1, and thus we expect to have at most k+1 solutions. However, since the right-hand side of the equation strictly reduces to a function of C, there may usually only be one solution to this equation. This solution is easy to find using a binary search, since the solution is bounded by C = 1 and $C = Σ_{i = 0}^{k} o R_{i}$ between.

相对风险评估的稳定性。测定各不同参数(流行度、等位基因频率和优势比误差)对于相对风险的估计值的影响。为了测定等位基因频率和流行度估计值对相对风险值的影响，由一组不同优势比、不同等位基因频率的值(在HWE下)计算相对风险，并且对于在0至1范围内的流行度值对这些计算的结果进行绘图。Stability of relative risk assessments. The effect of various parameters (prevalence, allele frequency, and odds ratio error) on estimates of relative risk was determined. To determine the effect of allele frequency and prevalence estimates on relative risk values, the relative risk was calculated from a set of values (under HWE) for different odds ratios, different allele frequencies, and for values ranging from 0 to 1 The popularity values plot the results of these calculations.

另外，对于固定的流行度值，所得的相对风险作为风险等位基因频率的函数绘图。很明显，在所有p(D)=0的情况下，λ_RR=OR_RR和λ_RN=OR_RN，并且当p(D)=1时，λ_RR=λ_RN=0。这可以由等式1直接计算得到。另外，当风险等位基因频率高时，λ_RR接近于线性表现，并且λ_RN接近于具有有界二次导数的凹函数。当风险等位基因频率低时，λ_RR和λ_RN接近于函数1／p(D)的表现。这意味着对于高的风险等位基因频率，流行度的错误估计值将不会很大地影响所得的相对风险。Additionally, the resulting relative risk is plotted as a function of risk allele frequency for a fixed prevalence value. It is clear that λ _RR =OR _RR and λ _RN =OR _RN in all cases of p(D)=0, and λ _RR =λ _RN =0 when p(D)=1. This can be directly calculated from Equation 1. In addition, λ _RR approaches a linear behavior when the risk allele frequency is high, and λ _RN approaches a concave function with bounded second derivative. When the risk allele frequency is low, λ _RR and λ _RN are close to the performance of function 1/p(D). This means that for high risk allele frequencies, erroneous estimates of prevalence will not greatly affect the resulting relative risk.

以下实施例举例说明和解释了本发明。本发明的范围不限于这些实施例。The following examples illustrate and explain the invention. The scope of the present invention is not limited to these examples.

实施例IExample I

SNP分布图的生成和分析Generation and analysis of SNP distribution maps

向个体提供试剂盒(例如从DNAGenotek购得的)中的样品管，个体将唾液样品(大约4ml)存放在该取样管中，将从唾液样品中提取基因组DNA。唾液样品递送至进行处理和分析的CLIA认证的实验室。通常，样品在采集试剂盒内方便地提供给个体的运输容器中通过隔夜邮寄递送至测试机构。The individual is provided with a sample tube in a kit (eg, available from DNAGenotek) into which the individual deposits a saliva sample (approximately 4 ml) from which genomic DNA will be extracted. Saliva samples were delivered to a CLIA-certified laboratory for processing and analysis. Typically, the sample is delivered to the testing facility by overnight mail in a shipping container conveniently provided to the individual within the collection kit.

在优选的实施方式中，基因组DNA从唾液中分离。例如，使用由DNAGenotek提供的DNA自采集试剂盒技术，个体采集用于临床处理的大约4ml唾液样品。在将样品递送至适当的用于处理的实验室之后，通过样品的热变性和蛋白酶消化(通常使用由采集试剂盒供应商提供的试剂在50℃下处理至少一小时)分离DNA。随后，对样品进行离心，并且对上层清液进行乙醇沉淀。将DNA沉淀物悬浮在适于后续分析的缓冲液中。In a preferred embodiment, genomic DNA is isolated from saliva. For example, using the DNA self-collection kit technology provided by DNAGenotek, individuals collect approximately 4 ml of saliva samples for clinical processing. After delivery of the sample to an appropriate laboratory for processing, DNA is isolated by heat denaturation and protease digestion of the sample (usually at 50°C for at least one hour using reagents provided by the collection kit supplier). Subsequently, the samples were centrifuged and the supernatant was subjected to ethanol precipitation. The DNA pellet was suspended in a buffer suitable for subsequent analysis.

按照公知的程序和／或由采集试剂盒制造商提供的程序，从唾液样品中分离个体的基因组DNA。通常，首先对样品进行热变性和蛋白酶消化。接着，对样品进行离心分离，并且保留上层清液。然后将上层清液进行乙醇沉淀以得到包含大约5～16ug的基因组DNA的沉淀。将DNA沉淀物悬浮在10mM的Tris(pH7.6)、1mM的EDTA(TE)中。使用由阵列制造商提供的仪器和使用说明，通过将基因组DNA与商购的高密度SNP阵列(例如由Affymetrix或Illumina提供的高密度SNP阵列)杂交以生成SNP分布图。将个体SNP分布图存储在加密数据库或保险库中。Genomic DNA of the individual is isolated from the saliva sample according to known procedures and/or as provided by the manufacturer of the collection kit. Typically, samples are first subjected to heat denaturation and protease digestion. Next, the samples were centrifuged and the supernatant was retained. The supernatant was then subjected to ethanol precipitation to obtain a precipitate containing approximately 5-16 ug of genomic DNA. The DNA pellet was suspended in 10 mM Tris (pH 7.6), 1 mM EDTA (TE). SNP profiles are generated by hybridizing genomic DNA to commercially available high-density SNP arrays, such as those provided by Affymetrix or Illumina, using the instrumentation and instructions provided by the array manufacturer. Store individual SNP profiles in an encrypted database or vault.

通过与已确立的、医学相关SNP(其在基因组中的存在与给定的疾病或状态有关)的临床数据库相比较，查询患者的数据结构以寻找赋予风险的SNP。该数据库包括特定SNP和SNP单体型与特定疾病或状态的统计学相关性的信息。例如，如实施例III所示，载脂蛋白E基因中的多态性导致蛋白质的相异同种型，这又与发生阿尔茨海默氏病的统计学似然性有关。作为另一实施例，具有称作因子VLeiden的凝血蛋白质因子V的变型的个体具有增大的凝血趋势。其中SNP与疾病或状态表型相关的许多基因示于表1中。由研究/临床顾问委员会核准数据库中的信息的科学准确性和重要性，并且可以由监督的政府机构进行检查。可以连续更新数据库，因为更多的SNP-疾病相关性从科学界出现。The patient data structure is queried for risk-conferring SNPs by comparison to clinical databases of established, medically relevant SNPs whose presence in the genome correlates with a given disease or condition. The database includes information on the statistical association of specific SNPs and SNP haplotypes with specific diseases or conditions. For example, as shown in Example III, polymorphisms in the apolipoprotein E gene result in distinct isoforms of the protein, which in turn correlates with the statistical likelihood of developing Alzheimer's disease. As another example, individuals with a variant of the coagulation protein Factor V called Factor V Leiden have an increased tendency to coagulate. A number of genes in which SNPs are associated with disease or condition phenotypes are shown in Table 1. The scientific accuracy and significance of the information in the database is approved by the Research/Clinical Advisory Board and may be checked by overseeing government agencies. The database can be continuously updated as more SNP-disease associations emerge from the scientific community.

通过在线入口或邮件向患者安全地提供个体SNP分布图的分析结果。向患者提供解释和支持性信息，例如实施例IV中显示的关于因子VLeiden的信息。对个体的SNP分布图信息的安全访问(例如通过在线入口)将便于与患者的医生进行讨论，并且赋予对于个人化医疗进行选择的能力。The analysis results of individual SNP profiles are securely provided to patients via online portal or email. Provide explanations and supportive information to the patient, such as the information on Factor V Leiden shown in Example IV. Secure access to an individual's SNP profile information (eg, through an online portal) would facilitate discussions with the patient's physician and empower options for personalized medicine.

实施例IIExample II

基因型相关性的更新Updating of Genotype Correlations

响应于初始确定个体基因型相关性的请求，生成基因组图谱，得到基因型相关性，并且如实施例I所述向个体提供结果。在个体的基因型相关性的初始确定之后，随后当已知附加的基因型相关性时，确定或能够确定更新的相关性。注册用户具有高级注册且其基因型谱保存在加密数据库中。更新的相关性在存储的基因型谱上进行。In response to a request to initially determine an individual's genotype correlation, a genomic profile is generated, the genotype correlation is obtained, and the results are provided to the individual as described in Example 1. After an initial determination of an individual's genotype correlations, later when additional genotype correlations are known, updated correlations are determined or can be determined. Registered users have advanced registration and their genotype profiles are kept in an encrypted database. The updated correlations are performed on the stored genotype profiles.

例如，如以上实施例I所述，初始基因型相关性已确定特定个体不具有ApoE4，并且因此不易患早发型阿尔茨海默氏病，并且确定这一个体不具有因子VLeiden。这一初始确定之后，新的相关性变得已知并且经过验证，以致在给定基因(假设为基因XYZ)中的多态性与给定状态(假设为状态321)相关。将这一新的基因型相关性加入到人类基因型相关性的主数据库中。然后通过首先从存储在加密数据库中的特定个体的基因组图谱中获取相关基因XYZ的数据，向特定个体提供更新。将特定个体的相关基因XYZ数据与更新的主数据库的基因XYZ信息相比较。从这一对比中确定特定的个体对于状态321的易感性或患病体质。将这一确定的结果加入到特定个体的基因型相关性中。将是否特定个体对状态321敏感或者遗传上易感的更新结果与解释性和支持性信息一起提供给特定个体。For example, as described in Example 1 above, initial genotype correlations have determined that a particular individual does not have ApoE4, and therefore is not susceptible to early-onset Alzheimer's disease, and that this individual is determined not to have factor VLeiden. After this initial determination, new associations become known and validated such that a polymorphism in a given gene (let's say gene XYZ) is associated with a given state (let's say state 321). Add this new genotype correlation to the master database of human genotype correlations. Updates are then provided to a specific individual by first fetching the data for the associated gene XYZ from the specific individual's genomic profile stored in an encrypted database. The associated gene XYZ data for a particular individual is compared with the updated master database's gene XYZ information. From this comparison, a particular individual's susceptibility or predisposition to state 321 is determined. The results of this determination are added to the genotype correlations for a particular individual. An updated result of whether a particular individual is susceptible to status 321 or is genetically susceptible is provided to the particular individual along with explanatory and supportive information.

实施例IIIExample III

ApoE4基因座和阿尔茨海默氏病的相关性Association between the ApoE4 locus and Alzheimer's disease

已经显示阿尔茨海默氏病(AD)的风险与载脂蛋白E(APOE)基因中的多态性相关，这一多态性导致称为ApoE2、ApoE3和ApoE4的APOE的三种同种型。这些同种型在APOE蛋白的残基112和158上的一个或两个氨基酸相互不同。ApoE2包含112／158位的半胱氨酸／半胱氨酸；ApoE3包含112／158位的半胱氨酸／精氨酸；和ApoE4包含112／158位的精氨酸／精氨酸。如表3所示，阿尔茨海默氏病在较小年龄发作的危险随APOEε4基因拷贝数增大。同样，如表3所示，AD的相对风险随APOEε4基因拷贝数增大。Alzheimer's disease (AD) risk has been shown to be associated with a polymorphism in the apolipoprotein E (APOE) gene, which results in three isoforms of APOE called ApoE2, ApoE3 and ApoE4 . These isoforms differ from each other by one or two amino acids at residues 112 and 158 of the APOE protein. ApoE2 contains cysteine/cysteine at position 112/158; ApoE3 contains cysteine/arginine at position 112/158; and ApoE4 contains arginine/arginine at position 112/158. As shown in Table 3, the risk of Alzheimer's disease onset at a younger age increases with the copy number of the APOEε4 gene. Also, as shown in Table 3, the relative risk of AD increased with the copy number of APOEε4 gene.

表3：AD风险等位基因的流行度(Corder等，Science：261：921—3，1993)Table 3: Prevalence of AD risk alleles (Corder et al., Science: 261:921-3, 1993)

表4：具有ApoE4的AD相对风险(Farrer等，JAMA：278：1349-56，1997)Table 4: Relative risk of AD with ApoE4 (Farrer et al., JAMA: 278: 1349-56, 1997)

APOE基因型APOE genotype 优势比odds ratio ε2ε2ε2ε2 0.60.6 ε2ε3ε2ε3 0.60.6 ε3ε3ε3ε3 1.01.0 ε2ε4ε2ε4 2.62.6 ε3ε4ε3ε4 3.23.2 ε4ε4ε4ε4 14.914.9

实施例IVExample IV

因子VLeiden阳性患者的信息Information for Factor V Leiden positive patients

以下信息是可能提供给具有显示出存在因子VLeiden基因的基因组SNP分布图的个体的信息的示例。该个体可具有在初始报告中可以提供信息的基础注册。The following information is an example of information that might be provided to an individual with a genomic SNP profile showing the presence of the Factor VLeiden gene. The individual may have a base registration that may provide information in the initial report.

什么是因子VLeiden?What is Factor VLeiden?

因子VLeiden不是疾病，其是指存在由一个人的父母遗传的特定基因。因子VLeiden是凝血需要的蛋白质因子V(5)的变型。具有因子V缺失的人更可能严重流血，而具有因子VLeiden的人的血液凝血趋势增加。Factor VLeiden is not a disease, it refers to the presence of specific genes that are inherited by a person's parents. Factor V Leiden is a variant of the protein Factor V(5) required for blood clotting. People with a factor V deficiency are more likely to bleed severely, while those with factor V Leiden have an increased tendency for their blood to clot.

携带因子VLeiden基因的人具有比群体中其余的人高5倍的出现血凝块(血栓症)的风险。但是许多具有该基因的人从不出现血凝块。在英国和美国，群体的5％携带一个或多个因子VLeiden基因，这远多于将实际患血栓症的人的数量。People who carry the factor V Leiden gene have a five-fold higher risk of developing blood clots (thrombosis) than the rest of the population. But many people with the gene never develop blood clots. In the UK and US, 5% of the population carry one or more Factor V Leiden genes, which is far more than the number of people who will actually develop thrombosis.

你如何得到因子VLeiden?How do you get factor VLeiden?

因子V基因由一个人的父母遗传。正如所有遗传性特征，一个基因遗传自母亲而一个遗传自父亲。由此，可能遗传：两个正常基因或者一个因子VLeiden基因和一个正常基因或者两个因子VLeiden基因。具有一个因子VLeiden基因将导致稍高的发生血栓症的风险，但是具有两个基因导致大得多的风险。The Factor V gene is inherited from a person's parents. As with all inherited traits, one gene is inherited from the mother and one from the father. Thus, it is possible to inherit: two normal genes or one factor VLeiden gene and one normal gene or two factor VLeiden genes. Having one gene for Factor VLeiden will result in a slightly higher risk of developing thrombosis, but having two genes will result in a much greater risk.

因子VLeiden的症状是什么?What are the symptoms of Factor V Leiden?

没有病征，除非你具有血凝块(血栓症)。Have no symptoms unless you have a blood clot (thrombosis).

危险信号是什么？What are the red flags?

最常见的问题是在腿部的血凝块。腿部肿胀、疼痛和发红显示出这一问题。在更稀有的病例中，可能出现肺部血凝块(肺血栓症)，其导致呼吸困难。根据血凝块的尺寸，这一病症的严重程度从几乎不能被察觉到患者发生严重的呼吸困难。在甚至更稀有的病例中，血凝块可能发生在手臂或其它身体部位。由于这些凝块形成在输送血液至心脏的静脉而不是形成在动脉(其从心脏输出血液)中，因子VLeiden不会使冠状动脉血栓形成的风险增大。The most common problem is blood clots in the legs. Swelling, pain, and redness in the legs show this problem. In rarer cases, a blood clot in the lungs (pulmonary thrombosis) may develop, which causes difficulty breathing. Depending on the size of the clot, the severity of the condition can range from barely noticeable to severe breathing difficulties for the patient. In even rarer cases, a blood clot may occur in the arm or other body part. Because these clots form in the veins that carry blood to the heart rather than in the arteries (which pump blood from the heart), Factor VLeiden does not increase the risk of coronary thrombosis.

做什么可以避免血凝块?What can be done to avoid blood clots?

因子VLeiden仅轻微增大导致血凝块的风险，并且许多具有这一状态的人永不会发生血栓症。一个人可以做许多事情来避免导致血凝块。避免以同一姿势久站或久坐。当长途旅行时，重要的是有规律地锻炼——必须使血液不“静置不动”。熬夜或吸烟将极大地增大出现血凝块的风险。携带因子VLeiden基因的妇女不应该服避孕丸，因为这将显著增大患血栓症的机会。携带因子VLeiden基因的妇女也应该在妊娠前咨询其医生，因为这也会增大血栓形成的风险。Factor VLeiden only slightly increases the risk of blood clots, and many people with this condition never develop blood clots. There are many things a person can do to avoid causing a blood clot. Avoid standing or sitting in the same position for long periods of time. When traveling long distances, it is important to exercise regularly - it is essential that the blood does not "stand still". Staying up late or smoking will greatly increase the risk of blood clots. Women who carry the factor VLeiden gene should not take birth control pills as this significantly increases the chance of developing blood clots. Women who carry the factor VLeiden gene should also consult their doctor before becoming pregnant as this also increases the risk of blood clots.

医生如何发现你是否具有因子VLeiden?How do doctors find out if you have Factor VLeiden?

因子VLeiden的基因可以在血液样品中发现。The gene for factor VLeiden can be found in blood samples.

在腿部或者手臂的血凝块通常由超声检查确定。Blood clots in the legs or arms are usually identified by ultrasound.

在将一种物质注入血液中以使血凝块显现后，血凝块也可由X射线检测。在肺里的血块更难于找到，但是通常医生将使用放射性物质去测试肺内血流的分布和流至肺内的空气的分布。这两种分布模式应该相匹配——不匹配表示存在血凝块。Blood clots can also be detected by X-rays after a substance is injected into the blood to make them visible. Blood clots in the lungs are more difficult to find, but usually doctors will use radioactive material to test the distribution of blood flow in the lungs and the distribution of air to the lungs. The two distribution patterns should match - a mismatch indicates the presence of a blood clot.

因子VIeiden如何处理?How is factor VIeiden handled?

具有因子VLeiden的人不需要治疗，除非他们的血液开始凝结，在这种情况下，医生将开出稀释血液(抗凝血的)药物，例如华法林(例如，苄丙酮香豆素钠)或者肝素以防止进一步的血凝块。治疗通常将持续三至六个月，但是如果存在几个血凝块，则可能需要更长时间。在重症的情况下，药物治疗的过程可能无限期地持续；在极稀有的情况下，血凝块可能需要手术移除。People with factor VLeiden do not need treatment unless their blood begins to clot, in which case a doctor will prescribe a blood-thinning (anticoagulant) drug such as warfarin (for example, coumarin) Or heparin to prevent further blood clots. Treatment will usually last three to six months, but it may take longer if several blood clots are present. In severe cases, the course of drug therapy may continue indefinitely; in very rare cases, the blood clot may require surgical removal.

在妊娠期间因子VLeiden如何处理?How is Factor VLeiden handled during pregnancy?

携带两个因子VLeiden基因的妇女在妊娠期间将需要接受肝素促凝药物的治疗。相同的治疗适用于本身先前有血凝块或者有血凝家族史的仅携带一个因子VLeiden基因的妇女。Women who carry both factor VLeiden genes will need to be treated with the clot-promoting drug heparin during pregnancy. The same treatment is given to women who carry only one gene for factor VLeiden who have had blood clots themselves or have a family history of blood clots.

所有携带因子VLeiden基因的妇女在妊娠后半段可能需要穿着特殊的长筒袜以防止血凝块。在孩子出生以后，可以给她们开抗凝血药物肝素。All women who carry the factor VLeiden gene may need to wear special stockings during the second half of pregnancy to prevent blood clots. After the baby is born, they can be prescribed the anticoagulant drug heparin.

预后Prognosis

出现血凝块的风险随年龄增大，但是在对100名携带该基因的人进行的随年龄的调查中，发现仅少数曾患过血栓症。国家遗传顾问学会(TheNationalSocietyforGeneticCounselors(NSGC))可以提供你所在地区中遗传顾问的列表以及关于建立家族史的信息。在www.nsgc.org／consumer上搜寻他们的在线数据库。The risk of blood clots increases with age, but in an age-related survey of 100 people who carried the gene, only a minority were found to have ever had a blood clot. The National Society for Genetic Counselors (NSGC) can provide a list of genetic counselors in your area and information on establishing a family history. Search their online database at www.nsgc.org/consumer.

虽然在此已经显示和描述了本发明的优选实施方式，但是对于本领域技术人员而言很清楚，这些实施方式仅以实施例的方式提供。本领域技术人员现在可以想到的许多变型、改变和替换而不脱离本发明。应该理解，对于在此所描述的本发明的实施方式的许多替代方式可以用于实现本发明。预想的是，以下的权利要求限定本发明的范围，且本发明覆盖在这些权利要求的范围内的方法和结构及其等效物。While preferred embodiments of the present invention have been shown and described herein, it will be clear to those skilled in the art that these embodiments are provided by way of example only. Numerous variations, changes and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that many alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that the invention cover methods and structures within the scope of these claims and their equivalents.

Claims

1. A method of generating a Genetic Composite Index (GCI) score for an individual, the method comprising:

a) obtaining a genetic sample of said individual;

b) generating a genome profile from said genetic sample;

c) determining a GCI score for the individual's phenotype by comparing the individual's genomic profile to a current database of human genotype correlations, wherein human genotype correlations are the correlation of genetic variants to phenotypes, wherein said GCI score is obtained by identifying an odds ratio or relative risk for one or more genetic loci, and

wherein said phenotype is selected from Alzheimer's disease (AD), colorectal cancer (CRC), osteoarthritis (OA), exfoliation glaucoma (XFG), obesity (BMIOB), Graves' disease (GD) , hemochromatosis (HEM), myocardial infarction (MI), multiple sclerosis (MS), psoriasis (PS), restless legs syndrome (RLS), celiac disease (CelD), prostate cancer (PC), lupus ( SLE), macular degeneration (AMD), rheumatoid arthritis (RA), breast cancer (BC), Crohn's disease (CD), type 2 diabetes (T2D), and combinations thereof;

And wherein said genetic variation is selected from SNP: rs4420638 when said genotype is AD, rs6983267 when said genotype is CRC, rs4911178 when said genotype is OA, and rs2165241 when said genotype is XFG , when the genotype is BMIOB, it is rs9939609 or rs9291171, when the genotype is GD, it is rs3087243, DRB1*0301DQA1*0501, when the genotype is HEM, it is rs1800562 or rs129128, when the genotype is MI, it is rs1866389 , rs1333049 or rs6922269, when the genotype is MS, it is rs6897932, rs12722489 or DRB1*1501, when the genotype is PS, it is rs6859018, rs11209026 or HLAC*0602, when the genotype is RLS, it is rs6904723, rs2300478, rs1026732 Or rs9296249, when the genotype is CelD, it is rs6840978, rs11571315, rs2187668 or DQA1*0301DQB1*0302, when the genotype is PC, it is rs4242384, rs6983267, rs16901979, rs17765344 or rs4430796, when the genotype is 17rs15LE , rs10954213, rs2004640, DRB1*0301 or DRB1*1501, when the genotype is AMD, it is rs10737680, rs10490924, rs541862, rs2230199, rs1061170 or rs9332739, when the genotype is RA, it is rs6679677, rs112033DR0, rs112033DR06 、DRB1*0401或DRB1*0404，所述基因型为BC时是rs3803662、rs2981582、rs4700485、rs3817198、rs17468277、rs6721996或rs3803662，所述基因型为CD时是rs2066845、rs5743293、rs10883365、rs17234657、rs10210302、rs9858542 , rs11805303, rs1000113, rs17221417, rs2542151 or rs10761659, when the genotype is T2D, it is rs13266634, rs4506565, rs10012946, rs7756992, rs10811661, rs12288738, rs80501136, rs 875, rs4402960, rs5215 or rs1801282;

and

d) Report the GCI score.

2. The method of claim 1, wherein a third party obtains the genetic sample.

3. The method of claim 1, wherein said generating a genomic profile is performed by a third party.

4. The method of claim 1, wherein said reporting comprises transmitting said results over a network.

5. The method of claim 1, wherein the genomic profile is the entire genome of the individual.

6. The method of claim 1, wherein the method comprises determining the plurality of relative risks or odds ratios from 10 or more genotype correlations.

7. The method of claim 1, further comprising generating a GCIplus score.

8. The method of claim 1, wherein the genetic sample is from a biological sample selected from the group consisting of blood, hair, skin, saliva, semen, urine, fecal matter, sweat, and oral samples.

9. The method of claim 1, wherein the genotype correlation is a correlation of a single nucleotide polymorphism with a phenotype of a non-medical condition.

10. The method of claim 1, wherein the genomic profile is generated using a high-density DNA microarray, genomic DNA sequencing, or a PCR-based method.

11. The method of claim 1, wherein the results further comprise incorporating data selected from the group consisting of physical data, medical data, demographic data, exposure data, lifestyle data, behavioral data, race, pedigree, geography, gender, age, familial Characterization of the individual by history and predetermined phenotype.

12. The method of claim 1, wherein the genomic profile includes genetic markers of linkage disequilibrium for genetic variants associated with a phenotype.

13. The method of claim 1, wherein the GCI score is an estimated lifetime risk.

14. The method of claim 1, wherein the genomic profile comprises at least 100,000 genetic variants.

15. The method of claim 1, wherein the genomic profile comprises at least 400,000 genetic variants.

16. The method of claim 1, further comprising reporting information about said phenotype, wherein the information is selected from the group consisting of preventive strategies, health information, therapy, symptom awareness, early detection protocols, intervention protocols, and accurate Identification and classification.

17. The method of claim 11, wherein the physical data is selected from the group consisting of blood pressure, heart rate, glucose level, metabolite level, ion level, weight, height, cholesterol level, vitamin level, blood count, body mass index (BMI) , protein and transcript levels.

18. The method of claim 1, further comprising:

f) updating said database with at least one human genotype correlation; and

g) generating at least one additional relative risk or odds ratio of said phenotype by comparing said individual's genomic profile with said at least one human genotype correlation of step f);

h) calculating at least one updated genetic composite index (GCI) from said at least one additional relative risk or odds ratio determined in step g); and

i) reporting said results from step h) to said individual or said individual's healthcare manager.

19. The method of claim 1, wherein the reporting of the at least one GCI score comprises a network transmission.

20. The method of claim 19, wherein the reporting includes transmitting the at least one GCI score via an online portal.