Patent application title: Esophageal Cancer Markers

Inventors: Florin M. Selaru (Baltimore, MD, US) Stephen J. Meltzer (Lutherville, MD, US) Stephen J. Meltzer (Lutherville, MD, US)
Assignees: University of Maryland, Baltimore THE JOHNS HOPKINS UNIVERSITY
IPC8 Class: AC12Q168FI
USPC Class: 506 9
Class name: Combinatorial chemistry technology: method, library, apparatus method of screening a library by measuring the ability to specifically bind a target molecule (e.g., antibody-antigen binding, receptor-ligand binding, etc.)
Publication date: 2014-07-24
Patent application number: 20140206565

Abstract:

The present invention is directed to methods for diagnosing cancer in a subject. Morphologically normal epithelial cells of the esophagus are assayed for marker expression. Characteristic expression of the markers indicates the presence of cancer or the predisposition to cancer. A panel of eleven markers are particularly good at identifying cancer and the predisposition to cancer.

Claims:

1. A method of determining presence or predisposition to esophageal cancer in a human subject, comprising: determining in a sample of morphologically normal esophageal epithelial cells of a human subject expression of one or more genes; determining a composite score of expression of the one or more genes; comparing the composite score to predetermined values for esophageal cancer or predisposition to esophageal cancer; identifying presence or predisposition to esophageal cancer based on the composite score.

2. A method of determining presence or predisposition to esophageal cancer in a human subject, comprising: determining in a sample of esophageal epithelial cells of a human subject expression of one or more genes selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member 0 (H2AFO); transglutaminase 3 (TGF3); major histocopatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB1B (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2); determining a composite score of expression of the one or more genes; comparing the composite score to predetermined values for esophageal cancer or predisposition to esophageal cancer; identifying presence or predisposition to esophageal cancer based on the composite score.

3. The method of claim 2 wherein the sample comprises morphologically normal esophageal epithelial cells.

4. The method of claim 1 wherein mRNA expression of the one or more genes is determined.

5. The method of claim 1 wherein protein expression of the one or more genes is determined.

6. The method of claim 2 wherein a composite score of expression of eleven genes is determined.

7. The method of claim 2 wherein a composite score of expression of ten genes is determined.

8. The method of claim 2 wherein a composite score of expression of nine genes is determined.

9. The method of claim 2 wherein a composite score of expression of eight genes is determined.

10. The method of claim 2 wherein a composite score of expression of seven genes is determined.

11. The method of claim 2 wherein a composite score of expression of six genes is determined.

12. The method of claim 2 wherein a composite score of expression of five genes is determined.

13. The method of claim 1 wherein a chemopreventive diet is recommended when presence or predisposition to esophageal cancer is identified.

14. The method of claim 1 wherein the sample is obtained by endoscopy.

15. The method of claim 1 wherein the sample is obtained by an inflatable balloon.

16. The method of claim 1 wherein the sample is obtained by a sponge.

17. The method of claim 1 wherein the human subject has diagnosed Barrett's esophagus.

18. A solid support for determining esophageal epithelium expression, comprising: antibodies or oligonucleotide probes for interrogating expression of at least six genes selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member O (H2AFO); transglutaminase 3 (TGF3); major histocompatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB1B (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2); wherein the antibodies or oligonucleotide probes for the at least six genes from the group comprise at least 50% of the antibodies or oligonucleotide probes on the solid support.

19. The solid support of claim 18 wherein the antibodies or oligonucleotide probes for the at least six genes from the group comprise at least 75% of the antibodies or oligonucleotide probes on the solid support.

20. The solid support of claim 18 wherein the antibodies or oligonucleotide probes for the at least six genes from the group comprise at least 90% of the antibodies or oligonucleotide probes on the solid support.

Description:

BACKGROUND OF THE INVENTION

[0002] One of the greatest challenges in the management of Barrett's esophagus (BA), the precursor lesion of esophageal adenocarcinoma (EAC), is to expeditiously identify patients who have early EAC and to predict those who will develop EAC. The rate of progression to cancer (0.4-0.5% per year in some studies, 0.5 to 1% per year in other studies) is very low, making this challenge particularly difficult (Reynolds et al., Gastroenterol Clin North Am 28(4):917-45 (1999); Cameron, Gastroenterol Clin North Am 26(3):487-94 (1997)). Moreover, in the surveillance of BA, a meticulous endoscopic search is often performed to identify grossly normal-appearing dysplastic or cancerous lesions. However, the value of this type of systematic surveillance has been questioned, due to its low sensitivity and specificity (Conio et al., Am J Gastroenterol 98(9):1931-9 (2003)). Thus, from a purely practical standpoint, it would be advantageous to be able to identify patients with malignant esophageal lesions simply by biopsying their normal squamous esophagus.

[0003] The presence and degree of dysplasia constitute the most widely accepted measure of neoplastic risk in Barrett's esophagus. However, significant problems have emerged demonstrating the need for improved progression risk biomarkers. These problems include poor interobserver reproducibility of dysplasia interpretation and inconsistent rates of progression as well as regression of dysplasia, both of which have made it difficult to develop national surveillance guidelines (Conio et al., Am J Gastroenterol 98(9):1931-9 (2003); Rana et al., Dis Esophagus 13(1):28-31 (2000); Reid et al., Am J Gastroenterol 95(7):1669-76 (2000)). Flow cytometry has shown promise in detecting a subset of patients who do not have high-grade dysplasia (HGD) but do have an increased risk of progression (Reid et al., Am J Gastroenterol 95(7):1669-76 (2000)).

[0004] The human genome project has yielded high-throughput methodologies for the computer analysis of data, which provide volume and quality control required to select clinically useful biomarkers (Taramelli et al., Eur J Cancer 40(17):2537-43 (2004); Varmus et al., Science 310(5754):1615 (2005); Yoshida, Jpn J Clin Oncol 29(10):457-9 (1999)). 17p (p53)-loss of heterozygosity (LOH) has also shown potential as a molecular biomarker (Reid et al., Gastrointest Endosc Clin N Am 13(2):369-97 (2003)). In addition, methylation of p16 and HPP1 have been shown to predict progression to HGD and EAC (Hardie et al., Cancer Lett 217(2):221-30 (2005); Geddert et al., Int J Cancer 110(2):208-11 (2004); Schulmann et al., Oncogene 24(25):4138-48 (2005)). Molecular alterations have been found in Barrett's metaplasia which reveal a field effect in premalignant metaplastic mucosa, but not in normal epithelium. For example, aneuploidy and loss of heterozygosity have been observed in metaplastic mucosa from Barrett's patients with dysplasia or adenocarcinoma (Blount et al., Proc Natl Acad Sci USA 90(8):3221-5 (1993); Boynton et al., Cancer Res 51(20):5766-9 (1991); Raskind et al., Cancer Res 52(10):2946-50 (1992); Reid et al., Gastroenterology 93(1):1-11 (1987)). Similarly, p53 tumor suppressor gene point mutations have been reported in Barrett's metaplasia (Casson et al., Am J Surg 167(1):52-7 (1994); Huang et al., Cancer Res 53(8):1889-94 (1993); Meltzer et al., Proc Natl Acad Sci USA 88(11):4976-80 (1991)), and altered promoter DNA methylation has also been described for some tumor suppressor genes in Barrett's esophagus (Eads et al., Cancer Res 60(18):5021-6 (2000); Kawakami et al., J Natl Cancer Inst 92(22):1805-11 (2000); Klump et al., Gastroenterology 115:1381-6 (1998); Wong et al., Cancer Res 57(13):2619-22 (1997)).

[0005] In contrast, most published studies to date report no DNA alterations (e.g., point mutations, methylation, or loss of heterozygosity) in normal squamous esophageal epithelium from patients with esophageal cancer. Corn et al. (Clinical Cancer Research 7(9):2765-9 (2001)) reported E-cadherin methylation in Barrett's esophagus specimens and esophageal adenocarcinoma, but not in normal esophageal epithelium. Another study showed that the expression of a panel of 23 genes capable of differentiating between Barrett's esophagus and esophageal adenocarcinoma was unable to distinguish between the normal epithelia of Barrett's metaplasia and adenocarcinoma patients (Brabender et al., Oncogene 23(27):4780-8 (2004)). One notable exception was the study by Eads et al., which found methylation of the CALCA, MGMT, and TIMP3 genes in the normal esophagus of a subset of patients with Barrett's-associated esophageal dysplasia and adenocarcinoma (Eads et al., Cancer Res 61(8):3410-8 (2001).

[0006] cDNA microarrays promise more accurate prediction than do classical clinical diagnostic tools (such as histologic categorization). However, the main challenge posed by microarrays is to construct meaningful classifiers based on gene expression profiles, using appropriate bioinformatics tools. A number of bioinformatics tools have been proposed, including artificial neural networks (Selaru et al., Gastroenterology 122(3):606-13 (2002)), hierarchical clustering (Selaru et al., Oncogene 21(3):475-8 (2002); Zou et al., Oncogene 21(31):4855-62 (2002)) and principal components analysis (Mori et al., Cancer Res 63(15):4577-82 (2003); Selaru et al., Cancer Res 64:1584-88 (2004)). Shrunken nearest centroid predictors (SNCPs) were adapted from classical nearest centroids predictors to gene microarray analysis (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)). From among large numbers of genes, it is difficult to distinguish expression variations due to chance. However, these variations tend to be of small amplitude. Thus, if small variations are ignored and only consistently relatively high changes in expression are accepted, biologic changes prevail over variations due to chance. Among the mathematical means used to ignore small variations, one method, SNCPs, is particularly valuable. Prediction Analysis of Microarrays (PAM) is a software package developed at Stanford University that utilizes SNCPs and performs internal validation simultaneously. Samples are divided up at random into K roughly equal-sized parts. For each part in turn, the classifier is built on the remaining K-1 parts, then tested on the last 1 part. This procedure is performed over a range of threshold values, and the cross-validated misclassification error rate is reported for each threshold value. Typically, the user chooses the threshold value giving the minimum cross-validated misclassification error rate. This method has been utilized successfully by investigators studying leukemia and breast cancer to find subsets of genes that accurately predicted classifications of these diseases (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002); Korkola et al., Cancer Res 63(21):7167-75 (2003); Sorlie et al., Proc Natl Acad Sci USA 100(14):8418-23 (2003)).

SUMMARY OF THE INVENTION

[0007] Diagnosing Cancer--Different Subjects

[0008] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0009] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0010] The present invention is further directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0011] Diagnosing Cancer--Same Subject

[0012] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0013] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0014] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0015] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained are separated by a distance of at least 3 cm in said subject.

[0016] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.

[0017] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.

[0018] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained have a grossly different appearance.

[0019] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.

[0020] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.

[0021] Detecting Differential Gene Expression

[0022] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0023] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0024] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0025] Diagnosing Cancer Using Markers

[0026] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining an expression pattern of one or more genes in a biological sample from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0027] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0028] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0029] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.

[0030] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.

TABLE-US-00001 TABLE 1 Name Gene ID gravin, complete cds. AB003476 protease, serine, 22 (P11) XM_006625 H1 histone family, member 2 (H1F2) NM_005319 fucosyltransferase 1 (galactoside 2-alpha-L- NM_000148 fucosyltransferase, Bombay phenotype included) (FUT1) H2A histone family, member L (H2AFL) XM_004416 serine (or cysteine) proteinase inhibitor, NM_002575 clade B (ovalbumin), member 2 (SERPINB2) H2B histone family, member C (H2BFC) NM_003519 membrane associated guanylate kinase 2 AF038563 (MAGI-2) (RG) heterogeneous nuclear ribonucleoprotein R11019 H1 (H) keratin 8 (KRT8) NM_002273 RAD51 (S. cerevisiae) homolog (E coli RecA XM_031515 homolog) (RAD51) plasminogen activator, urokinase (PLAU) NM_002658 H3 histone family, member B (H3FB) NM_003530 aldehyde dehydrogenase 1 family, member A3 XM_017971 (ALDH1A3) (RG) ankyrin 1, erythrocytic AA464755 wild-type p53 activated fragment-1 (WAF1) U03106 like mouse brain protein E46 (E46L) NM_013236 progestin induced protein (DD5) NM_015902 H2A histone family, member O (H2AFO) NM_003516 transglutaminase 3 (TGM3) XM_009572 major histocompatibility complex, class II, NM_019111 DR alpha (HLA-DRA) mitotic checkpoint protein kinase BUB1B AF107297 (BUB1B) glutathione peroxidase 2 (gastrointestinal) NM_002083 (GPX2)

[0031] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.

[0032] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.

TABLE-US-00002 TABLE 2 Name Gene ID gravin, complete cds. AB003476 protease, serine, 22 (P11) XM_006625 H1 histone family, member 2 (H1F2) NM_005319 fucosyltransferase 1 (galactoside 2-alpha-L- NM_000148 fucosyltransferase, Bombay phenotype included) (FUT1) H2A histone family, member L (H2AFL) XM_004416 serine (or cysteine) proteinase inhibitor, NM_002575 clade B (ovalbumin), member 2 (SERPINB2) H2B histone family, member C (H2BFC) NM_003519 membrane associated guanylate kinase 2 AF038563 (MAGI-2) (RG) heterogeneous nuclear ribonucleoprotein R11019 H1 (H) keratin 8 (KRT8) NM_002273 RAD51 (S. cerevisiae) homolog (E coli RecA XM_031515 homolog) (RAD51) plasminogen activator, urokinase (PLAU) NM_002658 H3 histone family, member B (H3FB) NM_003530 aldehyde dehydrogenase 1 family, member A3 XM_017971 (ALDH1A3) (RG) ankyrin 1, erythrocytic AA464755 wild-type p53 activated fragment-1 (WAF1) U03106 like mouse brain protein E46 (E46L) NM_013236

[0033] In one embodiment a method is provided for determining presence or predisposition to esophageal cancer in a human subject. Expression of one or more genes is determined in a sample of morphologically normal esophageal epithelial cells of a human subject. A composite score of expression of the one or more genes is calculated. The composite score is compared to predetermined values for esophageal cancer or predisposition to esophageal cancer which were obtained using appropriate populations of subjects with esophageal cancer or with predisposition to esophageal cancer. The presence or predisposition to esophageal cancer is identified based on the composite score.

[0034] In another embodiment a method is provided for determining presence or predisposition to esophageal cancer in a human subject. Expression of one or more genes is determined in a sample of esophageal epithelial cells of a human subject. The one or more genes is selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member O (H2AFO); transglutaminase 3 (TGF3); major histocopatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2). A composite score of expression of the one or more genes is calculated. The composite score is compared to predetermined values for esophageal cancer or predisposition to esophageal cancer which were obtained using appropriate populations of subjects with esophageal cancer or with predisposition to esophageal cancer. The presence or predisposition to esophageal cancer is identified based on the composite score.

[0035] Still another embodiment provides a method for determining presence or predisposition to esophageal cancer in a human subject. Expression of one or more genes is determined in a sample of morphologically normal esophageal epithelial cells of a human subject. The one or more genes is selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member 0 (H2AFO); transglutaminase 3 (TGF3); major histocompatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB1B (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2). A composite score of expression of the one or more genes is calculated. The composite score is compared to predetermined values for esophageal cancer or predisposition to esophageal cancer which were obtained using appropriate populations of subjects with esophageal cancer or with predisposition to esophageal cancer. Presence or predisposition to esophageal cancer is identified based on the composite score.

[0036] Unless otherwise stated, the cancer may be any cancer. The cancer may be esophageal adenocarcinoma or squamous cell cancer of the esophagus.

[0037] Unless otherwise stated, in each of the embodiments of the present invention the subject may be a mammal. In some embodiments the subject is a human.

[0038] Unless otherwise stated, in each of the embodiments of the present invention the gene expression pattern may be determined by any method known in the art. Either protein or mRNA expression can be analyzed. Any biochemical technique for assaying particular proteins or mRNA species can be used. Gene expression patterns may be determined using a polynucleotide microarray.

[0039] Unless otherwise stated, the gene expression pattern may be compared and/or analyzed by shrunken nearest centroid predictors (SNCP). Permutation analysis may be used in addition to SNCP analysis.

[0040] Unless otherwise stated, in each of the embodiments of the present invention the biological sample may be any biological sample from which polynucleotides may be obtained, such as mucosa. In some embodiments, the biological sample has a morphologically-normal appearance. In some embodiments the biological sample is esophageal epithelium, or squamous esophageal epithelium. In additional embodiments the biological sample is morphologically-normal appearing esophageal epithelium, or morphologically-normal appearing squamous esophageal epithelium.

[0041] Unless otherwise stated, in each of the embodiments of the present invention directed to methods for detecting differential gene expression, the method may further comprise predicting an increased risk for developing cancer in the subject. In one embodiment, the increased risk for developing cancer is an increased risk for developing esophageal adenocarcinoma.

[0042] Unless otherwise stated, in each of the embodiments of the present invention the one or more genes used in the determination of an expression pattern may be any of those identified by shrunken nearest centroid predictors. In some embodiments the one or more genes used in the determination of an expression pattern are selected from those genes set forth in Table 1, or selected from those genes set forth in Table 2.

BRIEF DESCRIPTION OF THE DRAWINGS

[0043] FIG. 1 shows the predicted diagnoses in comparison of NE from EAC patients vs. patients with either BA only or no BA. Patients with EAC, to the left of and on the vertical line, were diagnosed correctly in every case, as were all control patients without any lesion (to the right of the vertical line). Top left and bottom right, likelihood of being an EAC patient; Bottom left and top right, likelihood of being a non-cancer patient.

[0044] FIG. 2 shows the predicted diagnoses in comparison of NE from EAC patients vs. control subjects (patients with neither BA nor EAC). Patients with EAC, to the left of the vertical line, were diagnosed correctly in every case, as were all control subjects patients without any lesion (to the right of the vertical line). Bottom left and top right, likelihood of being a control subject; top left and bottom right, likelihood of being an EAC patient.

[0045] FIG. 3 shows over-expressed genes designated by rightward-extending bars; those that are under-expressed protrude to the left. Left centroid, NE specimens from subjects with EAC; right centroid, NE from patients without EAC. This plot demonstrates that genes under-expressed in non-cancer patients are over-expressed in EAC patients, and vice versa. SNCP threshold=2.7.

[0046] The sequence listing includes the National Center for Biotechnology Information--Entrez Nucleotide database sequences for each of the genes set forth in Table 3.

DETAILED DESCRIPTION

[0047] The present invention is directed to methods for diagnosing esophageal cancer or predisposition to esophageal cancer in a subject based on gene expression patterns. Interestingly, the gene expression patterns of cancer patients and pre-cancer patients differ from normal even in the esophageal epithelial cells which appear morphologically normal. Thus selection of particular locations for biopsy is not necessary. Even if a lesion is not detected visually with an endoscope, an abnormality or a predisposition can be detected.

[0048] Much of the analysis of the present methods can be automated and calculated by computer. Identifying presence or predisposition to esophageal cancer can be accomplished for example by recording a result in a patient's chart, on a computer print out, delivered via telephone or email, whether by machine or human.

[0049] Diagnosing Cancer--Different Subjects

[0050] In particular, the present invention is directed to methods for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a biological sample from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0051] In these embodiment, the skilled artisan will understand that the methods can be used to diagnose cancer by comparing the gene expression pattern of a biological sample from a subject, such as a patient suspected of having a cancer, with the gene expression pattern from a second subject previously screened and determined not to have the particular cancer for which the first subject is being screened. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The comparison may be conducted using shrunken nearest centroid predictors analysis.

[0052] While the present invention includes methods of diagnosis where the gene expression patterns are determined from biological samples from the same source material from two different subjects, the present invention includes methods of diagnosis where the biological samples may be from different source materials from the different subjects.

[0053] The methods may employ additional steps to confirm the diagnosis of cancer or predisposition, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.

[0054] The present invention is thus directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0055] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0056] The present invention is further directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0057] Comparisons to other subjects can be also be done by ascertaining expression values in populations of relevant individuals and determining the range of values of expression that occur in those populations. Thus, after such data has been collected and validated, absolute values can be determined in subjects and the absolute values can be compared to the data collected for relevant populations.

[0058] Diagnosing Cancer--Same Subject

[0059] In a variation on the methods of the present invention discussed above, the present invention is also directed to methods for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0060] In these embodiments, the skilled artisan will understand that the methods can be used to diagnose cancer by comparing the gene expression pattern of two different biological samples obtained from the subject. For example, and as discussed further herein, cancer may be diagnosed in a subject that exhibits no symptoms of disease by comparing gene expression patterns in biological samples obtained from different regions of the same tissue or obtained from different regions of the body, or by comparing gene expression patterns in biological samples obtained from different source material from the same subject. While the two samples may be selected from regions of the same tissue that have no gross morphological differences, the two sample may also be selected from regions of the same tissue that exhibit some morphological differences. For example, morphological differences may include swelling, differences in color, such as redness, differences in surface architecture, differences in mucosal layers, and differences in moisture content.

[0061] The comparison may be conducted using shrunken nearest centroid predictors analysis. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The method may include additional steps to confirm the diagnosis of cancer, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.

[0062] Accordingly, the present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0063] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0064] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0065] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained are separated by a distance of at least 3 cm in said subject.

[0066] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.

[0067] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.

[0068] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained have a grossly different appearance.

[0069] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.

[0070] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.

[0071] Detecting Differential Gene Expression

[0072] The present invention also includes methods for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0073] These methods can be used to screen for differences in gene expression patterns between individuals, differences in gene expression patterns between different biological samples obtained from the same subject, or differences in gene expression patterns over time found in biological samples obtained from the same source material within a subject. These methods can be used to identify one specific gene, or more than one gene. The comparison may be conducted using shrunken nearest centroid predictors analysis. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The method may include additional steps to confirm the diagnosis of cancer, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.

[0074] Where there is a difference in the two gene expression patterns, the methods may include additional steps to confirm that the gene or genes identified can be used as cancer markers. Further steps may also be included to identify the gene or genes found using these methods.

[0075] Similarly, where there is a difference in the two gene expression patterns, a prediction can be made that the subject will develop cancer or that the subject has an increased risk for developing cancer. As used herein, an increased risk for developing cancer means that the subject has a risk for developing a particular cancer that is greater that the risk for developing that particular cancer in the population as a whole. As used herein, the population as a whole may mean individuals sharing the same sex, age range, physical health, medical condition, or geographic location. For example, the population as a whole may mean adult humans residing in the United States.

[0076] Accordingly, the present invention is directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0077] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0078] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.

[0079] Diagnosing Cancer Using Markers

[0080] In a further variation, the present invention is directed to methods for diagnosing cancer in a subject comprising (a) determining an expression pattern of one or more genes in a biological sample from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in a biological sample from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0081] As discussed herein, differential gene expression can be used to diagnosis cancer in a subject by comparing the expression level of one or more genes in a subject, such as a patient suspected of having cancer, with the expression level of one or more genes from a subject that is known not to have cancer. One or more specific genes may be used that have previously been shown to be correlated with a specific cancer.

[0082] These methods can be used to screen for differences in gene expression patterns between individuals, differences in gene expression patterns between different biological samples obtained from the same subject, or differences in gene expression patterns over time found in biological samples obtained from the same source material within a subject.

[0083] The comparison may be conducted using shrunken nearest centroid predictors analysis. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The method may include additional steps to confirm the diagnosis of cancer, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.

[0084] Similarly, where there is a difference in the two gene expression patterns, a prediction can be made that the subject will develop cancer or that the subject has an increased risk for developing cancer. As used herein, an increased risk for developing cancer means that the subject has a risk for developing a particular cancer that is greater that the risk for developing that particular cancer in the population as a whole. As used herein, the population as a whole may mean individuals sharing the same sex, age range, physical health, medical condition, or geographic location. For example, the population as a whole may mean adult humans residing in the United States.

[0085] The one or more genes used in the determination of an expression pattern may be any of those set forth in Table 1, or the subset shown in Table 2. The National Center for Biotechnology Information--Entrez Nucleotide sequences for each of the genes set forth in Table 3 are included in the Sequence Listing. Other genes in Tables 1 and 2 can be used with the sequence data that are present in the NCBI database, which are expressly incorporated herein with the sequences present as of the priority date of this application.

[0086] Accordingly, the present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining an expression pattern of one or more genes in a biological sample from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.

[0087] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0088] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.

[0089] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.

[0090] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.

[0091] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.

[0092] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.

[0093] In an embodiment, an expression pattern of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or all 23 of the genes of Table 1 is determined. In a further embodiment, the expression pattern of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 or all 17 of the genes of Table 2 is determined. The number to be determined is that number which gives sufficient sensitivity and specificity. The number to be determined is that number which gives an acceptable number of false positives and acceptable number of false negatives.

[0094] Diagnoses and prognoses determined using the subject methods can be confirmed and combined with other means of assessment. These include physical findings, radiological findings, pH determinations, endoscopic determinations, pathological determinations, patient reports of symptoms, and the like.

[0095] In relevant embodiments of the present invention the skilled artisan will understand the identity of the particular cancer being diagnosed need not be limited, and may include adenocarcinoma, squamous cell cancer, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus. In some embodiments the cancer is esophageal cancer, adenocarcinoma or squamous cell cancer.

[0096] Similarly, the identity of the subject to which the methods of the present invention are applied is not limited. However, in some embodiments, the subject is a bird or a mammal. For example, the subject may be a dog, cat, horse, simian or human.

[0097] The gene expression pattern may be determined by any method know in the art, although the pattern is typically determined using a polynucleotide microarray as described herein. In some embodiments, the gene expression patterns are analyzed and/or compared by shrunken nearest centroid predictors (SNCP). Detailed means for such analysis and/or comparison using shrunken nearest centroid predictors is provided in herein, and is based on an adaptation of classical nearest centroids prediction analysis, tailored specifically to microarray data (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)). Permutation analysis, as also described herein, may be used in conjunction with the SNCP analysis.

[0098] For each new specimen, we calculate the square distance to the normal centroid and the square distance to the cancer centroid. The centroid (normal or cancer) to which the specimen is closest, in squared distance, defines the predicted class for that new sample.

[0099] As an example, if we have a new specimen, we would determine the level of the 11 genes in the centroid. Let these levels be G1, G2, . . . G11. We already have the centroid values for each of the 11 genes for the normal class and for the cancer class. Let these be CN1, CN2, . . . CN11 and CC1, CC2, . . . , CC11. The score of the new specimen for the normal class would be calculated as:

Score for normal SN=(G1-CN1) 2+(G2-CN2) 2+ . . . +(G11-CN11) 2

and, the score of the new specimen for the cancer class would be calculated as:

Score for cancer SC=(G1-CC1) 2+(G2-CC2) 2+ . . . +(G11-CC11) 2.

If SN>SC, then the specimen is classified as cancer. If SN<SC then the specimen is calassified as normal.

[0100] The biological sample tested in the methods of the present invention may be any biological sample from which polynucleotides or proteins may be obtained. In particular, the biological sample is one that contains cells or cellular material, proteins or polynucleotides. For example, the biological sample may be a biological fluid, such as lymph, serum, plasma, whole blood, urine, synovial fluid and spinal fluid; a cell type, such as bone marrow, immune, keratinocytes, epithelial cells, hepatocytes, renal cells, breast tissue cells, bladder cells, prostate cells, pancreatic cells; a tissue, such as skin, muscle, liver, kidney, pancreas, heart, lung, breast, male or reproductive organs, lymphatic system, nervous system, digestive system, bladder, colon, connective tissue, where the tissue may be normal, cancerous or wounded tissue; or biopsies. In some embodiments, the biological sample is mucosa, esophageal epithelium, or squamous esophageal epithelium. In one embodiment, the biological sample is tissue diagnosed as Barrett's esophagus. Samples can be collected by endoscopy, or other collecting means, including surgical spatulas, sponges, balloons, esophageal brush-capsule. See Cancer Cytopathol. 2000; 90:10-6; see also Cancer. 1997; 80(11):2047-59. Some of these may be used in conjunction with endoscopy and some may be used independently.

[0101] As indicated further throughout this application, one of the advantages of the methods disclosed herein pertains to the ability to analyze morphologically-normal tissue, and to thereby diagnose cancer early based on the gene pattern. This is important for at least two reasons: (1) the assumption that disease progression is minimal in morphologically-normal tissue and treatment can thus begin prior to major damage to biological tissues and systems; and (2) diagnosticians do not need to first locate morphologically-abnormal tissue and analyze such tissue for gene expression. With regard to esophageal adenocarcinoma and squamous cell cancer, cancerous lesions can be especially difficult to find. The methods herein thus allow for an accurate analysis and diagnosis based on pre-cancerous tissue or tissue that may be some distance from cancerous tissue.

[0102] Therefore, in some embodiments, the biological sample has a morphologically-normal appearance. In some embodiments, the biological sample is morphologically-normal appearing esophageal epithelium, or morphologically-normal appearing squamous esophageal epithelium.

[0103] The methods of the present invention may be practiced using polynucleotides or proteins obtained directly from biological samples, or using polynucleotides or proteins produced from or amplified from polynucleotides obtained directly from biological samples. For example, cDNA may be isolated from a biological sample, and then PCR conducted to amplify the cDNA to obtain a sufficient amount of a polynucleotide for use in the methods. mRNA may also be amplified as detailed below. Protein can be made using in vitro transcription/translation systems, which are well known in the art.

[0104] While the location from which the biological samples are taken is not critical, they should be at a sufficient distance apart to be separate samples. When there is a morphological difference, the samples can be taken from those regions of the source material, such as a tissue, that have different morphological appearances. In general, the leading edges of these samples should be at a distance of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or 30 cm apart. When there is no morphological difference, the leading edges of the samples should be at a distance apart of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or 30 cm, such as when taken from the same tissue. The sample may also be collected from different regions of the body, such as where the sample is a bodily fluid. In an embodiment, the locations from which the first and second samples are obtained from a biological tissue are separated by a distance of at least 3 cm.

[0105] The gene expression patterns that are determined and compared in the methods of the present invention can be quantitative and/or qualitative patterns. For example, differential gene expression patterns can be based on the level at which the one or more genes are being expressed, and/or based on whether the one or more genes are being expressed at all. When the level of gene expression is determined and compared, a statistically significant difference may be used to demonstrate a difference in gene expression.

[0106] Solid supports according to the invention can be any substrate to which antibodies or oligonucleotide probes can be attached, either directly or indirectly through linker groups. Typically each species of antibody or species of oligonucleotide probe is located as a discrete geographic location on the substrate. Alternatively, each species of antibody or probe can be otherwise distinguishable, for example, based on a detectable label or other physical property. In one embodiment each species can be bound to a bead and the beads can be separated on the basis of size, magnetic characteristic, fluorescence spectrum, etc. Beads can also be used in discrete geographical locations, such as in wells of a microtiter plate. Solid supports are typically inert materials, such as glass, plastic, polymer, etc. They may be fabricated into sheets, strips plates, multi-well plates, beads, fabrics, etc. Solid supports typically have probes/antibodies for only a small subset of the entire genome or proteome. Aside from controls and standards, only probes or antibodies for genes which are found to be relevant to esophageal disease need be present. Thus the probes or antibodies for assessing expression of the genes in Table 3 may comprise at least 5%, at least 10%, at least 25%, at least 50%, at least 75%, at least 90% of the probes or antibodies on the solid support. Expression of any number of relevant genes can be tested including 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 of the genes listed in Table 3.

EXAMPLES

[0107] The following embodiments are merely exemplary and are not intended to be limiting.

[0108] Patients and Tissues

[0109] Six patients with BA alone, nine with BA and concomitant EAC, and eight with neither BA nor EAC were included in this study. The eight patients without BA or EAC had had endoscopy for unrelated indications, such as peptic ulcer disease, but had undergone endoscopic biopsy of the gastroesophageal junction that was histologically normal. In all cases, biopsies from grossly normal-appearing squamous esophageal epithelium at least 7 cm proximal to the upper limit of the Barrett's mucosa were included in the study. None of the patients with BA alone had concomitant dysplasia. Fresh NE (normal esophagus) biopsy specimens were immediately frozen and stored in liquid nitrogen until further use. Matching morphologic controls were obtained from the same sites as the research specimens and were examined by hematoxylin and eosin staining by an expert gastrointestinal pathologist at the University of Maryland. Informed consent was obtained from all patients under an institutionally approved research protocol.

[0110] Location of the Normal Squamous Esophageal Biopsies

[0111] The normal squamous esophagus (NE) areas biopsied were grossly normal, without any endoscopic evidence of esophagitis or reflux changes. In patients with obvious mass lesions in their esophagus, biopsies were obtained at least 7 cm proximal to these lesions. Similarly, biopsies from BA patients were performed at least 7 cm away from the area that showed endoscopic evidence of Barrett's esophagus. In patients lacking BA or EAC, biopsies were performed from areas that did not show any gross endoscopic abnormalities. Finally, all NE specimens were analyzed histologically, and there was no evidence of any metaplasia or other changes found in any of these NE samples.

[0112] cDNA Microarray Production and Hybridization

[0113] Detailed protocols for glass slide coating, cDNA clone preparation and verification, microarray printing, post-printing slide processing, RNA extraction, RNA amplification, labeling and hybridization were performed as previously described (Selaru et al., Oncogene 21(3):475-8 (2002); Zou et al., Oncogene 21(31):4855-62 (2002); Xu et al., Cancer Res 62(12):3493-7 (2002)).

[0114] RNA Extraction, Amplification, and Labeling of the aRNA Probe

[0115] Total RNA (3-20 μg) was extracted from freshly frozen tissue using an RNeasy kit (Quiagen, Valencia, Calif.) and amplified using the AmpliScript T7-flash transcription kit (Epicentre, Madison, Wis.). Labeling was performed on 6 μg of aRNA by incorporating Cy3- or Cy5-labeled dCTP using random primers and Superscript reverse transcriptase (Xu et al., Cancer Res 62(12):3493-7 (2002)). The resulting probes were purified with a Microcon microcentrifuge filter device and recovered in a volume of 25 μl. The reference probe was prepared from an equimolar mixture containing aRNAs from eight human malignant cell lines, as described previously. Microarray preparation was performed as described (Selaru et al., Oncogene 21(3):475-8 (2002); Mori et al., Cancer Res 63(15):4577-82 (2003); Xu et al., Cancer Res 62(12):3493-7 (2002)).

[0116] Microarray Normalization

[0117] An algorithm for normalizing microarray data was adapted that improves its accuracy and dynamic range (Yang et al., Nucleic Acids Res 30(4):e15 (2002)). Both within-slide and inter-slide normalization were accomplished. In this fashion, local distortions in signal and background intensity within different regions of a slide, as well as overall differences in hybridization or labeling efficiencies between slides, were overcome. It was determined that the within-slide normalization performed optimally when 4 blocks were used as the normalization unit (each block being produced by a different microarray pin). It was assumed that each group of 4 blocks was equivalent in average signal intensity and range to the next group of 4 blocks on the array. Thus, 8 normalization units per slide were utilized. This assumption was based on an optimization strategy in which groups of 1, 2, 4, 8, and 16 blocks were tested as the normalization unit, which showed that the 4-block unit performed with the least inaccuracy when a random number generator was used to produce the 8,064 values on a microarray slide (data not shown). Thus, this normalization method (Yang et al., Nucleic Acids Res 30(4):e15 (2002)) consisted of three steps: intensity-dependent normalization within each slide, scale normalization within each slide, and inter-slide normalization.

[0118] Shrunken Nearest Centroid Predictor (SNCP) Model

[0119] The Shrunken nearest centroid predictor (SNCP) model was analyzed to determine if it could be used to identify gene expression patterns or individual genes as biomarkers to distinguish between the normal esophagus of patients with, vs. without, accompanying EAC. SNCPs discovered both broad patterns and individual genes that were highly accurate in their ability to identify whether or not a patient had accompanying remotely located cancer.

[0120] The SNCP method is an adaptation of classical nearest centroids prediction analysis, tailored specifically to microarray data (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)). Each centroid is comprised of weighted averages of genes (elements) on the microarray for a particular diagnostic category, or "class." Thus, the centroids each contain 8,064 elements, since there are 8,064 genes on each microarray. Gene weighting is directly proportional to the raw average expression value, but inversely proportional to the standard deviation (i.e., the variability) of expression value within a given class. Centroids are then shrunken by adjusting the threshold value, which removes genes with lower weighted averages (thus yielding a smaller set of relevant genes). Gene expression variations below a certain threshold value are made equal to zero and ignored. Thus, shrinkage consists of moving the centroid towards zero by threshold, and setting it equal to zero when it drops completely (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)).

[0121] The choice of Δ (amount of shrinkage) is dependent upon two variables: 1. prediction error minimization; 2. the number of genes that are left in the model. More specifically, when all the genes on the microarrays are used, the prediction error is significant. During the process of data fitting, the SNCP model excludes outliers, i.e., genes that are not usable for the prediction. It is, however, possible to achieve the minimum prediction error for a range of Δs. In this particular case, the model can predict the predefined categories using a variable number of genes. Under these conditions, the smaller the Δ, the higher the number of genes left in the model, and vice versa.

[0122] Internal validation of results is performed using cross-validation. The value of K (fold cross-validation number) is set by default at 10; therefore, a 10-fold cross-validation was performed. In this 10-fold cross-validation, the specimens were randomly assigned to 10 groups. Nine of the ten groups were used for training, while the prediction is made on the 10th group. This procedure is repeated 10 times. For example, in the Normal-Normal versus Normal-Cancer comparison, training is done on 16 specimens, and then the model predicts the 17th specimen.

[0123] Permutation Analysis

[0124] SNCPs are mathematical models that learn by example. In other words, SNCPs identify a centroid for every group in the comparison. New specimens are classified by calculating the distance between the new specimen and each of the centroids. The specimen is classified into the class whose centroid is closest to the specimen. Ideally, the SNCPs should be tested on a test set, composed of specimens that were not used during training. This, however, may prove difficult when a small number of specimens are available for the study. One method to circumvent the need for a test set, while ensuring statistical significance is permutation analysis.

[0125] Permutation analysis is a statistical technique used to calculate the chances of obtaining classification results purely by chance. The analysis consists of randomly permuting the specimen labels and constructing classifiers (SNCPs) to categorize the specimens. In the current study, permutation resulted in randomly assigning specimens to one of two categories: N-N (NE specimens from patients lacking EAC or BA) and BA-CA (NE specimens from patients with Barrett's esophagus and concomitant esophageal adenocarcinoma). The SNCP model with the lowest prediction error was subsequently chosen. This procedure was repeated 100 times. In all 100 random permutations, SNCPs were unable to learn the two categories correctly (with an error=0). The mean group error for the 100 permutations was 0.36. This finding demonstrated that the possibility that the SNCP learned the two categories (N-N and BA-CA) correctly by chance alone was less than 1 in 100.

Experiment 1

[0126] In the initial application of the methodology described above, NE biopsy specimens of both subjects with completely normal esophagi (Normal-Normal, or N-N) and patients with BA but without EAC (BA-alone) were considered together as a single group, which was compared to NE biopsy specimens of patients with Barrett's esophagus with concomitant EAC (BA-CA). Centroids were 100% accurate in predicting which subject or patient was in which group in this comparison, as shown in FIG. 1. A list containing 195 genes was generated, based on their differential expression between normal esophagi from normal patients and normal esophagi from patients with EAC. Table 3 contains a few of these genes, with already known links to cancer.

Experiment 2

[0127] In an effort to further narrow the number of variables involved in the difference between NE biopsies from patients with concomitant EAC vs. subjects without EAC, NE from subjects without esophageal disease vs. NE from EAC patients only (i.e., excluding noncancer subjects with BA) was also compared. This comparison revealed the accuracy of centroids in distinguishing these two subgroups, as shown in FIG. 2.

[0128] Experiment 3

[0129] The SNCPs also generated visual displays of centroids, showing which genes were over-expressed and which were under-expressed in NE from patients with vs. those without accompanying EAC. The genes in these displays are arrayed in order of decreasing differential expression, with the most differentially expressed genes at the top and the least differentially expressed genes at the bottom. One such typical centroid is displayed in FIG. 3.

[0130] Genes represented in a shrunken centroid derived by comparing NE tissues between cancer and non-cancer patients are shown in Table 3. Among them are many genes with previous links to esophageal cancer or to cancers in general: histone biomarkers, gravin, HLA-DRA, keratin 8 (KRT8), glutathione peroxidase 2 (GPX2), the mitotic checkpoint protein kinase BUB1B, the progestin-induced protein DD5 and transglutaminase 3.

TABLE-US-00003 TABLE 3 Selected genes identified by comparison of NE from patients with EAC (N with T) vs. without EAC (N without T). N with N without Gene ID Gene Name T T AB003476 gravin -0.7322 0.4707 NM_005319 H1 histone family, member 2 0.5907 -0.3797 (H1F2) XM_004416 H2A histone family, member L 0.5384 -0.3461 (H2AFL) NM_003519 H2B histone family, member C 0.5062 -0.3254 (H2BFC) NM_002273 keratin 8 (KRT8) 0.3834 -0.2465 NM_015902 progestin induced protein -0.3112 0.2001 (DD5) NM_003516 H2A histone family, member O 0.2322 -0.1493 (H2AFO) XM_009572 transglutaminase 3 (TGM3) -0.2078 0.1336 major histocompatibility complex, class II, DR NM_019111 alpha (HLA-DRA) 0.1695 -0.109 AF107297 mitotic checkpoint protein -0.0626 0.0402 kinase BUB1B (BUB1B) NM_002083 glutathione peroxidase 2 0.0614 -0.0395 (gastrointestinal) (GPX2) Threshold value set at 2.7; N with T: gene score in the group of patients with esophageal adenocarcinoma; N without T: gene score in the group of patients without esophageal adenocarcinoma. Gene identifiers and gene names are shown in the two leftmost columns.

[0131] Previous studies have compared gene expression patterns among normal, metaplastic, and cancerous esophageal epithelia (Selaru et al., Oncogene 21(3):475-8 (2002); Xu et al., Cancer Res 62(12):3493-7 (2002), Guillem et al., Int J Cancer 88(6):856-61 (2000); Lu et al., Int J Cancer 91(3):288-94 (2001)). Moreover, a recent study by Wang, S et al., suggests that gene expression patterns in Barrett's esophagus are significantly closer to gene expression patterns in esophageal adenocarcinoma than to expression patterns in normal esophagus. This finding alarmingly implies that Barrett's esophagus is biologically closer to cancer than to normal esophagus (Wang et al., Oncogene 25(23):3346-56 (2006)). However, these studies have consisted of direct comparisons of these different types of esophageal epithelia to each other. In the current study, a different approach was undertaken: i.e., a comparison of the normal esophageal epithelia from patients at differing stages of esophageal neoplastic progression. This study found unique molecular signatures in normal esophageal epithelium that reflected concomitant neoplasia elsewhere in the esophagus.

[0132] The potential biologic ramifications of these results are far-reaching. The field effect found near esophageal tumors in surrounding normal epithelium has been well-described (Eads et al., Cancer Res 61(8):3410-8 (2001); Eads et al., Cancer Res 60(18):5021-6 (2000)). A recent study by Brabender et al. (Cancer Epidemiol Biomarkers Prey 14(9):2113-7 (2005)) identified a field effect by using a gene expression panel. In the current study, biopsies of normal esophagus were obtained at least 7 cm away from the tumor or Barrett's esophagus. The current findings suggest that esophageal cancer exerts a greater influence on the normal esophageal epithelium than previously known or suspected. While molecular alterations in histologically normal squamous esophageal epithelium have previously been described adjacent to cancers, the current findings suggest that alterations in gene expression and gene expression pattern accompanying cancer can affect large portions of the normal squamous esophagus. It was postulated that the development of esophageal adenocarcinoma is accompanied by widespread molecular phenotypic alterations that involve the entire normal squamous esophageal epithelium.

[0133] The present SNCP-based approach offers a number of advantages over other analytic techniques. These include the ability to differentiate among multiple specimen groups; the potential for rapid translation to the clinical setting; a low likelihood of over-fitting, yielding a low probability of erroneous diagnoses in new, independent datasets; and the capacity to yield a reduced number of diagnostic genes, which can themselves be developed as individual biomarkers as well as the basis for further molecular genetic studies (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)).

[0134] In the current study, genes positioned the highest in centroids discriminating normal tissues from non-cancer vs. cancer patients were both interesting and relevant. For example, among the most highly ranked genes were members of the histone families (Table 3).

[0135] As single-gene predictors, histone biomarkers were accurate in distinguishing between accompanying cancer and its absence. Histones are basic nuclear proteins responsible for the nucleosome structure of chromosomal fibers in eukaryotes. Apart from promoter hypermethylation, modification of histone proteins is the second major component of epigenetic transcriptional control. DNA methylation and histone acetylation are integrally linked. Methylation is catalyzed by a family of DNA methyltransferases. DNA methyltransferases recruit histone deacetylases, leading to histone deacetylation and transcriptional repression. Methylated DNA is also recognized by a family of methylated DNA-binding proteins, which recruit histone deacetylases and ATP-dependent chromatin remodeling proteins, resulting in a tightly condensed chromatin structure and gene inactivation. Additional links between the "histone code" and the "cytosine methylation code" are increasingly evident (Johnstone et al., Nat Rev Drug Discov 1(4):287-99 (2002); Kouraklis et al., Curr Med Chem Anti-Canc Agents 2(4):477-84 (2002); Marks et al., Nat Rev Cancer 1(3):194-202 (2001)).

[0136] In addition, alterations of proteins in the histone acetyltransferase family (e.g., CREB-binding protein and p300) are associated with cancers of the breast, colon, liver, and hematopoietic system. Of particular relevance to the current findings, histone H4 is hyperacetylated in early stages of esophageal cancer cell invasion, and thereafter changes to a hypoacetylated state according to the degree of cancer progression (Toh et al., Oncol Rep 10(2):333-8 (2003)). These results suggest that a dynamic equilibrium between histone acetylase and deacetylase activities is disrupted in esophageal carcinogenesis, implying that an interaction may exist between hyperacetylation of histone H4 and histone deacetylase 1 expression (Toh et al., Oncol Rep 10(2):333-8 (2003)).

[0137] Similarly, by applying differential display to esophageal tumor and matched normal esophageal samples, histone H3.3 was identified among 49 cDNA ddPCR clones from esophageal cancers (ECs) (Graber et al., Ann Surg Oncol 3(2):192-7 (1996)). Histone H3.3 was overexpressed in 4/6 ECs, but not in paired normal mucosa. Only 5/13 normal human cell lines from various organs, but 11/12 human cancer cell lines (including 9 of 9 adenocarcinoma lines) overexpressed H3.3 (Graber et al., Ann Surg Oncol 3(2):192-7 (1996)). Histones H3 and H4 were deacetylated in gastric cancer cell lines showing aberrant methylation of CHFR, a mitotic checkpoint gene, suggesting a role for histone deacetylation in methylation-dependent gene silencing (Satoh et al., Cancer Res 63(24):8606-13 (2003)).

[0138] Another gene identified in the current study was HLA-DRA. Major histocompatibility complex (MHC) molecules are of central importance in regulating the immune response against tumors. Loss of expression of HLA class II molecules on tumor cells affects the onset and modulation of the immune response through lack of activation of CD4+ T lymphocytes. In part, loss of expression is caused by mutations as shown for large B-cell lymphoma (Jordanova et al., Immunogenetics 55(4):203-9 (2003)). A recent study found downregulation of HLA-DRA in invasive cancers compared to dysplastic cervical lesions (Chil et al., Acta Obstet Gynecol Scand 82(12):1146-52 (2003)).

[0139] A strong predictive value of keratin 8 (KRT8) was also observed. KRT8 belongs to the intermediate filament family and associates with keratin 18 to form a heterotetramer of two type i and two type ii keratins. Its phosphorylation on serine residues is enhanced during EGF stimulation and mitosis. Dysregulation of keratin 8 is associated with esophageal carcinogenesis (Boch et al., Gastroenterology 112(3):760-5 (1997); Glickman et al., Am J Surg Pathol 25(5):569-78 (2001); Glickman et al., Am J Surg Pathol 25(1):87-94 (2001); Salo et al., Ann Med 28(4):305-9 (1996)).

[0140] Additional genes with known relevance to human cancer identified by this SNCP model included glutathione peroxidase 2 (GPX2), the mitotic checkpoint protein kinase BUB1B, and the progestin-induced protein DD5. As expected, BUB1B was expressed at high levels in the normal esophageal tissues of patients without cancer and underexpressed in patients with cancer. BUB1B is a component of the mitotic checkpoint that delays anaphase until all chromosomes are properly attached to the mitotic spindle. In BRCA2-deficient murine cells, BUB1 mutants potentiate growth and cellular transformation (Davenport et al., Genomics 55(1):113-7 (1999)). In addition, mutations in human BUB1B have demonstrated a dominant negative effect by disrupting the mitotic checkpoint when transfected into euploid colon cancer cell lines (Davenport et al., Genomics 55(1):113-7 (1999)). Thus, BUB1B is a candidate tumor suppressor gene in the esophagus whose downregulation in normal esophageal tissue is associated with cancer development.

[0141] Transglutaminase 3, which was under-expressed in the normal tissue of tumor patients in this study, was recently found to be down-regulated in esophageal squamous cell carcinoma and head and neck squamous cell carcinoma by cDNA microarray studies comparing cancer and matching normal tissue (Luo et al., Oncogene 23(6):1291-9 (2004)).

[0142] In conclusion, the current study diagnosed patients with remote esophageal neoplasia based on biopsies of their remote normal epithelium alone, and provided a minimal list of genes necessary to do so. This proof-of-principle study establishes a theoretical basis to identify cancers in other organs by studying gene expression patterns or other molecular signatures in their matching normal epithelia. In addition, by shrinking the number of genes needed to arrive at a correct diagnosis, the current work showcases an approach to identify smaller numbers of genes worthy of further research from microarray data, both as biomarkers and for biologic or functional studies.

[0143] While the foregoing specification teaches the principles of the present invention, with examples provided for the purpose of illustration, it will be appreciated by one skilled in the art from reading this disclosure that various changes in form and detail can be made without departing from the true scope of the invention.

[0144] Each of the publications recited herein, including journal articles, books, manuals abstracts, posters, patents, and published patent applications, are hereby incorporated herein in their entireties.

Sequence CWU 1

1

2216287DNAHomo sapiens 1ggcagctccg agggcacctc cggttctccc ccatcctccg ggagtgtctg ggcgctcagt 60ccgctctgat cccgccgaaa ccacctgcgg ttggcaggca ggagactagg cgtctgccgg 120ggagggcagg gacccgctaa gctgatctcc tgtacagtag tgctacttaa aatatgctgg 180ggaccatcac catcacagtt ggacagagag actctgaaga tgtgagcaaa agagactccg 240ataaagagat ggctactaag tcagcggttg ttcacgacat cacagatgat gggcaggagg 300agacacccga aataatcgaa cagattcctt cttcagaaag caatttagaa gagctaacac 360aacccactga gtcccaggct aatgatattg gatttaagaa ggtgtttaag tttgttggct 420ttaaattcac tgtgaaaaag gataagacag agaagcctga cactgtccag ctactcactg 480tgaagaaaga tgaaggggag ggagcagcag gggctggcga ccacaaggac cccagccttg 540gggctggaga agcagcatcc aaagaaagcg aacccaaaca atctacagag aaacccgaag 600agaccctgaa gcgtgagcaa agccacgcag aaatttctcc cccagccgaa tctggccaag 660cagtggagga atgcaaagag gaaggagaag agaaacaaga aaaagaacct agcaagtctg 720cagaatctcc gactagtccc gtgaccagtg aaacaggatc aaccttcaaa aaattcttca 780ctcaaggttg ggccggctgg cgcaaaaaga ccagtttcag gaagccgaag gaggatgaag 840tggaagcttc agagaagaaa aaggaacaag agccagaaaa agtagacaca gaagaagacg 900gaaaggcaga ggttgcctcc gagaaactga ccgcctccga gcaagcccac ccacaggagc 960cggcagaaag tgcccacgag ccccggttat cagctgaata tgagaaagtt gagctgccct 1020cagaggagca agtcagtggc tcgcagggac cttctgaaga gaaacctgct ccgttggcga 1080cagaagtgtt tgatgagaaa atagaagtcc accaagaaga ggttgtggcc gaagtccacg 1140tcagcaccgt ggaggagaga accgaagagc agaaaacgga ggtggaagaa acagcagggt 1200ctgtgccagc tgaagaattg gttgaaatgg atgcagaacc tcaggaagct gaacctgcca 1260aggagctggt gaagctcaaa gaaacgtgtg tttccggaga ggaccctaca cagggagctg 1320acctcagtcc tgatgagaag gtgctgtcca aaccccccga aggcgttgtg agtgaggtgg 1380aaatgctgtc atcacaggag agaatgaagg tgcagggaag tccactaaag aagcttttta 1440ccagcactgg cttaaaaaag ctttctggaa agaaacagaa agggaaaaga ggaggaggag 1500acgaggaatc aggggagcac actcaggttc cagccgattc tccggacagc caggaggagc 1560aaaagggcga gagctctgcc tcatcccctg aggagcccga ggagatcacg tgtctggaaa 1620agggcttagc cgaggtgcag caggatgggg aagctgaaga aggagctact tccgatggag 1680agaaaaaaag agaaggtgtc actccctggg catcattcaa aaagatggtg acgcccaaga 1740agcgtgttag acggccttcg gaaagtgata aagaagatga gctggacaag gtcaagagcg 1800ctaccttgtc ttccaccgag agcacagcct ctgaaatgca agaagaaatg aaagggagcg 1860tggaagagcc aaagccggaa gaaccaaagc gcaaggtgga tacctcagta tcttgggaag 1920ctttaatttg tgtgggatca tccaagaaaa gagcaaggag agggtcctct tctgatgagg 1980aagggggacc aaaagcaatg ggaggagacc accagaaagc tgatgaggcc ggaaaagaca 2040aagagacggg gacagacggg atccttgctg gttcccaaga acatgatcca gggcagggaa 2100gttcctcccc ggagcaagct ggaagcccta ccgaagggga gggcgtttcc acctgggagt 2160catttaaaag gttagtcacg ccaagaaaaa aatcaaagtc caagctggaa gagaaaagcg 2220aagactccat agctgggtct ggtgtagaac attccactcc agacactgaa cccggtaaag 2280aagaatcctg ggtctcaatc aagaagttta ttcctggacg aaggaagaaa aggccagatg 2340ggaaacaaga acaagcccct gttgaagacg cagggccaac aggggccaac gaagatgact 2400ctgatgtccc ggccgtggtc cctctgtctg agtatgatgc tgtagaaagg gagaaaatgg 2460aggcacagca agcccaaaaa agcgcagagc agcccgagca gaaggcagcc actgaggtgt 2520ccaaggagct cagcgagagt caggttcata tgatggcagc agctgtcgct gacgggacga 2580gggcagctac cattattgaa gaaaggtctc cttcttggat atctgcttca gtgacagaac 2640ctcttgaaca agtagaagct gaagccgcac tgttaactga ggaggtattg gaaagagaag 2700taattgcaga agaagaaccc cccacggtta ctgaacctct gccagagaac agagaggccc 2760ggggcgacac ggtcgttagt gaggcggaat tgacccccga agctgtgaca gctgcagaaa 2820ctgcagggcc attgggtgcc gaagaaggaa ccgaagcatc tgctgctgaa gagaccacag 2880aaatggtgtc agcagtctcc cagttaaccg actccccaga caccacagag gaggccactc 2940cggtgcagga ggtggaaggt ggcgtacctg acatagaaga gcaagagagg cggactcaag 3000aggtcctcca ggcagtggca gaaaaagtga aagaggaatc ccagctgcct ggcaccggtg 3060ggccagaaga tgtgcttcag cctgtgcaga gagcagaggc agaaagacca gaagagcagg 3120ctgaagcgtc gggtctgaag aaagagacgg atgtagtgtt gaaagtagat gctcaggagg 3180caaaaactga gccttttaca caagggaagg tggtggggca gaccacccca gaaagctttg 3240aaaaagctcc tcaagtcaca gagagcatag agtccagtga gcttgtaacc acttgtcaag 3300ccgaaacctt agctggggta aaatcacagg agatggtgat ggaacaggct atcccccctg 3360actcggtgga aacccctaca gacagtgaga ctgatggaag cacccccgta gccgactttg 3420acgcaccagg cacaacccag aaagacgaga ttgtggaaat ccatgaggag aatgaggtcg 3480catctggtac ccagtcaggg ggcacagaag cagaggcagt tcctgcacag aaagagaggc 3540ctccagcacc ttccagtttt gtgttccagg aagaaactaa agaacaatca aagatggaag 3600acactctaga gcatacagat aaagaggtgt cagtggaaac tgtatccatt ctgtcaaaga 3660ctgaggggac tcaagaggct gaccagtatg ctgatgagaa aaccaaagac gtaccatttt 3720tcgaaggact tgaggggtct atagacacag gcataacagt cagtcgggaa aaggtcactg 3780aagttgccct taaaggtgaa gggacagaag aagctgaatg taaaaaggat gatgctcttg 3840aactgcagag tcacgctaag tctcctccat cccccgtgga gagagagatg gtagttcaag 3900tcgaaaggga gaaaacagaa gcagagccaa cccatgtgaa tgaagagaag cttgagcacg 3960aaacagctgt taccgtatct gaagaggtca gtaagcagct cctccagaca gtgaatgtgc 4020ccatcataga tggagcaaag gaagtcagca gtttggaagg aagccctcct ccctgcctag 4080gtcaagagga ggcagtatgc accaaaattc aagttcagag ctctgaggca tcattcactc 4140taacagcggc tgcagaggag gaaaaggtct taggagaaac tgccaacatt ttagaaacag 4200gtgaaacgtt ggagcctgca ggtgcacatt tagttctgga agagaaatcc tctgaaaaaa 4260atgaagactt tgccgctcat ccaggggaag atgctgtgcc cacagggccc gactgtcagg 4320caaaatcgac accagtgata gtatctgcta ctaccaagaa aggcttaagt tccgacctgg 4380aaggagagaa aaccacatca ctgaagtgga agtcagatga agtcgatgag caggttgctt 4440gccaggaggt caaagtgagt gtagcaattg aggatttaga gcctgaaaat gggattttgg 4500aacttgagac caaaagcagt aaacttgtcc aaaacatcat ccagacagcc gttgaccagt 4560ttgtacgtac agaagaaaca gccaccgaaa tgttgacgtc tgagttacag acacaagctc 4620acgtgataaa agctgacagc caggacgctg gacaggaaac ggagaaagaa ggagaggaac 4680ctctggcctc tgcacaggat gaaacaccaa ttacttcagc caaagaggag tcagagtcaa 4740ccgcagtggg acaagcacat tctgatattt ccaaagacat gagtgaagcc tcagaaaaga 4800ccatgactgt tgaggtagaa ggttccactg taaatgatca gcagctggaa gaggtcgtcc 4860tcccatctga ggaagaggga ggtggagctg gaacaaagtc tgtgccagaa gatgatggtc 4920atgccttgtt agcagaaaga atagagaagt cactagttga accgaaagaa gatgaaaaag 4980gtgatgatgt tgatgaccct gaaaaccaga actcagccct ggctgatact gatgcctcag 5040gaggcttaac caaagagtcc ccagatacaa atggaccaaa acaaaaagag aaggaggatg 5100cccaggaagt agaattgcag gaaggaaaag tgcacagtga atcagataaa gcgatcacac 5160cccaagcaca ggaggagtta cagaaacaag agagagaatc tgcaaagtca gaacttacag 5220aatcttaaaa catcatgcag ttaaactcat tgtctgtttg gaagaccaga atgtgaagac 5280aagtagtaga agaaaatgaa tgctgctgct gagactgaag accagtattt cagaactttg 5340agaattggag agcaggcaca tcaactgatc tcatttctag agagcccctg acaatcctga 5400ggcttcatca ggagctagag ccatttaaca tttcctcttt ccaagaccaa cctacaattt 5460tcccttgata accatataaa ttctgattta aggtcctaaa ttcttaacct ggaactggag 5520ttggcaatac ctagttctgc ttctgaaact ggagtatcat tctttacata tttatatgta 5580tgttttaagt agtcctcctg tatctattgt atattttttt cttaatgttt aaggaaatgt 5640gcaggatact acatgctttt tgtatcacac agtatatgat ggggcatgtg ccatagtgca 5700ggcttgggga gctttaagcc tcagttatat aacccacgaa aaacagagcc tcctagatgt 5760aacattcctg atcaaggtac aattctttaa aattcactaa tgattgaggt ccatatttag 5820tggtactctg aaattggtca ctttcctatt acacggagtg tgctaaaact aaaaagcatt 5880ttgaaacata cagaatgttc tattgtcatt gggaaatttt tctttctaac ccagtggagg 5940ttagaaagaa gttatattct ggtagcaaat taactttaca tcctttttcc tacttgttat 6000ggttgtttgg accgataagt gtgcttaatc ctgaggcaaa gtagtgaata tgttttatat 6060gttatgaaga aaagaattgt tgtaagtttt tgattctact cttatatgct ggactgcatt 6120cacacatggc atgaaataag tcaggttctt tacaaatggt attttgatag atactggatt 6180gtgtttgtgc catatttgtg ccattctttt aagaacaatg ttgcaacaca ttcatttgga 6240taagttgtga tttgacgact gatttaaata aaatatttgc ttcactt 62872732DNAHomo sapiens 2catcggcgct ttgccacttg tacccgagtt tttgattctc aacatgtccg agactgctcc 60tgccgctccc gctgccgcgc ctcctgcgga gaaggcccct gtaaagaaga aggcggccaa 120aaaggctggg ggtacgcctc gtaaggcgtc tggtcccccg gtgtcagagc tcatcaccaa 180ggctgtggcc gcctctaaag agcgtagcgg agtttctctg gctgctctga aaaaagcgtt 240ggctgccgcc ggctatgatg tggagaaaaa caacagccgt atcaaacttg gtctcaagag 300cctggtgagc aagggcactc tggtgcaaac gaaaggcacc ggtgcttctg gctcctttaa 360actcaacaag aaggcagcct ccggggaagc caagcccaag gttaaaaagg cgggcggaac 420caaacctaag aagccagttg gggcagccaa gaagcccaag aaggcggctg gcggcgcaac 480tccgaagaag agcgctaaga aaacaccgaa gaaagcgaag aagccggccg cggccactgt 540aaccaagaaa gtggctaaga gcccaaagaa ggccaaggtt gcgaagccca agaaagctgc 600caaaagtgct gctaaggctg tgaagcccaa ggccgctaag cccaaggttg tcaagcctaa 660gaaggcggcg cccaagaaga aataggcgaa cgcctacttc taaaacccaa aaggctcttt 720tcagagccac ca 73231657DNAHomo sapiens 3aaagcggcca tgttttacat atttcttgat tttgtttgtt ttctcgtgag cttaggccgc 60tggttttggt gatttttgtc tgattgcaat gtctggacgt ggtaagcaag gaggcaaagc 120tcgcgccaaa gcgaaatccc gctcttctcg cgctggtctc cagttcccgg tgggccgagt 180gcaccgcctg ctccgtaaag gcaactacgc agagcgggtt ggggcaggcg cgccggtgta 240cctggcggcg gtgttagagt acctgaccgc cgagatcctg gagctggccg gcaacgcggc 300tcgcgacaac aagaagactc gcatcatccc gcgccacttg cagctggcca tccgcaacga 360cgaggagctc aacaaactgc taggccgggt gaccattgct cagggcggcg tccttcctaa 420catccaggcc gtgcttctgc ctaagaagac cgagagtcac cacaaggcca agggcaagtg 480atttgacagg tatctgagct cccggaaacg ctatcaaacc caaaggctct tttcagagcc 540cccctaccgt ttcaaaggaa gagctaacct cactgcttgt aggtagaagg aaaaaaggca 600ctaaggttgc aaaagcttct catttcagag agatgccagg atcctaagtg cctgccaaac 660ttaccaattc taaggaataa gtggatggat ggcattactg attcctacat tactgattga 720ttctgcatcc gcaaattgtt ttattaaaaa cattctacat catgtgtggg gagataagga 780ggataaaatg aagagaaaga atattattga ggggaagttc ttctgaatac aaaatgtgtt 840taatttttta aataagtatt acattcacag ggttcaaact atttgaagta aagagattat 900atataaagaa tccatccctc aacttaccca ggtggtcact tttctttttc ttgtgtatct 960gcccagtatt cattcctgct gatatcagtc aataatgaat gatacgtgtt ttcttcactt 1020ttttcattct tgtcaggtag cagactgtgt agacttttct gcacttgccc ttttcataac 1080aatctatctt ggagaacttt ccctatgaga acatacagag cttcctgtac acagttgcat 1140gtactgcatt atgcaaatgc attatatttt atgtaacctg tccactgttg gtaggcactt 1200gagttgtttt agtcttttgc tatcaaacag ttctgggatg attaaccctg atttactgca 1260aaattgaaat tgctctgcta ttctgctgga atggtggtaa gtgaactgaa aattccagtc 1320actcttgggc tagactcaac gttcttaaaa actatgtggc catcaccaaa ttagttattt 1380tgaaccttaa tttcttcacc tctaaaatgg aggtaatact taccttaagt ggctatgaga 1440atgaagatca tgtgtatgaa ttgttggtgc tctaaagaac agcacaaata aaattatttt 1500caaatttaat tttaattgaa ctatgtgtaa tttcttaatt ttgaaataat tttatttgta 1560atgtgcataa tcttatttaa tgtataatgt atacattgta atagaaacag atttcccaaa 1620ttccagcctg gcatgaggta ataaaaggta atgcaaa 16574453DNAHomo sapiens 4ttggttttgc cactattgtt tcattatgcc cgagctggcc aagtctgctc ccgccccgaa 60gaagggctcc aagaaggcgg tgaccaaggc ccagaagaag gatggcaaga agcgcaagcg 120cagccgcaag gagagctact ccgtgtacgt gtacaaggtg ctgaagcagg tccaccccga 180caccggcatc tcttctaagg ccatgggaat catgaactcc ttcgtcaacg acatcttcga 240gcgcatcgca agcgaggctt cccgcctggc gcactacaac aagcgctcga ccatcacctc 300cagggagatc cagaccgccg tgcgcctgct gcttccgggg gagctggcca agcacgcggt 360gtcggagggc accaaggccg tcaccaagta caccagctcc aagtaaattc tcaagctctt 420gtccaaccca aaggctcttt tcagagccac tca 45351788DNAHomo sapiens 5attcctgaga gctctcctca ccaagaagca gcttctccgc tccttctagg atctccgcct 60ggttcggccc gcctgcctcc actcctgcct ctaccatgtc catcagggtg acccagaagt 120cctacaaggt gtccacctct ggcccccggg ccttcagcag ccgctcctac acgagtgggc 180ccggttcccg catcagctcc tcgagcttct cccgagtggg cagcagcaac tttcgcggtg 240gcctgggcgg cggctatggt ggggccagcg gcatgggagg catcaccgca gttacggtca 300accagagcct gctgagcccc cttgtcctgg aggtggaccc caacatccag gccgtgcgca 360cccaggagaa ggagcagatc aagaccctca acaacaagtt tgcctccttc atagacaagg 420tacggttcct ggagcagcag aacaagatgc tggagaccaa gtggagcctc ctgcagcagc 480agaagacggc tcgaagcaac atggacaaca tgttcgagag ctacatcaac aaccttaggc 540ggcagctgga gactctgggc caggagaagc tgaagctgga ggcggagctt ggcaacatgc 600aggggctggt ggaggacttc aagaacaagt atgaggatga gatcaataag cgtacagaga 660tggagaacga atttgtcctc atcaagaagg atgtggatga agcttacatg aacaaggtag 720agctggagtc tcgcctggaa gggctgaccg acgagatcaa cttcctcagg cagctatatg 780aagaggagat ccgggagctg cagtcccaga tctcggacac atctgtggtg ctgtccatgg 840acaacagccg ctccctggac atggacagca tcattgctga ggtcaaggca cagtacgagg 900atattgccaa ccgcagccgg gctgaggctg agagcatgta ccagatcaag tatgaggagc 960tgcagagcct ggctgggaag cacggggatg acctgcggcg cacaaagact gagatctctg 1020agatgaaccg gaacatcagc cggctccagg ctgagattga gggcctcaaa ggccagaggg 1080cttccctgga ggccgccatt gcagatgccg agcagcgtgg agagctggcc attaaggatg 1140ccaacgccaa gttgtccgag ctggaggccg ccctgcagcg ggccaagcag gacatggcgc 1200ggcagctgcg tgagtaccag gagctgatga acgtcaagct ggccctggac atcgagatcg 1260ccacctacag gaagctgctg gagggcgagg agagccggct ggagtctggg atgcagaaca 1320tgagtattca tacgaagacc accagcggct atgcaggtgg tctgagctcg gcctatgggg 1380gcctcacaag ccccggcctc agctacagcc tgggctccag ctttggctct ggcgcgggct 1440ccagctcctt cagccgcacc agctcctcca gggccgtggt tgtgaagaag atcgagacac 1500gtgatgggaa gctggtgtct gagtcctctg acgtcctgcc caagtgaaca gctgcggcag 1560cccctcccag cctacccctc ctgcgctgcc ccagagcctg ggaaggaggc cgctatgcag 1620ggtagcactg ggaacaggag acccacctga ggctcagccc tagccctcag cccacctggg 1680gagtttacta cctggggacc ccccttgccc atgcctccag ctacaaaaca attcaattgc 1740tttttttttt tggtccaaaa taaaacctca gctagctctg ccaatgtc 178869410DNAHomo sapiens 6cgccctcgag tggaggacga gaaggaaagc accatgacgt ccatccattt cgtggttcac 60ccgctgccgg gcaccgagga ccagctcaat gacaggttac gagaagtttc tgagaagctg 120aacaaatata atttaaacag ccacccccct ttgaatgtat tggaacaggc tactattaaa 180cagtgtgtgg tgggaccaaa tcatgctgcc tttcttcttg aggatggtag agtttgcagg 240attggttttt cagtacagcc agacagattg gaattgggta aacctgataa taatgatggg 300tcaaagttga acagcaactc gggggcaggg aggacgtcaa ggcctggtag gacaagcgac 360tctccatggt ttctctcagg ttctgagact ctaggcaggc tggcaggcaa caccttagga 420agccgctgga gttctggagt gggtggaagt ggtggaggat cctctggtag gtcatcagct 480ggagctcgag attcccgccg gcagactcga gttattcgga caggacggga tcgagggtct 540gggcttttgg gcagtcagcc ccagccagtt attccagcat ctgtcattcc agaggagctg 600atttcacagg cccaagttgt tttacaaggc aaatccagaa gtgtcattat tcgagaactt 660cagagaacaa atcttgatgt gaaccttgct gtaaataatt tacttagccg ggatgatgaa 720gatggagatg atggggatga tacagccagc gaatcttatt tgcctggaga ggatcttatg 780tctctccttg atgccgacat tcattctgcc cacccaagtg tcattattga tgcagatgcc 840atgttttctg aagacattag ctattttggt tacccttctt ttcgtcgttc atcactttcc 900aggctaggct catctcgagt tctccttctt cccttagaga gagactctga gctgttgcgt 960gaacgtgaat ccgttttacg tttacgtgaa cgaaggtggc ttgatggagc ctcatttgat 1020aatgaaaggg gttctaccag caaggaagga gagccaaact tggataagaa gaatacacct 1080gttcaaagtc cagtatctct aggagaagat ttgcagtggt ggcctgataa ggatggaaca 1140aaattcatct gtattggggc tctgtattct gaacttctgg ctgtcagcag taaaggagaa 1200ctttatcagt ggaaatggag tgaatctgag ccttacagaa atgcccagaa tccttcatta 1260catcatccac gagcaacatt tttggggtta accaatgaaa agatagtcct cctgtctgca 1320aatagcataa gagcaactgt agctacagaa aataacaagg ttgctacatg ggtggatgaa 1380actttaagtt ctgtggcttc taaattagag cacactgctc agacttactc tgaacttcaa 1440ggagagcgga tagtttcttt acattgctgt gccctttaca cctgcgctca gctggaaaac 1500agtttatatt ggtggggtgt agttcctttt agtcaaagga agaaaatgtt agagaaagct 1560agagcaaaaa ataaaaagcc taaatccagt gctggtattt cttcaatgcc gaacatcact 1620gttggtaccc aggtatgctt gagaaataat cctctttatc atgctggagc agttgcattt 1680tcaattagtg ctgggattcc taaagttggt gtcttaatgg agtcagtttg gaatatgaat 1740gacagctgta gatttcaact tagatctcct gaaagcttga aaaacatgga aaaagctagc 1800aaaactactg aagctaagcc tgaaagtaag caggagccag tgaaaacaga aatgggtcct 1860ccaccatctc cagcatccac gtgtagtgat gcatcctcaa ttgccagcag tgcatcaatg 1920ccatacaaac gacgacggtc aacccctgca ccaaaagaag aggaaaaggt gaatgaagag 1980cagtggtctc ttcgggaagt ggtttttgtg gaagatgtca agaatgttcc tgttggcaag 2040gtgctaaaag tagatggtgc ctatgttgct gtaaaatttc caggaacctc cagtaatact 2100aactgtcaga acagctctgg tccagatgct gacccttctt ctctcctgca ggattgtagg 2160ttacttagaa ttgatgaatt gcaggttgtc aaaactggtg gaacaccgaa ggttcccgac 2220tgtttccaaa ggactcctaa aaagctttgt atacctgaaa aaacagaaat attagcagtg 2280aatgtagatt ccaaaggtgt tcatgctgtt ctgaagactg gaaattgggt acgatactgt 2340atctttgatc ttgctacagg aaaagcagaa caggaaaata attttcctac aagcagcatt 2400gctttccttg gtcagaatga gaggaatgta gccattttca ctgctggaca ggaatctccc 2460attattcttc gagatggaaa tggtaccatc tacccaatgg ccaaagattg catgggagga 2520ataagggatc ccgattggct ggatcttcca cctattagta gtcttggaat gggtgtgcat 2580tctttaataa atcttcctgc caattcaaca atcaaaaaga aagctgctgt tatcatcatg 2640gctgtagaga aacaaacctt aatgcaacac attctgcgct gtgactatga ggcctgtcga 2700caatatctaa tgaatcttga gcaagcggtt gttttagagc agaatctaca gatgctgcag 2760acattcatca gccacagatg tgatggaaat cgaaatattt tgcatgcttg tgtatcagtt 2820tgctttccaa ccagcaataa agaaactaaa gaagaagagg aagcggagcg ttctgaaaga 2880aatacatttg cagaaaggct ttctgctgtt gaggccattg caaatgcaat atcagttgtt 2940tcaagtaatg gcccaggtaa tcgggctgga tcatcaagta gccgaagttt gagattacgg 3000gaaatgatga gacgttcgtt gagagcagct ggtttgggta gacatgaagc tggagcttca 3060tccagtgacc accaggatcc agtttcaccc cccatagctc cccctagttg ggttcctgac 3120cctcctgcga tggatcctga tggtgacatt gattttatcc tggcccccgc tgtgggatct 3180cttaccacag cagcaaccgg tactggtcaa ggaccaagca cctccactat tccaggtcct 3240tccacagagc catctgtagt agaatccaag gatcgaaagg cgaatgctca ttttatattg 3300aaattgttat gtgacagtgt ggttctccag ccctatctac gagaacttct ttctgccaag 3360gatgcaagag ggatgacccc atttatgtca gctgtaagtg gccgagctta tcctgctgca 3420attaccatct tagaaactgc tcagaaaatt gcaaaagctg aaatatcctc aagtgaaaaa 3480gaggaagatg tattcatggg aatggtttgc ccatcaggta ccaaccctga tgactctcct 3540ttatatgttt tatgttgtaa tgacacttgc agttttacat ggactggagc agagcacatt 3600aaccaggata tttttgagtg tcgaacttgt ggcttgctgg agtcactgtg ttgttgtacg 3660gaatgtgcaa gggtttgtca taaaggtcat gattgcaaac tcaaacggac atcaccaaca 3720gcctactgtg attgttggga gaaatgtaaa tgtaaaactc ttattgctgg acagaaatct 3780gctcgtcttg atctacttta tcgcctgctc actgctacta atctggttac tctgccaaac 3840agcaggggag agcacctctt actattctta gtacagacag tcgcaaggca gacggtggag

3900cattgtcaat acaggccacc tcgaatcagg gaagatcgta accgaaaaac agccagtcct 3960gaagattcag atatgccaga tcatgattta gagcctccaa gatttgccca gcttgcattg 4020gagcgtgttc tacaggactg gaatgccttg aaatctatga ttatgtttgg gtcgcaggag 4080aataaagacc ctcttagtgc cagcagtaga ataggccatc ttttgccaga agagcaagta 4140tacctcaatc agcaaagtgg cacaattcgg ctggactgtt tcactcattg ccttatagtt 4200aagtgtacag cagatatttt gcttttagat actctactag gtacactagt gaaagaactc 4260caaaacaaat atacacctgg acgtagagaa gaagctattg ctgtgacaat gaggtttcta 4320cgttcagtgg caagagtttt tgttattctg agtgtggaaa tggcttcatc caaaaagaaa 4380aacaacttta ttccacagcc aattggaaaa tgcaagcgtg tattccaagc attgctacct 4440tacgctgtgg aagaattgtg caacgtagca gagtcactga ttgttcctgt cagaatgggg 4500attgctcgtc caactgcacc atttaccctg gctagtacta gcatagatgc catgcagggc 4560agtgaagaat tattttcagt ggaaccacta ccaccacgac catcatctga tcagtctagc 4620agctccagtc agtctcagtc atcctacatc atcaggaatc cacagcagag gcgcatcagc 4680cagtcacagc ccgttcgggg cagagatgaa gaacaggatg atattgtttc agcagatgtg 4740gaagaggttg aggtggtgga gggtgtggct ggagaagagg atcatcatga tgaacaggaa 4800gaacacgggg aagaaaatgc tgaggcagag ggacaacatg atgagcatga tgaagacggg 4860agtgatatgg agctggactt gttagcagca gctgaaacag aaagtgatag tgaaagtaac 4920cacagcaacc aagataatgc tagtgggcgc agaagcgttg tcactgcagc aactgctggt 4980tcagaagcag gagcaagcag tgttcctgcc ttcttttctg aagatgattc tcaatcgaat 5040gactcaagtg attctgatag cagtagtagt cagagtgacg acatagaaca ggagaccttt 5100atgcttgatg agccattaga aagaaccaca aatagctccc atgccaatgg tgctgcccaa 5160gctccccgtt caatgcagtg ggctgtccgc aacacccagc atcagcgagc agccagtaca 5220gccccttcca gtacatctac accagcagca agttcagcgg gtttgattta tattgatcct 5280tcaaacttac gccggagtgg taccatcagt acaagtgctg cagctgcagc agctgctttg 5340gaagctagca acgccagcag ttacctaaca tctgcaagca gtttagccag ggcttacagc 5400attgtcatta gacaaatctc ggacttgatg ggccttattc ctaagtataa tcacctagta 5460tactctcaga ttccagcagc tgtgaaattg acttaccaag atgcagtaaa cttacagaac 5520tatgtagaag aaaagcttat tcccacttgg aactggatgg tcagtattat ggattctact 5580gaagctcaat tacgttatgg ttctgcatta gcatctgctg gtgatcctgg acatccaaat 5640catcctcttc acgcttctca gaattcagcg agaagagaga ggatgactgc gcgagaagaa 5700gctagcttac gaacacttga aggcagacga cgtgccacct tgcttagcgc ccgtcaagga 5760atgatgtctg cacgaggaga cttcctaaat tatgctctgt ctctaatgcg gtctcataat 5820gatgagcatt ctgatgttct tccagttttg gatgtttgct cattgaagca tgtggcatat 5880gtttttcaag cacttatata ctggattaag gcaatgaatc agcagacaac attggataca 5940cctcaactag aacgcaaaag gacgcgagaa ctcttggaac tgggtattga taatgaagat 6000tcagaacatg aaaatgatga tgacaccaat caaagtgcta ctttgaatga taaggatgat 6060gactctcttc ctgcagaaac tggccaaaac catccatttt tccgacgttc agactccatg 6120acattccttg ggtgtatacc cccaaatcca tttgaagtgc ctctggctga agccatcccc 6180ttggctgatc agccacatct gttgcagcca aatgctagaa aggaggatct ttttggccgt 6240ccaagtcagg gtctttattc ttcatctgcc agtagtggga aatgtttaat ggaggttaca 6300gtggatagaa actgcctaga ggttcttcca acaaaaatgt cttatgctgc caatctgaaa 6360aatgtaatga acatgcaaaa ccggcaaaaa aaagaagggg aagaacagcc cgtgctgcca 6420gaagaaactg agagttcaaa accagggcca tctgctcatg atcttgctgc acaattaaaa 6480agtagcttac tagcagaaat aggacttact gaaagtgaag ggccacctct cacatctttc 6540aggccacagt gtagctttat gggaatggtt atttcccatg atatgctgct aggacgttgg 6600cgcctttctt tagaactgtt cggcagggta ttcatggaag atgttggagc agaacctgga 6660tcaatcctaa ctgaattggg tggttttgag gtaaaagaat caaaattccg cagagaaatg 6720gaaaaactga gaaaccagca gtcaagagat ttgtcactag aggttgatcg ggatcgagat 6780cttctcattc agcagactat gaggcagctt aacaatcact ttggtcgaag atgtgctact 6840acaccaatgg ctgtacacag agtaaaagtc acatttaagg atgagccagg agagggcagt 6900ggtgtagcac gaagttttta tacagccatt gcacaagcat ttttatcaaa tgaaaaattg 6960ccaaatctag agtgtatcca aaatgccaac aaaggcaccc acacaagttt aatgcagaga 7020ttaaggaacc gaggagagag agaccgggaa agggagagag aaagggaaat gaggaggagt 7080agtggtttgc gagcaggttc tcggagggac cgggatagag actttagaag acagctttcc 7140atcgacacta ggccctttag accagcctct gaagggaatc ctagcgatga tcctgagcct 7200ttgccagcac atcggcaggc acttggagag aggctttatc ctcgtgtaca agcaatgcaa 7260ccagcatttg caagtaaaat cactggcatg ttgttggaat tatccccagc tcagctgctt 7320ctccttctag caagtgagga ttctctgaga gcaagagtgg atgaggccat ggaactcatt 7380attgcacatg gacgggaaaa tggagctgat agtatcctgg atcttggatt agtagactcc 7440tcagaaaagg tacagcagga aaaccgaaag cgccatggct ctagtcgaag tgtagtagat 7500atggatttag atgatacaga tgatggtgat gacaatgccc ctttgtttta ccaacctggg 7560aaaagaggat tttatactcc aaggcctggc aagaacacag aagcaaggtt gaattgtttc 7620agaaacattg gcaggattct tggactatgt ctgttacaga atgaactatg tcctatcaca 7680ttgaatagac atgtaattaa agtattgctt ggtagaaaag tcaattggca tgattttgct 7740ttttttgatc ctgtaatgta tgagagtttg cggcaactaa tcctcgcgtc tcagagttca 7800gatgctgatg ctgttttctc agcaatggat ttggcatttg caattgacct gtgtaaagaa 7860gaaggtggag gacaggttga actcattcct aatggtgtaa atataccagt cactccacag 7920aatgtatatg agtatgtgcg gaaatacgca gaacacagaa tgttggtagt tgcagaacag 7980cccttacatg caatgaggaa aggtctacta gatgtgcttc caaaaaattc attagaagat 8040ttaacggcag aagattttag gcttttggta aatggctgcg gtgaagtcaa tgtgcaaatg 8100ctgatcagtt ttacctcttt caatgatgaa tcaggagaaa atgctgagaa gcttctgcag 8160ttcaagcgtt ggttctggtc aatagtagag aagatgagca tgacagaacg acaagatctt 8220gtttactttt ggacatcaag cccatcactg ccagccagtg aagaaggatt ccagcctatg 8280ccctcaatca caataagacc accagatgac caacatcttc ctactgcaaa tacttgcatt 8340tctcgacttt acgtcccact ctattcctct aaacagattc tcaaacagaa attgttactc 8400gccattaaga ccaagaattt tggttttgtg tagagtataa aaagtgtgta ttgctgtgta 8460atattactag caaattttgt agattttttt ccatttgtct ataaaagttt atggaagtta 8520atgctgtcat acccccctgg tggtacctta aagagataaa atgcagacat tccttgctga 8580gtttatagct taaaggccta aggagcacta gcaacatttg gctatattgg tttgctagtc 8640accaacttct gggtctaacc ccagccaaag atgacagcag aacaacataa tttacactgt 8700gatttatctt tttgctgagg gggaaaaaat gtaaatgttc tgaaaattca ctgctgcctt 8760tgtggaaact gtttcagcaa aggttcttgt atagagggaa tagggaattt caaaataaaa 8820aattaagtat gttctgtgtt ttcattttaa ctttttttat ggtgtttaat ttgtggttgg 8880ctgcaactgt gtatcatgta tatggaactt gtaaaaaagt tctcgacatt cagatcttaa 8940gagatgaaat cacttttacc tataaaaacc acttttattg cggtttgact gcattgagct 9000ctaggatatt aaatgatatc actaatattt tgcatgtaat ttgctcattt gagtgagggc 9060actttttttg tacatatgat ggggccaatg cacaatactt ttatcacaat caactttttc 9120tttgtatccc tatttcaatg agcagtcagt ctcaagaggt tactgcactt cagttctaac 9180tagacatttg tactaaggta tttcagttat gtaaactcag cctgggcact ttctgataac 9240tgtaaaatgt tttataagat catgattatt gaagatacat tttggaaaat tttaaatgtt 9300cgtgagcagc ttaactactt ttgtatctag ccttttttaa gtatcttgtt acatttactt 9360ttttaaataa agaaattaca gaagaaatgt caagaaaaaa aaaaaaaaaa 94107534DNAHomo sapiens 7cgactttccc gatcgccagg caggagtttc tctcggtgac tactatcgct gtcatgtctg 60gtcgtggcaa gcaaggaggc aaggcccgcg ccaaggccaa gtcgcgctcg tcccgcgctg 120gccttcagtt cccggtaggg cgagtgcatc gcttgctgcg caaaggcaac tacgcggagc 180gagtgggggc cggcgcgccc gtctacatgg ctgcggtcct cgagtatctg accgccgaga 240tcctggagct ggcgggcaac gcggctcggg acaacaagaa gacgcgcatc atccctcgtc 300acctccagct ggccatccgc aacgacgagg aactgaacaa gctgctgggc aaagtcacca 360tcgcccaggg cggcgtcttg cctaacatcc aggccgtact gctccctaag aagacggaga 420gtcaccacaa ggcaaagggc aagtgaggct gacgtccggc ccaagtgggc ccagcccggc 480ccgcgtctcg aaggggcacc tgtgaactca aaaggctctt ttcagagcca ccca 53482642DNAHomo sapiens 8gtctgtcagc actgtccgtg ccattcccag aggagcctga gaagaggcag aggaaggcga 60aacatggctg ctctaggagt ccagagtatc aactggcaga cggccttcaa ccgacaagcg 120catcacacag acaagttctc cagccaggag ctcatcttgc ggagaggcca aaacttccag 180gtcttaatga tcatgaacaa aggccttggc tctaacgaaa gactggagtt cattgtctcc 240acagggcctt acccctcaga gtcggccatg acgaaggctg tgtttccact ctccaatggc 300agtagtggtg gctggagtgc ggtgcttcag gccagcaatg gcaatactct gactatcagc 360atctccagtc ctgccagcgc acccatagga cggtacacaa tggccctcca gatcttctcc 420cagggcggca tctcctctgt gaaacttggg acgttcatac tgctttttaa cccctggctg 480aatgtggata gcgtctttat gggtaaccac gctgagagag aagagtatgt tcaggaagat 540gccggcatca tctttgtggg aagcacaaac cgaattggca tgattggctg gaactttgga 600cagtttgaag aagacattct cagcatctgc ctctcaatct tggataggag tctgaatttc 660cgccgtgacg ctgctactga tgtggccagc agaaatgacc ccaaatacgt tggccgggtg 720ctgagtgcca tgatcaatag caatgatgac aatggtgtgc ttgctgggaa ttggagcggc 780acttacaccg gtggccggga cccaaggagc tggaacggca gcgtggagat cctcaaaaat 840tggaaaaaat ctggcttcag cccagtccga tatggccagt gctgggtctt tgctgggacc 900ctcaacacag cgctgcggtc tttggggatt ccttcccggg tgatcaccaa cttcaactca 960gctcatgaca cagaccgaaa tctcagtgtg gatgtgtact acgaccccat gggaaacccc 1020ctggacaagg gtagtgatag cgtatggaat ttccatgtct ggaatgaagg ctggtttgtg 1080aggtctgacc tgggcccctc gtacggtgga tggcaggtgt tggatgctac cccgcaggaa 1140agaagccaag gggtgttcca gtgcggcccc gcttcggtca ttggtgttcg agagggtgat 1200gtgcagctga acttcgacat gccctttatc ttcgcggagg ttaatgccga ccgcatcacc 1260tggctgtacg acaacaccac tggcaaacag tggaagaatt ccgtgaacag tcacaccatt 1320ggcaggtaca tcagcaccaa ggcggtgggc agcaatgctc gcatggacgt cacggacaag 1380tacaagtacc cagaaggctc tgaccaggaa agacaagtgt tccaaaaggc tttggggaaa 1440cttaaaccca acacgccatt tgccgcgacg tcttcaatgg gtttggaaac agaggaacag 1500gagcccagca tcatcgggaa gctgaaggtc gctggcatgc tggcagtagg caaagaagtc 1560aacctggtcc tactgctcaa aaacctgagc agggatacga agacagtgac agtgaacatg 1620acagcctgga ccatcatcta caacggcacg cttgtacatg aagtgtggaa ggactctgcc 1680acaatgtccc tggaccctga ggaagaggca gaacatccca taaagatctc gtacgctcag 1740tatgagaagt acctgaagtc agacaacatg atccggatca cagcggtgtg caaggtccca 1800gatgagtctg aggtggtggt ggagcgggac atcatcctgg acaaccccac cttgaccctg 1860gaggtgctga acgaggctcg tgtgcggaag cctgtgaacg tgcagatgct cttctccaat 1920ccactggatg agccggtgag ggactgcgtg ctgatggtgg agggaagcgg cctgctgttg 1980ggtaacctga agatcgacgt gccgacccta gggcccaagg aggggtcccg ggtccgtttt 2040gatatcctgc cctcccggag tggcaccaag caactgctcg ccgacttctc ctgcaacaag 2100ttccctgcaa tcaaggccat gttgtccatc gatgtagccg aatgaagggc gctggtggcc 2160tcccgtacaa acttggacaa cacggagcag ggagagctca ccatggaatg aaccccccgc 2220ccatgctgtc cggcctggga aaccctctcc atctcccaag gctgccagac atggacctcc 2280aggctccagc acatccccct ctcctctccc ccaggttggg gctgggtcca ccctgtccta 2340tgacttgatc acttttgcac attccctggc cgcttctccc cagagctgcc tgctctgtga 2400gccccacagc cctgctcatt cctcacgccc ttcaatgctg caggatggac tggcccctga 2460cccagggact ctccaaacgg gatacaggag agaagctggt ctagactgtt tgctgatccc 2520caacctgcac ggggcattcc tgcttctctc tcaggccacc acagagggca ggggatggtt 2580agtcacctgc cccagcactc acaccctaac tcaaaataaa tgttaaataa gtgcgatcac 2640ac 264291267DNAHomo sapiens 9acattctctt ttcttttatt cttgtctgtt ctgcctcact cccgagctct actgactccc 60aacagagcgc ccaagaagaa aatggccata agtggagtcc ctgtgctagg atttttcatc 120atagctgtgc tgatgagcgc tcaggaatca tgggctatca aagaagaaca tgtgatcatc 180caggccgagt tctatctgaa tcctgaccaa tcaggcgagt ttatgtttga ctttgatggt 240gatgagattt tccatgtgga tatggcaaag aaggagacgg tctggcggct tgaagaattt 300ggacgatttg ccagctttga ggctcaaggt gcattggcca acatagctgt ggacaaagcc 360aacctggaaa tcatgacaaa gcgctccaac tatactccga tcaccaatgt acctccagag 420gtaactgtgc tcacgaacag ccctgtggaa ctgagagagc ccaacgtcct catctgtttc 480atagacaagt tcaccccacc agtggtcaat gtcacgtggc ttcgaaatgg aaaacctgtc 540accacaggag tgtcagagac agtcttcctg cccagggaag accacctttt ccgcaagttc 600cactatctcc ccttcctgcc ctcaactgag gacgtttacg actgcagggt ggagcactgg 660ggcttggatg agcctcttct caagcactgg gagtttgatg ctccaagccc tctcccagag 720actacagaga acgtggtgtg tgccctgggc ctgactgtgg gtctggtggg catcattatt 780gggaccatct tcatcatcaa gggattgcgc aaaagcaatg cagcagaacg cagggggcct 840ctgtaaggca catggaggtg atggtgtttc ttagagagaa gatcactgaa gaaacttctg 900ctttaatggc tttacaaagc tggcaatatt acaatccttg acctcagtga aagcagtcat 960cttcagcatt ttccagccct atagccaccc caagtgtgga tatgcctctt cgattgctcc 1020gtactctaac atctagctgg cttccctgtc tattgccttt tcctgtatct attttcctct 1080atttcctatc attttattat caccatgcaa tgcctctgga ataaaacata caggagtctg 1140tctctgctat ggaatgcccc atggggcatc tcttgtgtac ttattgttta aggtttcctc 1200aaactgtgat ttttctgaac acaataaact attttgatga tcttgggtgg aaaaaaaaaa 1260aaaaaaa 1267103670DNAHomo sapiens 10ccggtttgtt agggagtcgt gtacgtgcct tggtcgcttc tgtagctccg agggcaggtt 60gcggaagaaa gcccaggcgg tctgtggccc agaggaaagg cctgcagcag gacgaggacc 120tgagccagga atgcaggatg gcggcggtga agaaggaagg gggtgctctg agtgaagcca 180tgtccctgga gggagatgaa tgggaactga gtaaagaaaa tgtacaacct ttaaggcaag 240ggcggatcat gtccacgctt cagggagcac tggcacaaga atctgcctgt aacaatactc 300ttcagcagca gaaacgggca tttgaatatg aaattcgatt ttacactgga aatgaccctc 360tggatgtttg ggataggtat atcagctgga cagagcagaa ctatcctcaa ggtgggaagg 420agagtaatat gtcaacgtta ttagaaagag ctgtagaagc actacaagga gaaaaacgat 480attatagtga tcctcgattt ctcaatctct ggcttaaatt agggcgttta tgcaatgagc 540ctttggatat gtacagttac ttgcacaacc aagggattgg tgtttcactt gctcagttct 600atatctcatg ggcagaagaa tatgaagcta gagaaaactt taggaaagca gatgcgatat 660ttcaggaagg gattcaacag aaggctgaac cactagaaag actacagtcc cagcaccgac 720aattccaagc tcgagtgtct cggcaaactc tgttggcact tgagaaagaa gaagaggagg 780aagtttttga gtcttctgta ccacaacgaa gcacactagc tgaactaaag agcaaaggga 840aaaagacagc aagagctcca atcatccgtg taggaggtgc tctcaaggct ccaagccaga 900acagaggact ccaaaatcca tttcctcaac agatgcaaaa taatagtaga attactgttt 960ttgatgaaaa tgctgatgag gcttctacag cagagttgtc taagcctaca gtccagccat 1020ggatagcacc ccccatgccc agggccaaag agaatgagct gcaagcaggc ccttggaaca 1080caggcaggtc cttggaacac aggcctcgtg gcaatacagc ttcactgata gctgtacccg 1140ctgtgcttcc cagtttcact ccatatgtgg aagagactgc acgacagcca gttatgacac 1200catgtaaaat tgaacctagt ataaaccaca tcctaagcac cagaaagcct ggaaaggaag 1260aaggagatcc tctacaaagg gttcagagcc atcagcaagc gtctgaggag aagaaagaga 1320agatgatgta ttgtaaggag aagatttatg caggagtagg ggaattctcc tttgaagaaa 1380ttcgggctga agttttccgg aagaaattaa aagagcaaag ggaagccgag ctattgacca 1440gtgcagagaa gagagcagaa atgcagaaac agattgaaga gatggagaag aagctaaaag 1500aaatccaaac tactcagcaa gaaagaacag gtgatcagca agaagagacg atgcctacaa 1560aggagacaac taaactgcaa attgcttccg agtctcagaa aataccagga atgactctat 1620ccagttctgt ttgtcaagta aactgttgtg ccagagaaac ttcacttgcg gagaacattt 1680ggcaggaaca acctcattct aaaggtccca gtgtaccttt ctccattttt gatgagtttc 1740ttctttcaga aaagaagaat aaaagtcctc ctgcagatcc cccacgagtt ttagctcaac 1800gaagacccct tgcagttctc aaaacctcag aaagcatcac ctcaaatgaa gatgtgtctc 1860cagatgtttg tgatgaattt acaggaattg aacccttgag cgaggatgcc attatcacag 1920gcttcagaaa tgtaacaatt tgtcctaacc cagaagacac ttgtgacttt gccagagcag 1980ctcgttttgt atccactcct tttcatgaga taatgtcctt gaaggatctc ccttctgatc 2040ctgagagact gttaccggaa gaagatctag atgtaaagac ctctgaggac cagcagacag 2100cttgtggcac tatctacagt cagactctca gcatcaagaa gctgagccca attattgaag 2160acagtcgtga agccacacac tcctctggct tctctggttc ttctgcctcg gttgcaagca 2220cctcctccat caaatgtctt caaattcctg agaaactaga acttactaat gagacttcag 2280aaaaccctac tcagtcacca tggtgttcac agtatcgcag acagctactg aagtccctac 2340cagagttaag tgcctctgca gagttgtgta tagaagacag accaatgcct aagttggaaa 2400ttgagaagga aattgaatta ggtaatgagg attactgcat taaacgagaa tacctaatat 2460gtgaagatta caagttattc tgggtggcgc caagaaactc tgcagaatta acagtaataa 2520aggtatcttc tcaacctgtc ccatgggact tttatatcaa cctcaagtta aaggaacgtt 2580taaatgaaga ttttgatcat ttttgcagct gttatcaata tcaagatggc tgtattgttt 2640ggcaccaata tataaactgc ttcacccttc aggatcttct ccaacacagt gaatatatta 2700cccatgaaat aacagtgttg attatttata accttttgac aatagtggag atgctacaca 2760aagcagaaat agtccatggt gacttgagtc caaggtgtct gattctcaga aacagaatcc 2820acgatcccta tgattgtaac aagaacaatc aagctttgaa gatagtggac ttttcctaca 2880gtgttgacct tagggtgcag ctggatgttt ttaccctcag cggctttcgg actgtacaga 2940tcctggaagg acaaaagatc ctggctaact gttcttctcc ctaccaggta gacctgtttg 3000gtatagcaga tttagcacat ttactattgt tcaaggaaca cctacaggtc ttctgggatg 3060ggtccttctg gaaacttagc caaaatattt ctgagctaaa agatggtgaa ttgtggaata 3120aattctttgt gcggattctg aatgccaatg atgaggccac agtgtctgtt cttggggagc 3180ttgcagcaga aatgaatggg gtttttgaca ctacattcca aagtcacctg aacaaagcct 3240tatggaaggt agggaagtta actagtcctg gggctttgct ctttcagtga gctaggcaat 3300caagtctcac agattgctgc ctcagagcaa tggttgtatt gtggaacact gaaactgtat 3360gtgctgtaat ttaatttagg acacatttag atgcactacc attgctgttc tactttttgg 3420tacaggtata ttttgacgtc actgatattt tttatacagt gatatactta ctcatggcct 3480tgtctaactt ttgtgaagaa ctattttatt ctaaacagac tcattacaaa tggttacctt 3540gttatttaac ccatttgtct ctacttttcc ctgtactttt cccatttgta atttgtaaaa 3600tgttctctta tgatcaccat gtattttgta aataataaaa tagtatctgt taaaaaaaaa 3660aaaaaaaaaa 3670111024DNAHomo sapiens 11cttcctggct cctccttcct ccccacccct ctaataggct cataagtggg ctcaggcctc 60tctgcggggc tcactctgcg cttcaccatg gctttcattg ccaagtcctt ctatgacctc 120agtgccatca gcctggatgg ggagaaggta gatttcaata cgttccgggg cagggccgtg 180ctgattgaga atgtggcttc gctctgaggc acaaccaccc gggacttcac ccagctcaac 240gagctgcaat gccgctttcc caggcgcctg gtggtccttg gcttcccttg caaccaattt 300ggacatcagg agaactgtca gaatgaggag atcctgaaca gtctcaagta tgtccgtcct 360gggggtggat accagcccac cttcaccctt gtccaaaaat gtgaggtgaa tgggcagaac 420gagcatcctg tcttcgccta cctgaaggac aagctcccct acccttatga tgacccattt 480tccctcatga ccgatcccaa gctcatcatt tggagccctg tgcgccgctc agatgtggcc 540tggaactttg agaagttcct catagggccg gagggagagc ccttccgacg ctacagccgc 600accttcccaa ccatcaacat tgagcctgac atcaagcgcc tccttaaagt tgccatatag 660atgtgaactg ctcaacacac agatctccta ctccatccag tcctgaggag ccttaggatg 720cagcatgcct tcaggagaca ctgctggacc tcagcattcc cttgatatca gtccccttca 780ctgcagagcc ttgcctttcc cctctgcctg tttccttttc ctctcccaac cctctggttg 840gtgattcaac ttgggctcca agacttgggt aagctctggg ccttcacaga atgatggcac 900cttcctaaac cctcatgggt ggtgtctgag aggcgtgaag ggcctggagc cactctgcta 960gaagagacca ataaagggca ggtgtggaaa cggcaaaaaa aaaaaaaaaa aaaaaaaaaa 1020aaaa 1024121684PRTHomo sapiens 12Met Leu Gly Thr Ile Thr Ile Thr Val Gly Gln Arg Asp Ser Glu Asp1

5 10 15Val Ser Lys Arg Asp Ser Asp Lys Glu Met Ala Thr Lys Ser Ala Val 20 25 30Val His Asp Ile Thr Asp Asp Gly Gln Glu Glu Thr Pro Glu Ile Ile 35 40 45Glu Gln Ile Pro Ser Ser Glu Ser Asn Leu Glu Glu Leu Thr Gln Pro 50 55 60Thr Glu Ser Gln Ala Asn Asp Ile Gly Phe Lys Lys Val Phe Lys Phe65 70 75 80Val Gly Phe Lys Phe Thr Val Lys Lys Asp Lys Thr Glu Lys Pro Asp 85 90 95Thr Val Gln Leu Leu Thr Val Lys Lys Asp Glu Gly Glu Gly Ala Ala 100 105 110Gly Ala Gly Asp His Lys Asp Pro Ser Leu Gly Ala Gly Glu Ala Ala 115 120 125Ser Lys Glu Ser Glu Pro Lys Gln Ser Thr Glu Lys Pro Glu Glu Thr 130 135 140Leu Lys Arg Glu Gln Ser His Ala Glu Ile Ser Pro Pro Ala Glu Ser145 150 155 160Gly Gln Ala Val Glu Glu Cys Lys Glu Glu Gly Glu Glu Lys Gln Glu 165 170 175Lys Glu Pro Ser Lys Ser Ala Glu Ser Pro Thr Ser Pro Val Thr Ser 180 185 190Glu Thr Gly Ser Thr Phe Lys Lys Phe Phe Thr Gln Gly Trp Ala Gly 195 200 205Trp Arg Lys Lys Thr Ser Phe Arg Lys Pro Lys Glu Asp Glu Val Glu 210 215 220Ala Ser Glu Lys Lys Lys Glu Gln Glu Pro Glu Lys Val Asp Thr Glu225 230 235 240Glu Asp Gly Lys Ala Glu Val Ala Ser Glu Lys Leu Thr Ala Ser Glu 245 250 255Gln Ala His Pro Gln Glu Pro Ala Glu Ser Ala His Glu Pro Arg Leu 260 265 270Ser Ala Glu Tyr Glu Lys Val Glu Leu Pro Ser Glu Glu Gln Val Ser 275 280 285Gly Ser Gln Gly Pro Ser Glu Glu Lys Pro Ala Pro Leu Ala Thr Glu 290 295 300Val Phe Asp Glu Lys Ile Glu Val His Gln Glu Glu Val Val Ala Glu305 310 315 320Val His Val Ser Thr Val Glu Glu Arg Thr Glu Glu Gln Lys Thr Glu 325 330 335Val Glu Glu Thr Ala Gly Ser Val Pro Ala Glu Glu Leu Val Glu Met 340 345 350Asp Ala Glu Pro Gln Glu Ala Glu Pro Ala Lys Glu Leu Val Lys Leu 355 360 365Lys Glu Thr Cys Val Ser Gly Glu Asp Pro Thr Gln Gly Ala Asp Leu 370 375 380Ser Pro Asp Glu Lys Val Leu Ser Lys Pro Pro Glu Gly Val Val Ser385 390 395 400Glu Val Glu Met Leu Ser Ser Gln Glu Arg Met Lys Val Gln Gly Ser 405 410 415Pro Leu Lys Lys Leu Phe Thr Ser Thr Gly Leu Lys Lys Leu Ser Gly 420 425 430Lys Lys Gln Lys Gly Lys Arg Gly Gly Gly Asp Glu Glu Ser Gly Glu 435 440 445His Thr Gln Val Pro Ala Asp Ser Pro Asp Ser Gln Glu Glu Gln Lys 450 455 460Gly Glu Ser Ser Ala Ser Ser Pro Glu Glu Pro Glu Glu Ile Thr Cys465 470 475 480Leu Glu Lys Gly Leu Ala Glu Val Gln Gln Asp Gly Glu Ala Glu Glu 485 490 495Gly Ala Thr Ser Asp Gly Glu Lys Lys Arg Glu Gly Val Thr Pro Trp 500 505 510Ala Ser Phe Lys Lys Met Val Thr Pro Lys Lys Arg Val Arg Arg Pro 515 520 525Ser Glu Ser Asp Lys Glu Asp Glu Leu Asp Lys Val Lys Ser Ala Thr 530 535 540Leu Ser Ser Thr Glu Ser Thr Ala Ser Glu Met Gln Glu Glu Met Lys545 550 555 560Gly Ser Val Glu Glu Pro Lys Pro Glu Glu Pro Lys Arg Lys Val Asp 565 570 575Thr Ser Val Ser Trp Glu Ala Leu Ile Cys Val Gly Ser Ser Lys Lys 580 585 590Arg Ala Arg Arg Gly Ser Ser Ser Asp Glu Glu Gly Gly Pro Lys Ala 595 600 605Met Gly Gly Asp His Gln Lys Ala Asp Glu Ala Gly Lys Asp Lys Glu 610 615 620Thr Gly Thr Asp Gly Ile Leu Ala Gly Ser Gln Glu His Asp Pro Gly625 630 635 640Gln Gly Ser Ser Ser Pro Glu Gln Ala Gly Ser Pro Thr Glu Gly Glu 645 650 655Gly Val Ser Thr Trp Glu Ser Phe Lys Arg Leu Val Thr Pro Arg Lys 660 665 670Lys Ser Lys Ser Lys Leu Glu Glu Lys Ser Glu Asp Ser Ile Ala Gly 675 680 685Ser Gly Val Glu His Ser Thr Pro Asp Thr Glu Pro Gly Lys Glu Glu 690 695 700Ser Trp Val Ser Ile Lys Lys Phe Ile Pro Gly Arg Arg Lys Lys Arg705 710 715 720Pro Asp Gly Lys Gln Glu Gln Ala Pro Val Glu Asp Ala Gly Pro Thr 725 730 735Gly Ala Asn Glu Asp Asp Ser Asp Val Pro Ala Val Val Pro Leu Ser 740 745 750Glu Tyr Asp Ala Val Glu Arg Glu Lys Met Glu Ala Gln Gln Ala Gln 755 760 765Lys Ser Ala Glu Gln Pro Glu Gln Lys Ala Ala Thr Glu Val Ser Lys 770 775 780Glu Leu Ser Glu Ser Gln Val His Met Met Ala Ala Ala Val Ala Asp785 790 795 800Gly Thr Arg Ala Ala Thr Ile Ile Glu Glu Arg Ser Pro Ser Trp Ile 805 810 815Ser Ala Ser Val Thr Glu Pro Leu Glu Gln Val Glu Ala Glu Ala Ala 820 825 830Leu Leu Thr Glu Glu Val Leu Glu Arg Glu Val Ile Ala Glu Glu Glu 835 840 845Pro Pro Thr Val Thr Glu Pro Leu Pro Glu Asn Arg Glu Ala Arg Gly 850 855 860Asp Thr Val Val Ser Glu Ala Glu Leu Thr Pro Glu Ala Val Thr Ala865 870 875 880Ala Glu Thr Ala Gly Pro Leu Gly Ala Glu Glu Gly Thr Glu Ala Ser 885 890 895Ala Ala Glu Glu Thr Thr Glu Met Val Ser Ala Val Ser Gln Leu Thr 900 905 910Asp Ser Pro Asp Thr Thr Glu Glu Ala Thr Pro Val Gln Glu Val Glu 915 920 925Gly Gly Val Pro Asp Ile Glu Glu Gln Glu Arg Arg Thr Gln Glu Val 930 935 940Leu Gln Ala Val Ala Glu Lys Val Lys Glu Glu Ser Gln Leu Pro Gly945 950 955 960Thr Gly Gly Pro Glu Asp Val Leu Gln Pro Val Gln Arg Ala Glu Ala 965 970 975Glu Arg Pro Glu Glu Gln Ala Glu Ala Ser Gly Leu Lys Lys Glu Thr 980 985 990Asp Val Val Leu Lys Val Asp Ala Gln Glu Ala Lys Thr Glu Pro Phe 995 1000 1005Thr Gln Gly Lys Val Val Gly Gln Thr Thr Pro Glu Ser Phe Glu Lys 1010 1015 1020Ala Pro Gln Val Thr Glu Ser Ile Glu Ser Ser Glu Leu Val Thr Thr1025 1030 1035 1040Cys Gln Ala Glu Thr Leu Ala Gly Val Lys Ser Gln Glu Met Val Met 1045 1050 1055Glu Gln Ala Ile Pro Pro Asp Ser Val Glu Thr Pro Thr Asp Ser Glu 1060 1065 1070Thr Asp Gly Ser Thr Pro Val Ala Asp Phe Asp Ala Pro Gly Thr Thr 1075 1080 1085Gln Lys Asp Glu Ile Val Glu Ile His Glu Glu Asn Glu Val Ala Ser 1090 1095 1100Gly Thr Gln Ser Gly Gly Thr Glu Ala Glu Ala Val Pro Ala Gln Lys1105 1110 1115 1120Glu Arg Pro Pro Ala Pro Ser Ser Phe Val Phe Gln Glu Glu Thr Lys 1125 1130 1135Glu Gln Ser Lys Met Glu Asp Thr Leu Glu His Thr Asp Lys Glu Val 1140 1145 1150Ser Val Glu Thr Val Ser Ile Leu Ser Lys Thr Glu Gly Thr Gln Glu 1155 1160 1165Ala Asp Gln Tyr Ala Asp Glu Lys Thr Lys Asp Val Pro Phe Phe Glu 1170 1175 1180Gly Leu Glu Gly Ser Ile Asp Thr Gly Ile Thr Val Ser Arg Glu Lys1185 1190 1195 1200Val Thr Glu Val Ala Leu Lys Gly Glu Gly Thr Glu Glu Ala Glu Cys 1205 1210 1215Lys Lys Asp Asp Ala Leu Glu Leu Gln Ser His Ala Lys Ser Pro Pro 1220 1225 1230Ser Pro Val Glu Arg Glu Met Val Val Gln Val Glu Arg Glu Lys Thr 1235 1240 1245Glu Ala Glu Pro Thr His Val Asn Glu Glu Lys Leu Glu His Glu Thr 1250 1255 1260Ala Val Thr Val Ser Glu Glu Val Ser Lys Gln Leu Leu Gln Thr Val1265 1270 1275 1280Asn Val Pro Ile Ile Asp Gly Ala Lys Glu Val Ser Ser Leu Glu Gly 1285 1290 1295Ser Pro Pro Pro Cys Leu Gly Gln Glu Glu Ala Val Cys Thr Lys Ile 1300 1305 1310Gln Val Gln Ser Ser Glu Ala Ser Phe Thr Leu Thr Ala Ala Ala Glu 1315 1320 1325Glu Glu Lys Val Leu Gly Glu Thr Ala Asn Ile Leu Glu Thr Gly Glu 1330 1335 1340Thr Leu Glu Pro Ala Gly Ala His Leu Val Leu Glu Glu Lys Ser Ser1345 1350 1355 1360Glu Lys Asn Glu Asp Phe Ala Ala His Pro Gly Glu Asp Ala Val Pro 1365 1370 1375Thr Gly Pro Asp Cys Gln Ala Lys Ser Thr Pro Val Ile Val Ser Ala 1380 1385 1390Thr Thr Lys Lys Gly Leu Ser Ser Asp Leu Glu Gly Glu Lys Thr Thr 1395 1400 1405Ser Leu Lys Trp Lys Ser Asp Glu Val Asp Glu Gln Val Ala Cys Gln 1410 1415 1420Glu Val Lys Val Ser Val Ala Ile Glu Asp Leu Glu Pro Glu Asn Gly1425 1430 1435 1440Ile Leu Glu Leu Glu Thr Lys Ser Ser Lys Leu Val Gln Asn Ile Ile 1445 1450 1455Gln Thr Ala Val Asp Gln Phe Val Arg Thr Glu Glu Thr Ala Thr Glu 1460 1465 1470Met Leu Thr Ser Glu Leu Gln Thr Gln Ala His Val Ile Lys Ala Asp 1475 1480 1485Ser Gln Asp Ala Gly Gln Glu Thr Glu Lys Glu Gly Glu Glu Pro Leu 1490 1495 1500Ala Ser Ala Gln Asp Glu Thr Pro Ile Thr Ser Ala Lys Glu Glu Ser1505 1510 1515 1520Glu Ser Thr Ala Val Gly Gln Ala His Ser Asp Ile Ser Lys Asp Met 1525 1530 1535Ser Glu Ala Ser Glu Lys Thr Met Thr Val Glu Val Glu Gly Ser Thr 1540 1545 1550Val Asn Asp Gln Gln Leu Glu Glu Val Val Leu Pro Ser Glu Glu Glu 1555 1560 1565Gly Gly Gly Ala Gly Thr Lys Ser Val Pro Glu Asp Asp Gly His Ala 1570 1575 1580Leu Leu Ala Glu Arg Ile Glu Lys Ser Leu Val Glu Pro Lys Glu Asp1585 1590 1595 1600Glu Lys Gly Asp Asp Val Asp Asp Pro Glu Asn Gln Asn Ser Ala Leu 1605 1610 1615Ala Asp Thr Asp Ala Ser Gly Gly Leu Thr Lys Glu Ser Pro Asp Thr 1620 1625 1630Asn Gly Pro Lys Gln Lys Glu Lys Glu Asp Ala Gln Glu Val Glu Leu 1635 1640 1645Gln Glu Gly Lys Val His Ser Glu Ser Asp Lys Ala Ile Thr Pro Gln 1650 1655 1660Ala Gln Glu Glu Leu Gln Lys Gln Glu Arg Glu Ser Ala Lys Ser Glu1665 1670 1675 1680Leu Thr Glu Ser13213PRTHomo sapiens 13Met Ser Glu Thr Ala Pro Ala Ala Pro Ala Ala Ala Pro Pro Ala Glu1 5 10 15Lys Ala Pro Val Lys Lys Lys Ala Ala Lys Lys Ala Gly Gly Thr Pro 20 25 30Arg Lys Ala Ser Gly Pro Pro Val Ser Glu Leu Ile Thr Lys Ala Val 35 40 45Ala Ala Ser Lys Glu Arg Ser Gly Val Ser Leu Ala Ala Leu Lys Lys 50 55 60Ala Leu Ala Ala Ala Gly Tyr Asp Val Glu Lys Asn Asn Ser Arg Ile65 70 75 80Lys Leu Gly Leu Lys Ser Leu Val Ser Lys Gly Thr Leu Val Gln Thr 85 90 95Lys Gly Thr Gly Ala Ser Gly Ser Phe Lys Leu Asn Lys Lys Ala Ala 100 105 110Ser Gly Glu Ala Lys Pro Lys Val Lys Lys Ala Gly Gly Thr Lys Pro 115 120 125Lys Lys Pro Val Gly Ala Ala Lys Lys Pro Lys Lys Ala Ala Gly Gly 130 135 140Ala Thr Pro Lys Lys Ser Ala Lys Lys Thr Pro Lys Lys Ala Lys Lys145 150 155 160Pro Ala Ala Ala Thr Val Thr Lys Lys Val Ala Lys Ser Pro Lys Lys 165 170 175Ala Lys Val Ala Lys Pro Lys Lys Ala Ala Lys Ser Ala Ala Lys Ala 180 185 190Val Lys Pro Lys Ala Ala Lys Pro Lys Val Val Lys Pro Lys Lys Ala 195 200 205Ala Pro Lys Lys Lys 21014130PRTHomo sapiens 14Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys1 5 10 15Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His 20 25 30Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala 35 40 45Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu 50 55 60Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile65 70 75 80Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys 85 90 95Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile 100 105 110Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys 115 120 125Gly Lys 13015126PRTHomo sapiens 15Met Pro Glu Leu Ala Lys Ser Ala Pro Ala Pro Lys Lys Gly Ser Lys1 5 10 15Lys Ala Val Thr Lys Ala Gln Lys Lys Asp Gly Lys Lys Arg Lys Arg 20 25 30Ser Arg Lys Glu Ser Tyr Ser Val Tyr Val Tyr Lys Val Leu Lys Gln 35 40 45Val His Pro Asp Thr Gly Ile Ser Ser Lys Ala Met Gly Ile Met Asn 50 55 60Ser Phe Val Asn Asp Ile Phe Glu Arg Ile Ala Ser Glu Ala Ser Arg65 70 75 80Leu Ala His Tyr Asn Lys Arg Ser Thr Ile Thr Ser Arg Glu Ile Gln 85 90 95Thr Ala Val Arg Leu Leu Leu Pro Gly Glu Leu Ala Lys His Ala Val 100 105 110Ser Glu Gly Thr Lys Ala Val Thr Lys Tyr Thr Ser Ser Lys 115 120 12516483PRTHomo sapiens 16Met Ser Ile Arg Val Thr Gln Lys Ser Tyr Lys Val Ser Thr Ser Gly1 5 10 15Pro Arg Ala Phe Ser Ser Arg Ser Tyr Thr Ser Gly Pro Gly Ser Arg 20 25 30Ile Ser Ser Ser Ser Phe Ser Arg Val Gly Ser Ser Asn Phe Arg Gly 35 40 45Gly Leu Gly Gly Gly Tyr Gly Gly Ala Ser Gly Met Gly Gly Ile Thr 50 55 60Ala Val Thr Val Asn Gln Ser Leu Leu Ser Pro Leu Val Leu Glu Val65 70 75 80Asp Pro Asn Ile Gln Ala Val Arg Thr Gln Glu Lys Glu Gln Ile Lys 85 90 95Thr Leu Asn Asn Lys Phe Ala Ser Phe Ile Asp Lys Val Arg Phe Leu 100 105 110Glu Gln Gln Asn Lys Met Leu Glu Thr Lys Trp Ser Leu Leu Gln Gln 115 120 125Gln Lys Thr Ala Arg Ser Asn Met Asp Asn Met Phe Glu Ser Tyr Ile 130 135 140Asn Asn Leu Arg Arg Gln Leu Glu Thr Leu Gly Gln Glu Lys Leu Lys145 150 155 160Leu Glu Ala Glu Leu Gly Asn Met Gln Gly Leu Val Glu Asp Phe Lys 165 170 175Asn Lys Tyr Glu Asp Glu Ile Asn Lys Arg Thr Glu Met Glu Asn Glu 180 185 190Phe Val Leu Ile Lys Lys Asp Val Asp Glu Ala Tyr Met Asn Lys Val 195 200 205Glu Leu Glu Ser Arg Leu Glu Gly Leu Thr Asp Glu Ile Asn Phe Leu 210 215 220Arg Gln Leu Tyr Glu Glu Glu Ile Arg Glu Leu Gln Ser Gln Ile Ser225 230 235 240Asp Thr Ser Val Val Leu Ser Met Asp Asn Ser Arg Ser Leu Asp Met 245 250 255Asp Ser Ile Ile Ala Glu Val Lys Ala Gln Tyr Glu Asp Ile Ala Asn 260 265 270Arg Ser Arg Ala Glu Ala Glu Ser Met Tyr Gln Ile Lys Tyr Glu Glu 275 280 285Leu Gln Ser Leu Ala Gly Lys His Gly Asp Asp Leu Arg Arg Thr Lys 290 295 300Thr Glu Ile Ser Glu Met Asn Arg Asn Ile Ser Arg Leu Gln Ala Glu305 310 315

320Ile Glu Gly Leu Lys Gly Gln Arg Ala Ser Leu Glu Ala Ala Ile Ala 325 330 335Asp Ala Glu Gln Arg Gly Glu Leu Ala Ile Lys Asp Ala Asn Ala Lys 340 345 350Leu Ser Glu Leu Glu Ala Ala Leu Gln Arg Ala Lys Gln Asp Met Ala 355 360 365Arg Gln Leu Arg Glu Tyr Gln Glu Leu Met Asn Val Lys Leu Ala Leu 370 375 380Asp Ile Glu Ile Ala Thr Tyr Arg Lys Leu Leu Glu Gly Glu Glu Ser385 390 395 400Arg Leu Glu Ser Gly Met Gln Asn Met Ser Ile His Thr Lys Thr Thr 405 410 415Ser Gly Tyr Ala Gly Gly Leu Ser Ser Ala Tyr Gly Gly Leu Thr Ser 420 425 430Pro Gly Leu Ser Tyr Ser Leu Gly Ser Ser Phe Gly Ser Gly Ala Gly 435 440 445Ser Ser Ser Phe Ser Arg Thr Ser Ser Ser Arg Ala Val Val Val Lys 450 455 460Lys Ile Glu Thr Arg Asp Gly Lys Leu Val Ser Glu Ser Ser Asp Val465 470 475 480Leu Pro Lys172799PRTHomo sapiens 17Met Thr Ser Ile His Phe Val Val His Pro Leu Pro Gly Thr Glu Asp1 5 10 15Gln Leu Asn Asp Arg Leu Arg Glu Val Ser Glu Lys Leu Asn Lys Tyr 20 25 30Asn Leu Asn Ser His Pro Pro Leu Asn Val Leu Glu Gln Ala Thr Ile 35 40 45Lys Gln Cys Val Val Gly Pro Asn His Ala Ala Phe Leu Leu Glu Asp 50 55 60Gly Arg Val Cys Arg Ile Gly Phe Ser Val Gln Pro Asp Arg Leu Glu65 70 75 80Leu Gly Lys Pro Asp Asn Asn Asp Gly Ser Lys Leu Asn Ser Asn Ser 85 90 95Gly Ala Gly Arg Thr Ser Arg Pro Gly Arg Thr Ser Asp Ser Pro Trp 100 105 110Phe Leu Ser Gly Ser Glu Thr Leu Gly Arg Leu Ala Gly Asn Thr Leu 115 120 125Gly Ser Arg Trp Ser Ser Gly Val Gly Gly Ser Gly Gly Gly Ser Ser 130 135 140Gly Arg Ser Ser Ala Gly Ala Arg Asp Ser Arg Arg Gln Thr Arg Val145 150 155 160Ile Arg Thr Gly Arg Asp Arg Gly Ser Gly Leu Leu Gly Ser Gln Pro 165 170 175Gln Pro Val Ile Pro Ala Ser Val Ile Pro Glu Glu Leu Ile Ser Gln 180 185 190Ala Gln Val Val Leu Gln Gly Lys Ser Arg Ser Val Ile Ile Arg Glu 195 200 205Leu Gln Arg Thr Asn Leu Asp Val Asn Leu Ala Val Asn Asn Leu Leu 210 215 220Ser Arg Asp Asp Glu Asp Gly Asp Asp Gly Asp Asp Thr Ala Ser Glu225 230 235 240Ser Tyr Leu Pro Gly Glu Asp Leu Met Ser Leu Leu Asp Ala Asp Ile 245 250 255His Ser Ala His Pro Ser Val Ile Ile Asp Ala Asp Ala Met Phe Ser 260 265 270Glu Asp Ile Ser Tyr Phe Gly Tyr Pro Ser Phe Arg Arg Ser Ser Leu 275 280 285Ser Arg Leu Gly Ser Ser Arg Val Leu Leu Leu Pro Leu Glu Arg Asp 290 295 300Ser Glu Leu Leu Arg Glu Arg Glu Ser Val Leu Arg Leu Arg Glu Arg305 310 315 320Arg Trp Leu Asp Gly Ala Ser Phe Asp Asn Glu Arg Gly Ser Thr Ser 325 330 335Lys Glu Gly Glu Pro Asn Leu Asp Lys Lys Asn Thr Pro Val Gln Ser 340 345 350Pro Val Ser Leu Gly Glu Asp Leu Gln Trp Trp Pro Asp Lys Asp Gly 355 360 365Thr Lys Phe Ile Cys Ile Gly Ala Leu Tyr Ser Glu Leu Leu Ala Val 370 375 380Ser Ser Lys Gly Glu Leu Tyr Gln Trp Lys Trp Ser Glu Ser Glu Pro385 390 395 400Tyr Arg Asn Ala Gln Asn Pro Ser Leu His His Pro Arg Ala Thr Phe 405 410 415Leu Gly Leu Thr Asn Glu Lys Ile Val Leu Leu Ser Ala Asn Ser Ile 420 425 430Arg Ala Thr Val Ala Thr Glu Asn Asn Lys Val Ala Thr Trp Val Asp 435 440 445Glu Thr Leu Ser Ser Val Ala Ser Lys Leu Glu His Thr Ala Gln Thr 450 455 460Tyr Ser Glu Leu Gln Gly Glu Arg Ile Val Ser Leu His Cys Cys Ala465 470 475 480Leu Tyr Thr Cys Ala Gln Leu Glu Asn Ser Leu Tyr Trp Trp Gly Val 485 490 495Val Pro Phe Ser Gln Arg Lys Lys Met Leu Glu Lys Ala Arg Ala Lys 500 505 510Asn Lys Lys Pro Lys Ser Ser Ala Gly Ile Ser Ser Met Pro Asn Ile 515 520 525Thr Val Gly Thr Gln Val Cys Leu Arg Asn Asn Pro Leu Tyr His Ala 530 535 540Gly Ala Val Ala Phe Ser Ile Ser Ala Gly Ile Pro Lys Val Gly Val545 550 555 560Leu Met Glu Ser Val Trp Asn Met Asn Asp Ser Cys Arg Phe Gln Leu 565 570 575Arg Ser Pro Glu Ser Leu Lys Asn Met Glu Lys Ala Ser Lys Thr Thr 580 585 590Glu Ala Lys Pro Glu Ser Lys Gln Glu Pro Val Lys Thr Glu Met Gly 595 600 605Pro Pro Pro Ser Pro Ala Ser Thr Cys Ser Asp Ala Ser Ser Ile Ala 610 615 620Ser Ser Ala Ser Met Pro Tyr Lys Arg Arg Arg Ser Thr Pro Ala Pro625 630 635 640Lys Glu Glu Glu Lys Val Asn Glu Glu Gln Trp Ser Leu Arg Glu Val 645 650 655Val Phe Val Glu Asp Val Lys Asn Val Pro Val Gly Lys Val Leu Lys 660 665 670Val Asp Gly Ala Tyr Val Ala Val Lys Phe Pro Gly Thr Ser Ser Asn 675 680 685Thr Asn Cys Gln Asn Ser Ser Gly Pro Asp Ala Asp Pro Ser Ser Leu 690 695 700Leu Gln Asp Cys Arg Leu Leu Arg Ile Asp Glu Leu Gln Val Val Lys705 710 715 720Thr Gly Gly Thr Pro Lys Val Pro Asp Cys Phe Gln Arg Thr Pro Lys 725 730 735Lys Leu Cys Ile Pro Glu Lys Thr Glu Ile Leu Ala Val Asn Val Asp 740 745 750Ser Lys Gly Val His Ala Val Leu Lys Thr Gly Asn Trp Val Arg Tyr 755 760 765Cys Ile Phe Asp Leu Ala Thr Gly Lys Ala Glu Gln Glu Asn Asn Phe 770 775 780Pro Thr Ser Ser Ile Ala Phe Leu Gly Gln Asn Glu Arg Asn Val Ala785 790 795 800Ile Phe Thr Ala Gly Gln Glu Ser Pro Ile Ile Leu Arg Asp Gly Asn 805 810 815Gly Thr Ile Tyr Pro Met Ala Lys Asp Cys Met Gly Gly Ile Arg Asp 820 825 830Pro Asp Trp Leu Asp Leu Pro Pro Ile Ser Ser Leu Gly Met Gly Val 835 840 845His Ser Leu Ile Asn Leu Pro Ala Asn Ser Thr Ile Lys Lys Lys Ala 850 855 860Ala Val Ile Ile Met Ala Val Glu Lys Gln Thr Leu Met Gln His Ile865 870 875 880Leu Arg Cys Asp Tyr Glu Ala Cys Arg Gln Tyr Leu Met Asn Leu Glu 885 890 895Gln Ala Val Val Leu Glu Gln Asn Leu Gln Met Leu Gln Thr Phe Ile 900 905 910Ser His Arg Cys Asp Gly Asn Arg Asn Ile Leu His Ala Cys Val Ser 915 920 925Val Cys Phe Pro Thr Ser Asn Lys Glu Thr Lys Glu Glu Glu Glu Ala 930 935 940Glu Arg Ser Glu Arg Asn Thr Phe Ala Glu Arg Leu Ser Ala Val Glu945 950 955 960Ala Ile Ala Asn Ala Ile Ser Val Val Ser Ser Asn Gly Pro Gly Asn 965 970 975Arg Ala Gly Ser Ser Ser Ser Arg Ser Leu Arg Leu Arg Glu Met Met 980 985 990Arg Arg Ser Leu Arg Ala Ala Gly Leu Gly Arg His Glu Ala Gly Ala 995 1000 1005Ser Ser Ser Asp His Gln Asp Pro Val Ser Pro Pro Ile Ala Pro Pro 1010 1015 1020Ser Trp Val Pro Asp Pro Pro Ala Met Asp Pro Asp Gly Asp Ile Asp1025 1030 1035 1040Phe Ile Leu Ala Pro Ala Val Gly Ser Leu Thr Thr Ala Ala Thr Gly 1045 1050 1055Thr Gly Gln Gly Pro Ser Thr Ser Thr Ile Pro Gly Pro Ser Thr Glu 1060 1065 1070Pro Ser Val Val Glu Ser Lys Asp Arg Lys Ala Asn Ala His Phe Ile 1075 1080 1085Leu Lys Leu Leu Cys Asp Ser Val Val Leu Gln Pro Tyr Leu Arg Glu 1090 1095 1100Leu Leu Ser Ala Lys Asp Ala Arg Gly Met Thr Pro Phe Met Ser Ala1105 1110 1115 1120Val Ser Gly Arg Ala Tyr Pro Ala Ala Ile Thr Ile Leu Glu Thr Ala 1125 1130 1135Gln Lys Ile Ala Lys Ala Glu Ile Ser Ser Ser Glu Lys Glu Glu Asp 1140 1145 1150Val Phe Met Gly Met Val Cys Pro Ser Gly Thr Asn Pro Asp Asp Ser 1155 1160 1165Pro Leu Tyr Val Leu Cys Cys Asn Asp Thr Cys Ser Phe Thr Trp Thr 1170 1175 1180Gly Ala Glu His Ile Asn Gln Asp Ile Phe Glu Cys Arg Thr Cys Gly1185 1190 1195 1200Leu Leu Glu Ser Leu Cys Cys Cys Thr Glu Cys Ala Arg Val Cys His 1205 1210 1215Lys Gly His Asp Cys Lys Leu Lys Arg Thr Ser Pro Thr Ala Tyr Cys 1220 1225 1230Asp Cys Trp Glu Lys Cys Lys Cys Lys Thr Leu Ile Ala Gly Gln Lys 1235 1240 1245Ser Ala Arg Leu Asp Leu Leu Tyr Arg Leu Leu Thr Ala Thr Asn Leu 1250 1255 1260Val Thr Leu Pro Asn Ser Arg Gly Glu His Leu Leu Leu Phe Leu Val1265 1270 1275 1280Gln Thr Val Ala Arg Gln Thr Val Glu His Cys Gln Tyr Arg Pro Pro 1285 1290 1295Arg Ile Arg Glu Asp Arg Asn Arg Lys Thr Ala Ser Pro Glu Asp Ser 1300 1305 1310Asp Met Pro Asp His Asp Leu Glu Pro Pro Arg Phe Ala Gln Leu Ala 1315 1320 1325Leu Glu Arg Val Leu Gln Asp Trp Asn Ala Leu Lys Ser Met Ile Met 1330 1335 1340Phe Gly Ser Gln Glu Asn Lys Asp Pro Leu Ser Ala Ser Ser Arg Ile1345 1350 1355 1360Gly His Leu Leu Pro Glu Glu Gln Val Tyr Leu Asn Gln Gln Ser Gly 1365 1370 1375Thr Ile Arg Leu Asp Cys Phe Thr His Cys Leu Ile Val Lys Cys Thr 1380 1385 1390Ala Asp Ile Leu Leu Leu Asp Thr Leu Leu Gly Thr Leu Val Lys Glu 1395 1400 1405Leu Gln Asn Lys Tyr Thr Pro Gly Arg Arg Glu Glu Ala Ile Ala Val 1410 1415 1420Thr Met Arg Phe Leu Arg Ser Val Ala Arg Val Phe Val Ile Leu Ser1425 1430 1435 1440Val Glu Met Ala Ser Ser Lys Lys Lys Asn Asn Phe Ile Pro Gln Pro 1445 1450 1455Ile Gly Lys Cys Lys Arg Val Phe Gln Ala Leu Leu Pro Tyr Ala Val 1460 1465 1470Glu Glu Leu Cys Asn Val Ala Glu Ser Leu Ile Val Pro Val Arg Met 1475 1480 1485Gly Ile Ala Arg Pro Thr Ala Pro Phe Thr Leu Ala Ser Thr Ser Ile 1490 1495 1500Asp Ala Met Gln Gly Ser Glu Glu Leu Phe Ser Val Glu Pro Leu Pro1505 1510 1515 1520Pro Arg Pro Ser Ser Asp Gln Ser Ser Ser Ser Ser Gln Ser Gln Ser 1525 1530 1535Ser Tyr Ile Ile Arg Asn Pro Gln Gln Arg Arg Ile Ser Gln Ser Gln 1540 1545 1550Pro Val Arg Gly Arg Asp Glu Glu Gln Asp Asp Ile Val Ser Ala Asp 1555 1560 1565Val Glu Glu Val Glu Val Val Glu Gly Val Ala Gly Glu Glu Asp His 1570 1575 1580His Asp Glu Gln Glu Glu His Gly Glu Glu Asn Ala Glu Ala Glu Gly1585 1590 1595 1600Gln His Asp Glu His Asp Glu Asp Gly Ser Asp Met Glu Leu Asp Leu 1605 1610 1615Leu Ala Ala Ala Glu Thr Glu Ser Asp Ser Glu Ser Asn His Ser Asn 1620 1625 1630Gln Asp Asn Ala Ser Gly Arg Arg Ser Val Val Thr Ala Ala Thr Ala 1635 1640 1645Gly Ser Glu Ala Gly Ala Ser Ser Val Pro Ala Phe Phe Ser Glu Asp 1650 1655 1660Asp Ser Gln Ser Asn Asp Ser Ser Asp Ser Asp Ser Ser Ser Ser Gln1665 1670 1675 1680Ser Asp Asp Ile Glu Gln Glu Thr Phe Met Leu Asp Glu Pro Leu Glu 1685 1690 1695Arg Thr Thr Asn Ser Ser His Ala Asn Gly Ala Ala Gln Ala Pro Arg 1700 1705 1710Ser Met Gln Trp Ala Val Arg Asn Thr Gln His Gln Arg Ala Ala Ser 1715 1720 1725Thr Ala Pro Ser Ser Thr Ser Thr Pro Ala Ala Ser Ser Ala Gly Leu 1730 1735 1740Ile Tyr Ile Asp Pro Ser Asn Leu Arg Arg Ser Gly Thr Ile Ser Thr1745 1750 1755 1760Ser Ala Ala Ala Ala Ala Ala Ala Leu Glu Ala Ser Asn Ala Ser Ser 1765 1770 1775Tyr Leu Thr Ser Ala Ser Ser Leu Ala Arg Ala Tyr Ser Ile Val Ile 1780 1785 1790Arg Gln Ile Ser Asp Leu Met Gly Leu Ile Pro Lys Tyr Asn His Leu 1795 1800 1805Val Tyr Ser Gln Ile Pro Ala Ala Val Lys Leu Thr Tyr Gln Asp Ala 1810 1815 1820Val Asn Leu Gln Asn Tyr Val Glu Glu Lys Leu Ile Pro Thr Trp Asn1825 1830 1835 1840Trp Met Val Ser Ile Met Asp Ser Thr Glu Ala Gln Leu Arg Tyr Gly 1845 1850 1855Ser Ala Leu Ala Ser Ala Gly Asp Pro Gly His Pro Asn His Pro Leu 1860 1865 1870His Ala Ser Gln Asn Ser Ala Arg Arg Glu Arg Met Thr Ala Arg Glu 1875 1880 1885Glu Ala Ser Leu Arg Thr Leu Glu Gly Arg Arg Arg Ala Thr Leu Leu 1890 1895 1900Ser Ala Arg Gln Gly Met Met Ser Ala Arg Gly Asp Phe Leu Asn Tyr1905 1910 1915 1920Ala Leu Ser Leu Met Arg Ser His Asn Asp Glu His Ser Asp Val Leu 1925 1930 1935Pro Val Leu Asp Val Cys Ser Leu Lys His Val Ala Tyr Val Phe Gln 1940 1945 1950Ala Leu Ile Tyr Trp Ile Lys Ala Met Asn Gln Gln Thr Thr Leu Asp 1955 1960 1965Thr Pro Gln Leu Glu Arg Lys Arg Thr Arg Glu Leu Leu Glu Leu Gly 1970 1975 1980Ile Asp Asn Glu Asp Ser Glu His Glu Asn Asp Asp Asp Thr Asn Gln1985 1990 1995 2000Ser Ala Thr Leu Asn Asp Lys Asp Asp Asp Ser Leu Pro Ala Glu Thr 2005 2010 2015Gly Gln Asn His Pro Phe Phe Arg Arg Ser Asp Ser Met Thr Phe Leu 2020 2025 2030Gly Cys Ile Pro Pro Asn Pro Phe Glu Val Pro Leu Ala Glu Ala Ile 2035 2040 2045Pro Leu Ala Asp Gln Pro His Leu Leu Gln Pro Asn Ala Arg Lys Glu 2050 2055 2060Asp Leu Phe Gly Arg Pro Ser Gln Gly Leu Tyr Ser Ser Ser Ala Ser2065 2070 2075 2080Ser Gly Lys Cys Leu Met Glu Val Thr Val Asp Arg Asn Cys Leu Glu 2085 2090 2095Val Leu Pro Thr Lys Met Ser Tyr Ala Ala Asn Leu Lys Asn Val Met 2100 2105 2110Asn Met Gln Asn Arg Gln Lys Lys Glu Gly Glu Glu Gln Pro Val Leu 2115 2120 2125Pro Glu Glu Thr Glu Ser Ser Lys Pro Gly Pro Ser Ala His Asp Leu 2130 2135 2140Ala Ala Gln Leu Lys Ser Ser Leu Leu Ala Glu Ile Gly Leu Thr Glu2145 2150 2155 2160Ser Glu Gly Pro Pro Leu Thr Ser Phe Arg Pro Gln Cys Ser Phe Met 2165 2170 2175Gly Met Val Ile Ser His Asp Met Leu Leu Gly Arg Trp Arg Leu Ser 2180 2185 2190Leu Glu Leu Phe Gly Arg Val Phe Met Glu Asp Val Gly Ala Glu Pro 2195 2200 2205Gly Ser Ile Leu Thr Glu Leu Gly Gly Phe Glu Val Lys Glu Ser Lys 2210 2215 2220Phe Arg Arg Glu Met Glu Lys Leu Arg Asn Gln Gln Ser Arg Asp Leu2225 2230 2235 2240Ser Leu Glu Val Asp Arg Asp Arg Asp Leu Leu Ile Gln Gln Thr Met 2245 2250 2255Arg Gln Leu Asn Asn His Phe Gly Arg Arg Cys Ala Thr Thr Pro Met 2260 2265 2270Ala Val His Arg Val Lys Val Thr Phe Lys Asp Glu Pro Gly Glu Gly 2275 2280 2285Ser Gly Val Ala Arg Ser Phe Tyr Thr Ala Ile Ala Gln Ala Phe Leu 2290

2295 2300Ser Asn Glu Lys Leu Pro Asn Leu Glu Cys Ile Gln Asn Ala Asn Lys2305 2310 2315 2320Gly Thr His Thr Ser Leu Met Gln Arg Leu Arg Asn Arg Gly Glu Arg 2325 2330 2335Asp Arg Glu Arg Glu Arg Glu Arg Glu Met Arg Arg Ser Ser Gly Leu 2340 2345 2350Arg Ala Gly Ser Arg Arg Asp Arg Asp Arg Asp Phe Arg Arg Gln Leu 2355 2360 2365Ser Ile Asp Thr Arg Pro Phe Arg Pro Ala Ser Glu Gly Asn Pro Ser 2370 2375 2380Asp Asp Pro Glu Pro Leu Pro Ala His Arg Gln Ala Leu Gly Glu Arg2385 2390 2395 2400Leu Tyr Pro Arg Val Gln Ala Met Gln Pro Ala Phe Ala Ser Lys Ile 2405 2410 2415Thr Gly Met Leu Leu Glu Leu Ser Pro Ala Gln Leu Leu Leu Leu Leu 2420 2425 2430Ala Ser Glu Asp Ser Leu Arg Ala Arg Val Asp Glu Ala Met Glu Leu 2435 2440 2445Ile Ile Ala His Gly Arg Glu Asn Gly Ala Asp Ser Ile Leu Asp Leu 2450 2455 2460Gly Leu Val Asp Ser Ser Glu Lys Val Gln Gln Glu Asn Arg Lys Arg2465 2470 2475 2480His Gly Ser Ser Arg Ser Val Val Asp Met Asp Leu Asp Asp Thr Asp 2485 2490 2495Asp Gly Asp Asp Asn Ala Pro Leu Phe Tyr Gln Pro Gly Lys Arg Gly 2500 2505 2510Phe Tyr Thr Pro Arg Pro Gly Lys Asn Thr Glu Ala Arg Leu Asn Cys 2515 2520 2525Phe Arg Asn Ile Gly Arg Ile Leu Gly Leu Cys Leu Leu Gln Asn Glu 2530 2535 2540Leu Cys Pro Ile Thr Leu Asn Arg His Val Ile Lys Val Leu Leu Gly2545 2550 2555 2560Arg Lys Val Asn Trp His Asp Phe Ala Phe Phe Asp Pro Val Met Tyr 2565 2570 2575Glu Ser Leu Arg Gln Leu Ile Leu Ala Ser Gln Ser Ser Asp Ala Asp 2580 2585 2590Ala Val Phe Ser Ala Met Asp Leu Ala Phe Ala Ile Asp Leu Cys Lys 2595 2600 2605Glu Glu Gly Gly Gly Gln Val Glu Leu Ile Pro Asn Gly Val Asn Ile 2610 2615 2620Pro Val Thr Pro Gln Asn Val Tyr Glu Tyr Val Arg Lys Tyr Ala Glu2625 2630 2635 2640His Arg Met Leu Val Val Ala Glu Gln Pro Leu His Ala Met Arg Lys 2645 2650 2655Gly Leu Leu Asp Val Leu Pro Lys Asn Ser Leu Glu Asp Leu Thr Ala 2660 2665 2670Glu Asp Phe Arg Leu Leu Val Asn Gly Cys Gly Glu Val Asn Val Gln 2675 2680 2685Met Leu Ile Ser Phe Thr Ser Phe Asn Asp Glu Ser Gly Glu Asn Ala 2690 2695 2700Glu Lys Leu Leu Gln Phe Lys Arg Trp Phe Trp Ser Ile Val Glu Lys2705 2710 2715 2720Met Ser Met Thr Glu Arg Gln Asp Leu Val Tyr Phe Trp Thr Ser Ser 2725 2730 2735Pro Ser Leu Pro Ala Ser Glu Glu Gly Phe Gln Pro Met Pro Ser Ile 2740 2745 2750Thr Ile Arg Pro Pro Asp Asp Gln His Leu Pro Thr Ala Asn Thr Cys 2755 2760 2765Ile Ser Arg Leu Tyr Val Pro Leu Tyr Ser Ser Lys Gln Ile Leu Lys 2770 2775 2780Gln Lys Leu Leu Leu Ala Ile Lys Thr Lys Asn Phe Gly Phe Val2785 2790 279518130PRTHomo sapiens 18Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys1 5 10 15Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His 20 25 30Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala 35 40 45Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu 50 55 60Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile65 70 75 80Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys 85 90 95Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile 100 105 110Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys 115 120 125Gly Lys 13019693PRTHomo sapiens 19Met Ala Ala Leu Gly Val Gln Ser Ile Asn Trp Gln Thr Ala Phe Asn1 5 10 15Arg Gln Ala His His Thr Asp Lys Phe Ser Ser Gln Glu Leu Ile Leu 20 25 30Arg Arg Gly Gln Asn Phe Gln Val Leu Met Ile Met Asn Lys Gly Leu 35 40 45Gly Ser Asn Glu Arg Leu Glu Phe Ile Val Ser Thr Gly Pro Tyr Pro 50 55 60Ser Glu Ser Ala Met Thr Lys Ala Val Phe Pro Leu Ser Asn Gly Ser65 70 75 80Ser Gly Gly Trp Ser Ala Val Leu Gln Ala Ser Asn Gly Asn Thr Leu 85 90 95Thr Ile Ser Ile Ser Ser Pro Ala Ser Ala Pro Ile Gly Arg Tyr Thr 100 105 110Met Ala Leu Gln Ile Phe Ser Gln Gly Gly Ile Ser Ser Val Lys Leu 115 120 125Gly Thr Phe Ile Leu Leu Phe Asn Pro Trp Leu Asn Val Asp Ser Val 130 135 140Phe Met Gly Asn His Ala Glu Arg Glu Glu Tyr Val Gln Glu Asp Ala145 150 155 160Gly Ile Ile Phe Val Gly Ser Thr Asn Arg Ile Gly Met Ile Gly Trp 165 170 175Asn Phe Gly Gln Phe Glu Glu Asp Ile Leu Ser Ile Cys Leu Ser Ile 180 185 190Leu Asp Arg Ser Leu Asn Phe Arg Arg Asp Ala Ala Thr Asp Val Ala 195 200 205Ser Arg Asn Asp Pro Lys Tyr Val Gly Arg Val Leu Ser Ala Met Ile 210 215 220Asn Ser Asn Asp Asp Asn Gly Val Leu Ala Gly Asn Trp Ser Gly Thr225 230 235 240Tyr Thr Gly Gly Arg Asp Pro Arg Ser Trp Asn Gly Ser Val Glu Ile 245 250 255Leu Lys Asn Trp Lys Lys Ser Gly Phe Ser Pro Val Arg Tyr Gly Gln 260 265 270Cys Trp Val Phe Ala Gly Thr Leu Asn Thr Ala Leu Arg Ser Leu Gly 275 280 285Ile Pro Ser Arg Val Ile Thr Asn Phe Asn Ser Ala His Asp Thr Asp 290 295 300Arg Asn Leu Ser Val Asp Val Tyr Tyr Asp Pro Met Gly Asn Pro Leu305 310 315 320Asp Lys Gly Ser Asp Ser Val Trp Asn Phe His Val Trp Asn Glu Gly 325 330 335Trp Phe Val Arg Ser Asp Leu Gly Pro Ser Tyr Gly Gly Trp Gln Val 340 345 350Leu Asp Ala Thr Pro Gln Glu Arg Ser Gln Gly Val Phe Gln Cys Gly 355 360 365Pro Ala Ser Val Ile Gly Val Arg Glu Gly Asp Val Gln Leu Asn Phe 370 375 380Asp Met Pro Phe Ile Phe Ala Glu Val Asn Ala Asp Arg Ile Thr Trp385 390 395 400Leu Tyr Asp Asn Thr Thr Gly Lys Gln Trp Lys Asn Ser Val Asn Ser 405 410 415His Thr Ile Gly Arg Tyr Ile Ser Thr Lys Ala Val Gly Ser Asn Ala 420 425 430Arg Met Asp Val Thr Asp Lys Tyr Lys Tyr Pro Glu Gly Ser Asp Gln 435 440 445Glu Arg Gln Val Phe Gln Lys Ala Leu Gly Lys Leu Lys Pro Asn Thr 450 455 460Pro Phe Ala Ala Thr Ser Ser Met Gly Leu Glu Thr Glu Glu Gln Glu465 470 475 480Pro Ser Ile Ile Gly Lys Leu Lys Val Ala Gly Met Leu Ala Val Gly 485 490 495Lys Glu Val Asn Leu Val Leu Leu Leu Lys Asn Leu Ser Arg Asp Thr 500 505 510Lys Thr Val Thr Val Asn Met Thr Ala Trp Thr Ile Ile Tyr Asn Gly 515 520 525Thr Leu Val His Glu Val Trp Lys Asp Ser Ala Thr Met Ser Leu Asp 530 535 540Pro Glu Glu Glu Ala Glu His Pro Ile Lys Ile Ser Tyr Ala Gln Tyr545 550 555 560Glu Lys Tyr Leu Lys Ser Asp Asn Met Ile Arg Ile Thr Ala Val Cys 565 570 575Lys Val Pro Asp Glu Ser Glu Val Val Val Glu Arg Asp Ile Ile Leu 580 585 590Asp Asn Pro Thr Leu Thr Leu Glu Val Leu Asn Glu Ala Arg Val Arg 595 600 605Lys Pro Val Asn Val Gln Met Leu Phe Ser Asn Pro Leu Asp Glu Pro 610 615 620Val Arg Asp Cys Val Leu Met Val Glu Gly Ser Gly Leu Leu Leu Gly625 630 635 640Asn Leu Lys Ile Asp Val Pro Thr Leu Gly Pro Lys Glu Gly Ser Arg 645 650 655Val Arg Phe Asp Ile Leu Pro Ser Arg Ser Gly Thr Lys Gln Leu Leu 660 665 670Ala Asp Phe Ser Cys Asn Lys Phe Pro Ala Ile Lys Ala Met Leu Ser 675 680 685Ile Asp Val Ala Glu 69020254PRTHomo sapiens 20Met Ala Ile Ser Gly Val Pro Val Leu Gly Phe Phe Ile Ile Ala Val1 5 10 15Leu Met Ser Ala Gln Glu Ser Trp Ala Ile Lys Glu Glu His Val Ile 20 25 30Ile Gln Ala Glu Phe Tyr Leu Asn Pro Asp Gln Ser Gly Glu Phe Met 35 40 45Phe Asp Phe Asp Gly Asp Glu Ile Phe His Val Asp Met Ala Lys Lys 50 55 60Glu Thr Val Trp Arg Leu Glu Glu Phe Gly Arg Phe Ala Ser Phe Glu65 70 75 80Ala Gln Gly Ala Leu Ala Asn Ile Ala Val Asp Lys Ala Asn Leu Glu 85 90 95Ile Met Thr Lys Arg Ser Asn Tyr Thr Pro Ile Thr Asn Val Pro Pro 100 105 110Glu Val Thr Val Leu Thr Asn Ser Pro Val Glu Leu Arg Glu Pro Asn 115 120 125Val Leu Ile Cys Phe Ile Asp Lys Phe Thr Pro Pro Val Val Asn Val 130 135 140Thr Trp Leu Arg Asn Gly Lys Pro Val Thr Thr Gly Val Ser Glu Thr145 150 155 160Val Phe Leu Pro Arg Glu Asp His Leu Phe Arg Lys Phe His Tyr Leu 165 170 175Pro Phe Leu Pro Ser Thr Glu Asp Val Tyr Asp Cys Arg Val Glu His 180 185 190Trp Gly Leu Asp Glu Pro Leu Leu Lys His Trp Glu Phe Asp Ala Pro 195 200 205Ser Pro Leu Pro Glu Thr Thr Glu Asn Val Val Cys Ala Leu Gly Leu 210 215 220Thr Val Gly Leu Val Gly Ile Ile Ile Gly Thr Ile Phe Ile Ile Lys225 230 235 240Gly Leu Arg Lys Ser Asn Ala Ala Glu Arg Arg Gly Pro Leu 245 250211050PRTHomo sapiens 21Met Ala Ala Val Lys Lys Glu Gly Gly Ala Leu Ser Glu Ala Met Ser1 5 10 15Leu Glu Gly Asp Glu Trp Glu Leu Ser Lys Glu Asn Val Gln Pro Leu 20 25 30Arg Gln Gly Arg Ile Met Ser Thr Leu Gln Gly Ala Leu Ala Gln Glu 35 40 45Ser Ala Cys Asn Asn Thr Leu Gln Gln Gln Lys Arg Ala Phe Glu Tyr 50 55 60Glu Ile Arg Phe Tyr Thr Gly Asn Asp Pro Leu Asp Val Trp Asp Arg65 70 75 80Tyr Ile Ser Trp Thr Glu Gln Asn Tyr Pro Gln Gly Gly Lys Glu Ser 85 90 95Asn Met Ser Thr Leu Leu Glu Arg Ala Val Glu Ala Leu Gln Gly Glu 100 105 110Lys Arg Tyr Tyr Ser Asp Pro Arg Phe Leu Asn Leu Trp Leu Lys Leu 115 120 125Gly Arg Leu Cys Asn Glu Pro Leu Asp Met Tyr Ser Tyr Leu His Asn 130 135 140Gln Gly Ile Gly Val Ser Leu Ala Gln Phe Tyr Ile Ser Trp Ala Glu145 150 155 160Glu Tyr Glu Ala Arg Glu Asn Phe Arg Lys Ala Asp Ala Ile Phe Gln 165 170 175Glu Gly Ile Gln Gln Lys Ala Glu Pro Leu Glu Arg Leu Gln Ser Gln 180 185 190His Arg Gln Phe Gln Ala Arg Val Ser Arg Gln Thr Leu Leu Ala Leu 195 200 205Glu Lys Glu Glu Glu Glu Glu Val Phe Glu Ser Ser Val Pro Gln Arg 210 215 220Ser Thr Leu Ala Glu Leu Lys Ser Lys Gly Lys Lys Thr Ala Arg Ala225 230 235 240Pro Ile Ile Arg Val Gly Gly Ala Leu Lys Ala Pro Ser Gln Asn Arg 245 250 255Gly Leu Gln Asn Pro Phe Pro Gln Gln Met Gln Asn Asn Ser Arg Ile 260 265 270Thr Val Phe Asp Glu Asn Ala Asp Glu Ala Ser Thr Ala Glu Leu Ser 275 280 285Lys Pro Thr Val Gln Pro Trp Ile Ala Pro Pro Met Pro Arg Ala Lys 290 295 300Glu Asn Glu Leu Gln Ala Gly Pro Trp Asn Thr Gly Arg Ser Leu Glu305 310 315 320His Arg Pro Arg Gly Asn Thr Ala Ser Leu Ile Ala Val Pro Ala Val 325 330 335Leu Pro Ser Phe Thr Pro Tyr Val Glu Glu Thr Ala Arg Gln Pro Val 340 345 350Met Thr Pro Cys Lys Ile Glu Pro Ser Ile Asn His Ile Leu Ser Thr 355 360 365Arg Lys Pro Gly Lys Glu Glu Gly Asp Pro Leu Gln Arg Val Gln Ser 370 375 380His Gln Gln Ala Ser Glu Glu Lys Lys Glu Lys Met Met Tyr Cys Lys385 390 395 400Glu Lys Ile Tyr Ala Gly Val Gly Glu Phe Ser Phe Glu Glu Ile Arg 405 410 415Ala Glu Val Phe Arg Lys Lys Leu Lys Glu Gln Arg Glu Ala Glu Leu 420 425 430Leu Thr Ser Ala Glu Lys Arg Ala Glu Met Gln Lys Gln Ile Glu Glu 435 440 445Met Glu Lys Lys Leu Lys Glu Ile Gln Thr Thr Gln Gln Glu Arg Thr 450 455 460Gly Asp Gln Gln Glu Glu Thr Met Pro Thr Lys Glu Thr Thr Lys Leu465 470 475 480Gln Ile Ala Ser Glu Ser Gln Lys Ile Pro Gly Met Thr Leu Ser Ser 485 490 495Ser Val Cys Gln Val Asn Cys Cys Ala Arg Glu Thr Ser Leu Ala Glu 500 505 510Asn Ile Trp Gln Glu Gln Pro His Ser Lys Gly Pro Ser Val Pro Phe 515 520 525Ser Ile Phe Asp Glu Phe Leu Leu Ser Glu Lys Lys Asn Lys Ser Pro 530 535 540Pro Ala Asp Pro Pro Arg Val Leu Ala Gln Arg Arg Pro Leu Ala Val545 550 555 560Leu Lys Thr Ser Glu Ser Ile Thr Ser Asn Glu Asp Val Ser Pro Asp 565 570 575Val Cys Asp Glu Phe Thr Gly Ile Glu Pro Leu Ser Glu Asp Ala Ile 580 585 590Ile Thr Gly Phe Arg Asn Val Thr Ile Cys Pro Asn Pro Glu Asp Thr 595 600 605Cys Asp Phe Ala Arg Ala Ala Arg Phe Val Ser Thr Pro Phe His Glu 610 615 620Ile Met Ser Leu Lys Asp Leu Pro Ser Asp Pro Glu Arg Leu Leu Pro625 630 635 640Glu Glu Asp Leu Asp Val Lys Thr Ser Glu Asp Gln Gln Thr Ala Cys 645 650 655Gly Thr Ile Tyr Ser Gln Thr Leu Ser Ile Lys Lys Leu Ser Pro Ile 660 665 670Ile Glu Asp Ser Arg Glu Ala Thr His Ser Ser Gly Phe Ser Gly Ser 675 680 685Ser Ala Ser Val Ala Ser Thr Ser Ser Ile Lys Cys Leu Gln Ile Pro 690 695 700Glu Lys Leu Glu Leu Thr Asn Glu Thr Ser Glu Asn Pro Thr Gln Ser705 710 715 720Pro Trp Cys Ser Gln Tyr Arg Arg Gln Leu Leu Lys Ser Leu Pro Glu 725 730 735Leu Ser Ala Ser Ala Glu Leu Cys Ile Glu Asp Arg Pro Met Pro Lys 740 745 750Leu Glu Ile Glu Lys Glu Ile Glu Leu Gly Asn Glu Asp Tyr Cys Ile 755 760 765Lys Arg Glu Tyr Leu Ile Cys Glu Asp Tyr Lys Leu Phe Trp Val Ala 770 775 780Pro Arg Asn Ser Ala Glu Leu Thr Val Ile Lys Val Ser Ser Gln Pro785 790 795 800Val Pro Trp Asp Phe Tyr Ile Asn Leu Lys Leu Lys Glu Arg Leu Asn 805 810 815Glu Asp Phe Asp His Phe Cys Ser Cys Tyr Gln Tyr Gln Asp Gly Cys 820 825 830Ile Val Trp His Gln Tyr Ile Asn Cys Phe Thr Leu Gln Asp Leu Leu 835 840 845Gln His Ser Glu Tyr Ile Thr His Glu Ile Thr Val Leu Ile Ile Tyr 850 855 860Asn Leu Leu Thr Ile Val Glu Met Leu His Lys Ala Glu Ile Val His865 870 875 880Gly Asp Leu Ser Pro Arg Cys Leu Ile Leu Arg Asn Arg Ile His Asp

885 890 895Pro Tyr Asp Cys Asn Lys Asn Asn Gln Ala Leu Lys Ile Val Asp Phe 900 905 910Ser Tyr Ser Val Asp Leu Arg Val Gln Leu Asp Val Phe Thr Leu Ser 915 920 925Gly Phe Arg Thr Val Gln Ile Leu Glu Gly Gln Lys Ile Leu Ala Asn 930 935 940Cys Ser Ser Pro Tyr Gln Val Asp Leu Phe Gly Ile Ala Asp Leu Ala945 950 955 960His Leu Leu Leu Phe Lys Glu His Leu Gln Val Phe Trp Asp Gly Ser 965 970 975Phe Trp Lys Leu Ser Gln Asn Ile Ser Glu Leu Lys Asp Gly Glu Leu 980 985 990Trp Asn Lys Phe Phe Val Arg Ile Leu Asn Ala Asn Asp Glu Ala Thr 995 1000 1005Val Ser Val Leu Gly Glu Leu Ala Ala Glu Met Asn Gly Val Phe Asp 1010 1015 1020Thr Thr Phe Gln Ser His Leu Asn Lys Ala Leu Trp Lys Val Gly Lys1025 1030 1035 1040Leu Thr Ser Pro Gly Ala Leu Leu Phe Gln 1045 105022190PRTHomo sapiensVARIANT(40)...(40)Selenocysteine 22Met Ala Phe Ile Ala Lys Ser Phe Tyr Asp Leu Ser Ala Ile Ser Leu1 5 10 15Asp Gly Glu Lys Val Asp Phe Asn Thr Phe Arg Gly Arg Ala Val Leu 20 25 30Ile Glu Asn Val Ala Ser Leu Cys Gly Thr Thr Thr Arg Asp Phe Thr 35 40 45Gln Leu Asn Glu Leu Gln Cys Arg Phe Pro Arg Arg Leu Val Val Leu 50 55 60Gly Phe Pro Cys Asn Gln Phe Gly His Gln Glu Asn Cys Gln Asn Glu65 70 75 80Glu Ile Leu Asn Ser Leu Lys Tyr Val Arg Pro Gly Gly Gly Tyr Gln 85 90 95Pro Thr Phe Thr Leu Val Gln Lys Cys Glu Val Asn Gly Gln Asn Glu 100 105 110His Pro Val Phe Ala Tyr Leu Lys Asp Lys Leu Pro Tyr Pro Tyr Asp 115 120 125Asp Pro Phe Ser Leu Met Thr Asp Pro Lys Leu Ile Ile Trp Ser Pro 130 135 140Val Arg Arg Ser Asp Val Ala Trp Asn Phe Glu Lys Phe Leu Ile Gly145 150 155 160Pro Glu Gly Glu Pro Phe Arg Arg Tyr Ser Arg Thr Phe Pro Thr Ile 165 170 175Asn Ile Glu Pro Asp Ile Lys Arg Leu Leu Lys Val Ala Ile 180 185 190

Patent applications by Florin M. Selaru, Baltimore, MD US

Patent applications by Stephen J. Meltzer, Lutherville, MD US

Patent applications by THE JOHNS HOPKINS UNIVERSITY

Patent applications by University of Maryland, Baltimore

Patent applications in class By measuring the ability to specifically bind a target molecule (e.g., antibody-antigen binding, receptor-ligand binding, etc.)

Patent applications in all subclasses By measuring the ability to specifically bind a target molecule (e.g., antibody-antigen binding, receptor-ligand binding, etc.)

User Contributions:

Comment about this patent or add new information about this topic:

Images included with this patent application:

Date	Title
Similar patent applications:
2014-07-31	Head and neck cancer biomarkers
2014-08-07	Methods and compositions for the treatment and diagnosis of colorectal cancer
2009-03-19	Adrb2 cancer markers
2012-02-23	Adrb2 cancer markers
2013-05-09	Lung cancer biomarkers and uses thereof

Date	Title
New patent applications in this class:
2022-05-05	Microfluidic system for amplifying and detecting polynucleotides in parallel
2019-05-16	Reagents and methods for detecting protein lysine 2-hydroxyisobutyrylation
2019-05-16	Lateral flow analyte detection
2019-05-16	Mutations in the bcr-abl tyrosine kinase associated with resistance to sti-571
2019-05-16	Enhanced methods of ribonucleic acid hybridization

Date	Title
New patent applications from these inventors:
2015-10-15	Methods and compositions for diagnosing and treating gastric cancer
2015-05-14	Methods and compositions useful for diagnosing inflammatory bowel disease-associated neoplasia
2014-01-30	Serum-based mirna microarray and its use in diagnosis and treatment of barrett's esophagus (be) and esophageal adenocarcinoma (eac)
2014-01-30	Circulating micrornas are biomarkers of various diseases
2013-08-15	Small peptides specifically bind to colorectal cancers

Rank	Inventor's name
Top Inventors for class "Combinatorial chemistry technology: method, library, apparatus"
1	Mehdi Azimi
2	Kia Silverbrook
3	Geoffrey Richard Facer
4	Alireza Moini
5	William Marshall

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Patent application title: Esophageal Cancer Markers

Abstract:

Claims:

Description: