Patent application title: Esophageal Cancer Markers
Inventors:
Florin M. Selaru (Baltimore, MD, US)
Stephen J. Meltzer (Lutherville, MD, US)
Stephen J. Meltzer (Lutherville, MD, US)
Assignees:
University of Maryland, Baltimore
THE JOHNS HOPKINS UNIVERSITY
IPC8 Class: AC12Q168FI
USPC Class:
506 9
Class name: Combinatorial chemistry technology: method, library, apparatus method of screening a library by measuring the ability to specifically bind a target molecule (e.g., antibody-antigen binding, receptor-ligand binding, etc.)
Publication date: 2014-07-24
Patent application number: 20140206565
Abstract:
The present invention is directed to methods for diagnosing cancer in a
subject. Morphologically normal epithelial cells of the esophagus are
assayed for marker expression. Characteristic expression of the markers
indicates the presence of cancer or the predisposition to cancer. A panel
of eleven markers are particularly good at identifying cancer and the
predisposition to cancer.Claims:
1. A method of determining presence or predisposition to esophageal
cancer in a human subject, comprising: determining in a sample of
morphologically normal esophageal epithelial cells of a human subject
expression of one or more genes; determining a composite score of
expression of the one or more genes; comparing the composite score to
predetermined values for esophageal cancer or predisposition to
esophageal cancer; identifying presence or predisposition to esophageal
cancer based on the composite score.
2. A method of determining presence or predisposition to esophageal cancer in a human subject, comprising: determining in a sample of esophageal epithelial cells of a human subject expression of one or more genes selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member 0 (H2AFO); transglutaminase 3 (TGF3); major histocopatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB1B (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2); determining a composite score of expression of the one or more genes; comparing the composite score to predetermined values for esophageal cancer or predisposition to esophageal cancer; identifying presence or predisposition to esophageal cancer based on the composite score.
3. The method of claim 2 wherein the sample comprises morphologically normal esophageal epithelial cells.
4. The method of claim 1 wherein mRNA expression of the one or more genes is determined.
5. The method of claim 1 wherein protein expression of the one or more genes is determined.
6. The method of claim 2 wherein a composite score of expression of eleven genes is determined.
7. The method of claim 2 wherein a composite score of expression of ten genes is determined.
8. The method of claim 2 wherein a composite score of expression of nine genes is determined.
9. The method of claim 2 wherein a composite score of expression of eight genes is determined.
10. The method of claim 2 wherein a composite score of expression of seven genes is determined.
11. The method of claim 2 wherein a composite score of expression of six genes is determined.
12. The method of claim 2 wherein a composite score of expression of five genes is determined.
13. The method of claim 1 wherein a chemopreventive diet is recommended when presence or predisposition to esophageal cancer is identified.
14. The method of claim 1 wherein the sample is obtained by endoscopy.
15. The method of claim 1 wherein the sample is obtained by an inflatable balloon.
16. The method of claim 1 wherein the sample is obtained by a sponge.
17. The method of claim 1 wherein the human subject has diagnosed Barrett's esophagus.
18. A solid support for determining esophageal epithelium expression, comprising: antibodies or oligonucleotide probes for interrogating expression of at least six genes selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member O (H2AFO); transglutaminase 3 (TGF3); major histocompatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB1B (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2); wherein the antibodies or oligonucleotide probes for the at least six genes from the group comprise at least 50% of the antibodies or oligonucleotide probes on the solid support.
19. The solid support of claim 18 wherein the antibodies or oligonucleotide probes for the at least six genes from the group comprise at least 75% of the antibodies or oligonucleotide probes on the solid support.
20. The solid support of claim 18 wherein the antibodies or oligonucleotide probes for the at least six genes from the group comprise at least 90% of the antibodies or oligonucleotide probes on the solid support.
Description:
BACKGROUND OF THE INVENTION
[0002] One of the greatest challenges in the management of Barrett's esophagus (BA), the precursor lesion of esophageal adenocarcinoma (EAC), is to expeditiously identify patients who have early EAC and to predict those who will develop EAC. The rate of progression to cancer (0.4-0.5% per year in some studies, 0.5 to 1% per year in other studies) is very low, making this challenge particularly difficult (Reynolds et al., Gastroenterol Clin North Am 28(4):917-45 (1999); Cameron, Gastroenterol Clin North Am 26(3):487-94 (1997)). Moreover, in the surveillance of BA, a meticulous endoscopic search is often performed to identify grossly normal-appearing dysplastic or cancerous lesions. However, the value of this type of systematic surveillance has been questioned, due to its low sensitivity and specificity (Conio et al., Am J Gastroenterol 98(9):1931-9 (2003)). Thus, from a purely practical standpoint, it would be advantageous to be able to identify patients with malignant esophageal lesions simply by biopsying their normal squamous esophagus.
[0003] The presence and degree of dysplasia constitute the most widely accepted measure of neoplastic risk in Barrett's esophagus. However, significant problems have emerged demonstrating the need for improved progression risk biomarkers. These problems include poor interobserver reproducibility of dysplasia interpretation and inconsistent rates of progression as well as regression of dysplasia, both of which have made it difficult to develop national surveillance guidelines (Conio et al., Am J Gastroenterol 98(9):1931-9 (2003); Rana et al., Dis Esophagus 13(1):28-31 (2000); Reid et al., Am J Gastroenterol 95(7):1669-76 (2000)). Flow cytometry has shown promise in detecting a subset of patients who do not have high-grade dysplasia (HGD) but do have an increased risk of progression (Reid et al., Am J Gastroenterol 95(7):1669-76 (2000)).
[0004] The human genome project has yielded high-throughput methodologies for the computer analysis of data, which provide volume and quality control required to select clinically useful biomarkers (Taramelli et al., Eur J Cancer 40(17):2537-43 (2004); Varmus et al., Science 310(5754):1615 (2005); Yoshida, Jpn J Clin Oncol 29(10):457-9 (1999)). 17p (p53)-loss of heterozygosity (LOH) has also shown potential as a molecular biomarker (Reid et al., Gastrointest Endosc Clin N Am 13(2):369-97 (2003)). In addition, methylation of p16 and HPP1 have been shown to predict progression to HGD and EAC (Hardie et al., Cancer Lett 217(2):221-30 (2005); Geddert et al., Int J Cancer 110(2):208-11 (2004); Schulmann et al., Oncogene 24(25):4138-48 (2005)). Molecular alterations have been found in Barrett's metaplasia which reveal a field effect in premalignant metaplastic mucosa, but not in normal epithelium. For example, aneuploidy and loss of heterozygosity have been observed in metaplastic mucosa from Barrett's patients with dysplasia or adenocarcinoma (Blount et al., Proc Natl Acad Sci USA 90(8):3221-5 (1993); Boynton et al., Cancer Res 51(20):5766-9 (1991); Raskind et al., Cancer Res 52(10):2946-50 (1992); Reid et al., Gastroenterology 93(1):1-11 (1987)). Similarly, p53 tumor suppressor gene point mutations have been reported in Barrett's metaplasia (Casson et al., Am J Surg 167(1):52-7 (1994); Huang et al., Cancer Res 53(8):1889-94 (1993); Meltzer et al., Proc Natl Acad Sci USA 88(11):4976-80 (1991)), and altered promoter DNA methylation has also been described for some tumor suppressor genes in Barrett's esophagus (Eads et al., Cancer Res 60(18):5021-6 (2000); Kawakami et al., J Natl Cancer Inst 92(22):1805-11 (2000); Klump et al., Gastroenterology 115:1381-6 (1998); Wong et al., Cancer Res 57(13):2619-22 (1997)).
[0005] In contrast, most published studies to date report no DNA alterations (e.g., point mutations, methylation, or loss of heterozygosity) in normal squamous esophageal epithelium from patients with esophageal cancer. Corn et al. (Clinical Cancer Research 7(9):2765-9 (2001)) reported E-cadherin methylation in Barrett's esophagus specimens and esophageal adenocarcinoma, but not in normal esophageal epithelium. Another study showed that the expression of a panel of 23 genes capable of differentiating between Barrett's esophagus and esophageal adenocarcinoma was unable to distinguish between the normal epithelia of Barrett's metaplasia and adenocarcinoma patients (Brabender et al., Oncogene 23(27):4780-8 (2004)). One notable exception was the study by Eads et al., which found methylation of the CALCA, MGMT, and TIMP3 genes in the normal esophagus of a subset of patients with Barrett's-associated esophageal dysplasia and adenocarcinoma (Eads et al., Cancer Res 61(8):3410-8 (2001).
[0006] cDNA microarrays promise more accurate prediction than do classical clinical diagnostic tools (such as histologic categorization). However, the main challenge posed by microarrays is to construct meaningful classifiers based on gene expression profiles, using appropriate bioinformatics tools. A number of bioinformatics tools have been proposed, including artificial neural networks (Selaru et al., Gastroenterology 122(3):606-13 (2002)), hierarchical clustering (Selaru et al., Oncogene 21(3):475-8 (2002); Zou et al., Oncogene 21(31):4855-62 (2002)) and principal components analysis (Mori et al., Cancer Res 63(15):4577-82 (2003); Selaru et al., Cancer Res 64:1584-88 (2004)). Shrunken nearest centroid predictors (SNCPs) were adapted from classical nearest centroids predictors to gene microarray analysis (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)). From among large numbers of genes, it is difficult to distinguish expression variations due to chance. However, these variations tend to be of small amplitude. Thus, if small variations are ignored and only consistently relatively high changes in expression are accepted, biologic changes prevail over variations due to chance. Among the mathematical means used to ignore small variations, one method, SNCPs, is particularly valuable. Prediction Analysis of Microarrays (PAM) is a software package developed at Stanford University that utilizes SNCPs and performs internal validation simultaneously. Samples are divided up at random into K roughly equal-sized parts. For each part in turn, the classifier is built on the remaining K-1 parts, then tested on the last 1 part. This procedure is performed over a range of threshold values, and the cross-validated misclassification error rate is reported for each threshold value. Typically, the user chooses the threshold value giving the minimum cross-validated misclassification error rate. This method has been utilized successfully by investigators studying leukemia and breast cancer to find subsets of genes that accurately predicted classifications of these diseases (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002); Korkola et al., Cancer Res 63(21):7167-75 (2003); Sorlie et al., Proc Natl Acad Sci USA 100(14):8418-23 (2003)).
SUMMARY OF THE INVENTION
[0007] Diagnosing Cancer--Different Subjects
[0008] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0009] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0010] The present invention is further directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0011] Diagnosing Cancer--Same Subject
[0012] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0013] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0014] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0015] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained are separated by a distance of at least 3 cm in said subject.
[0016] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.
[0017] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.
[0018] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained have a grossly different appearance.
[0019] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.
[0020] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.
[0021] Detecting Differential Gene Expression
[0022] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0023] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0024] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0025] Diagnosing Cancer Using Markers
[0026] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining an expression pattern of one or more genes in a biological sample from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0027] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0028] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0029] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.
[0030] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.
TABLE-US-00001 TABLE 1 Name Gene ID gravin, complete cds. AB003476 protease, serine, 22 (P11) XM_006625 H1 histone family, member 2 (H1F2) NM_005319 fucosyltransferase 1 (galactoside 2-alpha-L- NM_000148 fucosyltransferase, Bombay phenotype included) (FUT1) H2A histone family, member L (H2AFL) XM_004416 serine (or cysteine) proteinase inhibitor, NM_002575 clade B (ovalbumin), member 2 (SERPINB2) H2B histone family, member C (H2BFC) NM_003519 membrane associated guanylate kinase 2 AF038563 (MAGI-2) (RG) heterogeneous nuclear ribonucleoprotein R11019 H1 (H) keratin 8 (KRT8) NM_002273 RAD51 (S. cerevisiae) homolog (E coli RecA XM_031515 homolog) (RAD51) plasminogen activator, urokinase (PLAU) NM_002658 H3 histone family, member B (H3FB) NM_003530 aldehyde dehydrogenase 1 family, member A3 XM_017971 (ALDH1A3) (RG) ankyrin 1, erythrocytic AA464755 wild-type p53 activated fragment-1 (WAF1) U03106 like mouse brain protein E46 (E46L) NM_013236 progestin induced protein (DD5) NM_015902 H2A histone family, member O (H2AFO) NM_003516 transglutaminase 3 (TGM3) XM_009572 major histocompatibility complex, class II, NM_019111 DR alpha (HLA-DRA) mitotic checkpoint protein kinase BUB1B AF107297 (BUB1B) glutathione peroxidase 2 (gastrointestinal) NM_002083 (GPX2)
[0031] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.
[0032] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.
TABLE-US-00002 TABLE 2 Name Gene ID gravin, complete cds. AB003476 protease, serine, 22 (P11) XM_006625 H1 histone family, member 2 (H1F2) NM_005319 fucosyltransferase 1 (galactoside 2-alpha-L- NM_000148 fucosyltransferase, Bombay phenotype included) (FUT1) H2A histone family, member L (H2AFL) XM_004416 serine (or cysteine) proteinase inhibitor, NM_002575 clade B (ovalbumin), member 2 (SERPINB2) H2B histone family, member C (H2BFC) NM_003519 membrane associated guanylate kinase 2 AF038563 (MAGI-2) (RG) heterogeneous nuclear ribonucleoprotein R11019 H1 (H) keratin 8 (KRT8) NM_002273 RAD51 (S. cerevisiae) homolog (E coli RecA XM_031515 homolog) (RAD51) plasminogen activator, urokinase (PLAU) NM_002658 H3 histone family, member B (H3FB) NM_003530 aldehyde dehydrogenase 1 family, member A3 XM_017971 (ALDH1A3) (RG) ankyrin 1, erythrocytic AA464755 wild-type p53 activated fragment-1 (WAF1) U03106 like mouse brain protein E46 (E46L) NM_013236
[0033] In one embodiment a method is provided for determining presence or predisposition to esophageal cancer in a human subject. Expression of one or more genes is determined in a sample of morphologically normal esophageal epithelial cells of a human subject. A composite score of expression of the one or more genes is calculated. The composite score is compared to predetermined values for esophageal cancer or predisposition to esophageal cancer which were obtained using appropriate populations of subjects with esophageal cancer or with predisposition to esophageal cancer. The presence or predisposition to esophageal cancer is identified based on the composite score.
[0034] In another embodiment a method is provided for determining presence or predisposition to esophageal cancer in a human subject. Expression of one or more genes is determined in a sample of esophageal epithelial cells of a human subject. The one or more genes is selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member O (H2AFO); transglutaminase 3 (TGF3); major histocopatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2). A composite score of expression of the one or more genes is calculated. The composite score is compared to predetermined values for esophageal cancer or predisposition to esophageal cancer which were obtained using appropriate populations of subjects with esophageal cancer or with predisposition to esophageal cancer. The presence or predisposition to esophageal cancer is identified based on the composite score.
[0035] Still another embodiment provides a method for determining presence or predisposition to esophageal cancer in a human subject. Expression of one or more genes is determined in a sample of morphologically normal esophageal epithelial cells of a human subject. The one or more genes is selected from the group consisting of gravin; H1 histone family, member 2 (H1F2); H2A histone family, member L (H2AFL); H2B histone family, member C (H2BFC); keratin 8 (KRT8); progestin induced protein (DD5); H2A histone family, member 0 (H2AFO); transglutaminase 3 (TGF3); major histocompatibility complex, class II, DR alpha (HLA-DRA); mitotic checkpoint protein kinase BUB1B (BUB1B); and glutathione peroxidase 2 (gastrointestinal) (GPX2). A composite score of expression of the one or more genes is calculated. The composite score is compared to predetermined values for esophageal cancer or predisposition to esophageal cancer which were obtained using appropriate populations of subjects with esophageal cancer or with predisposition to esophageal cancer. Presence or predisposition to esophageal cancer is identified based on the composite score.
[0036] Unless otherwise stated, the cancer may be any cancer. The cancer may be esophageal adenocarcinoma or squamous cell cancer of the esophagus.
[0037] Unless otherwise stated, in each of the embodiments of the present invention the subject may be a mammal. In some embodiments the subject is a human.
[0038] Unless otherwise stated, in each of the embodiments of the present invention the gene expression pattern may be determined by any method known in the art. Either protein or mRNA expression can be analyzed. Any biochemical technique for assaying particular proteins or mRNA species can be used. Gene expression patterns may be determined using a polynucleotide microarray.
[0039] Unless otherwise stated, the gene expression pattern may be compared and/or analyzed by shrunken nearest centroid predictors (SNCP). Permutation analysis may be used in addition to SNCP analysis.
[0040] Unless otherwise stated, in each of the embodiments of the present invention the biological sample may be any biological sample from which polynucleotides may be obtained, such as mucosa. In some embodiments, the biological sample has a morphologically-normal appearance. In some embodiments the biological sample is esophageal epithelium, or squamous esophageal epithelium. In additional embodiments the biological sample is morphologically-normal appearing esophageal epithelium, or morphologically-normal appearing squamous esophageal epithelium.
[0041] Unless otherwise stated, in each of the embodiments of the present invention directed to methods for detecting differential gene expression, the method may further comprise predicting an increased risk for developing cancer in the subject. In one embodiment, the increased risk for developing cancer is an increased risk for developing esophageal adenocarcinoma.
[0042] Unless otherwise stated, in each of the embodiments of the present invention the one or more genes used in the determination of an expression pattern may be any of those identified by shrunken nearest centroid predictors. In some embodiments the one or more genes used in the determination of an expression pattern are selected from those genes set forth in Table 1, or selected from those genes set forth in Table 2.
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] FIG. 1 shows the predicted diagnoses in comparison of NE from EAC patients vs. patients with either BA only or no BA. Patients with EAC, to the left of and on the vertical line, were diagnosed correctly in every case, as were all control patients without any lesion (to the right of the vertical line). Top left and bottom right, likelihood of being an EAC patient; Bottom left and top right, likelihood of being a non-cancer patient.
[0044] FIG. 2 shows the predicted diagnoses in comparison of NE from EAC patients vs. control subjects (patients with neither BA nor EAC). Patients with EAC, to the left of the vertical line, were diagnosed correctly in every case, as were all control subjects patients without any lesion (to the right of the vertical line). Bottom left and top right, likelihood of being a control subject; top left and bottom right, likelihood of being an EAC patient.
[0045] FIG. 3 shows over-expressed genes designated by rightward-extending bars; those that are under-expressed protrude to the left. Left centroid, NE specimens from subjects with EAC; right centroid, NE from patients without EAC. This plot demonstrates that genes under-expressed in non-cancer patients are over-expressed in EAC patients, and vice versa. SNCP threshold=2.7.
[0046] The sequence listing includes the National Center for Biotechnology Information--Entrez Nucleotide database sequences for each of the genes set forth in Table 3.
DETAILED DESCRIPTION
[0047] The present invention is directed to methods for diagnosing esophageal cancer or predisposition to esophageal cancer in a subject based on gene expression patterns. Interestingly, the gene expression patterns of cancer patients and pre-cancer patients differ from normal even in the esophageal epithelial cells which appear morphologically normal. Thus selection of particular locations for biopsy is not necessary. Even if a lesion is not detected visually with an endoscope, an abnormality or a predisposition can be detected.
[0048] Much of the analysis of the present methods can be automated and calculated by computer. Identifying presence or predisposition to esophageal cancer can be accomplished for example by recording a result in a patient's chart, on a computer print out, delivered via telephone or email, whether by machine or human.
[0049] Diagnosing Cancer--Different Subjects
[0050] In particular, the present invention is directed to methods for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a biological sample from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0051] In these embodiment, the skilled artisan will understand that the methods can be used to diagnose cancer by comparing the gene expression pattern of a biological sample from a subject, such as a patient suspected of having a cancer, with the gene expression pattern from a second subject previously screened and determined not to have the particular cancer for which the first subject is being screened. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The comparison may be conducted using shrunken nearest centroid predictors analysis.
[0052] While the present invention includes methods of diagnosis where the gene expression patterns are determined from biological samples from the same source material from two different subjects, the present invention includes methods of diagnosis where the biological samples may be from different source materials from the different subjects.
[0053] The methods may employ additional steps to confirm the diagnosis of cancer or predisposition, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.
[0054] The present invention is thus directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0055] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0056] The present invention is further directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of esophageal epithelium from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0057] Comparisons to other subjects can be also be done by ascertaining expression values in populations of relevant individuals and determining the range of values of expression that occur in those populations. Thus, after such data has been collected and validated, absolute values can be determined in subjects and the absolute values can be compared to the data collected for relevant populations.
[0058] Diagnosing Cancer--Same Subject
[0059] In a variation on the methods of the present invention discussed above, the present invention is also directed to methods for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0060] In these embodiments, the skilled artisan will understand that the methods can be used to diagnose cancer by comparing the gene expression pattern of two different biological samples obtained from the subject. For example, and as discussed further herein, cancer may be diagnosed in a subject that exhibits no symptoms of disease by comparing gene expression patterns in biological samples obtained from different regions of the same tissue or obtained from different regions of the body, or by comparing gene expression patterns in biological samples obtained from different source material from the same subject. While the two samples may be selected from regions of the same tissue that have no gross morphological differences, the two sample may also be selected from regions of the same tissue that exhibit some morphological differences. For example, morphological differences may include swelling, differences in color, such as redness, differences in surface architecture, differences in mucosal layers, and differences in moisture content.
[0061] The comparison may be conducted using shrunken nearest centroid predictors analysis. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The method may include additional steps to confirm the diagnosis of cancer, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.
[0062] Accordingly, the present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0063] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0064] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0065] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained are separated by a distance of at least 3 cm in said subject.
[0066] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.
[0067] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium are separated by a distance of at least 3 cm in said subject.
[0068] The present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining a gene expression pattern of a first biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second biological sample from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer, and wherein the locations from which the first and second biological samples are obtained have a grossly different appearance.
[0069] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.
[0070] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining a gene expression pattern of a first sample of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a gene expression pattern of a second sample of esophageal epithelium from the subject using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the locations from which the first and second biological samples are obtained from the esophageal epithelium have a grossly different appearance.
[0071] Detecting Differential Gene Expression
[0072] The present invention also includes methods for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0073] These methods can be used to screen for differences in gene expression patterns between individuals, differences in gene expression patterns between different biological samples obtained from the same subject, or differences in gene expression patterns over time found in biological samples obtained from the same source material within a subject. These methods can be used to identify one specific gene, or more than one gene. The comparison may be conducted using shrunken nearest centroid predictors analysis. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The method may include additional steps to confirm the diagnosis of cancer, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.
[0074] Where there is a difference in the two gene expression patterns, the methods may include additional steps to confirm that the gene or genes identified can be used as cancer markers. Further steps may also be included to identify the gene or genes found using these methods.
[0075] Similarly, where there is a difference in the two gene expression patterns, a prediction can be made that the subject will develop cancer or that the subject has an increased risk for developing cancer. As used herein, an increased risk for developing cancer means that the subject has a risk for developing a particular cancer that is greater that the risk for developing that particular cancer in the population as a whole. As used herein, the population as a whole may mean individuals sharing the same sex, age range, physical health, medical condition, or geographic location. For example, the population as a whole may mean adult humans residing in the United States.
[0076] Accordingly, the present invention is directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of a biological sample from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0077] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0078] The present invention is also directed to a method for detecting differential gene expression in a subject comprising (a) determining a gene expression pattern of esophageal epithelium from a subject, and (b) comparing the gene expression pattern of (a) with a pre-determined gene expression pattern of esophageal epithelium using shrunken nearest centroid predictors, wherein when the gene expression pattern of (a) is different from the pre-determined gene expression pattern, differential gene expression is detected.
[0079] Diagnosing Cancer Using Markers
[0080] In a further variation, the present invention is directed to methods for diagnosing cancer in a subject comprising (a) determining an expression pattern of one or more genes in a biological sample from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in a biological sample from a subject that does not have cancer, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0081] As discussed herein, differential gene expression can be used to diagnosis cancer in a subject by comparing the expression level of one or more genes in a subject, such as a patient suspected of having cancer, with the expression level of one or more genes from a subject that is known not to have cancer. One or more specific genes may be used that have previously been shown to be correlated with a specific cancer.
[0082] These methods can be used to screen for differences in gene expression patterns between individuals, differences in gene expression patterns between different biological samples obtained from the same subject, or differences in gene expression patterns over time found in biological samples obtained from the same source material within a subject.
[0083] The comparison may be conducted using shrunken nearest centroid predictors analysis. Where there is a difference in the two gene expression patterns, a diagnosis of cancer may be made. The method may include additional steps to confirm the diagnosis of cancer, where such steps are any of those known by the skilled artisan to allow a diagnosis of cancer or confirmation of a diagnosis of cancer, including morphological and histological examinations, and screening for a particular cancer marker associated with the cancer for which the subject is being screened.
[0084] Similarly, where there is a difference in the two gene expression patterns, a prediction can be made that the subject will develop cancer or that the subject has an increased risk for developing cancer. As used herein, an increased risk for developing cancer means that the subject has a risk for developing a particular cancer that is greater that the risk for developing that particular cancer in the population as a whole. As used herein, the population as a whole may mean individuals sharing the same sex, age range, physical health, medical condition, or geographic location. For example, the population as a whole may mean adult humans residing in the United States.
[0085] The one or more genes used in the determination of an expression pattern may be any of those set forth in Table 1, or the subset shown in Table 2. The National Center for Biotechnology Information--Entrez Nucleotide sequences for each of the genes set forth in Table 3 are included in the Sequence Listing. Other genes in Tables 1 and 2 can be used with the sequence data that are present in the NCBI database, which are expressly incorporated herein with the sequences present as of the priority date of this application.
[0086] Accordingly, the present invention is directed to a method for diagnosing cancer in a subject comprising (a) determining an expression pattern of one or more genes in a biological sample from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in a biological sample from a subject that does not have cancer using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having cancer.
[0087] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0088] The present invention is directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma.
[0089] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.
[0090] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 1.
[0091] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.
[0092] The present invention is also directed to a method for diagnosing esophageal adenocarcinoma in a subject comprising (a) determining an expression pattern of one or more genes in esophageal epithelium from a subject, and (b) comparing the expression pattern of (a) with an expression pattern of the one or more genes in esophageal epithelium from a subject that does not have esophageal adenocarcinoma using shrunken nearest centroid predictors, wherein when differential gene expression is detected, the subject is diagnosed as having esophageal adenocarcinoma, and wherein the one or more genes are genes selected from Table 2.
[0093] In an embodiment, an expression pattern of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or all 23 of the genes of Table 1 is determined. In a further embodiment, the expression pattern of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 or all 17 of the genes of Table 2 is determined. The number to be determined is that number which gives sufficient sensitivity and specificity. The number to be determined is that number which gives an acceptable number of false positives and acceptable number of false negatives.
[0094] Diagnoses and prognoses determined using the subject methods can be confirmed and combined with other means of assessment. These include physical findings, radiological findings, pH determinations, endoscopic determinations, pathological determinations, patient reports of symptoms, and the like.
[0095] In relevant embodiments of the present invention the skilled artisan will understand the identity of the particular cancer being diagnosed need not be limited, and may include adenocarcinoma, squamous cell cancer, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus. In some embodiments the cancer is esophageal cancer, adenocarcinoma or squamous cell cancer.
[0096] Similarly, the identity of the subject to which the methods of the present invention are applied is not limited. However, in some embodiments, the subject is a bird or a mammal. For example, the subject may be a dog, cat, horse, simian or human.
[0097] The gene expression pattern may be determined by any method know in the art, although the pattern is typically determined using a polynucleotide microarray as described herein. In some embodiments, the gene expression patterns are analyzed and/or compared by shrunken nearest centroid predictors (SNCP). Detailed means for such analysis and/or comparison using shrunken nearest centroid predictors is provided in herein, and is based on an adaptation of classical nearest centroids prediction analysis, tailored specifically to microarray data (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)). Permutation analysis, as also described herein, may be used in conjunction with the SNCP analysis.
[0098] For each new specimen, we calculate the square distance to the normal centroid and the square distance to the cancer centroid. The centroid (normal or cancer) to which the specimen is closest, in squared distance, defines the predicted class for that new sample.
[0099] As an example, if we have a new specimen, we would determine the level of the 11 genes in the centroid. Let these levels be G1, G2, . . . G11. We already have the centroid values for each of the 11 genes for the normal class and for the cancer class. Let these be CN1, CN2, . . . CN11 and CC1, CC2, . . . , CC11. The score of the new specimen for the normal class would be calculated as:
Score for normal SN=(G1-CN1) 2+(G2-CN2) 2+ . . . +(G11-CN11) 2
and, the score of the new specimen for the cancer class would be calculated as:
Score for cancer SC=(G1-CC1) 2+(G2-CC2) 2+ . . . +(G11-CC11) 2.
If SN>SC, then the specimen is classified as cancer. If SN<SC then the specimen is calassified as normal.
[0100] The biological sample tested in the methods of the present invention may be any biological sample from which polynucleotides or proteins may be obtained. In particular, the biological sample is one that contains cells or cellular material, proteins or polynucleotides. For example, the biological sample may be a biological fluid, such as lymph, serum, plasma, whole blood, urine, synovial fluid and spinal fluid; a cell type, such as bone marrow, immune, keratinocytes, epithelial cells, hepatocytes, renal cells, breast tissue cells, bladder cells, prostate cells, pancreatic cells; a tissue, such as skin, muscle, liver, kidney, pancreas, heart, lung, breast, male or reproductive organs, lymphatic system, nervous system, digestive system, bladder, colon, connective tissue, where the tissue may be normal, cancerous or wounded tissue; or biopsies. In some embodiments, the biological sample is mucosa, esophageal epithelium, or squamous esophageal epithelium. In one embodiment, the biological sample is tissue diagnosed as Barrett's esophagus. Samples can be collected by endoscopy, or other collecting means, including surgical spatulas, sponges, balloons, esophageal brush-capsule. See Cancer Cytopathol. 2000; 90:10-6; see also Cancer. 1997; 80(11):2047-59. Some of these may be used in conjunction with endoscopy and some may be used independently.
[0101] As indicated further throughout this application, one of the advantages of the methods disclosed herein pertains to the ability to analyze morphologically-normal tissue, and to thereby diagnose cancer early based on the gene pattern. This is important for at least two reasons: (1) the assumption that disease progression is minimal in morphologically-normal tissue and treatment can thus begin prior to major damage to biological tissues and systems; and (2) diagnosticians do not need to first locate morphologically-abnormal tissue and analyze such tissue for gene expression. With regard to esophageal adenocarcinoma and squamous cell cancer, cancerous lesions can be especially difficult to find. The methods herein thus allow for an accurate analysis and diagnosis based on pre-cancerous tissue or tissue that may be some distance from cancerous tissue.
[0102] Therefore, in some embodiments, the biological sample has a morphologically-normal appearance. In some embodiments, the biological sample is morphologically-normal appearing esophageal epithelium, or morphologically-normal appearing squamous esophageal epithelium.
[0103] The methods of the present invention may be practiced using polynucleotides or proteins obtained directly from biological samples, or using polynucleotides or proteins produced from or amplified from polynucleotides obtained directly from biological samples. For example, cDNA may be isolated from a biological sample, and then PCR conducted to amplify the cDNA to obtain a sufficient amount of a polynucleotide for use in the methods. mRNA may also be amplified as detailed below. Protein can be made using in vitro transcription/translation systems, which are well known in the art.
[0104] While the location from which the biological samples are taken is not critical, they should be at a sufficient distance apart to be separate samples. When there is a morphological difference, the samples can be taken from those regions of the source material, such as a tissue, that have different morphological appearances. In general, the leading edges of these samples should be at a distance of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or 30 cm apart. When there is no morphological difference, the leading edges of the samples should be at a distance apart of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or 30 cm, such as when taken from the same tissue. The sample may also be collected from different regions of the body, such as where the sample is a bodily fluid. In an embodiment, the locations from which the first and second samples are obtained from a biological tissue are separated by a distance of at least 3 cm.
[0105] The gene expression patterns that are determined and compared in the methods of the present invention can be quantitative and/or qualitative patterns. For example, differential gene expression patterns can be based on the level at which the one or more genes are being expressed, and/or based on whether the one or more genes are being expressed at all. When the level of gene expression is determined and compared, a statistically significant difference may be used to demonstrate a difference in gene expression.
[0106] Solid supports according to the invention can be any substrate to which antibodies or oligonucleotide probes can be attached, either directly or indirectly through linker groups. Typically each species of antibody or species of oligonucleotide probe is located as a discrete geographic location on the substrate. Alternatively, each species of antibody or probe can be otherwise distinguishable, for example, based on a detectable label or other physical property. In one embodiment each species can be bound to a bead and the beads can be separated on the basis of size, magnetic characteristic, fluorescence spectrum, etc. Beads can also be used in discrete geographical locations, such as in wells of a microtiter plate. Solid supports are typically inert materials, such as glass, plastic, polymer, etc. They may be fabricated into sheets, strips plates, multi-well plates, beads, fabrics, etc. Solid supports typically have probes/antibodies for only a small subset of the entire genome or proteome. Aside from controls and standards, only probes or antibodies for genes which are found to be relevant to esophageal disease need be present. Thus the probes or antibodies for assessing expression of the genes in Table 3 may comprise at least 5%, at least 10%, at least 25%, at least 50%, at least 75%, at least 90% of the probes or antibodies on the solid support. Expression of any number of relevant genes can be tested including 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 of the genes listed in Table 3.
EXAMPLES
[0107] The following embodiments are merely exemplary and are not intended to be limiting.
[0108] Patients and Tissues
[0109] Six patients with BA alone, nine with BA and concomitant EAC, and eight with neither BA nor EAC were included in this study. The eight patients without BA or EAC had had endoscopy for unrelated indications, such as peptic ulcer disease, but had undergone endoscopic biopsy of the gastroesophageal junction that was histologically normal. In all cases, biopsies from grossly normal-appearing squamous esophageal epithelium at least 7 cm proximal to the upper limit of the Barrett's mucosa were included in the study. None of the patients with BA alone had concomitant dysplasia. Fresh NE (normal esophagus) biopsy specimens were immediately frozen and stored in liquid nitrogen until further use. Matching morphologic controls were obtained from the same sites as the research specimens and were examined by hematoxylin and eosin staining by an expert gastrointestinal pathologist at the University of Maryland. Informed consent was obtained from all patients under an institutionally approved research protocol.
[0110] Location of the Normal Squamous Esophageal Biopsies
[0111] The normal squamous esophagus (NE) areas biopsied were grossly normal, without any endoscopic evidence of esophagitis or reflux changes. In patients with obvious mass lesions in their esophagus, biopsies were obtained at least 7 cm proximal to these lesions. Similarly, biopsies from BA patients were performed at least 7 cm away from the area that showed endoscopic evidence of Barrett's esophagus. In patients lacking BA or EAC, biopsies were performed from areas that did not show any gross endoscopic abnormalities. Finally, all NE specimens were analyzed histologically, and there was no evidence of any metaplasia or other changes found in any of these NE samples.
[0112] cDNA Microarray Production and Hybridization
[0113] Detailed protocols for glass slide coating, cDNA clone preparation and verification, microarray printing, post-printing slide processing, RNA extraction, RNA amplification, labeling and hybridization were performed as previously described (Selaru et al., Oncogene 21(3):475-8 (2002); Zou et al., Oncogene 21(31):4855-62 (2002); Xu et al., Cancer Res 62(12):3493-7 (2002)).
[0114] RNA Extraction, Amplification, and Labeling of the aRNA Probe
[0115] Total RNA (3-20 μg) was extracted from freshly frozen tissue using an RNeasy kit (Quiagen, Valencia, Calif.) and amplified using the AmpliScript T7-flash transcription kit (Epicentre, Madison, Wis.). Labeling was performed on 6 μg of aRNA by incorporating Cy3- or Cy5-labeled dCTP using random primers and Superscript reverse transcriptase (Xu et al., Cancer Res 62(12):3493-7 (2002)). The resulting probes were purified with a Microcon microcentrifuge filter device and recovered in a volume of 25 μl. The reference probe was prepared from an equimolar mixture containing aRNAs from eight human malignant cell lines, as described previously. Microarray preparation was performed as described (Selaru et al., Oncogene 21(3):475-8 (2002); Mori et al., Cancer Res 63(15):4577-82 (2003); Xu et al., Cancer Res 62(12):3493-7 (2002)).
[0116] Microarray Normalization
[0117] An algorithm for normalizing microarray data was adapted that improves its accuracy and dynamic range (Yang et al., Nucleic Acids Res 30(4):e15 (2002)). Both within-slide and inter-slide normalization were accomplished. In this fashion, local distortions in signal and background intensity within different regions of a slide, as well as overall differences in hybridization or labeling efficiencies between slides, were overcome. It was determined that the within-slide normalization performed optimally when 4 blocks were used as the normalization unit (each block being produced by a different microarray pin). It was assumed that each group of 4 blocks was equivalent in average signal intensity and range to the next group of 4 blocks on the array. Thus, 8 normalization units per slide were utilized. This assumption was based on an optimization strategy in which groups of 1, 2, 4, 8, and 16 blocks were tested as the normalization unit, which showed that the 4-block unit performed with the least inaccuracy when a random number generator was used to produce the 8,064 values on a microarray slide (data not shown). Thus, this normalization method (Yang et al., Nucleic Acids Res 30(4):e15 (2002)) consisted of three steps: intensity-dependent normalization within each slide, scale normalization within each slide, and inter-slide normalization.
[0118] Shrunken Nearest Centroid Predictor (SNCP) Model
[0119] The Shrunken nearest centroid predictor (SNCP) model was analyzed to determine if it could be used to identify gene expression patterns or individual genes as biomarkers to distinguish between the normal esophagus of patients with, vs. without, accompanying EAC. SNCPs discovered both broad patterns and individual genes that were highly accurate in their ability to identify whether or not a patient had accompanying remotely located cancer.
[0120] The SNCP method is an adaptation of classical nearest centroids prediction analysis, tailored specifically to microarray data (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)). Each centroid is comprised of weighted averages of genes (elements) on the microarray for a particular diagnostic category, or "class." Thus, the centroids each contain 8,064 elements, since there are 8,064 genes on each microarray. Gene weighting is directly proportional to the raw average expression value, but inversely proportional to the standard deviation (i.e., the variability) of expression value within a given class. Centroids are then shrunken by adjusting the threshold value, which removes genes with lower weighted averages (thus yielding a smaller set of relevant genes). Gene expression variations below a certain threshold value are made equal to zero and ignored. Thus, shrinkage consists of moving the centroid towards zero by threshold, and setting it equal to zero when it drops completely (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)).
[0121] The choice of Δ (amount of shrinkage) is dependent upon two variables: 1. prediction error minimization; 2. the number of genes that are left in the model. More specifically, when all the genes on the microarrays are used, the prediction error is significant. During the process of data fitting, the SNCP model excludes outliers, i.e., genes that are not usable for the prediction. It is, however, possible to achieve the minimum prediction error for a range of Δs. In this particular case, the model can predict the predefined categories using a variable number of genes. Under these conditions, the smaller the Δ, the higher the number of genes left in the model, and vice versa.
[0122] Internal validation of results is performed using cross-validation. The value of K (fold cross-validation number) is set by default at 10; therefore, a 10-fold cross-validation was performed. In this 10-fold cross-validation, the specimens were randomly assigned to 10 groups. Nine of the ten groups were used for training, while the prediction is made on the 10th group. This procedure is repeated 10 times. For example, in the Normal-Normal versus Normal-Cancer comparison, training is done on 16 specimens, and then the model predicts the 17th specimen.
[0123] Permutation Analysis
[0124] SNCPs are mathematical models that learn by example. In other words, SNCPs identify a centroid for every group in the comparison. New specimens are classified by calculating the distance between the new specimen and each of the centroids. The specimen is classified into the class whose centroid is closest to the specimen. Ideally, the SNCPs should be tested on a test set, composed of specimens that were not used during training. This, however, may prove difficult when a small number of specimens are available for the study. One method to circumvent the need for a test set, while ensuring statistical significance is permutation analysis.
[0125] Permutation analysis is a statistical technique used to calculate the chances of obtaining classification results purely by chance. The analysis consists of randomly permuting the specimen labels and constructing classifiers (SNCPs) to categorize the specimens. In the current study, permutation resulted in randomly assigning specimens to one of two categories: N-N (NE specimens from patients lacking EAC or BA) and BA-CA (NE specimens from patients with Barrett's esophagus and concomitant esophageal adenocarcinoma). The SNCP model with the lowest prediction error was subsequently chosen. This procedure was repeated 100 times. In all 100 random permutations, SNCPs were unable to learn the two categories correctly (with an error=0). The mean group error for the 100 permutations was 0.36. This finding demonstrated that the possibility that the SNCP learned the two categories (N-N and BA-CA) correctly by chance alone was less than 1 in 100.
Experiment 1
[0126] In the initial application of the methodology described above, NE biopsy specimens of both subjects with completely normal esophagi (Normal-Normal, or N-N) and patients with BA but without EAC (BA-alone) were considered together as a single group, which was compared to NE biopsy specimens of patients with Barrett's esophagus with concomitant EAC (BA-CA). Centroids were 100% accurate in predicting which subject or patient was in which group in this comparison, as shown in FIG. 1. A list containing 195 genes was generated, based on their differential expression between normal esophagi from normal patients and normal esophagi from patients with EAC. Table 3 contains a few of these genes, with already known links to cancer.
Experiment 2
[0127] In an effort to further narrow the number of variables involved in the difference between NE biopsies from patients with concomitant EAC vs. subjects without EAC, NE from subjects without esophageal disease vs. NE from EAC patients only (i.e., excluding noncancer subjects with BA) was also compared. This comparison revealed the accuracy of centroids in distinguishing these two subgroups, as shown in FIG. 2.
[0128] Experiment 3
[0129] The SNCPs also generated visual displays of centroids, showing which genes were over-expressed and which were under-expressed in NE from patients with vs. those without accompanying EAC. The genes in these displays are arrayed in order of decreasing differential expression, with the most differentially expressed genes at the top and the least differentially expressed genes at the bottom. One such typical centroid is displayed in FIG. 3.
[0130] Genes represented in a shrunken centroid derived by comparing NE tissues between cancer and non-cancer patients are shown in Table 3. Among them are many genes with previous links to esophageal cancer or to cancers in general: histone biomarkers, gravin, HLA-DRA, keratin 8 (KRT8), glutathione peroxidase 2 (GPX2), the mitotic checkpoint protein kinase BUB1B, the progestin-induced protein DD5 and transglutaminase 3.
TABLE-US-00003 TABLE 3 Selected genes identified by comparison of NE from patients with EAC (N with T) vs. without EAC (N without T). N with N without Gene ID Gene Name T T AB003476 gravin -0.7322 0.4707 NM_005319 H1 histone family, member 2 0.5907 -0.3797 (H1F2) XM_004416 H2A histone family, member L 0.5384 -0.3461 (H2AFL) NM_003519 H2B histone family, member C 0.5062 -0.3254 (H2BFC) NM_002273 keratin 8 (KRT8) 0.3834 -0.2465 NM_015902 progestin induced protein -0.3112 0.2001 (DD5) NM_003516 H2A histone family, member O 0.2322 -0.1493 (H2AFO) XM_009572 transglutaminase 3 (TGM3) -0.2078 0.1336 major histocompatibility complex, class II, DR NM_019111 alpha (HLA-DRA) 0.1695 -0.109 AF107297 mitotic checkpoint protein -0.0626 0.0402 kinase BUB1B (BUB1B) NM_002083 glutathione peroxidase 2 0.0614 -0.0395 (gastrointestinal) (GPX2) Threshold value set at 2.7; N with T: gene score in the group of patients with esophageal adenocarcinoma; N without T: gene score in the group of patients without esophageal adenocarcinoma. Gene identifiers and gene names are shown in the two leftmost columns.
[0131] Previous studies have compared gene expression patterns among normal, metaplastic, and cancerous esophageal epithelia (Selaru et al., Oncogene 21(3):475-8 (2002); Xu et al., Cancer Res 62(12):3493-7 (2002), Guillem et al., Int J Cancer 88(6):856-61 (2000); Lu et al., Int J Cancer 91(3):288-94 (2001)). Moreover, a recent study by Wang, S et al., suggests that gene expression patterns in Barrett's esophagus are significantly closer to gene expression patterns in esophageal adenocarcinoma than to expression patterns in normal esophagus. This finding alarmingly implies that Barrett's esophagus is biologically closer to cancer than to normal esophagus (Wang et al., Oncogene 25(23):3346-56 (2006)). However, these studies have consisted of direct comparisons of these different types of esophageal epithelia to each other. In the current study, a different approach was undertaken: i.e., a comparison of the normal esophageal epithelia from patients at differing stages of esophageal neoplastic progression. This study found unique molecular signatures in normal esophageal epithelium that reflected concomitant neoplasia elsewhere in the esophagus.
[0132] The potential biologic ramifications of these results are far-reaching. The field effect found near esophageal tumors in surrounding normal epithelium has been well-described (Eads et al., Cancer Res 61(8):3410-8 (2001); Eads et al., Cancer Res 60(18):5021-6 (2000)). A recent study by Brabender et al. (Cancer Epidemiol Biomarkers Prey 14(9):2113-7 (2005)) identified a field effect by using a gene expression panel. In the current study, biopsies of normal esophagus were obtained at least 7 cm away from the tumor or Barrett's esophagus. The current findings suggest that esophageal cancer exerts a greater influence on the normal esophageal epithelium than previously known or suspected. While molecular alterations in histologically normal squamous esophageal epithelium have previously been described adjacent to cancers, the current findings suggest that alterations in gene expression and gene expression pattern accompanying cancer can affect large portions of the normal squamous esophagus. It was postulated that the development of esophageal adenocarcinoma is accompanied by widespread molecular phenotypic alterations that involve the entire normal squamous esophageal epithelium.
[0133] The present SNCP-based approach offers a number of advantages over other analytic techniques. These include the ability to differentiate among multiple specimen groups; the potential for rapid translation to the clinical setting; a low likelihood of over-fitting, yielding a low probability of erroneous diagnoses in new, independent datasets; and the capacity to yield a reduced number of diagnostic genes, which can themselves be developed as individual biomarkers as well as the basis for further molecular genetic studies (Tibshirani et al., Proc Natl Acad Sci USA 99(10):6567-72 (2002)).
[0134] In the current study, genes positioned the highest in centroids discriminating normal tissues from non-cancer vs. cancer patients were both interesting and relevant. For example, among the most highly ranked genes were members of the histone families (Table 3).
[0135] As single-gene predictors, histone biomarkers were accurate in distinguishing between accompanying cancer and its absence. Histones are basic nuclear proteins responsible for the nucleosome structure of chromosomal fibers in eukaryotes. Apart from promoter hypermethylation, modification of histone proteins is the second major component of epigenetic transcriptional control. DNA methylation and histone acetylation are integrally linked. Methylation is catalyzed by a family of DNA methyltransferases. DNA methyltransferases recruit histone deacetylases, leading to histone deacetylation and transcriptional repression. Methylated DNA is also recognized by a family of methylated DNA-binding proteins, which recruit histone deacetylases and ATP-dependent chromatin remodeling proteins, resulting in a tightly condensed chromatin structure and gene inactivation. Additional links between the "histone code" and the "cytosine methylation code" are increasingly evident (Johnstone et al., Nat Rev Drug Discov 1(4):287-99 (2002); Kouraklis et al., Curr Med Chem Anti-Canc Agents 2(4):477-84 (2002); Marks et al., Nat Rev Cancer 1(3):194-202 (2001)).
[0136] In addition, alterations of proteins in the histone acetyltransferase family (e.g., CREB-binding protein and p300) are associated with cancers of the breast, colon, liver, and hematopoietic system. Of particular relevance to the current findings, histone H4 is hyperacetylated in early stages of esophageal cancer cell invasion, and thereafter changes to a hypoacetylated state according to the degree of cancer progression (Toh et al., Oncol Rep 10(2):333-8 (2003)). These results suggest that a dynamic equilibrium between histone acetylase and deacetylase activities is disrupted in esophageal carcinogenesis, implying that an interaction may exist between hyperacetylation of histone H4 and histone deacetylase 1 expression (Toh et al., Oncol Rep 10(2):333-8 (2003)).
[0137] Similarly, by applying differential display to esophageal tumor and matched normal esophageal samples, histone H3.3 was identified among 49 cDNA ddPCR clones from esophageal cancers (ECs) (Graber et al., Ann Surg Oncol 3(2):192-7 (1996)). Histone H3.3 was overexpressed in 4/6 ECs, but not in paired normal mucosa. Only 5/13 normal human cell lines from various organs, but 11/12 human cancer cell lines (including 9 of 9 adenocarcinoma lines) overexpressed H3.3 (Graber et al., Ann Surg Oncol 3(2):192-7 (1996)). Histones H3 and H4 were deacetylated in gastric cancer cell lines showing aberrant methylation of CHFR, a mitotic checkpoint gene, suggesting a role for histone deacetylation in methylation-dependent gene silencing (Satoh et al., Cancer Res 63(24):8606-13 (2003)).
[0138] Another gene identified in the current study was HLA-DRA. Major histocompatibility complex (MHC) molecules are of central importance in regulating the immune response against tumors. Loss of expression of HLA class II molecules on tumor cells affects the onset and modulation of the immune response through lack of activation of CD4+ T lymphocytes. In part, loss of expression is caused by mutations as shown for large B-cell lymphoma (Jordanova et al., Immunogenetics 55(4):203-9 (2003)). A recent study found downregulation of HLA-DRA in invasive cancers compared to dysplastic cervical lesions (Chil et al., Acta Obstet Gynecol Scand 82(12):1146-52 (2003)).
[0139] A strong predictive value of keratin 8 (KRT8) was also observed. KRT8 belongs to the intermediate filament family and associates with keratin 18 to form a heterotetramer of two type i and two type ii keratins. Its phosphorylation on serine residues is enhanced during EGF stimulation and mitosis. Dysregulation of keratin 8 is associated with esophageal carcinogenesis (Boch et al., Gastroenterology 112(3):760-5 (1997); Glickman et al., Am J Surg Pathol 25(5):569-78 (2001); Glickman et al., Am J Surg Pathol 25(1):87-94 (2001); Salo et al., Ann Med 28(4):305-9 (1996)).
[0140] Additional genes with known relevance to human cancer identified by this SNCP model included glutathione peroxidase 2 (GPX2), the mitotic checkpoint protein kinase BUB1B, and the progestin-induced protein DD5. As expected, BUB1B was expressed at high levels in the normal esophageal tissues of patients without cancer and underexpressed in patients with cancer. BUB1B is a component of the mitotic checkpoint that delays anaphase until all chromosomes are properly attached to the mitotic spindle. In BRCA2-deficient murine cells, BUB1 mutants potentiate growth and cellular transformation (Davenport et al., Genomics 55(1):113-7 (1999)). In addition, mutations in human BUB1B have demonstrated a dominant negative effect by disrupting the mitotic checkpoint when transfected into euploid colon cancer cell lines (Davenport et al., Genomics 55(1):113-7 (1999)). Thus, BUB1B is a candidate tumor suppressor gene in the esophagus whose downregulation in normal esophageal tissue is associated with cancer development.
[0141] Transglutaminase 3, which was under-expressed in the normal tissue of tumor patients in this study, was recently found to be down-regulated in esophageal squamous cell carcinoma and head and neck squamous cell carcinoma by cDNA microarray studies comparing cancer and matching normal tissue (Luo et al., Oncogene 23(6):1291-9 (2004)).
[0142] In conclusion, the current study diagnosed patients with remote esophageal neoplasia based on biopsies of their remote normal epithelium alone, and provided a minimal list of genes necessary to do so. This proof-of-principle study establishes a theoretical basis to identify cancers in other organs by studying gene expression patterns or other molecular signatures in their matching normal epithelia. In addition, by shrinking the number of genes needed to arrive at a correct diagnosis, the current work showcases an approach to identify smaller numbers of genes worthy of further research from microarray data, both as biomarkers and for biologic or functional studies.
[0143] While the foregoing specification teaches the principles of the present invention, with examples provided for the purpose of illustration, it will be appreciated by one skilled in the art from reading this disclosure that various changes in form and detail can be made without departing from the true scope of the invention.
[0144] Each of the publications recited herein, including journal articles, books, manuals abstracts, posters, patents, and published patent applications, are hereby incorporated herein in their entireties.
Sequence CWU
1
1
2216287DNAHomo sapiens 1ggcagctccg agggcacctc cggttctccc ccatcctccg
ggagtgtctg ggcgctcagt 60ccgctctgat cccgccgaaa ccacctgcgg ttggcaggca
ggagactagg cgtctgccgg 120ggagggcagg gacccgctaa gctgatctcc tgtacagtag
tgctacttaa aatatgctgg 180ggaccatcac catcacagtt ggacagagag actctgaaga
tgtgagcaaa agagactccg 240ataaagagat ggctactaag tcagcggttg ttcacgacat
cacagatgat gggcaggagg 300agacacccga aataatcgaa cagattcctt cttcagaaag
caatttagaa gagctaacac 360aacccactga gtcccaggct aatgatattg gatttaagaa
ggtgtttaag tttgttggct 420ttaaattcac tgtgaaaaag gataagacag agaagcctga
cactgtccag ctactcactg 480tgaagaaaga tgaaggggag ggagcagcag gggctggcga
ccacaaggac cccagccttg 540gggctggaga agcagcatcc aaagaaagcg aacccaaaca
atctacagag aaacccgaag 600agaccctgaa gcgtgagcaa agccacgcag aaatttctcc
cccagccgaa tctggccaag 660cagtggagga atgcaaagag gaaggagaag agaaacaaga
aaaagaacct agcaagtctg 720cagaatctcc gactagtccc gtgaccagtg aaacaggatc
aaccttcaaa aaattcttca 780ctcaaggttg ggccggctgg cgcaaaaaga ccagtttcag
gaagccgaag gaggatgaag 840tggaagcttc agagaagaaa aaggaacaag agccagaaaa
agtagacaca gaagaagacg 900gaaaggcaga ggttgcctcc gagaaactga ccgcctccga
gcaagcccac ccacaggagc 960cggcagaaag tgcccacgag ccccggttat cagctgaata
tgagaaagtt gagctgccct 1020cagaggagca agtcagtggc tcgcagggac cttctgaaga
gaaacctgct ccgttggcga 1080cagaagtgtt tgatgagaaa atagaagtcc accaagaaga
ggttgtggcc gaagtccacg 1140tcagcaccgt ggaggagaga accgaagagc agaaaacgga
ggtggaagaa acagcagggt 1200ctgtgccagc tgaagaattg gttgaaatgg atgcagaacc
tcaggaagct gaacctgcca 1260aggagctggt gaagctcaaa gaaacgtgtg tttccggaga
ggaccctaca cagggagctg 1320acctcagtcc tgatgagaag gtgctgtcca aaccccccga
aggcgttgtg agtgaggtgg 1380aaatgctgtc atcacaggag agaatgaagg tgcagggaag
tccactaaag aagcttttta 1440ccagcactgg cttaaaaaag ctttctggaa agaaacagaa
agggaaaaga ggaggaggag 1500acgaggaatc aggggagcac actcaggttc cagccgattc
tccggacagc caggaggagc 1560aaaagggcga gagctctgcc tcatcccctg aggagcccga
ggagatcacg tgtctggaaa 1620agggcttagc cgaggtgcag caggatgggg aagctgaaga
aggagctact tccgatggag 1680agaaaaaaag agaaggtgtc actccctggg catcattcaa
aaagatggtg acgcccaaga 1740agcgtgttag acggccttcg gaaagtgata aagaagatga
gctggacaag gtcaagagcg 1800ctaccttgtc ttccaccgag agcacagcct ctgaaatgca
agaagaaatg aaagggagcg 1860tggaagagcc aaagccggaa gaaccaaagc gcaaggtgga
tacctcagta tcttgggaag 1920ctttaatttg tgtgggatca tccaagaaaa gagcaaggag
agggtcctct tctgatgagg 1980aagggggacc aaaagcaatg ggaggagacc accagaaagc
tgatgaggcc ggaaaagaca 2040aagagacggg gacagacggg atccttgctg gttcccaaga
acatgatcca gggcagggaa 2100gttcctcccc ggagcaagct ggaagcccta ccgaagggga
gggcgtttcc acctgggagt 2160catttaaaag gttagtcacg ccaagaaaaa aatcaaagtc
caagctggaa gagaaaagcg 2220aagactccat agctgggtct ggtgtagaac attccactcc
agacactgaa cccggtaaag 2280aagaatcctg ggtctcaatc aagaagttta ttcctggacg
aaggaagaaa aggccagatg 2340ggaaacaaga acaagcccct gttgaagacg cagggccaac
aggggccaac gaagatgact 2400ctgatgtccc ggccgtggtc cctctgtctg agtatgatgc
tgtagaaagg gagaaaatgg 2460aggcacagca agcccaaaaa agcgcagagc agcccgagca
gaaggcagcc actgaggtgt 2520ccaaggagct cagcgagagt caggttcata tgatggcagc
agctgtcgct gacgggacga 2580gggcagctac cattattgaa gaaaggtctc cttcttggat
atctgcttca gtgacagaac 2640ctcttgaaca agtagaagct gaagccgcac tgttaactga
ggaggtattg gaaagagaag 2700taattgcaga agaagaaccc cccacggtta ctgaacctct
gccagagaac agagaggccc 2760ggggcgacac ggtcgttagt gaggcggaat tgacccccga
agctgtgaca gctgcagaaa 2820ctgcagggcc attgggtgcc gaagaaggaa ccgaagcatc
tgctgctgaa gagaccacag 2880aaatggtgtc agcagtctcc cagttaaccg actccccaga
caccacagag gaggccactc 2940cggtgcagga ggtggaaggt ggcgtacctg acatagaaga
gcaagagagg cggactcaag 3000aggtcctcca ggcagtggca gaaaaagtga aagaggaatc
ccagctgcct ggcaccggtg 3060ggccagaaga tgtgcttcag cctgtgcaga gagcagaggc
agaaagacca gaagagcagg 3120ctgaagcgtc gggtctgaag aaagagacgg atgtagtgtt
gaaagtagat gctcaggagg 3180caaaaactga gccttttaca caagggaagg tggtggggca
gaccacccca gaaagctttg 3240aaaaagctcc tcaagtcaca gagagcatag agtccagtga
gcttgtaacc acttgtcaag 3300ccgaaacctt agctggggta aaatcacagg agatggtgat
ggaacaggct atcccccctg 3360actcggtgga aacccctaca gacagtgaga ctgatggaag
cacccccgta gccgactttg 3420acgcaccagg cacaacccag aaagacgaga ttgtggaaat
ccatgaggag aatgaggtcg 3480catctggtac ccagtcaggg ggcacagaag cagaggcagt
tcctgcacag aaagagaggc 3540ctccagcacc ttccagtttt gtgttccagg aagaaactaa
agaacaatca aagatggaag 3600acactctaga gcatacagat aaagaggtgt cagtggaaac
tgtatccatt ctgtcaaaga 3660ctgaggggac tcaagaggct gaccagtatg ctgatgagaa
aaccaaagac gtaccatttt 3720tcgaaggact tgaggggtct atagacacag gcataacagt
cagtcgggaa aaggtcactg 3780aagttgccct taaaggtgaa gggacagaag aagctgaatg
taaaaaggat gatgctcttg 3840aactgcagag tcacgctaag tctcctccat cccccgtgga
gagagagatg gtagttcaag 3900tcgaaaggga gaaaacagaa gcagagccaa cccatgtgaa
tgaagagaag cttgagcacg 3960aaacagctgt taccgtatct gaagaggtca gtaagcagct
cctccagaca gtgaatgtgc 4020ccatcataga tggagcaaag gaagtcagca gtttggaagg
aagccctcct ccctgcctag 4080gtcaagagga ggcagtatgc accaaaattc aagttcagag
ctctgaggca tcattcactc 4140taacagcggc tgcagaggag gaaaaggtct taggagaaac
tgccaacatt ttagaaacag 4200gtgaaacgtt ggagcctgca ggtgcacatt tagttctgga
agagaaatcc tctgaaaaaa 4260atgaagactt tgccgctcat ccaggggaag atgctgtgcc
cacagggccc gactgtcagg 4320caaaatcgac accagtgata gtatctgcta ctaccaagaa
aggcttaagt tccgacctgg 4380aaggagagaa aaccacatca ctgaagtgga agtcagatga
agtcgatgag caggttgctt 4440gccaggaggt caaagtgagt gtagcaattg aggatttaga
gcctgaaaat gggattttgg 4500aacttgagac caaaagcagt aaacttgtcc aaaacatcat
ccagacagcc gttgaccagt 4560ttgtacgtac agaagaaaca gccaccgaaa tgttgacgtc
tgagttacag acacaagctc 4620acgtgataaa agctgacagc caggacgctg gacaggaaac
ggagaaagaa ggagaggaac 4680ctctggcctc tgcacaggat gaaacaccaa ttacttcagc
caaagaggag tcagagtcaa 4740ccgcagtggg acaagcacat tctgatattt ccaaagacat
gagtgaagcc tcagaaaaga 4800ccatgactgt tgaggtagaa ggttccactg taaatgatca
gcagctggaa gaggtcgtcc 4860tcccatctga ggaagaggga ggtggagctg gaacaaagtc
tgtgccagaa gatgatggtc 4920atgccttgtt agcagaaaga atagagaagt cactagttga
accgaaagaa gatgaaaaag 4980gtgatgatgt tgatgaccct gaaaaccaga actcagccct
ggctgatact gatgcctcag 5040gaggcttaac caaagagtcc ccagatacaa atggaccaaa
acaaaaagag aaggaggatg 5100cccaggaagt agaattgcag gaaggaaaag tgcacagtga
atcagataaa gcgatcacac 5160cccaagcaca ggaggagtta cagaaacaag agagagaatc
tgcaaagtca gaacttacag 5220aatcttaaaa catcatgcag ttaaactcat tgtctgtttg
gaagaccaga atgtgaagac 5280aagtagtaga agaaaatgaa tgctgctgct gagactgaag
accagtattt cagaactttg 5340agaattggag agcaggcaca tcaactgatc tcatttctag
agagcccctg acaatcctga 5400ggcttcatca ggagctagag ccatttaaca tttcctcttt
ccaagaccaa cctacaattt 5460tcccttgata accatataaa ttctgattta aggtcctaaa
ttcttaacct ggaactggag 5520ttggcaatac ctagttctgc ttctgaaact ggagtatcat
tctttacata tttatatgta 5580tgttttaagt agtcctcctg tatctattgt atattttttt
cttaatgttt aaggaaatgt 5640gcaggatact acatgctttt tgtatcacac agtatatgat
ggggcatgtg ccatagtgca 5700ggcttgggga gctttaagcc tcagttatat aacccacgaa
aaacagagcc tcctagatgt 5760aacattcctg atcaaggtac aattctttaa aattcactaa
tgattgaggt ccatatttag 5820tggtactctg aaattggtca ctttcctatt acacggagtg
tgctaaaact aaaaagcatt 5880ttgaaacata cagaatgttc tattgtcatt gggaaatttt
tctttctaac ccagtggagg 5940ttagaaagaa gttatattct ggtagcaaat taactttaca
tcctttttcc tacttgttat 6000ggttgtttgg accgataagt gtgcttaatc ctgaggcaaa
gtagtgaata tgttttatat 6060gttatgaaga aaagaattgt tgtaagtttt tgattctact
cttatatgct ggactgcatt 6120cacacatggc atgaaataag tcaggttctt tacaaatggt
attttgatag atactggatt 6180gtgtttgtgc catatttgtg ccattctttt aagaacaatg
ttgcaacaca ttcatttgga 6240taagttgtga tttgacgact gatttaaata aaatatttgc
ttcactt 62872732DNAHomo sapiens 2catcggcgct ttgccacttg
tacccgagtt tttgattctc aacatgtccg agactgctcc 60tgccgctccc gctgccgcgc
ctcctgcgga gaaggcccct gtaaagaaga aggcggccaa 120aaaggctggg ggtacgcctc
gtaaggcgtc tggtcccccg gtgtcagagc tcatcaccaa 180ggctgtggcc gcctctaaag
agcgtagcgg agtttctctg gctgctctga aaaaagcgtt 240ggctgccgcc ggctatgatg
tggagaaaaa caacagccgt atcaaacttg gtctcaagag 300cctggtgagc aagggcactc
tggtgcaaac gaaaggcacc ggtgcttctg gctcctttaa 360actcaacaag aaggcagcct
ccggggaagc caagcccaag gttaaaaagg cgggcggaac 420caaacctaag aagccagttg
gggcagccaa gaagcccaag aaggcggctg gcggcgcaac 480tccgaagaag agcgctaaga
aaacaccgaa gaaagcgaag aagccggccg cggccactgt 540aaccaagaaa gtggctaaga
gcccaaagaa ggccaaggtt gcgaagccca agaaagctgc 600caaaagtgct gctaaggctg
tgaagcccaa ggccgctaag cccaaggttg tcaagcctaa 660gaaggcggcg cccaagaaga
aataggcgaa cgcctacttc taaaacccaa aaggctcttt 720tcagagccac ca
73231657DNAHomo sapiens
3aaagcggcca tgttttacat atttcttgat tttgtttgtt ttctcgtgag cttaggccgc
60tggttttggt gatttttgtc tgattgcaat gtctggacgt ggtaagcaag gaggcaaagc
120tcgcgccaaa gcgaaatccc gctcttctcg cgctggtctc cagttcccgg tgggccgagt
180gcaccgcctg ctccgtaaag gcaactacgc agagcgggtt ggggcaggcg cgccggtgta
240cctggcggcg gtgttagagt acctgaccgc cgagatcctg gagctggccg gcaacgcggc
300tcgcgacaac aagaagactc gcatcatccc gcgccacttg cagctggcca tccgcaacga
360cgaggagctc aacaaactgc taggccgggt gaccattgct cagggcggcg tccttcctaa
420catccaggcc gtgcttctgc ctaagaagac cgagagtcac cacaaggcca agggcaagtg
480atttgacagg tatctgagct cccggaaacg ctatcaaacc caaaggctct tttcagagcc
540cccctaccgt ttcaaaggaa gagctaacct cactgcttgt aggtagaagg aaaaaaggca
600ctaaggttgc aaaagcttct catttcagag agatgccagg atcctaagtg cctgccaaac
660ttaccaattc taaggaataa gtggatggat ggcattactg attcctacat tactgattga
720ttctgcatcc gcaaattgtt ttattaaaaa cattctacat catgtgtggg gagataagga
780ggataaaatg aagagaaaga atattattga ggggaagttc ttctgaatac aaaatgtgtt
840taatttttta aataagtatt acattcacag ggttcaaact atttgaagta aagagattat
900atataaagaa tccatccctc aacttaccca ggtggtcact tttctttttc ttgtgtatct
960gcccagtatt cattcctgct gatatcagtc aataatgaat gatacgtgtt ttcttcactt
1020ttttcattct tgtcaggtag cagactgtgt agacttttct gcacttgccc ttttcataac
1080aatctatctt ggagaacttt ccctatgaga acatacagag cttcctgtac acagttgcat
1140gtactgcatt atgcaaatgc attatatttt atgtaacctg tccactgttg gtaggcactt
1200gagttgtttt agtcttttgc tatcaaacag ttctgggatg attaaccctg atttactgca
1260aaattgaaat tgctctgcta ttctgctgga atggtggtaa gtgaactgaa aattccagtc
1320actcttgggc tagactcaac gttcttaaaa actatgtggc catcaccaaa ttagttattt
1380tgaaccttaa tttcttcacc tctaaaatgg aggtaatact taccttaagt ggctatgaga
1440atgaagatca tgtgtatgaa ttgttggtgc tctaaagaac agcacaaata aaattatttt
1500caaatttaat tttaattgaa ctatgtgtaa tttcttaatt ttgaaataat tttatttgta
1560atgtgcataa tcttatttaa tgtataatgt atacattgta atagaaacag atttcccaaa
1620ttccagcctg gcatgaggta ataaaaggta atgcaaa
16574453DNAHomo sapiens 4ttggttttgc cactattgtt tcattatgcc cgagctggcc
aagtctgctc ccgccccgaa 60gaagggctcc aagaaggcgg tgaccaaggc ccagaagaag
gatggcaaga agcgcaagcg 120cagccgcaag gagagctact ccgtgtacgt gtacaaggtg
ctgaagcagg tccaccccga 180caccggcatc tcttctaagg ccatgggaat catgaactcc
ttcgtcaacg acatcttcga 240gcgcatcgca agcgaggctt cccgcctggc gcactacaac
aagcgctcga ccatcacctc 300cagggagatc cagaccgccg tgcgcctgct gcttccgggg
gagctggcca agcacgcggt 360gtcggagggc accaaggccg tcaccaagta caccagctcc
aagtaaattc tcaagctctt 420gtccaaccca aaggctcttt tcagagccac tca
45351788DNAHomo sapiens 5attcctgaga gctctcctca
ccaagaagca gcttctccgc tccttctagg atctccgcct 60ggttcggccc gcctgcctcc
actcctgcct ctaccatgtc catcagggtg acccagaagt 120cctacaaggt gtccacctct
ggcccccggg ccttcagcag ccgctcctac acgagtgggc 180ccggttcccg catcagctcc
tcgagcttct cccgagtggg cagcagcaac tttcgcggtg 240gcctgggcgg cggctatggt
ggggccagcg gcatgggagg catcaccgca gttacggtca 300accagagcct gctgagcccc
cttgtcctgg aggtggaccc caacatccag gccgtgcgca 360cccaggagaa ggagcagatc
aagaccctca acaacaagtt tgcctccttc atagacaagg 420tacggttcct ggagcagcag
aacaagatgc tggagaccaa gtggagcctc ctgcagcagc 480agaagacggc tcgaagcaac
atggacaaca tgttcgagag ctacatcaac aaccttaggc 540ggcagctgga gactctgggc
caggagaagc tgaagctgga ggcggagctt ggcaacatgc 600aggggctggt ggaggacttc
aagaacaagt atgaggatga gatcaataag cgtacagaga 660tggagaacga atttgtcctc
atcaagaagg atgtggatga agcttacatg aacaaggtag 720agctggagtc tcgcctggaa
gggctgaccg acgagatcaa cttcctcagg cagctatatg 780aagaggagat ccgggagctg
cagtcccaga tctcggacac atctgtggtg ctgtccatgg 840acaacagccg ctccctggac
atggacagca tcattgctga ggtcaaggca cagtacgagg 900atattgccaa ccgcagccgg
gctgaggctg agagcatgta ccagatcaag tatgaggagc 960tgcagagcct ggctgggaag
cacggggatg acctgcggcg cacaaagact gagatctctg 1020agatgaaccg gaacatcagc
cggctccagg ctgagattga gggcctcaaa ggccagaggg 1080cttccctgga ggccgccatt
gcagatgccg agcagcgtgg agagctggcc attaaggatg 1140ccaacgccaa gttgtccgag
ctggaggccg ccctgcagcg ggccaagcag gacatggcgc 1200ggcagctgcg tgagtaccag
gagctgatga acgtcaagct ggccctggac atcgagatcg 1260ccacctacag gaagctgctg
gagggcgagg agagccggct ggagtctggg atgcagaaca 1320tgagtattca tacgaagacc
accagcggct atgcaggtgg tctgagctcg gcctatgggg 1380gcctcacaag ccccggcctc
agctacagcc tgggctccag ctttggctct ggcgcgggct 1440ccagctcctt cagccgcacc
agctcctcca gggccgtggt tgtgaagaag atcgagacac 1500gtgatgggaa gctggtgtct
gagtcctctg acgtcctgcc caagtgaaca gctgcggcag 1560cccctcccag cctacccctc
ctgcgctgcc ccagagcctg ggaaggaggc cgctatgcag 1620ggtagcactg ggaacaggag
acccacctga ggctcagccc tagccctcag cccacctggg 1680gagtttacta cctggggacc
ccccttgccc atgcctccag ctacaaaaca attcaattgc 1740tttttttttt tggtccaaaa
taaaacctca gctagctctg ccaatgtc 178869410DNAHomo sapiens
6cgccctcgag tggaggacga gaaggaaagc accatgacgt ccatccattt cgtggttcac
60ccgctgccgg gcaccgagga ccagctcaat gacaggttac gagaagtttc tgagaagctg
120aacaaatata atttaaacag ccacccccct ttgaatgtat tggaacaggc tactattaaa
180cagtgtgtgg tgggaccaaa tcatgctgcc tttcttcttg aggatggtag agtttgcagg
240attggttttt cagtacagcc agacagattg gaattgggta aacctgataa taatgatggg
300tcaaagttga acagcaactc gggggcaggg aggacgtcaa ggcctggtag gacaagcgac
360tctccatggt ttctctcagg ttctgagact ctaggcaggc tggcaggcaa caccttagga
420agccgctgga gttctggagt gggtggaagt ggtggaggat cctctggtag gtcatcagct
480ggagctcgag attcccgccg gcagactcga gttattcgga caggacggga tcgagggtct
540gggcttttgg gcagtcagcc ccagccagtt attccagcat ctgtcattcc agaggagctg
600atttcacagg cccaagttgt tttacaaggc aaatccagaa gtgtcattat tcgagaactt
660cagagaacaa atcttgatgt gaaccttgct gtaaataatt tacttagccg ggatgatgaa
720gatggagatg atggggatga tacagccagc gaatcttatt tgcctggaga ggatcttatg
780tctctccttg atgccgacat tcattctgcc cacccaagtg tcattattga tgcagatgcc
840atgttttctg aagacattag ctattttggt tacccttctt ttcgtcgttc atcactttcc
900aggctaggct catctcgagt tctccttctt cccttagaga gagactctga gctgttgcgt
960gaacgtgaat ccgttttacg tttacgtgaa cgaaggtggc ttgatggagc ctcatttgat
1020aatgaaaggg gttctaccag caaggaagga gagccaaact tggataagaa gaatacacct
1080gttcaaagtc cagtatctct aggagaagat ttgcagtggt ggcctgataa ggatggaaca
1140aaattcatct gtattggggc tctgtattct gaacttctgg ctgtcagcag taaaggagaa
1200ctttatcagt ggaaatggag tgaatctgag ccttacagaa atgcccagaa tccttcatta
1260catcatccac gagcaacatt tttggggtta accaatgaaa agatagtcct cctgtctgca
1320aatagcataa gagcaactgt agctacagaa aataacaagg ttgctacatg ggtggatgaa
1380actttaagtt ctgtggcttc taaattagag cacactgctc agacttactc tgaacttcaa
1440ggagagcgga tagtttcttt acattgctgt gccctttaca cctgcgctca gctggaaaac
1500agtttatatt ggtggggtgt agttcctttt agtcaaagga agaaaatgtt agagaaagct
1560agagcaaaaa ataaaaagcc taaatccagt gctggtattt cttcaatgcc gaacatcact
1620gttggtaccc aggtatgctt gagaaataat cctctttatc atgctggagc agttgcattt
1680tcaattagtg ctgggattcc taaagttggt gtcttaatgg agtcagtttg gaatatgaat
1740gacagctgta gatttcaact tagatctcct gaaagcttga aaaacatgga aaaagctagc
1800aaaactactg aagctaagcc tgaaagtaag caggagccag tgaaaacaga aatgggtcct
1860ccaccatctc cagcatccac gtgtagtgat gcatcctcaa ttgccagcag tgcatcaatg
1920ccatacaaac gacgacggtc aacccctgca ccaaaagaag aggaaaaggt gaatgaagag
1980cagtggtctc ttcgggaagt ggtttttgtg gaagatgtca agaatgttcc tgttggcaag
2040gtgctaaaag tagatggtgc ctatgttgct gtaaaatttc caggaacctc cagtaatact
2100aactgtcaga acagctctgg tccagatgct gacccttctt ctctcctgca ggattgtagg
2160ttacttagaa ttgatgaatt gcaggttgtc aaaactggtg gaacaccgaa ggttcccgac
2220tgtttccaaa ggactcctaa aaagctttgt atacctgaaa aaacagaaat attagcagtg
2280aatgtagatt ccaaaggtgt tcatgctgtt ctgaagactg gaaattgggt acgatactgt
2340atctttgatc ttgctacagg aaaagcagaa caggaaaata attttcctac aagcagcatt
2400gctttccttg gtcagaatga gaggaatgta gccattttca ctgctggaca ggaatctccc
2460attattcttc gagatggaaa tggtaccatc tacccaatgg ccaaagattg catgggagga
2520ataagggatc ccgattggct ggatcttcca cctattagta gtcttggaat gggtgtgcat
2580tctttaataa atcttcctgc caattcaaca atcaaaaaga aagctgctgt tatcatcatg
2640gctgtagaga aacaaacctt aatgcaacac attctgcgct gtgactatga ggcctgtcga
2700caatatctaa tgaatcttga gcaagcggtt gttttagagc agaatctaca gatgctgcag
2760acattcatca gccacagatg tgatggaaat cgaaatattt tgcatgcttg tgtatcagtt
2820tgctttccaa ccagcaataa agaaactaaa gaagaagagg aagcggagcg ttctgaaaga
2880aatacatttg cagaaaggct ttctgctgtt gaggccattg caaatgcaat atcagttgtt
2940tcaagtaatg gcccaggtaa tcgggctgga tcatcaagta gccgaagttt gagattacgg
3000gaaatgatga gacgttcgtt gagagcagct ggtttgggta gacatgaagc tggagcttca
3060tccagtgacc accaggatcc agtttcaccc cccatagctc cccctagttg ggttcctgac
3120cctcctgcga tggatcctga tggtgacatt gattttatcc tggcccccgc tgtgggatct
3180cttaccacag cagcaaccgg tactggtcaa ggaccaagca cctccactat tccaggtcct
3240tccacagagc catctgtagt agaatccaag gatcgaaagg cgaatgctca ttttatattg
3300aaattgttat gtgacagtgt ggttctccag ccctatctac gagaacttct ttctgccaag
3360gatgcaagag ggatgacccc atttatgtca gctgtaagtg gccgagctta tcctgctgca
3420attaccatct tagaaactgc tcagaaaatt gcaaaagctg aaatatcctc aagtgaaaaa
3480gaggaagatg tattcatggg aatggtttgc ccatcaggta ccaaccctga tgactctcct
3540ttatatgttt tatgttgtaa tgacacttgc agttttacat ggactggagc agagcacatt
3600aaccaggata tttttgagtg tcgaacttgt ggcttgctgg agtcactgtg ttgttgtacg
3660gaatgtgcaa gggtttgtca taaaggtcat gattgcaaac tcaaacggac atcaccaaca
3720gcctactgtg attgttggga gaaatgtaaa tgtaaaactc ttattgctgg acagaaatct
3780gctcgtcttg atctacttta tcgcctgctc actgctacta atctggttac tctgccaaac
3840agcaggggag agcacctctt actattctta gtacagacag tcgcaaggca gacggtggag
3900cattgtcaat acaggccacc tcgaatcagg gaagatcgta accgaaaaac agccagtcct
3960gaagattcag atatgccaga tcatgattta gagcctccaa gatttgccca gcttgcattg
4020gagcgtgttc tacaggactg gaatgccttg aaatctatga ttatgtttgg gtcgcaggag
4080aataaagacc ctcttagtgc cagcagtaga ataggccatc ttttgccaga agagcaagta
4140tacctcaatc agcaaagtgg cacaattcgg ctggactgtt tcactcattg ccttatagtt
4200aagtgtacag cagatatttt gcttttagat actctactag gtacactagt gaaagaactc
4260caaaacaaat atacacctgg acgtagagaa gaagctattg ctgtgacaat gaggtttcta
4320cgttcagtgg caagagtttt tgttattctg agtgtggaaa tggcttcatc caaaaagaaa
4380aacaacttta ttccacagcc aattggaaaa tgcaagcgtg tattccaagc attgctacct
4440tacgctgtgg aagaattgtg caacgtagca gagtcactga ttgttcctgt cagaatgggg
4500attgctcgtc caactgcacc atttaccctg gctagtacta gcatagatgc catgcagggc
4560agtgaagaat tattttcagt ggaaccacta ccaccacgac catcatctga tcagtctagc
4620agctccagtc agtctcagtc atcctacatc atcaggaatc cacagcagag gcgcatcagc
4680cagtcacagc ccgttcgggg cagagatgaa gaacaggatg atattgtttc agcagatgtg
4740gaagaggttg aggtggtgga gggtgtggct ggagaagagg atcatcatga tgaacaggaa
4800gaacacgggg aagaaaatgc tgaggcagag ggacaacatg atgagcatga tgaagacggg
4860agtgatatgg agctggactt gttagcagca gctgaaacag aaagtgatag tgaaagtaac
4920cacagcaacc aagataatgc tagtgggcgc agaagcgttg tcactgcagc aactgctggt
4980tcagaagcag gagcaagcag tgttcctgcc ttcttttctg aagatgattc tcaatcgaat
5040gactcaagtg attctgatag cagtagtagt cagagtgacg acatagaaca ggagaccttt
5100atgcttgatg agccattaga aagaaccaca aatagctccc atgccaatgg tgctgcccaa
5160gctccccgtt caatgcagtg ggctgtccgc aacacccagc atcagcgagc agccagtaca
5220gccccttcca gtacatctac accagcagca agttcagcgg gtttgattta tattgatcct
5280tcaaacttac gccggagtgg taccatcagt acaagtgctg cagctgcagc agctgctttg
5340gaagctagca acgccagcag ttacctaaca tctgcaagca gtttagccag ggcttacagc
5400attgtcatta gacaaatctc ggacttgatg ggccttattc ctaagtataa tcacctagta
5460tactctcaga ttccagcagc tgtgaaattg acttaccaag atgcagtaaa cttacagaac
5520tatgtagaag aaaagcttat tcccacttgg aactggatgg tcagtattat ggattctact
5580gaagctcaat tacgttatgg ttctgcatta gcatctgctg gtgatcctgg acatccaaat
5640catcctcttc acgcttctca gaattcagcg agaagagaga ggatgactgc gcgagaagaa
5700gctagcttac gaacacttga aggcagacga cgtgccacct tgcttagcgc ccgtcaagga
5760atgatgtctg cacgaggaga cttcctaaat tatgctctgt ctctaatgcg gtctcataat
5820gatgagcatt ctgatgttct tccagttttg gatgtttgct cattgaagca tgtggcatat
5880gtttttcaag cacttatata ctggattaag gcaatgaatc agcagacaac attggataca
5940cctcaactag aacgcaaaag gacgcgagaa ctcttggaac tgggtattga taatgaagat
6000tcagaacatg aaaatgatga tgacaccaat caaagtgcta ctttgaatga taaggatgat
6060gactctcttc ctgcagaaac tggccaaaac catccatttt tccgacgttc agactccatg
6120acattccttg ggtgtatacc cccaaatcca tttgaagtgc ctctggctga agccatcccc
6180ttggctgatc agccacatct gttgcagcca aatgctagaa aggaggatct ttttggccgt
6240ccaagtcagg gtctttattc ttcatctgcc agtagtggga aatgtttaat ggaggttaca
6300gtggatagaa actgcctaga ggttcttcca acaaaaatgt cttatgctgc caatctgaaa
6360aatgtaatga acatgcaaaa ccggcaaaaa aaagaagggg aagaacagcc cgtgctgcca
6420gaagaaactg agagttcaaa accagggcca tctgctcatg atcttgctgc acaattaaaa
6480agtagcttac tagcagaaat aggacttact gaaagtgaag ggccacctct cacatctttc
6540aggccacagt gtagctttat gggaatggtt atttcccatg atatgctgct aggacgttgg
6600cgcctttctt tagaactgtt cggcagggta ttcatggaag atgttggagc agaacctgga
6660tcaatcctaa ctgaattggg tggttttgag gtaaaagaat caaaattccg cagagaaatg
6720gaaaaactga gaaaccagca gtcaagagat ttgtcactag aggttgatcg ggatcgagat
6780cttctcattc agcagactat gaggcagctt aacaatcact ttggtcgaag atgtgctact
6840acaccaatgg ctgtacacag agtaaaagtc acatttaagg atgagccagg agagggcagt
6900ggtgtagcac gaagttttta tacagccatt gcacaagcat ttttatcaaa tgaaaaattg
6960ccaaatctag agtgtatcca aaatgccaac aaaggcaccc acacaagttt aatgcagaga
7020ttaaggaacc gaggagagag agaccgggaa agggagagag aaagggaaat gaggaggagt
7080agtggtttgc gagcaggttc tcggagggac cgggatagag actttagaag acagctttcc
7140atcgacacta ggccctttag accagcctct gaagggaatc ctagcgatga tcctgagcct
7200ttgccagcac atcggcaggc acttggagag aggctttatc ctcgtgtaca agcaatgcaa
7260ccagcatttg caagtaaaat cactggcatg ttgttggaat tatccccagc tcagctgctt
7320ctccttctag caagtgagga ttctctgaga gcaagagtgg atgaggccat ggaactcatt
7380attgcacatg gacgggaaaa tggagctgat agtatcctgg atcttggatt agtagactcc
7440tcagaaaagg tacagcagga aaaccgaaag cgccatggct ctagtcgaag tgtagtagat
7500atggatttag atgatacaga tgatggtgat gacaatgccc ctttgtttta ccaacctggg
7560aaaagaggat tttatactcc aaggcctggc aagaacacag aagcaaggtt gaattgtttc
7620agaaacattg gcaggattct tggactatgt ctgttacaga atgaactatg tcctatcaca
7680ttgaatagac atgtaattaa agtattgctt ggtagaaaag tcaattggca tgattttgct
7740ttttttgatc ctgtaatgta tgagagtttg cggcaactaa tcctcgcgtc tcagagttca
7800gatgctgatg ctgttttctc agcaatggat ttggcatttg caattgacct gtgtaaagaa
7860gaaggtggag gacaggttga actcattcct aatggtgtaa atataccagt cactccacag
7920aatgtatatg agtatgtgcg gaaatacgca gaacacagaa tgttggtagt tgcagaacag
7980cccttacatg caatgaggaa aggtctacta gatgtgcttc caaaaaattc attagaagat
8040ttaacggcag aagattttag gcttttggta aatggctgcg gtgaagtcaa tgtgcaaatg
8100ctgatcagtt ttacctcttt caatgatgaa tcaggagaaa atgctgagaa gcttctgcag
8160ttcaagcgtt ggttctggtc aatagtagag aagatgagca tgacagaacg acaagatctt
8220gtttactttt ggacatcaag cccatcactg ccagccagtg aagaaggatt ccagcctatg
8280ccctcaatca caataagacc accagatgac caacatcttc ctactgcaaa tacttgcatt
8340tctcgacttt acgtcccact ctattcctct aaacagattc tcaaacagaa attgttactc
8400gccattaaga ccaagaattt tggttttgtg tagagtataa aaagtgtgta ttgctgtgta
8460atattactag caaattttgt agattttttt ccatttgtct ataaaagttt atggaagtta
8520atgctgtcat acccccctgg tggtacctta aagagataaa atgcagacat tccttgctga
8580gtttatagct taaaggccta aggagcacta gcaacatttg gctatattgg tttgctagtc
8640accaacttct gggtctaacc ccagccaaag atgacagcag aacaacataa tttacactgt
8700gatttatctt tttgctgagg gggaaaaaat gtaaatgttc tgaaaattca ctgctgcctt
8760tgtggaaact gtttcagcaa aggttcttgt atagagggaa tagggaattt caaaataaaa
8820aattaagtat gttctgtgtt ttcattttaa ctttttttat ggtgtttaat ttgtggttgg
8880ctgcaactgt gtatcatgta tatggaactt gtaaaaaagt tctcgacatt cagatcttaa
8940gagatgaaat cacttttacc tataaaaacc acttttattg cggtttgact gcattgagct
9000ctaggatatt aaatgatatc actaatattt tgcatgtaat ttgctcattt gagtgagggc
9060actttttttg tacatatgat ggggccaatg cacaatactt ttatcacaat caactttttc
9120tttgtatccc tatttcaatg agcagtcagt ctcaagaggt tactgcactt cagttctaac
9180tagacatttg tactaaggta tttcagttat gtaaactcag cctgggcact ttctgataac
9240tgtaaaatgt tttataagat catgattatt gaagatacat tttggaaaat tttaaatgtt
9300cgtgagcagc ttaactactt ttgtatctag ccttttttaa gtatcttgtt acatttactt
9360ttttaaataa agaaattaca gaagaaatgt caagaaaaaa aaaaaaaaaa
94107534DNAHomo sapiens 7cgactttccc gatcgccagg caggagtttc tctcggtgac
tactatcgct gtcatgtctg 60gtcgtggcaa gcaaggaggc aaggcccgcg ccaaggccaa
gtcgcgctcg tcccgcgctg 120gccttcagtt cccggtaggg cgagtgcatc gcttgctgcg
caaaggcaac tacgcggagc 180gagtgggggc cggcgcgccc gtctacatgg ctgcggtcct
cgagtatctg accgccgaga 240tcctggagct ggcgggcaac gcggctcggg acaacaagaa
gacgcgcatc atccctcgtc 300acctccagct ggccatccgc aacgacgagg aactgaacaa
gctgctgggc aaagtcacca 360tcgcccaggg cggcgtcttg cctaacatcc aggccgtact
gctccctaag aagacggaga 420gtcaccacaa ggcaaagggc aagtgaggct gacgtccggc
ccaagtgggc ccagcccggc 480ccgcgtctcg aaggggcacc tgtgaactca aaaggctctt
ttcagagcca ccca 53482642DNAHomo sapiens 8gtctgtcagc actgtccgtg
ccattcccag aggagcctga gaagaggcag aggaaggcga 60aacatggctg ctctaggagt
ccagagtatc aactggcaga cggccttcaa ccgacaagcg 120catcacacag acaagttctc
cagccaggag ctcatcttgc ggagaggcca aaacttccag 180gtcttaatga tcatgaacaa
aggccttggc tctaacgaaa gactggagtt cattgtctcc 240acagggcctt acccctcaga
gtcggccatg acgaaggctg tgtttccact ctccaatggc 300agtagtggtg gctggagtgc
ggtgcttcag gccagcaatg gcaatactct gactatcagc 360atctccagtc ctgccagcgc
acccatagga cggtacacaa tggccctcca gatcttctcc 420cagggcggca tctcctctgt
gaaacttggg acgttcatac tgctttttaa cccctggctg 480aatgtggata gcgtctttat
gggtaaccac gctgagagag aagagtatgt tcaggaagat 540gccggcatca tctttgtggg
aagcacaaac cgaattggca tgattggctg gaactttgga 600cagtttgaag aagacattct
cagcatctgc ctctcaatct tggataggag tctgaatttc 660cgccgtgacg ctgctactga
tgtggccagc agaaatgacc ccaaatacgt tggccgggtg 720ctgagtgcca tgatcaatag
caatgatgac aatggtgtgc ttgctgggaa ttggagcggc 780acttacaccg gtggccggga
cccaaggagc tggaacggca gcgtggagat cctcaaaaat 840tggaaaaaat ctggcttcag
cccagtccga tatggccagt gctgggtctt tgctgggacc 900ctcaacacag cgctgcggtc
tttggggatt ccttcccggg tgatcaccaa cttcaactca 960gctcatgaca cagaccgaaa
tctcagtgtg gatgtgtact acgaccccat gggaaacccc 1020ctggacaagg gtagtgatag
cgtatggaat ttccatgtct ggaatgaagg ctggtttgtg 1080aggtctgacc tgggcccctc
gtacggtgga tggcaggtgt tggatgctac cccgcaggaa 1140agaagccaag gggtgttcca
gtgcggcccc gcttcggtca ttggtgttcg agagggtgat 1200gtgcagctga acttcgacat
gccctttatc ttcgcggagg ttaatgccga ccgcatcacc 1260tggctgtacg acaacaccac
tggcaaacag tggaagaatt ccgtgaacag tcacaccatt 1320ggcaggtaca tcagcaccaa
ggcggtgggc agcaatgctc gcatggacgt cacggacaag 1380tacaagtacc cagaaggctc
tgaccaggaa agacaagtgt tccaaaaggc tttggggaaa 1440cttaaaccca acacgccatt
tgccgcgacg tcttcaatgg gtttggaaac agaggaacag 1500gagcccagca tcatcgggaa
gctgaaggtc gctggcatgc tggcagtagg caaagaagtc 1560aacctggtcc tactgctcaa
aaacctgagc agggatacga agacagtgac agtgaacatg 1620acagcctgga ccatcatcta
caacggcacg cttgtacatg aagtgtggaa ggactctgcc 1680acaatgtccc tggaccctga
ggaagaggca gaacatccca taaagatctc gtacgctcag 1740tatgagaagt acctgaagtc
agacaacatg atccggatca cagcggtgtg caaggtccca 1800gatgagtctg aggtggtggt
ggagcgggac atcatcctgg acaaccccac cttgaccctg 1860gaggtgctga acgaggctcg
tgtgcggaag cctgtgaacg tgcagatgct cttctccaat 1920ccactggatg agccggtgag
ggactgcgtg ctgatggtgg agggaagcgg cctgctgttg 1980ggtaacctga agatcgacgt
gccgacccta gggcccaagg aggggtcccg ggtccgtttt 2040gatatcctgc cctcccggag
tggcaccaag caactgctcg ccgacttctc ctgcaacaag 2100ttccctgcaa tcaaggccat
gttgtccatc gatgtagccg aatgaagggc gctggtggcc 2160tcccgtacaa acttggacaa
cacggagcag ggagagctca ccatggaatg aaccccccgc 2220ccatgctgtc cggcctggga
aaccctctcc atctcccaag gctgccagac atggacctcc 2280aggctccagc acatccccct
ctcctctccc ccaggttggg gctgggtcca ccctgtccta 2340tgacttgatc acttttgcac
attccctggc cgcttctccc cagagctgcc tgctctgtga 2400gccccacagc cctgctcatt
cctcacgccc ttcaatgctg caggatggac tggcccctga 2460cccagggact ctccaaacgg
gatacaggag agaagctggt ctagactgtt tgctgatccc 2520caacctgcac ggggcattcc
tgcttctctc tcaggccacc acagagggca ggggatggtt 2580agtcacctgc cccagcactc
acaccctaac tcaaaataaa tgttaaataa gtgcgatcac 2640ac
264291267DNAHomo sapiens
9acattctctt ttcttttatt cttgtctgtt ctgcctcact cccgagctct actgactccc
60aacagagcgc ccaagaagaa aatggccata agtggagtcc ctgtgctagg atttttcatc
120atagctgtgc tgatgagcgc tcaggaatca tgggctatca aagaagaaca tgtgatcatc
180caggccgagt tctatctgaa tcctgaccaa tcaggcgagt ttatgtttga ctttgatggt
240gatgagattt tccatgtgga tatggcaaag aaggagacgg tctggcggct tgaagaattt
300ggacgatttg ccagctttga ggctcaaggt gcattggcca acatagctgt ggacaaagcc
360aacctggaaa tcatgacaaa gcgctccaac tatactccga tcaccaatgt acctccagag
420gtaactgtgc tcacgaacag ccctgtggaa ctgagagagc ccaacgtcct catctgtttc
480atagacaagt tcaccccacc agtggtcaat gtcacgtggc ttcgaaatgg aaaacctgtc
540accacaggag tgtcagagac agtcttcctg cccagggaag accacctttt ccgcaagttc
600cactatctcc ccttcctgcc ctcaactgag gacgtttacg actgcagggt ggagcactgg
660ggcttggatg agcctcttct caagcactgg gagtttgatg ctccaagccc tctcccagag
720actacagaga acgtggtgtg tgccctgggc ctgactgtgg gtctggtggg catcattatt
780gggaccatct tcatcatcaa gggattgcgc aaaagcaatg cagcagaacg cagggggcct
840ctgtaaggca catggaggtg atggtgtttc ttagagagaa gatcactgaa gaaacttctg
900ctttaatggc tttacaaagc tggcaatatt acaatccttg acctcagtga aagcagtcat
960cttcagcatt ttccagccct atagccaccc caagtgtgga tatgcctctt cgattgctcc
1020gtactctaac atctagctgg cttccctgtc tattgccttt tcctgtatct attttcctct
1080atttcctatc attttattat caccatgcaa tgcctctgga ataaaacata caggagtctg
1140tctctgctat ggaatgcccc atggggcatc tcttgtgtac ttattgttta aggtttcctc
1200aaactgtgat ttttctgaac acaataaact attttgatga tcttgggtgg aaaaaaaaaa
1260aaaaaaa
1267103670DNAHomo sapiens 10ccggtttgtt agggagtcgt gtacgtgcct tggtcgcttc
tgtagctccg agggcaggtt 60gcggaagaaa gcccaggcgg tctgtggccc agaggaaagg
cctgcagcag gacgaggacc 120tgagccagga atgcaggatg gcggcggtga agaaggaagg
gggtgctctg agtgaagcca 180tgtccctgga gggagatgaa tgggaactga gtaaagaaaa
tgtacaacct ttaaggcaag 240ggcggatcat gtccacgctt cagggagcac tggcacaaga
atctgcctgt aacaatactc 300ttcagcagca gaaacgggca tttgaatatg aaattcgatt
ttacactgga aatgaccctc 360tggatgtttg ggataggtat atcagctgga cagagcagaa
ctatcctcaa ggtgggaagg 420agagtaatat gtcaacgtta ttagaaagag ctgtagaagc
actacaagga gaaaaacgat 480attatagtga tcctcgattt ctcaatctct ggcttaaatt
agggcgttta tgcaatgagc 540ctttggatat gtacagttac ttgcacaacc aagggattgg
tgtttcactt gctcagttct 600atatctcatg ggcagaagaa tatgaagcta gagaaaactt
taggaaagca gatgcgatat 660ttcaggaagg gattcaacag aaggctgaac cactagaaag
actacagtcc cagcaccgac 720aattccaagc tcgagtgtct cggcaaactc tgttggcact
tgagaaagaa gaagaggagg 780aagtttttga gtcttctgta ccacaacgaa gcacactagc
tgaactaaag agcaaaggga 840aaaagacagc aagagctcca atcatccgtg taggaggtgc
tctcaaggct ccaagccaga 900acagaggact ccaaaatcca tttcctcaac agatgcaaaa
taatagtaga attactgttt 960ttgatgaaaa tgctgatgag gcttctacag cagagttgtc
taagcctaca gtccagccat 1020ggatagcacc ccccatgccc agggccaaag agaatgagct
gcaagcaggc ccttggaaca 1080caggcaggtc cttggaacac aggcctcgtg gcaatacagc
ttcactgata gctgtacccg 1140ctgtgcttcc cagtttcact ccatatgtgg aagagactgc
acgacagcca gttatgacac 1200catgtaaaat tgaacctagt ataaaccaca tcctaagcac
cagaaagcct ggaaaggaag 1260aaggagatcc tctacaaagg gttcagagcc atcagcaagc
gtctgaggag aagaaagaga 1320agatgatgta ttgtaaggag aagatttatg caggagtagg
ggaattctcc tttgaagaaa 1380ttcgggctga agttttccgg aagaaattaa aagagcaaag
ggaagccgag ctattgacca 1440gtgcagagaa gagagcagaa atgcagaaac agattgaaga
gatggagaag aagctaaaag 1500aaatccaaac tactcagcaa gaaagaacag gtgatcagca
agaagagacg atgcctacaa 1560aggagacaac taaactgcaa attgcttccg agtctcagaa
aataccagga atgactctat 1620ccagttctgt ttgtcaagta aactgttgtg ccagagaaac
ttcacttgcg gagaacattt 1680ggcaggaaca acctcattct aaaggtccca gtgtaccttt
ctccattttt gatgagtttc 1740ttctttcaga aaagaagaat aaaagtcctc ctgcagatcc
cccacgagtt ttagctcaac 1800gaagacccct tgcagttctc aaaacctcag aaagcatcac
ctcaaatgaa gatgtgtctc 1860cagatgtttg tgatgaattt acaggaattg aacccttgag
cgaggatgcc attatcacag 1920gcttcagaaa tgtaacaatt tgtcctaacc cagaagacac
ttgtgacttt gccagagcag 1980ctcgttttgt atccactcct tttcatgaga taatgtcctt
gaaggatctc ccttctgatc 2040ctgagagact gttaccggaa gaagatctag atgtaaagac
ctctgaggac cagcagacag 2100cttgtggcac tatctacagt cagactctca gcatcaagaa
gctgagccca attattgaag 2160acagtcgtga agccacacac tcctctggct tctctggttc
ttctgcctcg gttgcaagca 2220cctcctccat caaatgtctt caaattcctg agaaactaga
acttactaat gagacttcag 2280aaaaccctac tcagtcacca tggtgttcac agtatcgcag
acagctactg aagtccctac 2340cagagttaag tgcctctgca gagttgtgta tagaagacag
accaatgcct aagttggaaa 2400ttgagaagga aattgaatta ggtaatgagg attactgcat
taaacgagaa tacctaatat 2460gtgaagatta caagttattc tgggtggcgc caagaaactc
tgcagaatta acagtaataa 2520aggtatcttc tcaacctgtc ccatgggact tttatatcaa
cctcaagtta aaggaacgtt 2580taaatgaaga ttttgatcat ttttgcagct gttatcaata
tcaagatggc tgtattgttt 2640ggcaccaata tataaactgc ttcacccttc aggatcttct
ccaacacagt gaatatatta 2700cccatgaaat aacagtgttg attatttata accttttgac
aatagtggag atgctacaca 2760aagcagaaat agtccatggt gacttgagtc caaggtgtct
gattctcaga aacagaatcc 2820acgatcccta tgattgtaac aagaacaatc aagctttgaa
gatagtggac ttttcctaca 2880gtgttgacct tagggtgcag ctggatgttt ttaccctcag
cggctttcgg actgtacaga 2940tcctggaagg acaaaagatc ctggctaact gttcttctcc
ctaccaggta gacctgtttg 3000gtatagcaga tttagcacat ttactattgt tcaaggaaca
cctacaggtc ttctgggatg 3060ggtccttctg gaaacttagc caaaatattt ctgagctaaa
agatggtgaa ttgtggaata 3120aattctttgt gcggattctg aatgccaatg atgaggccac
agtgtctgtt cttggggagc 3180ttgcagcaga aatgaatggg gtttttgaca ctacattcca
aagtcacctg aacaaagcct 3240tatggaaggt agggaagtta actagtcctg gggctttgct
ctttcagtga gctaggcaat 3300caagtctcac agattgctgc ctcagagcaa tggttgtatt
gtggaacact gaaactgtat 3360gtgctgtaat ttaatttagg acacatttag atgcactacc
attgctgttc tactttttgg 3420tacaggtata ttttgacgtc actgatattt tttatacagt
gatatactta ctcatggcct 3480tgtctaactt ttgtgaagaa ctattttatt ctaaacagac
tcattacaaa tggttacctt 3540gttatttaac ccatttgtct ctacttttcc ctgtactttt
cccatttgta atttgtaaaa 3600tgttctctta tgatcaccat gtattttgta aataataaaa
tagtatctgt taaaaaaaaa 3660aaaaaaaaaa
3670111024DNAHomo sapiens 11cttcctggct cctccttcct
ccccacccct ctaataggct cataagtggg ctcaggcctc 60tctgcggggc tcactctgcg
cttcaccatg gctttcattg ccaagtcctt ctatgacctc 120agtgccatca gcctggatgg
ggagaaggta gatttcaata cgttccgggg cagggccgtg 180ctgattgaga atgtggcttc
gctctgaggc acaaccaccc gggacttcac ccagctcaac 240gagctgcaat gccgctttcc
caggcgcctg gtggtccttg gcttcccttg caaccaattt 300ggacatcagg agaactgtca
gaatgaggag atcctgaaca gtctcaagta tgtccgtcct 360gggggtggat accagcccac
cttcaccctt gtccaaaaat gtgaggtgaa tgggcagaac 420gagcatcctg tcttcgccta
cctgaaggac aagctcccct acccttatga tgacccattt 480tccctcatga ccgatcccaa
gctcatcatt tggagccctg tgcgccgctc agatgtggcc 540tggaactttg agaagttcct
catagggccg gagggagagc ccttccgacg ctacagccgc 600accttcccaa ccatcaacat
tgagcctgac atcaagcgcc tccttaaagt tgccatatag 660atgtgaactg ctcaacacac
agatctccta ctccatccag tcctgaggag ccttaggatg 720cagcatgcct tcaggagaca
ctgctggacc tcagcattcc cttgatatca gtccccttca 780ctgcagagcc ttgcctttcc
cctctgcctg tttccttttc ctctcccaac cctctggttg 840gtgattcaac ttgggctcca
agacttgggt aagctctggg ccttcacaga atgatggcac 900cttcctaaac cctcatgggt
ggtgtctgag aggcgtgaag ggcctggagc cactctgcta 960gaagagacca ataaagggca
ggtgtggaaa cggcaaaaaa aaaaaaaaaa aaaaaaaaaa 1020aaaa
1024121684PRTHomo sapiens
12Met Leu Gly Thr Ile Thr Ile Thr Val Gly Gln Arg Asp Ser Glu Asp1
5 10 15Val Ser Lys Arg Asp Ser
Asp Lys Glu Met Ala Thr Lys Ser Ala Val 20 25
30Val His Asp Ile Thr Asp Asp Gly Gln Glu Glu Thr Pro
Glu Ile Ile 35 40 45Glu Gln Ile
Pro Ser Ser Glu Ser Asn Leu Glu Glu Leu Thr Gln Pro 50
55 60Thr Glu Ser Gln Ala Asn Asp Ile Gly Phe Lys Lys
Val Phe Lys Phe65 70 75
80Val Gly Phe Lys Phe Thr Val Lys Lys Asp Lys Thr Glu Lys Pro Asp
85 90 95Thr Val Gln Leu Leu Thr
Val Lys Lys Asp Glu Gly Glu Gly Ala Ala 100
105 110Gly Ala Gly Asp His Lys Asp Pro Ser Leu Gly Ala
Gly Glu Ala Ala 115 120 125Ser Lys
Glu Ser Glu Pro Lys Gln Ser Thr Glu Lys Pro Glu Glu Thr 130
135 140Leu Lys Arg Glu Gln Ser His Ala Glu Ile Ser
Pro Pro Ala Glu Ser145 150 155
160Gly Gln Ala Val Glu Glu Cys Lys Glu Glu Gly Glu Glu Lys Gln Glu
165 170 175Lys Glu Pro Ser
Lys Ser Ala Glu Ser Pro Thr Ser Pro Val Thr Ser 180
185 190Glu Thr Gly Ser Thr Phe Lys Lys Phe Phe Thr
Gln Gly Trp Ala Gly 195 200 205Trp
Arg Lys Lys Thr Ser Phe Arg Lys Pro Lys Glu Asp Glu Val Glu 210
215 220Ala Ser Glu Lys Lys Lys Glu Gln Glu Pro
Glu Lys Val Asp Thr Glu225 230 235
240Glu Asp Gly Lys Ala Glu Val Ala Ser Glu Lys Leu Thr Ala Ser
Glu 245 250 255Gln Ala His
Pro Gln Glu Pro Ala Glu Ser Ala His Glu Pro Arg Leu 260
265 270Ser Ala Glu Tyr Glu Lys Val Glu Leu Pro
Ser Glu Glu Gln Val Ser 275 280
285Gly Ser Gln Gly Pro Ser Glu Glu Lys Pro Ala Pro Leu Ala Thr Glu 290
295 300Val Phe Asp Glu Lys Ile Glu Val
His Gln Glu Glu Val Val Ala Glu305 310
315 320Val His Val Ser Thr Val Glu Glu Arg Thr Glu Glu
Gln Lys Thr Glu 325 330
335Val Glu Glu Thr Ala Gly Ser Val Pro Ala Glu Glu Leu Val Glu Met
340 345 350Asp Ala Glu Pro Gln Glu
Ala Glu Pro Ala Lys Glu Leu Val Lys Leu 355 360
365Lys Glu Thr Cys Val Ser Gly Glu Asp Pro Thr Gln Gly Ala
Asp Leu 370 375 380Ser Pro Asp Glu Lys
Val Leu Ser Lys Pro Pro Glu Gly Val Val Ser385 390
395 400Glu Val Glu Met Leu Ser Ser Gln Glu Arg
Met Lys Val Gln Gly Ser 405 410
415Pro Leu Lys Lys Leu Phe Thr Ser Thr Gly Leu Lys Lys Leu Ser Gly
420 425 430Lys Lys Gln Lys Gly
Lys Arg Gly Gly Gly Asp Glu Glu Ser Gly Glu 435
440 445His Thr Gln Val Pro Ala Asp Ser Pro Asp Ser Gln
Glu Glu Gln Lys 450 455 460Gly Glu Ser
Ser Ala Ser Ser Pro Glu Glu Pro Glu Glu Ile Thr Cys465
470 475 480Leu Glu Lys Gly Leu Ala Glu
Val Gln Gln Asp Gly Glu Ala Glu Glu 485
490 495Gly Ala Thr Ser Asp Gly Glu Lys Lys Arg Glu Gly
Val Thr Pro Trp 500 505 510Ala
Ser Phe Lys Lys Met Val Thr Pro Lys Lys Arg Val Arg Arg Pro 515
520 525Ser Glu Ser Asp Lys Glu Asp Glu Leu
Asp Lys Val Lys Ser Ala Thr 530 535
540Leu Ser Ser Thr Glu Ser Thr Ala Ser Glu Met Gln Glu Glu Met Lys545
550 555 560Gly Ser Val Glu
Glu Pro Lys Pro Glu Glu Pro Lys Arg Lys Val Asp 565
570 575Thr Ser Val Ser Trp Glu Ala Leu Ile Cys
Val Gly Ser Ser Lys Lys 580 585
590Arg Ala Arg Arg Gly Ser Ser Ser Asp Glu Glu Gly Gly Pro Lys Ala
595 600 605Met Gly Gly Asp His Gln Lys
Ala Asp Glu Ala Gly Lys Asp Lys Glu 610 615
620Thr Gly Thr Asp Gly Ile Leu Ala Gly Ser Gln Glu His Asp Pro
Gly625 630 635 640Gln Gly
Ser Ser Ser Pro Glu Gln Ala Gly Ser Pro Thr Glu Gly Glu
645 650 655Gly Val Ser Thr Trp Glu Ser
Phe Lys Arg Leu Val Thr Pro Arg Lys 660 665
670Lys Ser Lys Ser Lys Leu Glu Glu Lys Ser Glu Asp Ser Ile
Ala Gly 675 680 685Ser Gly Val Glu
His Ser Thr Pro Asp Thr Glu Pro Gly Lys Glu Glu 690
695 700Ser Trp Val Ser Ile Lys Lys Phe Ile Pro Gly Arg
Arg Lys Lys Arg705 710 715
720Pro Asp Gly Lys Gln Glu Gln Ala Pro Val Glu Asp Ala Gly Pro Thr
725 730 735Gly Ala Asn Glu Asp
Asp Ser Asp Val Pro Ala Val Val Pro Leu Ser 740
745 750Glu Tyr Asp Ala Val Glu Arg Glu Lys Met Glu Ala
Gln Gln Ala Gln 755 760 765Lys Ser
Ala Glu Gln Pro Glu Gln Lys Ala Ala Thr Glu Val Ser Lys 770
775 780Glu Leu Ser Glu Ser Gln Val His Met Met Ala
Ala Ala Val Ala Asp785 790 795
800Gly Thr Arg Ala Ala Thr Ile Ile Glu Glu Arg Ser Pro Ser Trp Ile
805 810 815Ser Ala Ser Val
Thr Glu Pro Leu Glu Gln Val Glu Ala Glu Ala Ala 820
825 830Leu Leu Thr Glu Glu Val Leu Glu Arg Glu Val
Ile Ala Glu Glu Glu 835 840 845Pro
Pro Thr Val Thr Glu Pro Leu Pro Glu Asn Arg Glu Ala Arg Gly 850
855 860Asp Thr Val Val Ser Glu Ala Glu Leu Thr
Pro Glu Ala Val Thr Ala865 870 875
880Ala Glu Thr Ala Gly Pro Leu Gly Ala Glu Glu Gly Thr Glu Ala
Ser 885 890 895Ala Ala Glu
Glu Thr Thr Glu Met Val Ser Ala Val Ser Gln Leu Thr 900
905 910Asp Ser Pro Asp Thr Thr Glu Glu Ala Thr
Pro Val Gln Glu Val Glu 915 920
925Gly Gly Val Pro Asp Ile Glu Glu Gln Glu Arg Arg Thr Gln Glu Val 930
935 940Leu Gln Ala Val Ala Glu Lys Val
Lys Glu Glu Ser Gln Leu Pro Gly945 950
955 960Thr Gly Gly Pro Glu Asp Val Leu Gln Pro Val Gln
Arg Ala Glu Ala 965 970
975Glu Arg Pro Glu Glu Gln Ala Glu Ala Ser Gly Leu Lys Lys Glu Thr
980 985 990Asp Val Val Leu Lys Val
Asp Ala Gln Glu Ala Lys Thr Glu Pro Phe 995 1000
1005Thr Gln Gly Lys Val Val Gly Gln Thr Thr Pro Glu Ser Phe
Glu Lys 1010 1015 1020Ala Pro Gln Val
Thr Glu Ser Ile Glu Ser Ser Glu Leu Val Thr Thr1025 1030
1035 1040Cys Gln Ala Glu Thr Leu Ala Gly Val
Lys Ser Gln Glu Met Val Met 1045 1050
1055Glu Gln Ala Ile Pro Pro Asp Ser Val Glu Thr Pro Thr Asp Ser
Glu 1060 1065 1070Thr Asp Gly
Ser Thr Pro Val Ala Asp Phe Asp Ala Pro Gly Thr Thr 1075
1080 1085Gln Lys Asp Glu Ile Val Glu Ile His Glu Glu
Asn Glu Val Ala Ser 1090 1095 1100Gly
Thr Gln Ser Gly Gly Thr Glu Ala Glu Ala Val Pro Ala Gln Lys1105
1110 1115 1120Glu Arg Pro Pro Ala Pro
Ser Ser Phe Val Phe Gln Glu Glu Thr Lys 1125
1130 1135Glu Gln Ser Lys Met Glu Asp Thr Leu Glu His Thr
Asp Lys Glu Val 1140 1145
1150Ser Val Glu Thr Val Ser Ile Leu Ser Lys Thr Glu Gly Thr Gln Glu
1155 1160 1165Ala Asp Gln Tyr Ala Asp Glu
Lys Thr Lys Asp Val Pro Phe Phe Glu 1170 1175
1180Gly Leu Glu Gly Ser Ile Asp Thr Gly Ile Thr Val Ser Arg Glu
Lys1185 1190 1195 1200Val
Thr Glu Val Ala Leu Lys Gly Glu Gly Thr Glu Glu Ala Glu Cys
1205 1210 1215Lys Lys Asp Asp Ala Leu Glu
Leu Gln Ser His Ala Lys Ser Pro Pro 1220 1225
1230Ser Pro Val Glu Arg Glu Met Val Val Gln Val Glu Arg Glu
Lys Thr 1235 1240 1245Glu Ala Glu
Pro Thr His Val Asn Glu Glu Lys Leu Glu His Glu Thr 1250
1255 1260Ala Val Thr Val Ser Glu Glu Val Ser Lys Gln Leu
Leu Gln Thr Val1265 1270 1275
1280Asn Val Pro Ile Ile Asp Gly Ala Lys Glu Val Ser Ser Leu Glu Gly
1285 1290 1295Ser Pro Pro Pro Cys
Leu Gly Gln Glu Glu Ala Val Cys Thr Lys Ile 1300
1305 1310Gln Val Gln Ser Ser Glu Ala Ser Phe Thr Leu Thr
Ala Ala Ala Glu 1315 1320 1325Glu
Glu Lys Val Leu Gly Glu Thr Ala Asn Ile Leu Glu Thr Gly Glu 1330
1335 1340Thr Leu Glu Pro Ala Gly Ala His Leu Val
Leu Glu Glu Lys Ser Ser1345 1350 1355
1360Glu Lys Asn Glu Asp Phe Ala Ala His Pro Gly Glu Asp Ala Val
Pro 1365 1370 1375Thr Gly
Pro Asp Cys Gln Ala Lys Ser Thr Pro Val Ile Val Ser Ala 1380
1385 1390Thr Thr Lys Lys Gly Leu Ser Ser Asp
Leu Glu Gly Glu Lys Thr Thr 1395 1400
1405Ser Leu Lys Trp Lys Ser Asp Glu Val Asp Glu Gln Val Ala Cys Gln
1410 1415 1420Glu Val Lys Val Ser Val Ala
Ile Glu Asp Leu Glu Pro Glu Asn Gly1425 1430
1435 1440Ile Leu Glu Leu Glu Thr Lys Ser Ser Lys Leu Val
Gln Asn Ile Ile 1445 1450
1455Gln Thr Ala Val Asp Gln Phe Val Arg Thr Glu Glu Thr Ala Thr Glu
1460 1465 1470Met Leu Thr Ser Glu Leu
Gln Thr Gln Ala His Val Ile Lys Ala Asp 1475 1480
1485Ser Gln Asp Ala Gly Gln Glu Thr Glu Lys Glu Gly Glu Glu
Pro Leu 1490 1495 1500Ala Ser Ala Gln
Asp Glu Thr Pro Ile Thr Ser Ala Lys Glu Glu Ser1505 1510
1515 1520Glu Ser Thr Ala Val Gly Gln Ala His
Ser Asp Ile Ser Lys Asp Met 1525 1530
1535Ser Glu Ala Ser Glu Lys Thr Met Thr Val Glu Val Glu Gly Ser
Thr 1540 1545 1550Val Asn Asp
Gln Gln Leu Glu Glu Val Val Leu Pro Ser Glu Glu Glu 1555
1560 1565Gly Gly Gly Ala Gly Thr Lys Ser Val Pro Glu
Asp Asp Gly His Ala 1570 1575 1580Leu
Leu Ala Glu Arg Ile Glu Lys Ser Leu Val Glu Pro Lys Glu Asp1585
1590 1595 1600Glu Lys Gly Asp Asp Val
Asp Asp Pro Glu Asn Gln Asn Ser Ala Leu 1605
1610 1615Ala Asp Thr Asp Ala Ser Gly Gly Leu Thr Lys Glu
Ser Pro Asp Thr 1620 1625
1630Asn Gly Pro Lys Gln Lys Glu Lys Glu Asp Ala Gln Glu Val Glu Leu
1635 1640 1645Gln Glu Gly Lys Val His Ser
Glu Ser Asp Lys Ala Ile Thr Pro Gln 1650 1655
1660Ala Gln Glu Glu Leu Gln Lys Gln Glu Arg Glu Ser Ala Lys Ser
Glu1665 1670 1675 1680Leu
Thr Glu Ser13213PRTHomo sapiens 13Met Ser Glu Thr Ala Pro Ala Ala Pro Ala
Ala Ala Pro Pro Ala Glu1 5 10
15Lys Ala Pro Val Lys Lys Lys Ala Ala Lys Lys Ala Gly Gly Thr Pro
20 25 30Arg Lys Ala Ser Gly Pro
Pro Val Ser Glu Leu Ile Thr Lys Ala Val 35 40
45Ala Ala Ser Lys Glu Arg Ser Gly Val Ser Leu Ala Ala Leu
Lys Lys 50 55 60Ala Leu Ala Ala Ala
Gly Tyr Asp Val Glu Lys Asn Asn Ser Arg Ile65 70
75 80Lys Leu Gly Leu Lys Ser Leu Val Ser Lys
Gly Thr Leu Val Gln Thr 85 90
95Lys Gly Thr Gly Ala Ser Gly Ser Phe Lys Leu Asn Lys Lys Ala Ala
100 105 110Ser Gly Glu Ala Lys
Pro Lys Val Lys Lys Ala Gly Gly Thr Lys Pro 115
120 125Lys Lys Pro Val Gly Ala Ala Lys Lys Pro Lys Lys
Ala Ala Gly Gly 130 135 140Ala Thr Pro
Lys Lys Ser Ala Lys Lys Thr Pro Lys Lys Ala Lys Lys145
150 155 160Pro Ala Ala Ala Thr Val Thr
Lys Lys Val Ala Lys Ser Pro Lys Lys 165
170 175Ala Lys Val Ala Lys Pro Lys Lys Ala Ala Lys Ser
Ala Ala Lys Ala 180 185 190Val
Lys Pro Lys Ala Ala Lys Pro Lys Val Val Lys Pro Lys Lys Ala 195
200 205Ala Pro Lys Lys Lys
21014130PRTHomo sapiens 14Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg
Ala Lys Ala Lys1 5 10
15Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30Arg Leu Leu Arg Lys Gly Asn
Tyr Ala Glu Arg Val Gly Ala Gly Ala 35 40
45Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile
Leu 50 55 60Glu Leu Ala Gly Asn Ala
Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile65 70
75 80Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp
Glu Glu Leu Asn Lys 85 90
95Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110Gln Ala Val Leu Leu Pro
Lys Lys Thr Glu Ser His His Lys Ala Lys 115 120
125Gly Lys 13015126PRTHomo sapiens 15Met Pro Glu Leu Ala
Lys Ser Ala Pro Ala Pro Lys Lys Gly Ser Lys1 5
10 15Lys Ala Val Thr Lys Ala Gln Lys Lys Asp Gly
Lys Lys Arg Lys Arg 20 25
30Ser Arg Lys Glu Ser Tyr Ser Val Tyr Val Tyr Lys Val Leu Lys Gln
35 40 45Val His Pro Asp Thr Gly Ile Ser
Ser Lys Ala Met Gly Ile Met Asn 50 55
60Ser Phe Val Asn Asp Ile Phe Glu Arg Ile Ala Ser Glu Ala Ser Arg65
70 75 80Leu Ala His Tyr Asn
Lys Arg Ser Thr Ile Thr Ser Arg Glu Ile Gln 85
90 95Thr Ala Val Arg Leu Leu Leu Pro Gly Glu Leu
Ala Lys His Ala Val 100 105
110Ser Glu Gly Thr Lys Ala Val Thr Lys Tyr Thr Ser Ser Lys 115
120 12516483PRTHomo sapiens 16Met Ser Ile Arg
Val Thr Gln Lys Ser Tyr Lys Val Ser Thr Ser Gly1 5
10 15Pro Arg Ala Phe Ser Ser Arg Ser Tyr Thr
Ser Gly Pro Gly Ser Arg 20 25
30Ile Ser Ser Ser Ser Phe Ser Arg Val Gly Ser Ser Asn Phe Arg Gly
35 40 45Gly Leu Gly Gly Gly Tyr Gly Gly
Ala Ser Gly Met Gly Gly Ile Thr 50 55
60Ala Val Thr Val Asn Gln Ser Leu Leu Ser Pro Leu Val Leu Glu Val65
70 75 80Asp Pro Asn Ile Gln
Ala Val Arg Thr Gln Glu Lys Glu Gln Ile Lys 85
90 95Thr Leu Asn Asn Lys Phe Ala Ser Phe Ile Asp
Lys Val Arg Phe Leu 100 105
110Glu Gln Gln Asn Lys Met Leu Glu Thr Lys Trp Ser Leu Leu Gln Gln
115 120 125Gln Lys Thr Ala Arg Ser Asn
Met Asp Asn Met Phe Glu Ser Tyr Ile 130 135
140Asn Asn Leu Arg Arg Gln Leu Glu Thr Leu Gly Gln Glu Lys Leu
Lys145 150 155 160Leu Glu
Ala Glu Leu Gly Asn Met Gln Gly Leu Val Glu Asp Phe Lys
165 170 175Asn Lys Tyr Glu Asp Glu Ile
Asn Lys Arg Thr Glu Met Glu Asn Glu 180 185
190Phe Val Leu Ile Lys Lys Asp Val Asp Glu Ala Tyr Met Asn
Lys Val 195 200 205Glu Leu Glu Ser
Arg Leu Glu Gly Leu Thr Asp Glu Ile Asn Phe Leu 210
215 220Arg Gln Leu Tyr Glu Glu Glu Ile Arg Glu Leu Gln
Ser Gln Ile Ser225 230 235
240Asp Thr Ser Val Val Leu Ser Met Asp Asn Ser Arg Ser Leu Asp Met
245 250 255Asp Ser Ile Ile Ala
Glu Val Lys Ala Gln Tyr Glu Asp Ile Ala Asn 260
265 270Arg Ser Arg Ala Glu Ala Glu Ser Met Tyr Gln Ile
Lys Tyr Glu Glu 275 280 285Leu Gln
Ser Leu Ala Gly Lys His Gly Asp Asp Leu Arg Arg Thr Lys 290
295 300Thr Glu Ile Ser Glu Met Asn Arg Asn Ile Ser
Arg Leu Gln Ala Glu305 310 315
320Ile Glu Gly Leu Lys Gly Gln Arg Ala Ser Leu Glu Ala Ala Ile Ala
325 330 335Asp Ala Glu Gln
Arg Gly Glu Leu Ala Ile Lys Asp Ala Asn Ala Lys 340
345 350Leu Ser Glu Leu Glu Ala Ala Leu Gln Arg Ala
Lys Gln Asp Met Ala 355 360 365Arg
Gln Leu Arg Glu Tyr Gln Glu Leu Met Asn Val Lys Leu Ala Leu 370
375 380Asp Ile Glu Ile Ala Thr Tyr Arg Lys Leu
Leu Glu Gly Glu Glu Ser385 390 395
400Arg Leu Glu Ser Gly Met Gln Asn Met Ser Ile His Thr Lys Thr
Thr 405 410 415Ser Gly Tyr
Ala Gly Gly Leu Ser Ser Ala Tyr Gly Gly Leu Thr Ser 420
425 430Pro Gly Leu Ser Tyr Ser Leu Gly Ser Ser
Phe Gly Ser Gly Ala Gly 435 440
445Ser Ser Ser Phe Ser Arg Thr Ser Ser Ser Arg Ala Val Val Val Lys 450
455 460Lys Ile Glu Thr Arg Asp Gly Lys
Leu Val Ser Glu Ser Ser Asp Val465 470
475 480Leu Pro Lys172799PRTHomo sapiens 17Met Thr Ser Ile
His Phe Val Val His Pro Leu Pro Gly Thr Glu Asp1 5
10 15Gln Leu Asn Asp Arg Leu Arg Glu Val Ser
Glu Lys Leu Asn Lys Tyr 20 25
30Asn Leu Asn Ser His Pro Pro Leu Asn Val Leu Glu Gln Ala Thr Ile
35 40 45Lys Gln Cys Val Val Gly Pro Asn
His Ala Ala Phe Leu Leu Glu Asp 50 55
60Gly Arg Val Cys Arg Ile Gly Phe Ser Val Gln Pro Asp Arg Leu Glu65
70 75 80Leu Gly Lys Pro Asp
Asn Asn Asp Gly Ser Lys Leu Asn Ser Asn Ser 85
90 95Gly Ala Gly Arg Thr Ser Arg Pro Gly Arg Thr
Ser Asp Ser Pro Trp 100 105
110Phe Leu Ser Gly Ser Glu Thr Leu Gly Arg Leu Ala Gly Asn Thr Leu
115 120 125Gly Ser Arg Trp Ser Ser Gly
Val Gly Gly Ser Gly Gly Gly Ser Ser 130 135
140Gly Arg Ser Ser Ala Gly Ala Arg Asp Ser Arg Arg Gln Thr Arg
Val145 150 155 160Ile Arg
Thr Gly Arg Asp Arg Gly Ser Gly Leu Leu Gly Ser Gln Pro
165 170 175Gln Pro Val Ile Pro Ala Ser
Val Ile Pro Glu Glu Leu Ile Ser Gln 180 185
190Ala Gln Val Val Leu Gln Gly Lys Ser Arg Ser Val Ile Ile
Arg Glu 195 200 205Leu Gln Arg Thr
Asn Leu Asp Val Asn Leu Ala Val Asn Asn Leu Leu 210
215 220Ser Arg Asp Asp Glu Asp Gly Asp Asp Gly Asp Asp
Thr Ala Ser Glu225 230 235
240Ser Tyr Leu Pro Gly Glu Asp Leu Met Ser Leu Leu Asp Ala Asp Ile
245 250 255His Ser Ala His Pro
Ser Val Ile Ile Asp Ala Asp Ala Met Phe Ser 260
265 270Glu Asp Ile Ser Tyr Phe Gly Tyr Pro Ser Phe Arg
Arg Ser Ser Leu 275 280 285Ser Arg
Leu Gly Ser Ser Arg Val Leu Leu Leu Pro Leu Glu Arg Asp 290
295 300Ser Glu Leu Leu Arg Glu Arg Glu Ser Val Leu
Arg Leu Arg Glu Arg305 310 315
320Arg Trp Leu Asp Gly Ala Ser Phe Asp Asn Glu Arg Gly Ser Thr Ser
325 330 335Lys Glu Gly Glu
Pro Asn Leu Asp Lys Lys Asn Thr Pro Val Gln Ser 340
345 350Pro Val Ser Leu Gly Glu Asp Leu Gln Trp Trp
Pro Asp Lys Asp Gly 355 360 365Thr
Lys Phe Ile Cys Ile Gly Ala Leu Tyr Ser Glu Leu Leu Ala Val 370
375 380Ser Ser Lys Gly Glu Leu Tyr Gln Trp Lys
Trp Ser Glu Ser Glu Pro385 390 395
400Tyr Arg Asn Ala Gln Asn Pro Ser Leu His His Pro Arg Ala Thr
Phe 405 410 415Leu Gly Leu
Thr Asn Glu Lys Ile Val Leu Leu Ser Ala Asn Ser Ile 420
425 430Arg Ala Thr Val Ala Thr Glu Asn Asn Lys
Val Ala Thr Trp Val Asp 435 440
445Glu Thr Leu Ser Ser Val Ala Ser Lys Leu Glu His Thr Ala Gln Thr 450
455 460Tyr Ser Glu Leu Gln Gly Glu Arg
Ile Val Ser Leu His Cys Cys Ala465 470
475 480Leu Tyr Thr Cys Ala Gln Leu Glu Asn Ser Leu Tyr
Trp Trp Gly Val 485 490
495Val Pro Phe Ser Gln Arg Lys Lys Met Leu Glu Lys Ala Arg Ala Lys
500 505 510Asn Lys Lys Pro Lys Ser
Ser Ala Gly Ile Ser Ser Met Pro Asn Ile 515 520
525Thr Val Gly Thr Gln Val Cys Leu Arg Asn Asn Pro Leu Tyr
His Ala 530 535 540Gly Ala Val Ala Phe
Ser Ile Ser Ala Gly Ile Pro Lys Val Gly Val545 550
555 560Leu Met Glu Ser Val Trp Asn Met Asn Asp
Ser Cys Arg Phe Gln Leu 565 570
575Arg Ser Pro Glu Ser Leu Lys Asn Met Glu Lys Ala Ser Lys Thr Thr
580 585 590Glu Ala Lys Pro Glu
Ser Lys Gln Glu Pro Val Lys Thr Glu Met Gly 595
600 605Pro Pro Pro Ser Pro Ala Ser Thr Cys Ser Asp Ala
Ser Ser Ile Ala 610 615 620Ser Ser Ala
Ser Met Pro Tyr Lys Arg Arg Arg Ser Thr Pro Ala Pro625
630 635 640Lys Glu Glu Glu Lys Val Asn
Glu Glu Gln Trp Ser Leu Arg Glu Val 645
650 655Val Phe Val Glu Asp Val Lys Asn Val Pro Val Gly
Lys Val Leu Lys 660 665 670Val
Asp Gly Ala Tyr Val Ala Val Lys Phe Pro Gly Thr Ser Ser Asn 675
680 685Thr Asn Cys Gln Asn Ser Ser Gly Pro
Asp Ala Asp Pro Ser Ser Leu 690 695
700Leu Gln Asp Cys Arg Leu Leu Arg Ile Asp Glu Leu Gln Val Val Lys705
710 715 720Thr Gly Gly Thr
Pro Lys Val Pro Asp Cys Phe Gln Arg Thr Pro Lys 725
730 735Lys Leu Cys Ile Pro Glu Lys Thr Glu Ile
Leu Ala Val Asn Val Asp 740 745
750Ser Lys Gly Val His Ala Val Leu Lys Thr Gly Asn Trp Val Arg Tyr
755 760 765Cys Ile Phe Asp Leu Ala Thr
Gly Lys Ala Glu Gln Glu Asn Asn Phe 770 775
780Pro Thr Ser Ser Ile Ala Phe Leu Gly Gln Asn Glu Arg Asn Val
Ala785 790 795 800Ile Phe
Thr Ala Gly Gln Glu Ser Pro Ile Ile Leu Arg Asp Gly Asn
805 810 815Gly Thr Ile Tyr Pro Met Ala
Lys Asp Cys Met Gly Gly Ile Arg Asp 820 825
830Pro Asp Trp Leu Asp Leu Pro Pro Ile Ser Ser Leu Gly Met
Gly Val 835 840 845His Ser Leu Ile
Asn Leu Pro Ala Asn Ser Thr Ile Lys Lys Lys Ala 850
855 860Ala Val Ile Ile Met Ala Val Glu Lys Gln Thr Leu
Met Gln His Ile865 870 875
880Leu Arg Cys Asp Tyr Glu Ala Cys Arg Gln Tyr Leu Met Asn Leu Glu
885 890 895Gln Ala Val Val Leu
Glu Gln Asn Leu Gln Met Leu Gln Thr Phe Ile 900
905 910Ser His Arg Cys Asp Gly Asn Arg Asn Ile Leu His
Ala Cys Val Ser 915 920 925Val Cys
Phe Pro Thr Ser Asn Lys Glu Thr Lys Glu Glu Glu Glu Ala 930
935 940Glu Arg Ser Glu Arg Asn Thr Phe Ala Glu Arg
Leu Ser Ala Val Glu945 950 955
960Ala Ile Ala Asn Ala Ile Ser Val Val Ser Ser Asn Gly Pro Gly Asn
965 970 975Arg Ala Gly Ser
Ser Ser Ser Arg Ser Leu Arg Leu Arg Glu Met Met 980
985 990Arg Arg Ser Leu Arg Ala Ala Gly Leu Gly Arg
His Glu Ala Gly Ala 995 1000
1005Ser Ser Ser Asp His Gln Asp Pro Val Ser Pro Pro Ile Ala Pro Pro
1010 1015 1020Ser Trp Val Pro Asp Pro Pro
Ala Met Asp Pro Asp Gly Asp Ile Asp1025 1030
1035 1040Phe Ile Leu Ala Pro Ala Val Gly Ser Leu Thr Thr
Ala Ala Thr Gly 1045 1050
1055Thr Gly Gln Gly Pro Ser Thr Ser Thr Ile Pro Gly Pro Ser Thr Glu
1060 1065 1070Pro Ser Val Val Glu Ser
Lys Asp Arg Lys Ala Asn Ala His Phe Ile 1075 1080
1085Leu Lys Leu Leu Cys Asp Ser Val Val Leu Gln Pro Tyr Leu
Arg Glu 1090 1095 1100Leu Leu Ser Ala
Lys Asp Ala Arg Gly Met Thr Pro Phe Met Ser Ala1105 1110
1115 1120Val Ser Gly Arg Ala Tyr Pro Ala Ala
Ile Thr Ile Leu Glu Thr Ala 1125 1130
1135Gln Lys Ile Ala Lys Ala Glu Ile Ser Ser Ser Glu Lys Glu Glu
Asp 1140 1145 1150Val Phe Met
Gly Met Val Cys Pro Ser Gly Thr Asn Pro Asp Asp Ser 1155
1160 1165Pro Leu Tyr Val Leu Cys Cys Asn Asp Thr Cys
Ser Phe Thr Trp Thr 1170 1175 1180Gly
Ala Glu His Ile Asn Gln Asp Ile Phe Glu Cys Arg Thr Cys Gly1185
1190 1195 1200Leu Leu Glu Ser Leu Cys
Cys Cys Thr Glu Cys Ala Arg Val Cys His 1205
1210 1215Lys Gly His Asp Cys Lys Leu Lys Arg Thr Ser Pro
Thr Ala Tyr Cys 1220 1225
1230Asp Cys Trp Glu Lys Cys Lys Cys Lys Thr Leu Ile Ala Gly Gln Lys
1235 1240 1245Ser Ala Arg Leu Asp Leu Leu
Tyr Arg Leu Leu Thr Ala Thr Asn Leu 1250 1255
1260Val Thr Leu Pro Asn Ser Arg Gly Glu His Leu Leu Leu Phe Leu
Val1265 1270 1275 1280Gln
Thr Val Ala Arg Gln Thr Val Glu His Cys Gln Tyr Arg Pro Pro
1285 1290 1295Arg Ile Arg Glu Asp Arg Asn
Arg Lys Thr Ala Ser Pro Glu Asp Ser 1300 1305
1310Asp Met Pro Asp His Asp Leu Glu Pro Pro Arg Phe Ala Gln
Leu Ala 1315 1320 1325Leu Glu Arg
Val Leu Gln Asp Trp Asn Ala Leu Lys Ser Met Ile Met 1330
1335 1340Phe Gly Ser Gln Glu Asn Lys Asp Pro Leu Ser Ala
Ser Ser Arg Ile1345 1350 1355
1360Gly His Leu Leu Pro Glu Glu Gln Val Tyr Leu Asn Gln Gln Ser Gly
1365 1370 1375Thr Ile Arg Leu Asp
Cys Phe Thr His Cys Leu Ile Val Lys Cys Thr 1380
1385 1390Ala Asp Ile Leu Leu Leu Asp Thr Leu Leu Gly Thr
Leu Val Lys Glu 1395 1400 1405Leu
Gln Asn Lys Tyr Thr Pro Gly Arg Arg Glu Glu Ala Ile Ala Val 1410
1415 1420Thr Met Arg Phe Leu Arg Ser Val Ala Arg
Val Phe Val Ile Leu Ser1425 1430 1435
1440Val Glu Met Ala Ser Ser Lys Lys Lys Asn Asn Phe Ile Pro Gln
Pro 1445 1450 1455Ile Gly
Lys Cys Lys Arg Val Phe Gln Ala Leu Leu Pro Tyr Ala Val 1460
1465 1470Glu Glu Leu Cys Asn Val Ala Glu Ser
Leu Ile Val Pro Val Arg Met 1475 1480
1485Gly Ile Ala Arg Pro Thr Ala Pro Phe Thr Leu Ala Ser Thr Ser Ile
1490 1495 1500Asp Ala Met Gln Gly Ser Glu
Glu Leu Phe Ser Val Glu Pro Leu Pro1505 1510
1515 1520Pro Arg Pro Ser Ser Asp Gln Ser Ser Ser Ser Ser
Gln Ser Gln Ser 1525 1530
1535Ser Tyr Ile Ile Arg Asn Pro Gln Gln Arg Arg Ile Ser Gln Ser Gln
1540 1545 1550Pro Val Arg Gly Arg Asp
Glu Glu Gln Asp Asp Ile Val Ser Ala Asp 1555 1560
1565Val Glu Glu Val Glu Val Val Glu Gly Val Ala Gly Glu Glu
Asp His 1570 1575 1580His Asp Glu Gln
Glu Glu His Gly Glu Glu Asn Ala Glu Ala Glu Gly1585 1590
1595 1600Gln His Asp Glu His Asp Glu Asp Gly
Ser Asp Met Glu Leu Asp Leu 1605 1610
1615Leu Ala Ala Ala Glu Thr Glu Ser Asp Ser Glu Ser Asn His Ser
Asn 1620 1625 1630Gln Asp Asn
Ala Ser Gly Arg Arg Ser Val Val Thr Ala Ala Thr Ala 1635
1640 1645Gly Ser Glu Ala Gly Ala Ser Ser Val Pro Ala
Phe Phe Ser Glu Asp 1650 1655 1660Asp
Ser Gln Ser Asn Asp Ser Ser Asp Ser Asp Ser Ser Ser Ser Gln1665
1670 1675 1680Ser Asp Asp Ile Glu Gln
Glu Thr Phe Met Leu Asp Glu Pro Leu Glu 1685
1690 1695Arg Thr Thr Asn Ser Ser His Ala Asn Gly Ala Ala
Gln Ala Pro Arg 1700 1705
1710Ser Met Gln Trp Ala Val Arg Asn Thr Gln His Gln Arg Ala Ala Ser
1715 1720 1725Thr Ala Pro Ser Ser Thr Ser
Thr Pro Ala Ala Ser Ser Ala Gly Leu 1730 1735
1740Ile Tyr Ile Asp Pro Ser Asn Leu Arg Arg Ser Gly Thr Ile Ser
Thr1745 1750 1755 1760Ser
Ala Ala Ala Ala Ala Ala Ala Leu Glu Ala Ser Asn Ala Ser Ser
1765 1770 1775Tyr Leu Thr Ser Ala Ser Ser
Leu Ala Arg Ala Tyr Ser Ile Val Ile 1780 1785
1790Arg Gln Ile Ser Asp Leu Met Gly Leu Ile Pro Lys Tyr Asn
His Leu 1795 1800 1805Val Tyr Ser
Gln Ile Pro Ala Ala Val Lys Leu Thr Tyr Gln Asp Ala 1810
1815 1820Val Asn Leu Gln Asn Tyr Val Glu Glu Lys Leu Ile
Pro Thr Trp Asn1825 1830 1835
1840Trp Met Val Ser Ile Met Asp Ser Thr Glu Ala Gln Leu Arg Tyr Gly
1845 1850 1855Ser Ala Leu Ala Ser
Ala Gly Asp Pro Gly His Pro Asn His Pro Leu 1860
1865 1870His Ala Ser Gln Asn Ser Ala Arg Arg Glu Arg Met
Thr Ala Arg Glu 1875 1880 1885Glu
Ala Ser Leu Arg Thr Leu Glu Gly Arg Arg Arg Ala Thr Leu Leu 1890
1895 1900Ser Ala Arg Gln Gly Met Met Ser Ala Arg
Gly Asp Phe Leu Asn Tyr1905 1910 1915
1920Ala Leu Ser Leu Met Arg Ser His Asn Asp Glu His Ser Asp Val
Leu 1925 1930 1935Pro Val
Leu Asp Val Cys Ser Leu Lys His Val Ala Tyr Val Phe Gln 1940
1945 1950Ala Leu Ile Tyr Trp Ile Lys Ala Met
Asn Gln Gln Thr Thr Leu Asp 1955 1960
1965Thr Pro Gln Leu Glu Arg Lys Arg Thr Arg Glu Leu Leu Glu Leu Gly
1970 1975 1980Ile Asp Asn Glu Asp Ser Glu
His Glu Asn Asp Asp Asp Thr Asn Gln1985 1990
1995 2000Ser Ala Thr Leu Asn Asp Lys Asp Asp Asp Ser Leu
Pro Ala Glu Thr 2005 2010
2015Gly Gln Asn His Pro Phe Phe Arg Arg Ser Asp Ser Met Thr Phe Leu
2020 2025 2030Gly Cys Ile Pro Pro Asn
Pro Phe Glu Val Pro Leu Ala Glu Ala Ile 2035 2040
2045Pro Leu Ala Asp Gln Pro His Leu Leu Gln Pro Asn Ala Arg
Lys Glu 2050 2055 2060Asp Leu Phe Gly
Arg Pro Ser Gln Gly Leu Tyr Ser Ser Ser Ala Ser2065 2070
2075 2080Ser Gly Lys Cys Leu Met Glu Val Thr
Val Asp Arg Asn Cys Leu Glu 2085 2090
2095Val Leu Pro Thr Lys Met Ser Tyr Ala Ala Asn Leu Lys Asn Val
Met 2100 2105 2110Asn Met Gln
Asn Arg Gln Lys Lys Glu Gly Glu Glu Gln Pro Val Leu 2115
2120 2125Pro Glu Glu Thr Glu Ser Ser Lys Pro Gly Pro
Ser Ala His Asp Leu 2130 2135 2140Ala
Ala Gln Leu Lys Ser Ser Leu Leu Ala Glu Ile Gly Leu Thr Glu2145
2150 2155 2160Ser Glu Gly Pro Pro Leu
Thr Ser Phe Arg Pro Gln Cys Ser Phe Met 2165
2170 2175Gly Met Val Ile Ser His Asp Met Leu Leu Gly Arg
Trp Arg Leu Ser 2180 2185
2190Leu Glu Leu Phe Gly Arg Val Phe Met Glu Asp Val Gly Ala Glu Pro
2195 2200 2205Gly Ser Ile Leu Thr Glu Leu
Gly Gly Phe Glu Val Lys Glu Ser Lys 2210 2215
2220Phe Arg Arg Glu Met Glu Lys Leu Arg Asn Gln Gln Ser Arg Asp
Leu2225 2230 2235 2240Ser
Leu Glu Val Asp Arg Asp Arg Asp Leu Leu Ile Gln Gln Thr Met
2245 2250 2255Arg Gln Leu Asn Asn His Phe
Gly Arg Arg Cys Ala Thr Thr Pro Met 2260 2265
2270Ala Val His Arg Val Lys Val Thr Phe Lys Asp Glu Pro Gly
Glu Gly 2275 2280 2285Ser Gly Val
Ala Arg Ser Phe Tyr Thr Ala Ile Ala Gln Ala Phe Leu 2290
2295 2300Ser Asn Glu Lys Leu Pro Asn Leu Glu Cys Ile Gln
Asn Ala Asn Lys2305 2310 2315
2320Gly Thr His Thr Ser Leu Met Gln Arg Leu Arg Asn Arg Gly Glu Arg
2325 2330 2335Asp Arg Glu Arg Glu
Arg Glu Arg Glu Met Arg Arg Ser Ser Gly Leu 2340
2345 2350Arg Ala Gly Ser Arg Arg Asp Arg Asp Arg Asp Phe
Arg Arg Gln Leu 2355 2360 2365Ser
Ile Asp Thr Arg Pro Phe Arg Pro Ala Ser Glu Gly Asn Pro Ser 2370
2375 2380Asp Asp Pro Glu Pro Leu Pro Ala His Arg
Gln Ala Leu Gly Glu Arg2385 2390 2395
2400Leu Tyr Pro Arg Val Gln Ala Met Gln Pro Ala Phe Ala Ser Lys
Ile 2405 2410 2415Thr Gly
Met Leu Leu Glu Leu Ser Pro Ala Gln Leu Leu Leu Leu Leu 2420
2425 2430Ala Ser Glu Asp Ser Leu Arg Ala Arg
Val Asp Glu Ala Met Glu Leu 2435 2440
2445Ile Ile Ala His Gly Arg Glu Asn Gly Ala Asp Ser Ile Leu Asp Leu
2450 2455 2460Gly Leu Val Asp Ser Ser Glu
Lys Val Gln Gln Glu Asn Arg Lys Arg2465 2470
2475 2480His Gly Ser Ser Arg Ser Val Val Asp Met Asp Leu
Asp Asp Thr Asp 2485 2490
2495Asp Gly Asp Asp Asn Ala Pro Leu Phe Tyr Gln Pro Gly Lys Arg Gly
2500 2505 2510Phe Tyr Thr Pro Arg Pro
Gly Lys Asn Thr Glu Ala Arg Leu Asn Cys 2515 2520
2525Phe Arg Asn Ile Gly Arg Ile Leu Gly Leu Cys Leu Leu Gln
Asn Glu 2530 2535 2540Leu Cys Pro Ile
Thr Leu Asn Arg His Val Ile Lys Val Leu Leu Gly2545 2550
2555 2560Arg Lys Val Asn Trp His Asp Phe Ala
Phe Phe Asp Pro Val Met Tyr 2565 2570
2575Glu Ser Leu Arg Gln Leu Ile Leu Ala Ser Gln Ser Ser Asp Ala
Asp 2580 2585 2590Ala Val Phe
Ser Ala Met Asp Leu Ala Phe Ala Ile Asp Leu Cys Lys 2595
2600 2605Glu Glu Gly Gly Gly Gln Val Glu Leu Ile Pro
Asn Gly Val Asn Ile 2610 2615 2620Pro
Val Thr Pro Gln Asn Val Tyr Glu Tyr Val Arg Lys Tyr Ala Glu2625
2630 2635 2640His Arg Met Leu Val Val
Ala Glu Gln Pro Leu His Ala Met Arg Lys 2645
2650 2655Gly Leu Leu Asp Val Leu Pro Lys Asn Ser Leu Glu
Asp Leu Thr Ala 2660 2665
2670Glu Asp Phe Arg Leu Leu Val Asn Gly Cys Gly Glu Val Asn Val Gln
2675 2680 2685Met Leu Ile Ser Phe Thr Ser
Phe Asn Asp Glu Ser Gly Glu Asn Ala 2690 2695
2700Glu Lys Leu Leu Gln Phe Lys Arg Trp Phe Trp Ser Ile Val Glu
Lys2705 2710 2715 2720Met
Ser Met Thr Glu Arg Gln Asp Leu Val Tyr Phe Trp Thr Ser Ser
2725 2730 2735Pro Ser Leu Pro Ala Ser Glu
Glu Gly Phe Gln Pro Met Pro Ser Ile 2740 2745
2750Thr Ile Arg Pro Pro Asp Asp Gln His Leu Pro Thr Ala Asn
Thr Cys 2755 2760 2765Ile Ser Arg
Leu Tyr Val Pro Leu Tyr Ser Ser Lys Gln Ile Leu Lys 2770
2775 2780Gln Lys Leu Leu Leu Ala Ile Lys Thr Lys Asn Phe
Gly Phe Val2785 2790 279518130PRTHomo
sapiens 18Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala
Lys1 5 10 15Ser Arg Ser
Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His 20
25 30Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu
Arg Val Gly Ala Gly Ala 35 40
45Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu 50
55 60Glu Leu Ala Gly Asn Ala Ala Arg Asp
Asn Lys Lys Thr Arg Ile Ile65 70 75
80Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu
Asn Lys 85 90 95Leu Leu
Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile 100
105 110Gln Ala Val Leu Leu Pro Lys Lys Thr
Glu Ser His His Lys Ala Lys 115 120
125Gly Lys 13019693PRTHomo sapiens 19Met Ala Ala Leu Gly Val Gln Ser
Ile Asn Trp Gln Thr Ala Phe Asn1 5 10
15Arg Gln Ala His His Thr Asp Lys Phe Ser Ser Gln Glu Leu
Ile Leu 20 25 30Arg Arg Gly
Gln Asn Phe Gln Val Leu Met Ile Met Asn Lys Gly Leu 35
40 45Gly Ser Asn Glu Arg Leu Glu Phe Ile Val Ser
Thr Gly Pro Tyr Pro 50 55 60Ser Glu
Ser Ala Met Thr Lys Ala Val Phe Pro Leu Ser Asn Gly Ser65
70 75 80Ser Gly Gly Trp Ser Ala Val
Leu Gln Ala Ser Asn Gly Asn Thr Leu 85 90
95Thr Ile Ser Ile Ser Ser Pro Ala Ser Ala Pro Ile Gly
Arg Tyr Thr 100 105 110Met Ala
Leu Gln Ile Phe Ser Gln Gly Gly Ile Ser Ser Val Lys Leu 115
120 125Gly Thr Phe Ile Leu Leu Phe Asn Pro Trp
Leu Asn Val Asp Ser Val 130 135 140Phe
Met Gly Asn His Ala Glu Arg Glu Glu Tyr Val Gln Glu Asp Ala145
150 155 160Gly Ile Ile Phe Val Gly
Ser Thr Asn Arg Ile Gly Met Ile Gly Trp 165
170 175Asn Phe Gly Gln Phe Glu Glu Asp Ile Leu Ser Ile
Cys Leu Ser Ile 180 185 190Leu
Asp Arg Ser Leu Asn Phe Arg Arg Asp Ala Ala Thr Asp Val Ala 195
200 205Ser Arg Asn Asp Pro Lys Tyr Val Gly
Arg Val Leu Ser Ala Met Ile 210 215
220Asn Ser Asn Asp Asp Asn Gly Val Leu Ala Gly Asn Trp Ser Gly Thr225
230 235 240Tyr Thr Gly Gly
Arg Asp Pro Arg Ser Trp Asn Gly Ser Val Glu Ile 245
250 255Leu Lys Asn Trp Lys Lys Ser Gly Phe Ser
Pro Val Arg Tyr Gly Gln 260 265
270Cys Trp Val Phe Ala Gly Thr Leu Asn Thr Ala Leu Arg Ser Leu Gly
275 280 285Ile Pro Ser Arg Val Ile Thr
Asn Phe Asn Ser Ala His Asp Thr Asp 290 295
300Arg Asn Leu Ser Val Asp Val Tyr Tyr Asp Pro Met Gly Asn Pro
Leu305 310 315 320Asp Lys
Gly Ser Asp Ser Val Trp Asn Phe His Val Trp Asn Glu Gly
325 330 335Trp Phe Val Arg Ser Asp Leu
Gly Pro Ser Tyr Gly Gly Trp Gln Val 340 345
350Leu Asp Ala Thr Pro Gln Glu Arg Ser Gln Gly Val Phe Gln
Cys Gly 355 360 365Pro Ala Ser Val
Ile Gly Val Arg Glu Gly Asp Val Gln Leu Asn Phe 370
375 380Asp Met Pro Phe Ile Phe Ala Glu Val Asn Ala Asp
Arg Ile Thr Trp385 390 395
400Leu Tyr Asp Asn Thr Thr Gly Lys Gln Trp Lys Asn Ser Val Asn Ser
405 410 415His Thr Ile Gly Arg
Tyr Ile Ser Thr Lys Ala Val Gly Ser Asn Ala 420
425 430Arg Met Asp Val Thr Asp Lys Tyr Lys Tyr Pro Glu
Gly Ser Asp Gln 435 440 445Glu Arg
Gln Val Phe Gln Lys Ala Leu Gly Lys Leu Lys Pro Asn Thr 450
455 460Pro Phe Ala Ala Thr Ser Ser Met Gly Leu Glu
Thr Glu Glu Gln Glu465 470 475
480Pro Ser Ile Ile Gly Lys Leu Lys Val Ala Gly Met Leu Ala Val Gly
485 490 495Lys Glu Val Asn
Leu Val Leu Leu Leu Lys Asn Leu Ser Arg Asp Thr 500
505 510Lys Thr Val Thr Val Asn Met Thr Ala Trp Thr
Ile Ile Tyr Asn Gly 515 520 525Thr
Leu Val His Glu Val Trp Lys Asp Ser Ala Thr Met Ser Leu Asp 530
535 540Pro Glu Glu Glu Ala Glu His Pro Ile Lys
Ile Ser Tyr Ala Gln Tyr545 550 555
560Glu Lys Tyr Leu Lys Ser Asp Asn Met Ile Arg Ile Thr Ala Val
Cys 565 570 575Lys Val Pro
Asp Glu Ser Glu Val Val Val Glu Arg Asp Ile Ile Leu 580
585 590Asp Asn Pro Thr Leu Thr Leu Glu Val Leu
Asn Glu Ala Arg Val Arg 595 600
605Lys Pro Val Asn Val Gln Met Leu Phe Ser Asn Pro Leu Asp Glu Pro 610
615 620Val Arg Asp Cys Val Leu Met Val
Glu Gly Ser Gly Leu Leu Leu Gly625 630
635 640Asn Leu Lys Ile Asp Val Pro Thr Leu Gly Pro Lys
Glu Gly Ser Arg 645 650
655Val Arg Phe Asp Ile Leu Pro Ser Arg Ser Gly Thr Lys Gln Leu Leu
660 665 670Ala Asp Phe Ser Cys Asn
Lys Phe Pro Ala Ile Lys Ala Met Leu Ser 675 680
685Ile Asp Val Ala Glu 69020254PRTHomo sapiens 20Met Ala
Ile Ser Gly Val Pro Val Leu Gly Phe Phe Ile Ile Ala Val1 5
10 15Leu Met Ser Ala Gln Glu Ser Trp
Ala Ile Lys Glu Glu His Val Ile 20 25
30Ile Gln Ala Glu Phe Tyr Leu Asn Pro Asp Gln Ser Gly Glu Phe
Met 35 40 45Phe Asp Phe Asp Gly
Asp Glu Ile Phe His Val Asp Met Ala Lys Lys 50 55
60Glu Thr Val Trp Arg Leu Glu Glu Phe Gly Arg Phe Ala Ser
Phe Glu65 70 75 80Ala
Gln Gly Ala Leu Ala Asn Ile Ala Val Asp Lys Ala Asn Leu Glu
85 90 95Ile Met Thr Lys Arg Ser Asn
Tyr Thr Pro Ile Thr Asn Val Pro Pro 100 105
110Glu Val Thr Val Leu Thr Asn Ser Pro Val Glu Leu Arg Glu
Pro Asn 115 120 125Val Leu Ile Cys
Phe Ile Asp Lys Phe Thr Pro Pro Val Val Asn Val 130
135 140Thr Trp Leu Arg Asn Gly Lys Pro Val Thr Thr Gly
Val Ser Glu Thr145 150 155
160Val Phe Leu Pro Arg Glu Asp His Leu Phe Arg Lys Phe His Tyr Leu
165 170 175Pro Phe Leu Pro Ser
Thr Glu Asp Val Tyr Asp Cys Arg Val Glu His 180
185 190Trp Gly Leu Asp Glu Pro Leu Leu Lys His Trp Glu
Phe Asp Ala Pro 195 200 205Ser Pro
Leu Pro Glu Thr Thr Glu Asn Val Val Cys Ala Leu Gly Leu 210
215 220Thr Val Gly Leu Val Gly Ile Ile Ile Gly Thr
Ile Phe Ile Ile Lys225 230 235
240Gly Leu Arg Lys Ser Asn Ala Ala Glu Arg Arg Gly Pro Leu
245 250211050PRTHomo sapiens 21Met Ala Ala Val Lys
Lys Glu Gly Gly Ala Leu Ser Glu Ala Met Ser1 5
10 15Leu Glu Gly Asp Glu Trp Glu Leu Ser Lys Glu
Asn Val Gln Pro Leu 20 25
30Arg Gln Gly Arg Ile Met Ser Thr Leu Gln Gly Ala Leu Ala Gln Glu
35 40 45Ser Ala Cys Asn Asn Thr Leu Gln
Gln Gln Lys Arg Ala Phe Glu Tyr 50 55
60Glu Ile Arg Phe Tyr Thr Gly Asn Asp Pro Leu Asp Val Trp Asp Arg65
70 75 80Tyr Ile Ser Trp Thr
Glu Gln Asn Tyr Pro Gln Gly Gly Lys Glu Ser 85
90 95Asn Met Ser Thr Leu Leu Glu Arg Ala Val Glu
Ala Leu Gln Gly Glu 100 105
110Lys Arg Tyr Tyr Ser Asp Pro Arg Phe Leu Asn Leu Trp Leu Lys Leu
115 120 125Gly Arg Leu Cys Asn Glu Pro
Leu Asp Met Tyr Ser Tyr Leu His Asn 130 135
140Gln Gly Ile Gly Val Ser Leu Ala Gln Phe Tyr Ile Ser Trp Ala
Glu145 150 155 160Glu Tyr
Glu Ala Arg Glu Asn Phe Arg Lys Ala Asp Ala Ile Phe Gln
165 170 175Glu Gly Ile Gln Gln Lys Ala
Glu Pro Leu Glu Arg Leu Gln Ser Gln 180 185
190His Arg Gln Phe Gln Ala Arg Val Ser Arg Gln Thr Leu Leu
Ala Leu 195 200 205Glu Lys Glu Glu
Glu Glu Glu Val Phe Glu Ser Ser Val Pro Gln Arg 210
215 220Ser Thr Leu Ala Glu Leu Lys Ser Lys Gly Lys Lys
Thr Ala Arg Ala225 230 235
240Pro Ile Ile Arg Val Gly Gly Ala Leu Lys Ala Pro Ser Gln Asn Arg
245 250 255Gly Leu Gln Asn Pro
Phe Pro Gln Gln Met Gln Asn Asn Ser Arg Ile 260
265 270Thr Val Phe Asp Glu Asn Ala Asp Glu Ala Ser Thr
Ala Glu Leu Ser 275 280 285Lys Pro
Thr Val Gln Pro Trp Ile Ala Pro Pro Met Pro Arg Ala Lys 290
295 300Glu Asn Glu Leu Gln Ala Gly Pro Trp Asn Thr
Gly Arg Ser Leu Glu305 310 315
320His Arg Pro Arg Gly Asn Thr Ala Ser Leu Ile Ala Val Pro Ala Val
325 330 335Leu Pro Ser Phe
Thr Pro Tyr Val Glu Glu Thr Ala Arg Gln Pro Val 340
345 350Met Thr Pro Cys Lys Ile Glu Pro Ser Ile Asn
His Ile Leu Ser Thr 355 360 365Arg
Lys Pro Gly Lys Glu Glu Gly Asp Pro Leu Gln Arg Val Gln Ser 370
375 380His Gln Gln Ala Ser Glu Glu Lys Lys Glu
Lys Met Met Tyr Cys Lys385 390 395
400Glu Lys Ile Tyr Ala Gly Val Gly Glu Phe Ser Phe Glu Glu Ile
Arg 405 410 415Ala Glu Val
Phe Arg Lys Lys Leu Lys Glu Gln Arg Glu Ala Glu Leu 420
425 430Leu Thr Ser Ala Glu Lys Arg Ala Glu Met
Gln Lys Gln Ile Glu Glu 435 440
445Met Glu Lys Lys Leu Lys Glu Ile Gln Thr Thr Gln Gln Glu Arg Thr 450
455 460Gly Asp Gln Gln Glu Glu Thr Met
Pro Thr Lys Glu Thr Thr Lys Leu465 470
475 480Gln Ile Ala Ser Glu Ser Gln Lys Ile Pro Gly Met
Thr Leu Ser Ser 485 490
495Ser Val Cys Gln Val Asn Cys Cys Ala Arg Glu Thr Ser Leu Ala Glu
500 505 510Asn Ile Trp Gln Glu Gln
Pro His Ser Lys Gly Pro Ser Val Pro Phe 515 520
525Ser Ile Phe Asp Glu Phe Leu Leu Ser Glu Lys Lys Asn Lys
Ser Pro 530 535 540Pro Ala Asp Pro Pro
Arg Val Leu Ala Gln Arg Arg Pro Leu Ala Val545 550
555 560Leu Lys Thr Ser Glu Ser Ile Thr Ser Asn
Glu Asp Val Ser Pro Asp 565 570
575Val Cys Asp Glu Phe Thr Gly Ile Glu Pro Leu Ser Glu Asp Ala Ile
580 585 590Ile Thr Gly Phe Arg
Asn Val Thr Ile Cys Pro Asn Pro Glu Asp Thr 595
600 605Cys Asp Phe Ala Arg Ala Ala Arg Phe Val Ser Thr
Pro Phe His Glu 610 615 620Ile Met Ser
Leu Lys Asp Leu Pro Ser Asp Pro Glu Arg Leu Leu Pro625
630 635 640Glu Glu Asp Leu Asp Val Lys
Thr Ser Glu Asp Gln Gln Thr Ala Cys 645
650 655Gly Thr Ile Tyr Ser Gln Thr Leu Ser Ile Lys Lys
Leu Ser Pro Ile 660 665 670Ile
Glu Asp Ser Arg Glu Ala Thr His Ser Ser Gly Phe Ser Gly Ser 675
680 685Ser Ala Ser Val Ala Ser Thr Ser Ser
Ile Lys Cys Leu Gln Ile Pro 690 695
700Glu Lys Leu Glu Leu Thr Asn Glu Thr Ser Glu Asn Pro Thr Gln Ser705
710 715 720Pro Trp Cys Ser
Gln Tyr Arg Arg Gln Leu Leu Lys Ser Leu Pro Glu 725
730 735Leu Ser Ala Ser Ala Glu Leu Cys Ile Glu
Asp Arg Pro Met Pro Lys 740 745
750Leu Glu Ile Glu Lys Glu Ile Glu Leu Gly Asn Glu Asp Tyr Cys Ile
755 760 765Lys Arg Glu Tyr Leu Ile Cys
Glu Asp Tyr Lys Leu Phe Trp Val Ala 770 775
780Pro Arg Asn Ser Ala Glu Leu Thr Val Ile Lys Val Ser Ser Gln
Pro785 790 795 800Val Pro
Trp Asp Phe Tyr Ile Asn Leu Lys Leu Lys Glu Arg Leu Asn
805 810 815Glu Asp Phe Asp His Phe Cys
Ser Cys Tyr Gln Tyr Gln Asp Gly Cys 820 825
830Ile Val Trp His Gln Tyr Ile Asn Cys Phe Thr Leu Gln Asp
Leu Leu 835 840 845Gln His Ser Glu
Tyr Ile Thr His Glu Ile Thr Val Leu Ile Ile Tyr 850
855 860Asn Leu Leu Thr Ile Val Glu Met Leu His Lys Ala
Glu Ile Val His865 870 875
880Gly Asp Leu Ser Pro Arg Cys Leu Ile Leu Arg Asn Arg Ile His Asp
885 890 895Pro Tyr Asp Cys Asn
Lys Asn Asn Gln Ala Leu Lys Ile Val Asp Phe 900
905 910Ser Tyr Ser Val Asp Leu Arg Val Gln Leu Asp Val
Phe Thr Leu Ser 915 920 925Gly Phe
Arg Thr Val Gln Ile Leu Glu Gly Gln Lys Ile Leu Ala Asn 930
935 940Cys Ser Ser Pro Tyr Gln Val Asp Leu Phe Gly
Ile Ala Asp Leu Ala945 950 955
960His Leu Leu Leu Phe Lys Glu His Leu Gln Val Phe Trp Asp Gly Ser
965 970 975Phe Trp Lys Leu
Ser Gln Asn Ile Ser Glu Leu Lys Asp Gly Glu Leu 980
985 990Trp Asn Lys Phe Phe Val Arg Ile Leu Asn Ala
Asn Asp Glu Ala Thr 995 1000
1005Val Ser Val Leu Gly Glu Leu Ala Ala Glu Met Asn Gly Val Phe Asp
1010 1015 1020Thr Thr Phe Gln Ser His Leu
Asn Lys Ala Leu Trp Lys Val Gly Lys1025 1030
1035 1040Leu Thr Ser Pro Gly Ala Leu Leu Phe Gln
1045 105022190PRTHomo
sapiensVARIANT(40)...(40)Selenocysteine 22Met Ala Phe Ile Ala Lys Ser Phe
Tyr Asp Leu Ser Ala Ile Ser Leu1 5 10
15Asp Gly Glu Lys Val Asp Phe Asn Thr Phe Arg Gly Arg Ala
Val Leu 20 25 30Ile Glu Asn
Val Ala Ser Leu Cys Gly Thr Thr Thr Arg Asp Phe Thr 35
40 45Gln Leu Asn Glu Leu Gln Cys Arg Phe Pro Arg
Arg Leu Val Val Leu 50 55 60Gly Phe
Pro Cys Asn Gln Phe Gly His Gln Glu Asn Cys Gln Asn Glu65
70 75 80Glu Ile Leu Asn Ser Leu Lys
Tyr Val Arg Pro Gly Gly Gly Tyr Gln 85 90
95Pro Thr Phe Thr Leu Val Gln Lys Cys Glu Val Asn Gly
Gln Asn Glu 100 105 110His Pro
Val Phe Ala Tyr Leu Lys Asp Lys Leu Pro Tyr Pro Tyr Asp 115
120 125Asp Pro Phe Ser Leu Met Thr Asp Pro Lys
Leu Ile Ile Trp Ser Pro 130 135 140Val
Arg Arg Ser Asp Val Ala Trp Asn Phe Glu Lys Phe Leu Ile Gly145
150 155 160Pro Glu Gly Glu Pro Phe
Arg Arg Tyr Ser Arg Thr Phe Pro Thr Ile 165
170 175Asn Ile Glu Pro Asp Ile Lys Arg Leu Leu Lys Val
Ala Ile 180 185 190
User Contributions:
Comment about this patent or add new information about this topic: