Patent application title: BIOLOGICAL ENTITIES AND THE USE THEREOF
Inventors:
Ulrich Haupts (Nattermannallee, DE)
Andre Koltermann (Nattermannalle, DE)
Andreas Scheidig (Nattermannallee, DE)
Christian Votsmeier (Nattermannallee, DE)
Ulrich Ketting (Nattermannallee, DE)
IPC8 Class: AA61K3843FI
USPC Class:
424 943
Class name: Drug, bio-affecting and body treating compositions enzyme or coenzyme containing stabilized enzymes or enzymes complexed with nonenzyme (e.g., liposomes, etc.)
Publication date: 2009-08-20
Patent application number: 20090208474
Inventors list |
Agents list |
Assignees list |
List by place |
Classification tree browser |
Top 100 Inventors |
Top 100 Agents |
Top 100 Assignees |
Usenet FAQ Index |
Documents |
Other FAQs |
Patent application title: BIOLOGICAL ENTITIES AND THE USE THEREOF
Inventors:
Ulrich Haupts
Andre Koltermann
Andreas Scheidig
Christian Votsmeier
Ulrich Ketting
Agents:
Ballard Spahr Andrews & Ingersoll, LLP
Assignees:
Origin: ATLANTA, GA US
IPC8 Class: AA61K3843FI
USPC Class:
424 943
Abstract:
The present invention provides engineered enzymes generated from protein
scaffolds combined with Specificity Determining Regions, the production
thereof and the use of said engineered enzymes for research, nutritional
care, personal care and industrial purposes.Claims:
1. A recombinant engineered enzyme with catalytic activity of defined
specificity, characterized by a combination of the following
components:(a) a protein scaffold capable of catalyzing at least one
protein cleavage reaction on at least one target substrate and being a
serine protease of the structural class S1, and(b) one or more
specificity determining regions (SDRs), wherein the SDRs are peptide
sequences that are inserted into the protein scaffold at one or more
positions that correspond structurally or by amino acid sequence homology
to the regions 18-25, 54-63, 73-86, 148-156, 165-171 and 194-204 in human
trypsin I having the amino acid sequence shown in SEQ ID NO: 1, wherein
the inserted SDRs enable the resulting engineered protein to discriminate
between at least one target substrate and one or more different
substrates.
2. The recombinant engineered enzyme of claim 1, wherein the SDRs (b) have a length of less than 50 amino acid residues.
3. The recombinant engineered enzyme of claim 2, wherein the SDRs (b) have a length between two and 20 amino acid residues.
4. The recombinant engineered enzyme of claim 3, wherein the SDRs (b) have a length between two and ten amino acid residues.
5. The recombinant engineered enzyme of claim 4, wherein the SDRs (b) have a length between three and eight amino acid residues.
6. The recombinant engineered enzyme of claim 2, wherein the number of SDRs is at least one.
7. The recombinant engineered enzyme of claim 6, wherein the number of SDRs is more than one.
8. The recombinant engineered enzyme of claim 6, wherein the number of SDRs is between two and eleven.
9. The recombinant engineered enzyme of claim 6, wherein the number of SDRs is between two and six.
10. The recombinant engineered enzyme of claim 1, wherein the protein scaffold (a) encoded by a gene of viral origin.
11. The recombinant engineered enzyme of claim 1, wherein the protein scaffold (a) is encoded by a gene of prokaryotic origin.
12. The recombinant engineered enzyme of claim 1, wherein the protein scaffold (a) is encoded by a gene of eukaryotic origin.
13. The recombinant engineered enzyme of claim 1, wherein the protein scaffold (a) is comprised of one or more polypeptides being derived from the same or different native enzymes.
14. The recombinant engineered enzyme of claim 1, wherein the protein scaffold (a) is comprised of one or more polypeptides being derived from the same or different native mammalian enzymes.
15. The recombinant engineered enzyme of claim 14, wherein the mammalian enzymes are human enzymes.
16-28. (canceled)
29. The recombinant engineered enzyme of claim 1, further comprising SDRs located at one or more positions selected from the group of positions that correspond structurally or by amino acid sequence homology to the regions 38-48, and 122-130 in human trypsin I having the amino acid sequence shown in SEQ ID NO: 1.
30. The recombinant engineered enzyme of claim 1, wherein the SDRs are located at one or more positions selected from the group of positions that correspond structurally or by amino acid sequence homology to the regions 20-23, 57-60, 76-83, 150-153, 167-169 and 197-201 in human trypsin I having the amino acid sequence shown in SEQ ID NO: 1.
31. The recombinant engineered enzyme of claim 1, wherein the protein scaffold (a) is derived from the serine protease trypsin.
32. The recombinant engineered enzyme of claim 31, wherein the serine protease trypsin is human trypsin I having the amino acid sequence shown in SEQ ID NO:1 or a derivative thereof.
33. The recombinant engineered enzyme of claim 31, wherein the serine protease trypsin has amino acid sequence SEQ ID NO: 1 and comprises one or more of the amino acid substitutions selected from the group consisting of E56G, R78W, Y131F, A146T and C183R.
34. The recombinant engineered enzyme of claim 28, which has at least one of two SDRs located in the scaffold, a first SDR having a length of up to 6 amino acids and being inserted between residues 42 and 43, and a second SDR having a length of up to 5 amino acids and being inserted between residues 123 and 124, the numbering being relative to human trypsin I having the amino acid sequence shown in SEQ ID NO: 1.
35. The recombinant engineered enzyme of claim 34, which comprises one of the peptide sequences of the following group: SEQ ID NO: 72, 78, 79, 80, 84, 85, 86, 87, 88, and 89 inserted as the first SDR between residues 42 and 43.
36. The recombinant engineered enzyme of claim 34, which comprises one of the peptide sequences of the following group: SEQ ID NO: 73, 81, 82, 83, 90, 91, 92, 93, 94, and 95 inserted as the second SDR between residues 123 and 124.
37. The recombinant engineered enzyme of claim 31, which comprises an amino acid sequence selected from the group consisting of SEQ ID NO:74 and SEQ ID NO:75.
38-44. (canceled)
45. A fusion protein which is comprised of at least one engineered enzyme of claim 1 and at least one further proteinacious component.
46. The fusion protein of claim 45, wherein the further proteinacious component is selected from the group consisting of binding domains, receptors, antibodies, regulation domains, pro-sequences, and fragments thereof.
47. A fusion protein which is comprised of at least one engineered enzyme of claim 1 and at least one further functional component.
48. The fusion protein of claim 47, wherein the functional component is selected from the group consisting of polyethylenglycols, carbohydrates, lipids, fatty acids, nucleic acids, metals, metal chelates, and fragments or derivatives thereof.
49-71. (canceled)
72. A composition comprising one or more engineered enzymes of claim 1.
73. A composition comprising a fusion protein of claim 47.
74. A composition comprising a fusion protein of claim 45.
75. The composition of claim 72, which is a composition selected from the group consisting of research composition, nutritional composition, cleaning composition, food additive composition, desinfection composition, cosmetic composition and composition for personal care.
76. The composition of claim 73, which is a composition selected from the group consisting of research composition, nutritional composition, cleaning composition, food additive composition, desinfection composition, cosmetic composition and composition for personal care.
77. The composition of claim 74, which is a composition selected from the group consisting of research composition, nutritional composition, cleaning composition, food additive composition, desinfection composition, cosmetic composition and composition for personal care.
78. The composition of claim 72, which further comprises optional components selected from the group consisting of a pharmaceutically acceptable carrier(s) and auxiliary agent(s).
79. The composition of claim 73, which further comprises optional components selected from the group consisting of a pharmaceutically acceptable carrier(s) and auxiliary agent(s).
80. The composition of claim 74, which further comprises optional components selected from the group consisting of a pharmaceutically acceptable carrier(s) and auxiliary agent(s).
81. The recombinant engineered enzyme of claim 29, wherein the SDRs are located at one or more positions selected from the group of positions that correspond structurally or by amino acid sequence homology to the regions 41-45 and 125-128 in human trypsin I having the amino acid sequence shown in SEQ ID NO: 1.
Description:
[0001]This application claims the priority benefit of European Application
No. 03013819, filed Jun. 18, 2003; European Application No. 03025851,
filed Nov. 10, 2003; European Application No. 03025871, filed Nov. 11,
2003; and U.S. Provisional Application No. 60/524,960, filed Nov. 25,
2003, which applications are incorporated herein fully by this reference.
[0002]The present invention provides engineered enzymes comprised of a protein scaffold and Specificity Determining Regions, the production of such enzymes and the use thereof for therapeutic, research, diagnostic, nutritional care, personal care and industrial purposes.
BACKGROUND
[0003]Academic and industrial research continuously searches for functional proteins to be used as therapeutic, research, diagnostic, nutritional, personal care or industrial agents. Today, such functional proteins can be classified mainly into two categories: natural proteins and engineered proteins. Natural proteins, on the one hand, are discovered from nature, e.g. by screening natural isolates or by sequencing genomes from diverse species. Engineered proteins, on the other hand, are typically based on known proteins and are altered in order to acquire modified functionalities. The present invention discloses engineered proteins with novel functions as compared to the starting components. Such proteins are called NBEs (New Biologic Entities). The NBEs disclosed in the present invention are engineered enzymes with novel substrate specificities or fusion proteins of such engineered enzymes with other functional components.
[0004]Specificity is an essential element of enzyme function. A cell consists of thousands of different, highly reactive catalysts. Yet the cell is able to maintain a coordinated metabolism and a highly organized three-dimensional structure. This is due in part to the specificity of enzymes, i.e. the selective conversion of their respective substrates. Specificity is a qualitative and a quantitative property: the specificity of a particular enzyme can vary widely, ranging from just one particular type of target molecules to all molecular types with certain chemical substructures. In nature, the specificity of an organism's enzymes has been evolved to the particular needs of the organism. Arbitrary specificities with high value for therapeutic, research, diagnostic, nutritional or industrial applications are unlikely to be found in any organism's enzymatic repertoire due to the large space of possible specificities. The only realistic way of obtaining such specificities is their generation de novo.
[0005]When comparing enzymes with binders, a paradigm of specificity is given by antibodies recognizing individual epitopes as small distinct structures within large molecules. The naturally occurring vast range of antibody specificities is attributed to the diversity generated by the immune system combined with natural selection. Several mechanisms contribute to the vast repertoire of antibody specificity and occur at different stages of immune response generation and antibody maturation (Janeway, C et al. (1999) Immunobiology, Elsevier Science Ltd., Garland Publishing, New York). Specifically, antibodies contain complementarity determining regions (CDRs) which interact with the antigen in a highly specific manner and allow discrimination even between very similar epitopes. The light as well as the heavy chain of the antibody each contribute three CDRs to the binding domain. Nature uses recombination of various gene segments combined with further mutagenesis in the generation of CDRs. As a result, the sequences of the six CDR loops are highly variable in composition and length and this forms the basis for the diversity of binding specificities in antibodies. A similar principle for the generation of a diversity of catalytic specificities is not known from nature.
[0006]Catalysis, i.e. the increase of the rate of a specific chemical reaction, is besides binding the most important protein function. Catalytic proteins, i.e. enzymes, are classified according to the chemical reaction they catalyze.
[0007]Transferases are enzymes transferring a group, for example, the methyl group or a glycosyl group, from one compound (generally regarded as donor) to another compound (generally regarded as acceptor). For example, glycosyltransferases (EC 2.4) transfer glycosyl residues from a donor to an acceptor molecule. Some of the glycosyltransferases also catalyze hydrolysis, which can be regarded as transfer of a glycosyl group from the donor to water. The subclass is further subdivided into hexosyltransferases (EC 2.4.1), pentosyltransferases (EC 2.4.2) and those transferring other glycosyl groups (EC 2.4.99, Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB)).
[0008]Oxidoreductases catalyze oxido-reductions. The substrate that is oxidized is regarded as hydrogen or electron donor. Oxidoreductases are classified as dehydrogenases, oxidases, mono- and dioxygenases. Dehydrogenases transfer hydrogen from a hydrogen donor to a hydrogen acceptor molecule. Oxidases react with molecular oxygen as hydrogen acceptor and produce oxidized products as well as either hydrogen peroxide or water. Monooxygenases transfer one oxygen atom from molecular oxygen to the substrate and one is reduced to water. In contrast, dioxygenases catalyze the insert of both oxygen atoms from molecular oxygen into the substrate.
[0009]Lyases calalyze elimination reactions and thereby generate double bonds or, in the reverse direction, catalyze the additions at double bonds. Isomerases catalyze intramolecular rearrangements. Ligases catalyze the formation of chemical bonds at the expense of ATP consumption.
[0010]Finally, hydrolases are enzymes that catalyze the hydrolysis of chemical bonds like C--O or C--N. The E.C. classification for these enzymes generally classifies them by the nature of the bond hydrolysed and by the nature of the substrate. Hydrolases such as lipases and proteases play an important role in nature as well in technical applications of biocatalysts. Proteases hydrolyse a peptide bond within the context of an oligo- or polypeptide. Depending on the catalytic mechanism proteases are grouped into aspartic, serin, cysteine, metallo- and threonine proteases (Handbook of proteolytic enzymes. (1998) Eds: Barret, A; Rawling, N.; Woessner, J.; Academic Press, London). This classification is based on the amino acid side chains that are responsible for catalysis and which are typically presented in the active site in very similar orientation to each other. The scissile bond of the substrate is brought into register with the catalytic residues due to specific interactions between the amino acid side chains of the substrate and complementary regions of the protease (Perona, J. & Craik, C (1995) Protein Science, 4, 337-360). The residues on the N- and C-terminal side of the scissile bond are usually called P1, P2, P3 etc and P1', P2', P3' and the binding pockets complementary to the substrate S1, S2, S3 and S1', S2', S3', respectively (nomenclature according to Schlechter & Berger, Biochem. Biophys. Res. Commun. 27 (1967) 157-162). The selectivity of proteases can vary widely from being virtually nonselective--e.g. the Subtilisins--over a strict preference at the P1 position--e.g. Trypsin selectively cutting on the C-terminal side of arginine or lysine residues--to highly specific proteases--e.g. human tissue-type plasminogen activator (t-PA) cleaving at the C-terminal side of the arginine in the sequence CPGRVVG (Ding, L et al. (1995) Proc. Natl. Ac ad. Sci. USA 92, 7627-7631; Coombs, G et al. (1996) J. Biol. Chem. 271, 4461-4467).
[0011]The specificity of proteases, i.e. their ability to recognize and hydrolyze preferentially certain peptide substrates, can be expressed qualitatively and quantitatively. Qualitative specificity refers to the kind of amino acid residues that are accepted by a protease at certain positions of the peptide substrate. For example, trypsin and t-PA are related with respect to their qualitative specificity, since both of them require at the P1 position an arginine or a similar residue. On the other hand, quantitative specificity refers to the relative number of peptide substrates that are accepted as substrates by the protease, or more precisely, to the relative kcat/kM ratios of the protease for the different peptides that are accepted by the protease. Proteases that accept only a small portion of all possible peptides have a high specificity, whereas the specificity of proteases that, as an extreme, cleave any peptide substrate would theoretically be zero.
[0012]Comparison of the primary, secondary as well as the tertiary structure of proteases (Fersht, A., Enzyme Structure and Mechanism, W. H. Freeman and Company, New York, 1995) allows identification of classes showing a high degree of conservation (Rawlings, N. D. & Barrett, A. J. (1997) In: Proteolysis in Cell Functions Eds. Hopsu-Havu, V. K.; Jarvinen, M.; Kirschke, H, pp. 13-21, IOS Press, Amsterdam). A widely accepted scheme for protease classification has been proposed by Rawlings & Barrett (Handbook of proteolytic enzymes. (1998) Eds: Barret, A; Rawling, N.; Woessner, J.; Academic Press, London). For example, the serine proteases family can be subdivided into structural classes with chymotpsin (class S1), subtilisin (class S8) and carboxypeptidase (class SC) folds, each of which includes nonspecific as well as specific proteases (Rawlings, N. D. & Barrett, A. J. (1994) Methods Enzymol. 244, 19-61). This applies to other protease families analogously. An additional distinction can be made according to the relative location of the cleaved bond in the substrate. Carboxy- and aminopeptidases cleave amino acids from the C- and N-terminus, respectively, while endopeptidases cut anywhere along the oligopeptide.
[0013]Many applications would be conceivable if enzymes with a basically unlimited spectrum of specificities were available. However, the use of such enzymes with high, low or any defined specificity is currently limited to those which can be isolated from natural sources. The field of application for these enzymes varies from therapeutic, research, diagnostic, nutritional to personal care and industrial purposes.
[0014]Enzyme additives in detergents have come to constitute nearly a third of the whole industrial enzyme market. Detergent enzymes include proteinases for removing organic stains, lipases for removing greasy stains, amylases for removing residues of starchy foods and cellulases for restoring of smooth surface of the fiber. The best known detergent enzyme is probably the nonspecific proteinase subtilisin, isolated from various Bacillus species.
[0015]Starch enzymes, such as amylases, occupy the majority of those used in food processing. While starch enzymes include products that are important for textile desizing, alcohol fermentation, paper and pulp processing, and laundry detergent additives, the largest application is for the production of high fructose corn syrup. The production of corn syrup from starch by means of industrial enzymes was a successful alternative to acid hydrolysis.
[0016]Apart from starch processing, enzymes are used for an increasing range of applications in food. Enzymes in food can improve texture, appearance and nutritional value or may generate desirable flavours and aromas. Currently used food enzymes in bakery are amylase, amyloglycosidases, pentosanases for breakdown of pentosan and reduced gluten production or glucose oxidases to increase the stability of dough. Common enzymes for dairy are rennet (protease) as coagulant in cheese production, lactase for hydrolysis of lactose, protease for hydrolysis of whey proteins or catalase for the removal of hydrogen peroxides. Enzymes used in brewing process are the above named amylases, but also cellulases or proteases to clarify the beer from suspended proteins. In wines and fruit juices, cloudiness is more commonly caused by starch and pectins so that amylases and pectinases increase yield and clarification. Papain and other proteinases are used for meat tenderizing.
[0017]Enzymes have also been developed to aid animals in the digestion of feed. In the western hemisphere, corn is a major source of food for cattle, swine, and poultry. In order to improve the bioavailability of phosphate from corn, phytase is commonly added (Wyss, M. et al. Biochemical characterization of fungal phytases (myo-inositol hexakisphosphate phosphohydrolases): Catalytic properties. Applied & Environmental Microbiology 65, 367-373 (1999)). Moreover, phytate hydrolysis has been shown to bring about improvements in digestibility of protein and absorption of minerals such as calcium (Bedford, M. R. & Schulze, H. EXOGENOUS ENZYMES FOR PIGS AND POULTRY [Review]. Nutrition Research Reviews 11, 91-114 (1998)). Another major feed enzyme is xylanase. This enzyme is particularly useful as a supplement for feeding stuff comprising more than about 10% of wheat barley or rye, because of their relatively high soluble fiber content. Xylanases cause two important actions: reduction of viscosity of the intestinal contents by hydrolyzing the gel-like high molecular weight arabinoxylans in feed (Murphy, T., C., Bedford, M. R. & McCracken, K. J. Effect of a range of new xylanases on in vitro viscosity and on performance of broiler diets. British Poultry Science 44, S16-S18 (2003)) and break down of polymers in cell walls which improve the bioavailability of protein and starch.
[0018]Biotech research and development laboratories routinely use special enzymes in small quantities along with many other reagents. These enzymes create a significant market for various enzymes. Enzymes like alkaline phosphatase, horseradish peroxidase and luciferase are only some examples. Thermostable DNA polymerases like Taq polymerase or restriction endonucleases revolutionized laboratory work. Therapeutic enzymes are a particular class of drugs, categorized by the FDA as biologicals, with a lot of advantages compared to other, especially non-biological pharmaceuticals. Examples for successful therapeutic enzymes are human clotting factors like factor VIII and factor IX for human treatment. In addition, digestive enzymes are used for various deficiencies in human digestive processes. Other examples are t-PA and streptokinase for the treatment of cardiovascular disease, beta-glucocerebrosidase for the treatment of Type I Gaucher disease, L-asparaginase for the treatment of acute lymphoblastic leukemia and DNAse for the treatment of cystic fibrosis. An important issue in the application of proteins as therapeutics is their potential immunogenicity. To reduce this risk, one would prefer enzymes of human origin, which narrows down the set of available enzymes. The provision of designed enzymes, preferably of human origin, with novel, tailor-made specificities would allow the specific modification of target substrates at will, while minimizing the risk of immunogenicity. A further advantage of highly specific enzymes as therapeutics would be their lower risk of side effects. Due to the limited possibility of specific interactions between a small molecule and a protein, binding to non-target proteins and therefore side effects are quite common and often cause termination of an otherwise promising lead compound. Specific enzymes, on the other hand, provide many more contact sites and mechanisms for substrate discrimination and therefore enable a higher specificity and thereby less side activities.
[0019]Proteases represent an important class of therapeutic agents (Drugs of today, 33, 641-648 (1997)). However, currently the therapeutic protease is usually a substitute for insufficient activity of the body's own proteases. For example, factor VII can be administered in certain cases of coagulation deficiencies of bleeders or during surgery (Heuer L.; Blumenberg D. (2002) Anaesthesist 51:388). Tissue-type plasminogen activator (t-PA) is applied in acute cardiac infarction, initializing the dissolution of fibrin clots through specific cleavage and activation of plasminogen (Verstraete, M. et al. (1995) Drugs, 50, 29-41). So far a protease with tailor-made specificity is generated to provide a therapeutic agent that specifically activates or inactivates a disease related target protein.
[0020]Monoclonal antibodies represent another important biological class of substances with therapeutic capabilities. One of the main antibody targets are tumor necrosis factors (TNFs) which belong to the family of cytokines. TNFs play a major role in the inflammation process. As homotrimers they could bind to receptors of nearly every cell. They activate a multiplicity of cellular genes, multiple signal transduction mechanisms, kinases and transcription factors. The most important TNFs are TNF-alpha and TNF-beta. TNF-alpha is produced by macrophages, monocytes and other cells. TNF-alpha is an inflammation mediator. Therefore, research of the last decade has been focused on TNF-alpha inhibitors like monoclonal antibodies as possible therapeutics for different therapeutic indications like Rheumatoid Arthritis, Crohn's disease or Psoriasis (Hamilton et al. (2000) Expert Opin Pharmacother, 1 (5): 1041-1052). One of the major disadvantages of monoclonal antibodies are their high costs, so that new biological alternatives are of great importance.
[0021]There are a lot of examples for engineered enzymes in literature. Fulani et al. (Fulani F. et al. (2003) Protein Engineering 16, 515-519) describe a rhodanase (thiosulfat:cyanide sulfurtransferase) from Azotobacter vinelandii which has a catalytic domain structurally related to catalytic subunit of Cdc25 phosphatase enzymes. The difference in catalytic mechanism depends on the different size of the active site. Both rhodanase and phosphatase are highly specific on different substrates (sulfate vs. phosphate). The catalytic mechanism of the rhodanase could be shifted towards serine/threonine phosphatase by single-residue insertion. Therefore, Fulani et al. give a single example for the change of a catalytic mechanism by structural comparison and sequence alignment of naturally known enzymes from different enzyme classes but lack an indication of how to generate a user-definable substrate specificity while keeping the same catalytic mechanism.
[0022]The thioredoxin reductase described by Briggs et al. (WO 02/090300 A2) has an altered cofactor specificity which preferably binds NADPH compared to NADH. Thus, both enzymes, the starting point as well as the resulting engineered enzyme are highly specific towards different substrates. The methods to achieve such an altered substrate specificity are either computational processing methods or sequence alignments of related proteins to define variable and conserved residues. They all have in common that they are based on the comparison of structures and sequences of proteins with known specificities followed by the transfer of the same to another backbone.
[0023]There are other examples of specificity-engineered enzymes and, in particular, of proteases which have been published in the literature. None of these examples, however, provides a means for generating novel specificites compared to the specificity of the starting material used within the described methods. The methods range from structure-directed single point mutations (Kurth, T. et al. (1998) Biochemistry 37, 11434-11440; Ballinger, M et al. (1996) Biochemistry, 35:13579-13585), exchange of surface loops between two specific proteases (Horrevoets et al. (1993) J. Biol. Chem. 268, 779-782), to random mutagenesis either regio-selectively or across the whole gene combined with in-vitro or in-vivo selection (Sices, H. & Kristie, T. (1998) Proc. Natl. Acad. Sci. USA, 95, 2828-2833).
[0024]The rational design of protease specificity is limited to very few examples. This approach is severely limited by the insufficient understanding of the complexities that govern folding and dynamics as well as structure-function relationships in proteins (Corey, M. J. & Corey, E. (1996) Proc. Natl. Acad. Sci. USA, 93:11428-11434). It is therefore difficult to alter the primary amino acid sequence of a protease in order to change its activity or specificity in a predictive way. In a successful example, Kurth et al. engineered trypsin to show a preference for a dibasic motive (Kurth, T. et al. (1998) Biochemistry, 37:11434-11440). In another example, Hedstrom et al. converted the S1 substrate specificity of trypsin to that of chymotrypsin (Hedstrom, L. et al. (1992) Science, 255:1249-1253). This is an example where a known property was transferred from one backbone to another.
[0025]Ballinger et al. (WO 96/27671) describe subtilisin variants with combination mutations (N62D/G166D, and optionally Y104D) having a shift of substrate specificity towards peptide or polypeptide substrates with basic amino acids at the P1, P2 and P4 positions of the substrate. Suitable substrates of the variant subtilisin were revealed by sorting a library of phage particles (substrate phage) containing five contiguous randomized residues. These subtilisin variants are useful for cleaving fusion proteins with basic substrate linkers and processing hormones or other proteins (in vitro or in vivo) that contain basic cleavage sites.
[0026]The problems associated with rational redesign of enzymes can partially be overcome by directed evolution (as disclosed in PCT/EP03/04864). These studies can be classified by their expression and selection systems. Genetic selection means to produce inside an organism an enzyme, e.g. a protease, which is able to cleave a precursor protein which in turn results in an alteration of the growth behavior of the producing organism. From a population of organisms with different proteases those can be selected which have an altered growth behavior. This principle was for example reported by Davis et al. (U.S. Pat. No. 5,258,289, WO 96/21009). The production of a phage system is dependent on the cleavage of a phage protein which only can be activated in the presence of a proteolytic enzyme which is able to cleave the phage protein. Other approaches use a reporter system which allows a selection by screening instead of a genetic selection, but also cannot overcome the intrinsic insufficiency of the intracellular characterization of enzymes.
[0027]Systems to generate enzymes with altered sequence specificities with self-secreting enzymes are also reported. Duff et al. (WO 98/11237) describe an expression system for a self-secreting protease. An essential element of the experimental design is that the catalytic reaction acts on the protease itself by an autoproteolytic processing of the membrane-bound precursor molecule to release the matured protease from the cellular membrane into the extracellular environment. Therefore, a fusion protein must be constructed where the target peptide sequence replaces the natural cleavage site for autoproteolysis. Limitations of such a system are that positively identified proteases will have the ability to cleave a certain amino acid sequence but they also may cleave many other peptide sequences. Therefore, high substrate specificity can not be achieved. Additionally, such a system is not able to control that selected proteases cleave at a specific position in a defined amino acid sequence and it does not allow a precise characterization of the kinetic constants of the selected proteases (kcat, KM).
[0028]A method has been described that aims at the generation of new catalytic activities and specificities within the α/β-barrel proteins (WO 01/42432; Fersht et al, Methods of producing novel enzymes; Altamirano et al. (2000) Nature 403, 617-622). The α/β-barrel proteins comprise a large superfamily of proteins accounting for a large fraction of all known enzymes. The structure of the proteins is made from α/β-barrel surrounded by α-helices. The loops connecting 13-strands and helices comprise the so-called lid-structure including the active site residues. The method is based on the classification of α/β-barrel proteins into two classes based on the catalytic lid structure. An extensive comparison of α/β-barrel protein structures led the authors to the conclusion that the substrate binding and specificity is primarily defined by the barrel structure while the specificity of the chemical reaction resides within the loops. It is suggested that barrels and lid structures from different enzymes can be combined to generate new enzymatic activities and to provide a starting point to fine tune the properties by targeted or randomized mutagenesis and selection. The method does not provide for the generation of user-defined specificity.
[0029]In summary, it is clear that there are many possible applications in the fields of therapeutics, research and diagnostics, industrial enzymes, food and feed processing, cosmetics and other areas that would become possible by the availability of enzymes with a novel substrate specificity. However, only a limited number of specific enzymes has been identified from natural sources so far. Methods of rational design to modify, alter, convert or transfer sequence specificity as well as random approaches described above did not enable the generation of a novel and user-definable specificity that was not present in the employed starting material.
[0030]Therefore, none of the currently available methods can provide enzymes with a novel and user-defined sequence specificity. In contrast, the current invention provides such enzymes as well as methods for generating them.
SUMMARY OF THE INVENTION
[0031]The objective of the present invention is to provide engineered proteins with novel functions that do not exist in the components used for the engineering of such proteins. In particular, the invention provides enzymes with user-definable specificities. User-definable specificity means that enzymes are provided with specificities that do not exist in the components used for the engineering of such enzymes. The specificities can be chosen by the user so that one or more intended target substrates are preferentially recognised and converted by the enzymes. Furthermore, the invention provides enzymes that possess essentially identical sequences to human proteins but have different specificities. In a particular embodiment, the invention provides proteases with user-definable specificities.
[0032]Furthermore, the present invention is directed to engineered enzymes which are fused to one or more further functional components. These further components can be proteinacious components which preferably have binding properties and are of the group consisting of substrate binding domains, antibodies, receptors or fragments thereof. Furthermore, these further components can be further functional components, preferably being selected from the group consisting of polyethylenglycols, carbohydrates, lipids, fatty acids, nucleic acids, metals, metal chelates, and fragments or derivatives thereof. The resulting fusion proteins are understood as enzymes with user-definable specificities within the present invention.
[0033]Besides, the invention is directed to the application of such enzymes with novel, user-definable specificities for therapeutic, research, diagnostic, nutritional, personal care or industrial purposes. Moreover, the invention is directed to a method for generating engineered enzymes with user-definable specificities. In particular, the invention is directed to generate enzymes that possess essentially identical sequences to human enzymes but have different specificities.
[0034]This problem has been solved by the embodiments of the invention specified in the description below and in the claims. The present invention is thus directed to
(1) an engineered enzyme with defined specificity characterized by the combination of the following components:(a) a protein scaffold which catalyzes at least one chemical reaction on at least one substrate, and(b) one or more specificity determining regions (SDRs) located at sites in the protein scaffold that enable the resulting engineered protein to discriminate between at least one target substrate and one or more different substrates, and wherein the SDRs are essentially synthetic peptide sequences;(2) the use of an engineered enzyme as defined in (1) above for therapeutic, research, diagnostic, nutritional, personal care or industrial purposes;(3) a method for generating engineered enzymes as defined in (1) above having specificities towards target substrates, such specificities not being present in the individual starting components, comprising at least the following steps:(a) providing a protein scaffold which catalyzes at least one chemical reaction on at least one substrate,(b) generating a library of engineered enzymes by combining the protein scaffold from step (a) with fully or partially random peptide sequences at sites in the protein scaffold that enable the resulting engineered enzyme to discriminate between at least one target substrate and one or more different substrates, and(c) selecting out of the library of engineered enzymes generated in step (b) one or more enzymes that have specificities towards at least one target substrate;(4) a fusion protein which is comprised of at least one engineered enzyme as defined in (1) above and at least one further component, preferably the at least one further component having binding properties and more preferably being selected from the group consisting of antibodies, binding domains, receptors, and fragments thereof;(5) a composition or pharmaceutical composition comprising one or more engineered enzymes as defined in (1) above or a fusion protein as defined in (4) above, said pharmaceutical composition may optionally comprise an acceptable carrier, excipient and/or auxiliary agent;(6) a DNA encoding the engineered enzyme as defined in (1) above;(7) a vector comprising the DNA as defined in (6) above;(8) a host cell or transgenic organism being transformed/transfected with a vector as defined in (7) above and/or containing the DNA as defined in (6) above; and(9) a method for producing the engineered enzyme comprising culturing a cell or organism as defined in (8) above and isolating the enzyme from the culture broth.
BRIEF DESCRIPTION OF THE FIGURES
[0035]The following figures are provided in order to explain further the present invention in supplement to the detailed description:
[0036]FIG. 1 illustrates the three-dimensional structure of human trypsin I with the active site residues shown in "ball-and-stick" representation and with the marked regions indicating potential SDR insertion sites.
[0037]FIG. 2 shows the alignment of the primary amino acid sequence of three members of the serine protease class S1 family: human trypsin I, human alpha-thrombin and human enteropeptidase (see also SEQ ID NOs: 1, 5 and 6).
[0038]FIG. 3 illustrates the three-dimensional structure of subtilisin with the active site residues being shown in "ball-and-stick" representation and with the numbered regions indicating potential SDR insertion sites.
[0039]FIG. 4 shows the alignment of the primary amino acid sequences of four members of the serine protease class S8 family: subtilisin E, furin, PC1 and PC5 (see also SEQ ID NOs: 7-10).
[0040]FIG. 5 illustrates the three-dimensional structure of pepsin with the active site residues being shown in "ball-and-stick" representation and with the numbered regions indicating potential SDR insertion sites.
[0041]FIG. 6 shows the alignment of the primary amino acid sequences of three members of the A1 aspartic acid protease family: pepsin, β-secretase and cathepsin D (see also SEQ ID NOs: 11-13).
[0042]FIG. 7: illustrates the three-dimensional structure of caspase 7 with the active site residues being shown in "ball-and-stick" representation and with the numbered regions indicating potential SDR insertion sites.
[0043]FIG. 8: shows the primary amino acid sequence of caspase 7 as a member of the cysteine protease class C14 family (see also SEQ ID NO: 14).
[0044]FIG. 9 depicts schematically the third aspect of the invention.
[0045]FIG. 10 shows a Western blot analysis of a culture supernatant of cells expressing variants of human trypsin I with SDR1 and SDR2, compared to negative controls.
[0046]FIG. 11 shows the time course of the proteolytic cleavage of a target substrate by human trypsin I.
[0047]FIG. 12 shows the relative activities of three variants of inventive engineered proteolytic enzymes in comparison with human trypsin I on two different peptide substrates.
[0048]FIG. 13 shows the relative specificities of human trypsin I and variants of inventive engineered proteolytic enzymes with one or two SDRs, respectively.
[0049]FIG. 14: shows the relative specificities of human trypsin I and of variants of inventive engineered proteolytic enzymes being specific for human TNF-alpha with this scaffold on peptides with a target sequence of human TNF-alpha.
[0050]FIG. 15: shows the reduction of cytotoxicity induced by TNF-alpha when incubating the TNF-alpha with concentrated supernatant from cultures expressing the inventive engineered proteolytic enzymes being specific for human TNF-alpha.
[0051]FIG. 16: shows the reduction of cytotoxicity induced by TNF-alpha when incubating the TNF-alpha with purified inventive engineered proteolytic enzyme being specific for human TNF-alpha.
[0052]FIG. 17: compares the activity of inventive engineered proteolytic enzymes being specific for human TNF-alpha with the activity of human trypsin I on two protein substrates: (a) human TNF-alpha; (b) mixture of human serum proteins.
[0053]FIG. 18: shows the specific activity of an inventive engineered proteolytic enzyme with specificity for human VEGF.
DEFINITIONS
[0054]In the framework of the present invention the following terms and definitions are used.
[0055]The term "protease" means any protein molecule that is capable of hydrolysing peptide bonds. This includes naturally-occurring or artificial proteolytic enzymes, as well as variants thereof obtained by site-directed or random mutagenesis or any other protein engineering method, any active fragment of a proteolytic enzyme, or any molecular complex or fusion protein comprising one of the aforementioned proteins. A "chimera of proteases" means a fusion protein of two or more fragments derived from different parent proteases.
[0056]The term "substrate" means any molecule that can be converted catalytically by an enzyme. The term "peptide substrate" means any peptide, oligopeptide, or protein molecule of any amino acid composition, sequence or length, that contains a peptide bond that can be hydrolyzed catalytically by a protease. The peptide bond that is hydrolyzed is referred to as the "cleavage site". Numbering of positions in the substrate is done according to the system introduced by Schlechter & Berger (Biochem. Biophys. Res. Commun. 27 (1967) 157-162). Amino acid residues adjacent N-terminal to the cleavage site are numbered P1, P2, P3, etc., whereas residues adjacent C-terminal to the cleavage site are numbered P1', P2', P3', etc.
[0057]The term "target substrate" describes a user-defined substrate which is specifically recognized and converted by an enzyme according to the invention. The term "target peptide substrate" describes a user-defined peptide substrate. The term "target specificity" describes the qualitative and quantitative specificity of an enzyme that is capable of recognizing and converting a target substrate. Catalytic properties of enzymes are expressed using the kinetic parameters "KM" or "Michaelis Menten constant", "kcat" or "catalytic rate constant", and "kcat/KM" or "catalytic efficiency", according to the definitions of Michaelis and Menten (Fersht, A., Enzyme Structure and Mechanism, W. H. Freeman and Company, New York, 1995). The term "catalytic activity" describes quantitatively the conversion of a given substrate under defined reaction conditions.
[0058]The term "specificity" means the ability of an enzyme to recognize and convert preferentially certain substrates. Specificity can be expressed qualitatively and quantitatively. "Qualitative specificity" refers to the chemical nature of the substrate residues that are recognized by an enzyme. "Quantitative specificity" refers to the number of substrates that are accepted as substrates. Quantitative specificity can be expressed by the term s, which is defined as the negative logarithm of the number of all accepted substrates divided by the number of all possible substrates. Proteases, for example, that accept preferentially a small portion of all possible peptide substrates have a "high specificity". Proteases that accept almost any peptide substrate have a "low specificity". Definitions are made in accordance to WO 03/095670 which is therefore incorporated by reference. Proteases with very low specificity are also referred to as "unspecific proteases". The term "defined specificity" refers to a certain type of specificity, i.e. to a certain target substrate or a set of certain target substrates that are preferentially converted versus other substrates.
[0059]The term "engineered" in combination with the term "enzyme" describes an enzyme that is comprised of different components and that has features not being conferred by the individual components alone.
[0060]The term "protein scaffold" or "scaffold protein" refers to a variety of primary, secondary and tertiary polypeptide structures.
[0061]The term "peptide sequence" indicates any peptide sequence used for insertion or substitution into or combination with a protein scaffold. Peptide sequences are usually obtained by expression from DNA sequences which can be synthesized according to well-established techniques or can be obtained from natural sources. Insertion, substitution or combination of peptide sequences with the protein scaffold are generated by insertion, substitution or combination of oligonucleotides into or with a polynucleotide encoding the protein scaffold. The term "synthetic" in combination with the term "peptide sequence" refers to peptide sequences that are not present in the protein scaffold in which the peptide sequences are inserted or substituted or with which they are combined.
[0062]The term "components" in combination with the term "engineered enzyme" refers to peptide or polypeptide sequences that are combined in the engineering of such enzymes. Such components may among others comprise one or more protein scaffolds and one or more synthetic peptide sequences. The term "library of engineered enzymes" describes a mixture of engineered enzymes, whereby every single engineered enzyme is encoded by a different polynucleotide sequence. The term "gene library" indicates a library of polynucleotides that encodes the library of engineered enzymes. The term "SDR" or "Specificity determining region" refers to a synthetic peptide sequence that provides the defined specificity when combined with the protein scaffold at sites that enable the resulting enzymes to discriminate between the target substrate and one or more other substrates. Such sites are termed "SDR sites".
[0063]The terms "tertiary structure similar to the structure of" and "similar tertiary structure" in combination with the terms "enzyme" or "protein" refer to proteins in which the type, sequence, connectivity and relative orientation of the typical secondary structural elements of a protein, e.g. alpha-helices, beta-sheets, beta-turns and loops, are similar and the proteins are therefore grouped into the same structural or topological class or fold. This includes proteins that have altered, additional or deleted structural elements of any type but otherwise unchanged topology. Examples of such structural classes are the TNF superfamily, the S1 fold or the S8 fold within the serine proteases, the GPCRs, or the α/β-barrel fold.
[0064]The term "positions that correspond structurally" indicates amino acids in proteins of similar tertiary structure that correspond structurally to each other, i.e. they are usually located within the same structural or topological element of the structure. Within the structural element they possess the same relative positions with respect to beginning and end of the structural element. If, e.g. the topological comparison of two proteins reveals two structurally corresponding sequences of different length, then amino acids within, e.g. 20% and 40% of the respective region lengths, correspond to each other structurally.
[0065]The term "library of engineered enzymes" of the present invention refers to a multiplicity of enzymes or enzyme variants, which may exist as a mixture or in isolated form.
[0066]Amino acids residues are abbreviated according to the following Table 1 either in one- or in three-letter code.
TABLE-US-00001 TABLE 1 Amino acid abbreviations Abbreviations Amino acid A Ala Alanine C Cys Cysteine D Asp Aspartic acid E Glu Glutamic acid F Phe Phenylalanine G Gly Glycine H His Histidine I Ile Isoleucine K Lys Lysine L Leu Leucine M Met Methionine N Asn Asparagine P Pro Proline Q Gln Glutamine R Arg Arginine S Ser Serine T Thr Threonine V Val Valine W Trp Tryptophane Y Tyr Tyrosine
DETAILED DESCRIPTION OF THE INVENTION
[0067]The present invention provides engineered proteins with novel functions. In particular, the invention provides enzymes with user-definable specificities. In a particular embodiment, the invention provides proteases with user-definable specificities. Besides, the invention provides applications of such enzymes with novel, user-definable specificities for therapeutic, research, diagnostic, nutritional, personal care or industrial purposes. Moreover, the invention provides a method for generating enzymes with specificities that are not present in the components used for the engineering of such enzymes. In particular, the invention is directed to the generation of enzymes that have sequences that are essentially identical to mammalian, especially human enzymes but have different specificities. Moreover, the invention provides libraries of specific engineered enzymes with corresponding specificities encoded genetically, a method for the generation of libraries of specific engineered enzymes with corresponding specificities encoded genetically, and the application of such libraries for technical, diagnostic, nutritional, personal care or research purposes.
[0068]A first aspect of the invention discloses engineered enzymes with defined specificities. These engineered enzymes are characterized by the following components:
(a) a protein scaffold capable of catalyzing at least one chemical reaction on a substrate, and(b) one or more specificity determining regions (SDRs) located at sites in the protein scaffold that enable the resulting engineered protein to discriminate between at least one target substrate and one or more different substrates, wherein the SDRs are essentially synthetic peptide sequences.
[0069]Preferably, such defined specificity of the engineered enzymes is not conferred by the protein scaffold.
[0070]In principle, the protein scaffold can have a variety of primary, secondary and tertiary structures. The primary structure, i.e. the amino acid sequence, can be an engineered sequence or can be derived from any viral, prokaryotic or eukaryotic origin. For human therapeutic use, however, the protein scaffold is preferably of mammalian origin, and more preferably, of human origin. Furthermore, the protein scaffold is capable to catalyze one or more chemical reactions and has preferably only a low specificity.
[0071]Preferably, derivatives of the protein scaffold are used that have modified amino acid sequences that confer improved characteristics for the applicability as protein scaffolds. Such improved characteristics comprise, but are not limited to, stability; expression or secretion yield; folding, in particular after combination of the protein scaffold with SDRs; increased or decreased sensitivity to regulators such as activators or inhibitors; immunogenicity; catalytic rate; kM or substrate affinity.
[0072]The engineered enzymes reveal their quantitative specificity from the synthetic peptide sequences that are combined with the protein scaffold. Therefore, the engineered peptide sequences are acting as Specificity Determining Regions or SDRs. The number, the length and the positions of such SDRs can vary over a wide range. The number of SDRs within the scaffold is at least one, preferably more than one, more preferably between two and eleven, most preferably between two and six. The SDRs have a length between one and 50 amino acid residues, preferably a length between one and 15 amino acid residues, more preferably a length between one and six amino acid residues. Alternatively, the SDRs have a length between two and 20 amino acid residues, preferably a length between two and ten amino acid residues, more preferably a length between three and eight amino acid residues.
[0073]The inventive engineered enzymes can further be described as antibody-like protein molecules comprising constant and variable regions, but having a non-immunoglogulin backbone and having an active site (catalytic activity) in the constant region, whereby the substrate specificity of the active site is modulated by the variable region. Preferably, as in the immunoglobulin structure, the variable regions are loops of variable length and composition that interact with a target molecule.
[0074]In a particular variant of the invention, the engineered enzymes have hydrolase activity. In a preferred variant, the engineered enzymes have proteolytic activity. Particularly preferred protein scaffolds for this variant are unspecific proteases or are parts from unspecific proteases or are otherwise derived from unspecific proteases. The expressions "derived from" or "a derivative thereof" in this respect and in the following variants and embodiments refer to derivatives of proteins that are mutated at one or more amino acid positions and/or have a homology of at least 70%, preferably 90%, more preferably 95% and most preferably 99% to the original protein, and/or that are proteolytically processed, and/or that have an altered glycosylation pattern, and/or that are covalently linked to non-protein substances, and/or that are fused with further protein domains, and/or that have C-terminal and/or N-terminal truncations, and/or that have specific insertions, substitutions and/or deletions. Alternatively, "derived from" may refer to derivatives that are combinations or chimeras of two or more fragments from two or more proteins, each of which optionally comprises any or all of the aforementioned modifications. The tertiary structure of the protein scaffold can be of any type. Preferably, however, the tertiary structure belongs to one of the following structural classes: class S1 (chymotrypsin fold of the serine proteases family), class S8 (subtilisin fold of the serine proteases family), class SC (carboxypeptidase fold of the serine proteases family), class A1 (pepsin A fold of the aspartic proteases), or class C14 (caspase-1 fold of the cysteine proteases). Examples of proteases that can serve as the protein scaffold of engineered proteolytic enzymes for the use as human therapeutics are or are derived from human trypsin, human thrombin, human chymotrypsin, human pepsin, human endothiapepsin, human caspases 1 to 14, and/or human furin.
[0075]The defined specificity of the engineered proteolytic enzymes is a measure of their ability to discriminate between at least one target peptide or protein substrates and one or more further peptide or protein substrates. Preferably, the defined specificity refers to the ability to discriminate peptide or protein substrates that differ in other positions than the P1 site, more preferably, the defined specificity refers to the ability to discriminate peptide or protein substrates that differ in other positions than the P1 site and the P1' site. Most preferably, the engineered proteolytic enzymes distinguish target peptid or protein substrates at as many sites as is necessary to preferentially hydrolyse the target substrate versus other proteins. As an example, a therapeutically useful engineered proteolytic enzyme applied intravenously in the human body should be sufficiently specific to discriminate between the target substrate and any other protein in the human serum. Preferably, such an engineered proteolytic enzyme recognizes and discriminates peptide substrates at three or more amino acid positions, more preferably at four or more positions, and even more preferably at five or more amino acid positions. These positions may either be adjacent or non-adjacent.
[0076]In a first embodiment, the protein scaffold has a tertiary structure or fold equal or similar to the tertiary structure or fold of the S1 structural subclass of serine proteases, i.e. the chymotrypsin fold, and/or has at least 70% identity on the amino acid level to a protein of the S1 structural subclass of serine proteases. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 18-25, 38-48, 54-63, 73-86, 122-130, 148-156, 165-171 and 194-204 in human trypsin I, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 20-23, 41-45, 57-60, 76-83, 125-128, 150-153, 167-169 and 197-201 (numbering of amino acids according to SEQ ID NO: 1). The number of SDRs to be combined with this type of protein scaffold is preferably between 1 and 10, and more preferably between 2 and 4. Preferably, the protein scaffold is equal to or is a derivative or homologue of one or more of the following proteins: chymotrypsin, granzyme, kallikrein, trypsin, mesotrypsin, neutrophil elastase, pancreatic elastase, enteropeptidase, cathepsin, thrombin, ancrod, coagulation factor IXa, coagulation factor VIIa, coagulation factor Xa, activated protein C, urokinase, tissue-type plasminogen activator, plasmin, Desmodus-type plasminogen activator. More preferably, the protein scaffold is trypsin or thrombin or is a derivative or homologue from trypsin or thrombin. For the use as a human therapeutic, the trypsin or thrombin scaffold is most preferably of human origin in order to minimize the risk of an immune response or an allergenic reaction.
[0077]Preferably, derivatives with improved characteristics derived from human trypsin I or from proteins with similar tertiary structure are used. Preferred examples of such derivatives are derived from human trypsin I (SEQ ID NO: 1) and comprise one or more of the following amino acid substitutions E56G; R78W; Y131F; A146T; C183R. It is preferred that at least one of two SDRs are inserted into human trypsin I, or a derivative thereof, between residues 42 and 43 (SDR 1) and between 123 and 124 (SDR 2), respectively (numbering of amino acids according to SEQ ID NO: 1). In addition the SDR 1 has a preferred length of 6 and the SDR 2 has a preferred length of 5 amino acids, respectively. In a preferred variant of this embodiment, the SDR 1 and SDR 2 sequences comprise one of the amino acid sequences listed in table 2. Such engineered proteolytic enzymes have specificity for the target substrate B as exemplified in example IV.
[0078]In a further embodiment the protein scaffold belongs to the S8 structural subclass of serine proteases and/or has a tertiary structure similar to subtilisin E from Bacillus subtilis and/or has at least 70% identity on the amino acid level to a protein of the S8 structural subclass of serine proteases. Preferably, the scaffold belongs to the subtilisin family or the human pro-protein convertases. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 6-17, 25-29, 47-55, 59-69, 101-111, 117-125, 129-137, 139-154, 158-169, 185-195 and 204-225 in subtilisin E from Bacillus subtilis, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 59-69, 101-111, 129-137, 158-169 and 204-225 (numbering of amino acids according to SEQ ID NO:7). It is preferred that the protein scaffold is equal to or is a derivative or homologue of one or more of the following proteins: subtilisin Carlsberg; B. subtilis subtilisin E; subtilisin BPN'; B. licheniformis subtilisin; B. lentus subtilisin; Bacillus alcalophilus alkaline protease; proteinase K; kexin; human pro-protein convertase; human furin. In a preferred variant, subtilisin BPN' or one of the proteins SPC 1 to 7 is used as the protein scaffold.
[0079]In a further embodiment the protein scaffold belongs to the family of aspartic proteases and/or has a tertiary structure similar to human pepsin. Preferably, the scaffold belongs to the A1 class of proteases and/or has at least 70% identity on the amino acid level to a protein of the A1 class of proteases. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 6-18, 49-55, 74-83, 91-97, 112-120, 126-137, 159-164, 184-194, 242-247, 262-267 and 277-300 in human pepsin, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 10-15, 75-80, 114-118, 130-134, 186-191 and 280-296 (numbering of amino acids according to SEQ ID NO:11). It is preferred that the protein scaffold is equal to or is a derivative or homologue of one or more of the following proteins: pepsin, chymosin, renin, cathepsin, yapsin. Preferably, pepsin or endothiopepsin or a derivative or homologue thereof is used as the protein scaffold.
[0080]In a further embodiment the protein scaffold belongs to the cysteine protease family and/or has a tertiary structure similar to human caspase 7. Preferably the scaffold belongs to the C14 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C14 class of cysteine proteases. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 78-91, 144-160, 186-198, 226-243 and 271-291 in human caspase 7, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 80-86, 149-157, 190-194 and 233-238 (numbering of amino acids according to SEQ ID NO: 14). It is preferred that the protein scaffold is equal to or is a derivative or homologue of one of the caspases 1 to 9.
[0081]In a further embodiment the protein scaffold belongs to the S111 class of serine proteases or has at least 70% identity on the amino acid level to a protein of the S11 class of serine proteases and/or has a tertiary structure similar to D-alanyl-D-alanine transpeptidase from Streptomyces species K15. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 67-79, 137-150, 191-206, 212-222 and 241-251 in D-alanyl-D-alanine transpeptidase from Streptomyces species K15, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 70-75, 141-147, 195-202 and 216-220 (numbering of amino acids according to SEQ ID NO: 15). It is preferred that the D-alanyl-D-alanine transpeptidase from Streptomyces species K±5 or a derivative or homologue thereof is used as the scaffold.
[0082]In a further embodiment the protein scaffold belongs to the S21 class of serine proteases or has at least 70% identity on the amino acid level to a protein of the S21 class of serine proteases and/or has a tertiary structure similar to assemblin from human cytomegalovirus. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 25-33, 64-69, 134-155, 162-169 and 217-244 in assemblin from human cytomegalovirus, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 27-31, 164-168 and 222-239 (numbering of amino acids according to SEQ ID NO:16). It is preferred that the assemblin from human cytomegalovirus or a derivative or homologue thereof is used as the scaffold.
[0083]In a further embodiment the protein scaffold belongs to the S26 class of serine proteases or has at least 70% identity on the amino acid level to a protein of the S26 class of serine proteases and/or has a tertiary structure similar to the signal peptidase from Escherichia coli. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 8-14, 57-68, 125-134, 239-254, 200-211 and 228-239 in signal peptidase from Escherichia coli, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 9-13, 60-67, 127-132 and 203-209 (numbering of amino acids according to SEQ ID NO: 17). It is preferred that the signal peptidase from Escherichia coli or a derivative or homologue thereof is used as the scaffold.
[0084]In a further embodiment the protein scaffold belongs to the S33 class of serine proteases or has at least 70% identity on the amino acid level to a protein of the S33 class of serine proteases and/or has a tertiary structure similar to the prolyl aminopeptidase from Serratia marcescens. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 47-54, 152-160, 203-212 and 297-302 in prolyl aminopeptidase from Serratia marcescens, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 50-53, 154-158 and 206-210 (numbering of amino acids according to SEQ ID NO: 18). It is preferred that the prolyl aminopeptidase from Serratia marcescens or a derivative or homologue thereof is used as the scaffold.
[0085]In a further embodiment the protein scaffold belongs to the S51 class of serine proteases or has at least 70% identity on the amino acid level to a protein of the S51 class of serine proteases and/or has a tertiary structure similar to aspartyl dipeptidase from Escherichia coli. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 8-16, 38-46, 85-92, 132-140, 159-170 and 205-211 in aspartyl dipeptidase from Escherichia coli, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 10-14, 87-90, 134-138 and 160-165 (numbering of amino acids according to SEQ ID NO: 19). It is preferred that the aspartyl dipeptidase from Escherichia coli or a derivative or homologue thereof is used as the scaffold.
[0086]In a further embodiment the protein scaffold belongs to the A2 class of aspartic proteases or has at least 70% identity on the amino acid level to a protein of the A2 class of aspartic proteases and/or has a tertiary structure similar to the protease from human immunodeficiency virus. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 5-12, 17-23, 27-30, 33-38 and 77-83 in protease from human immunodeficiency virus, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 7-10, 18-21, 34-37 and 79-82 (numbering of amino acids according to SEQ ID NO:20). It is preferred that the protease from human immunodeficiency virus, preferably HIV-1 protease, or a derivative or homologue thereof is used as the scaffold.
[0087]In a further embodiment the protein scaffold belongs to the A26 class of aspartic proteases or has at least 70% identity on the amino acid level to a protein of the A26 class of aspartic proteases and/or has a tertiary structure similar to the omptin from Escherichia coli. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 28-40, 86-98, 150-168, 213-219 and 267-278 in omptin from Escherichia coli, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 33-38, 161-168 and 273-277 (numbering of amino acids according to SEQ ID NO:21). It is preferred that the omptin from Escherichia coli or a derivative or homologue thereof is used as the scaffold.
[0088]In a further embodiment the protein scaffold belongs to the C1 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C1 class of cysteine proteases and/or has a tertiary structure similar to the papain from Carica papaya. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 17-24, 61-68, 88-95, 135-142, 153-158 and 176-184 in papain from Carica papaya, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 63-66, 136-139 and 177-181 (numbering of amino acids according to SEQ ID NO: 22). It is preferred that the papain from Carica papaya or a derivative or homologue thereof is used as the scaffold.
[0089]In a further embodiment the protein scaffold belongs to the C2 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C2 class of cysteine proteases and/or has a tertiary structure similar to human calpain-2. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 90-103, 160-172, 193-199, 243-260, 286-294 and 316-322 in human calpain-2, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 92-101, 245-250 and 287-291 (numbering of amino acids according to SEQ ID NO:23). It is preferred that the human calpain-2 or a derivative or homologue thereof is used as the scaffold.
[0090]In a further embodiment the protein scaffold belongs to the C4 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C4 class of cysteine proteases and/or has a tertiary structure similar to NIa protease from tobacco etch virus. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 23-31, 112-120, 144-150, 168-176 and 205-218 in NIa protease from tobacco etch virus, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 145-149, 169-174 and 212-218 (numbering of amino acids according to SEQ ID NO:24). It is preferred that the NIa protease from tobacco etch virus (TEV protease) or a derivative or homologue thereof is used as the scaffold.
[0091]In a further embodiment the protein scaffold belongs to the C10 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C10 class of cysteine proteases and/or has a tertiary structure similar to the streptopain from Streptococcus pyogenes. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 81-90, 133-140, 150-164, 191-199, 219-229, 246-256, 306-312 and 330-337 in streptopain from Streptococcus pyogenes, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 82-87, 134-138, 250-254 and 331-335 (numbering of amino acids according to SEQ ID NO:25). It is preferred that the streptopain from Streptococcus pyogenes or a derivative or homologue thereof is used as the scaffold.
[0092]In a further embodiment the protein scaffold belongs to the C19 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C19 class of cysteine proteases and/or has a tertiary structure similar to human ubiquitin specific protease 7. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 3-15, 63-70, 80-86, 248-256, 272-283 and 292-304 in human ubiquitin specific protease 7, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 10-15, 251-255, 277-281 and 298-304 (numbering of amino acids according to SEQ ID NO:26). It is preferred that the human ubiquitin specific protease 7 or a derivative or homologue thereof is used as the scaffold.
[0093]In a further embodiment the protein scaffold belongs to the C47 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C47 class of cysteine proteases and/or has a tertiary structure similar to the staphopain from Staphylococcus aureus. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 15-23, 57-66, 108-119, 142-149 and 157-164 in staphopain from Staphylococcus aureus, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 17-22, 111-117, 143-147 and 159-163 (numbering of amino acids according to SEQ ID NO:27). It is preferred that the staphopain from Staphylococcus aureus or a derivative or homologue thereof is used as the scaffold.
[0094]In a further embodiment the protein scaffold belongs to the C48 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C48 class of cysteine proteases and/or has a tertiary structure similar to the Ulp1 endopeptidase from Saccharomyces cerevisiae. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 40-51, 108-115, 132-141, 173-179 and 597-605 in Ulp1 endopeptidase from Saccharomyces cerevisiae, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 43-49, 110-113, 133-137 and 175-178 (numbering of amino acids according to SEQ ID NO:28). It is preferred that the Ulp1 endopeptidase from Saccharomyces cerevisiae or a derivative or homologue thereof is used as the scaffold.
[0095]In a further embodiment the protein scaffold belongs to the C56 class of cysteine proteases or has at least 70% identity on the amino acid level to a protein of the C56 class of cysteine proteases and/or has a tertiary structure similar to the Pfp1 endopeptidase from Pyrococcus horikoshii. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 8-16, 40-47, 66-73, 118-125 and 147-153 in Pfp1 endopeptidase from Pyrococcus horikoshii, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 9-14, 68-71, 120-123 and 148-151 (numbering of amino acids according to SEQ ID NO:29). It is preferred that the Pfp1 endopeptidase from Pyrococcus horikoshii or a derivative or homologue thereof is used as the scaffold.
[0096]In a further embodiment the protein scaffold belongs to the M4 class of metallo proteases or has at least 70% identity on the amino acid level to a protein of the M4 class of metallo proteases and/or has a tertiary structure similar to thermolysin from Bacillus thermoproteolyticus. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 106-118, 125-130, 152-160, 197-204, 210-213 and 221-229 in thermolysin from Bacillus thermoproteolyticus, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 108-115, 126-129, 199-203 and 223-227 (numbering of amino acids according to SEQ ID NO:30). It is preferred that the thermolysin from Bacillus thermoproteolyticus or a derivative or homologue thereof is used as the scaffold.
[0097]In a further embodiment the protein scaffold belongs to the M10 class of metallo proteases or has at least 70% identity on the amino acid level to a protein of the M10 class of metallo proteases and/or has a tertiary structure similar to human collagenase. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 2-7, 68-79, 85-90, 107-111 and 135-141 in human collagenase, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 3-6, 71-78 and 136-140 (numbering of amino acids according to SEQ ID NO:31). It is preferred that human collagenase or a derivative or homologue thereof is used as the scaffold.
[0098]It is further preferred that the engineered enzymes have glycosidase activity. A particularly suited protein scaffold for this variant is a glycosylase or is derived from a glycosylase. Preferably, the tertiary structure belongs to one of the following structural classes: class GH13, GH7, GH12, GH11, GH10, GH28, GH26, and GH18 (beta/alpha)8 barrel.
[0099]In a first embodiment the protein scaffold belongs to the GH13 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH13 class of glycosylases and/or has a tertiary structure similar to human pancreatic alpha-amylase. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 50-60, 100-110, 148-167, 235-244, 302-310 and 346-359 in human pancreatic alpha-amylase, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 51-58, 148-155 and 303-309 (numbering of amino acids according to SEQ ID NO:32). It is preferred that human pancreatic alpha-amylase or a derivative or homologue thereof is used as the scaffold.
[0100]In a further embodiment the protein scaffold belongs to the GH7 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH7 class of glycosylases and/or has a tertiary structure similar to cellulase from Trichoderma reesei. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 47-56, 93-104, 173-182, 215-223, 229-236 and 322-334 in cellulase from Trichoderma reesei, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 175-180, 218-222 and 324-332 (numbering of amino acids according to SEQ ID NO:33). It is preferred that cellulase from Trichoderma reesei or a derivative or homologue thereof is used as the scaffold.
[0101]In a further embodiment the protein scaffold belongs to the GH12 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH12 class of glycosylases and/or has a tertiary structure similar to cellulase from Aspergillus niger. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 18-28, 55-60, 106-113, 126-132 and 149-159 in cellulase from Aspergillus niger, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 20-26, 56-59, 108-112 and 151-156 (numbering of amino acids according to SEQ ID NO:34). It is preferred that cellulase from Aspergillus niger or a derivative or homologue thereof is used as the scaffold.
[0102]In a further embodiment the protein scaffold belongs to the GH11 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH11 class of glycosylases and/or has a tertiary structure similar to xylanase from Aspergillus niger. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 7-14, 33-39, 88-97, 114-126 and 158-167 in xylanase from Aspergillus niger, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 20-26, 56-59, 108-112 and 151-156 (numbering of amino acids according to SEQ ID NO:35). It is preferred that xylanase from Aspergillus niger or a derivative or homologue thereof is used as the scaffold.
[0103]In a further embodiment the protein scaffold belongs to the GH10 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH10 class of glycosylases and/or has a tertiary structure similar to xylanase from Streptomyces lividans. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 21-29, 42-50, 84-92, 130-136, 206-217 and 269-278 in xylanase from Streptomyces lividans, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 43-49, 86-90, 208-213 and 271-276 (numbering of amino acids according to SEQ ID NO:36). It is preferred that xylanase from Streptomyces lividans or a derivative or homologue thereof is used as the scaffold.
[0104]In a further embodiment the protein scaffold belongs to the GH28 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH28 class of glycosylases and/or has a tertiary structure similar to pectinase from Aspergillus niger. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 82-88, 118-126, 171-178, 228-236, 256-264 and 289-299 in pectinase from Aspergillus niger, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 116-124, 174-178 and 291-296 (numbering of amino acids according to SEQ ID NO:37). It is preferred that pectinase from Aspergillus niger or a derivative or homologue thereof is used as the scaffold.
[0105]In a further embodiment the protein scaffold belongs to the GH26 class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH26 class of glycosylases and/or has a tertiary structure similar to mannanase from Pseudomonas cellulosa. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 75-83, 113-125, 174-182, 217-224, 247-254, 324-332 and 325-340 in mannanase from Pseudomonas cellulosa, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 115-123, 176-180, 286-291 and 328-337 (numbering of amino acids according to SEQ ID NO:38). It is preferred that mannanase from Pseudomonas cellulosa or a derivative or homologue thereof is used as the scaffold.
[0106]In an further embodiment the protein scaffold belongs to the GH18 (beta/alpha)8 barrel class of glycosylases or has at least 70% identity on the amino acid level to a protein of the GH18 class of glycosylases and/or has a tertiary structure similar to chitinase from Bacillus circulans. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 21-29, 57-65, 130-136, 176-183, 221-229, 249-257 and 327-337 in chitinase from Bacillus circulans, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 59-63, 178-181, 250-254 and 330-336 (numbering of amino acids according to SEQ ID NO:39). It is preferred that chitinase from Bacillus circulans or a derivative or homologue thereof is used as the scaffold.
[0107]It is further preferred that the engineered enzymes have esterhydrolase activity. Preferably, the protein scaffold for this variant have lipase, phosphatase, phytase, or phosphodiesterase activity.
[0108]In a first embodiment the protein scaffold belongs to the GX class of esterases or has at least 70% identity on the amino acid level to a protein of the GX class of esterases and/or has a tertiary structure similar to the structure of the lipase B from Candida antarctica. Preferably, the scaffold has lipase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 139-148, 188-195, 216-224, 256-266, 272-287 in lipase B from Candida antarctica, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 141-146, 218-222, 259-263 and 275-283 (numbering of amino acids according to SEQ ID NO:40). It is preferred that lipase B from Candida antarctica or a derivative or homologue thereof is used as the scaffold.
[0109]In a further embodiment the protein scaffold belongs to the GX class of esterases or has at least 70% identity on the amino acid level to a protein of the GX class of esterases and/or has a tertiary structure similar to the pancreatic lipase from guinea pig. Preferably, the scaffold has lipase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 78-90, 91-100, 112-120, 179-186, 207-218, 238-247 and 248-260 in pancreatic lipase from guinea pig, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 80-87, 114-118, 209-215 and 239-246 (numbering of amino acids according to SEQ ID NO:41). It is preferred that pancreatic lipase from guinea pig or a derivative or homologue thereof is used as the scaffold.
[0110]In a further embodiment the protein scaffold has a tertiary structure similar to the structure of the alkaline phosphatase from Escherichia coli or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the alkaline phosphatase from Escherichia coli. Preferably, the scaffold has phosphatase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 110-122, 187-142, 170-175, 186-193, 280-287 and 425-435 in alkaline phosphatase from Escherichia coli, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 171-174, 187-191, 282-286 and 426-433 (numbering of amino acids according to SEQ ID NO:42). It is preferred that alkaline phosphatase from Escherichia coli or a derivative or homologue thereof is used as the scaffold.
[0111]In a further embodiment the protein scaffold has a tertiary structure similar to the structure of the bovine pancreatic desoxyribonuclease I or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the bovine pancreatic desoxyribonuclease I. Preferably, the scaffold has phosphodiesterase activity. More preferably, a nuclease, and most preferably, an unspecific endonuclease or a derivative thereof is used as the scaffold. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 14-21, 41-47, 72-77, 97-111, 135-143, 171-178, 202-209 and 242-251 in bovine pancreatic desoxyribonuclease I, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 16-19, 42-46, 136-141 and 172-176 (numbering of amino acids according to SEQ ID NO:43). It is preferred that bovine pancreatic desoxyribonuclease I or human desoxyribonuclease I or a derivative or homologue thereof is used as the scaffold.
[0112]It is further preferred that the engineered enzyme has transferase activity. A particularly suited protein scaffold for this variant is a glycosyl-, a phospho- or a methyltransferase, or is a derivative thereof. Particularly preferred protein scaffolds for this variant are glycosyltransferases or are derived from glycosyltransferases. The tertiary structure of the protein scaffold can be of any type. Preferably, however, the tertiary structure belongs to one of the following structural classes: GH13 and GT1.
[0113]In a first embodiment the protein scaffold belongs to the GH13 class of transferases or has at least 70% identity on the amino acid level to a protein of the GH13 class of transferases and/or has a tertiary structure similar to the structure of the cyclomaltodextrin glucanotransferase from Bacillus circulans. Preferably, the scaffold has transferase activity, and more preferably a glycosyltransferase is used as the scaffold. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 38-48, 85-94, 142-154, 178-186, 259-266, 331-340 and 367-377 in cyclomaltodextrin glucanotransferase from Bacillus circulans, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 87-92, 180-185, 261-264 and 269-275 (numbering of amino acids according to SEQ ID NO:44). It is preferred that cyclomaltodextrin glucanotransferase from Bacillus circulans or a derivative or homologue thereof is used as the scaffold.
[0114]In a further embodiment the protein scaffold belongs to the GT1 class of tranferases or has at least 70% identity on the amino acid level to a protein of the GT1 class of transferases and/or has a tertiary structure similar to the structure of the glycosyltransferase from Amycolatopsis orientalis A82846. Preferably the scaffold has transferase activity, and more preferably glycosyltransferase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 58-74, 130-138, 185-193, 228-236 and 314-323 in glycosyltransferase from Amycolatopsis orientalis A82846, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 61-71, 230-234 and 316-321 (numbering of amino acids according to SEQ ID NO:45). It is preferred that the glycosyltransferase from Amycolatopsis orientalis A82846 or a derivative or homologue thereof is used as the scaffold.
[0115]It is further preferred that the engineered enzymes have oxidoreductase activity. A particularly suited protein scaffold for this variant is a monooxygenase, a dioxygenase or a alcohol dehydrogenase, or a derivative thereof. The tertiary structure of the protein scaffold can be of any type.
[0116]In a first embodiment the protein scaffold has a tertiary structure similar to the structure of the 2,3-diphydroxybiphenyl dioxygenase from Pseudomonas sp. or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the 2,3-diphydroxybiphenyl dioxygenase from Pseudomonas sp. Preferably, the scaffold has dioxygenase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 172-185, 198-206, 231-237, 250-259 and 282-287 in 2,3-diphydroxybiphenyl dioxygenase from Pseudomonas sp., and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 175-182, 200-204, 252-257 and 284-287 (numbering of amino acids according to SEQ ID NO:46). It is preferred that the 2,3-diphydroxybiphenyl dioxygenase from Pseudomonas sp or a derivative or homologue thereof is used as the scaffold.
[0117]In a further embodiment the protein scaffold has a tertiary structure similar to the structure of the catechol dioxygenase from Acinetobacter sp. or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the catechol dioxygenase from Acinetobacter sp. Preferably, the scaffold has dioxygenase activity, and more preferably catechol dioxygenase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 66-72, 105-112, 156-171 and 198-207 in catechol dioxygenase from Acinetobacter sp., and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 107-110, 161-171 and 201-205 (numbering of amino acids according to SEQ ID NO:47). It is preferred that the catechol dioxygenase from Acinetobacter sp or a derivative or homologue thereof is used as the scaffold.
[0118]In a further embodiment the protein scaffold has a tertiary structure similar to the structure of the camphor-5-monooxygenase from Pseudomonas putida or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the camphor-5-monooxygenase from Pseudomonas putida. Preferably, the scaffold has monooxygenase activity, and more preferably camphor monooxygenase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 26-31, 57-63, 84-98, 182-191, 242-256, 292-299 and 392-399 in camphor-5-monooxygenase from Pseudomonas putida, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 85-96, 183-188, 244-253, 293-298 and 393-398 (numbering of amino acids according to SEQ ID NO:48). It is preferred that the camphor-5-monooxygenase from Pseudomonas putida or a derivative or homologue thereof is used as the scaffold.
[0119]In a further embodiment the protein scaffold has a tertiary structure similar to the structure of the alcohol dehydrogenase from Equus callabus or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the alcohol dehydrogenase from Equus callabus. Preferably, the scaffold has alcohol dehydrogenase activity. It is preferred that SDRs are inserted into, the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 49-63, 111-112, 294-301 and 361-369 in alcohol dehydrogenase from Equus callabus, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 51-61 and 295-299 (numbering of amino acids according to SEQ ID NO:49). It is preferred that the alcohol dehydrogenase from Equus callabus or a derivative or homologue thereof is used as the scaffold.
[0120]It is further preferred that the engineered enzymes have lyase activity. A particularly suited protein scaffold for this variant is a oxoacid lyase or is a derivative thereof. Particularly preferred protein scaffolds for this variant are aldolases or synthases, or are derived thereof. The tertiary structure of the protein scaffold can be of any type, but a (beta/alpha)8 barrel structure is preferred.
[0121]In a first embodiment the protein scaffold has a tertiary structure similar to the structure of the N-acetyl-d-neuramic acid aldolase from Escherichia coli or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the N-acetyl-d-neuramic acid aldolase from Escherichia coli. Preferably, the scaffold has aldolase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 45-55, 78-87, 105-113, 137-146, 164-171, 187-193, 205-210, 244-255 and 269-276 in N-acetyl-d-neuramic acid aldolase from Escherichia coli, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 45-52, 138-144, 189-192, 247-253 and 271-275 (numbering of amino acids according to SEQ ID NO:50). It is preferred that the N-acetyl-d-neuramic acid aldolase from Escherichia coli or a derivative or homologue thereof is used as the scaffold.
[0122]In a further embodiment the protein scaffold has a tertiary structure similar to the structure of the tryptophan synthase from Salmonella typhimurium or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the tryptophan synthase from Salmonella typhimurium. Preferably, the scaffold has synthase activity. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 56-63, 127-134, 154-161, 175-193, 209-216 and 230-240 in tryptophan synthase from Salmonella typhimurium, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 57-62, 155-160, 178-190 and 210-215 (numbering of amino acids according to SEQ ID NO:51). It is preferred that the tryptophan synthase from Salmonella typhimurium or a derivative or homologue thereof is used as the scaffold.
[0123]It is further preferred that the engineered enzymes have isomerase activity. A particularly suited protein scaffold for this variant is a converting aldose or a converting ketose, or is a derivative thereof.
[0124]In a first embodiment, the protein scaffold has a tertiary structure similar to the structure of the xylose isomerase from Actinoplanes missouriensis or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the xylose isomerase from Actinoplanes missouriensis. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 18-31, 92-103, 136-147, 178-188 and 250-257 in xylose isomerase from Actinoplanes missouriensis, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 20-27, 92-99 and 180-186 (numbering of amino acids according to SEQ ID NO:52). It is preferred that the xylose isomerase from Actinoplanes missouriensis or a derivative or homologue thereof is used as the scaffold.
[0125]It is further preferred that the engineered enzymes have ligase activity. A particularly suited protein scaffold for this variant is a DNA ligase, or is a derivative thereof.
[0126]In a first embodiment, the protein scaffold has a tertiary structure similar to the structure of the DNA ligase from Bacteriophage T7 or has at least 70% identity on the amino acid level to a protein that has a tertiary structure similar to the structure of the DNA-ligase from Bacteriophage T7. It is preferred that SDRs are inserted into the protein scaffold at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 52-60, 94-108, 119-131, 241-248, 255-263 and 302-318 in DNA ligase from Bacteriophage T7, and more preferably at one or more positions from the group of positions that correspond structurally or by amino acid sequence homology to the regions 96-106, 121-129, 256-262 and 304-316 (numbering of amino acids according to SEQ ID NO:53). It is preferred that the DNA ligase from Bacteriophage T7 or a derivative or homologue thereof is used as the scaffold.
[0127]A second aspect of the invention is directed to the application of engineered enzymes with specificities for therapeutic, research, diagnostic, nutritional, personal care or industrial purposes. The application comprises at least the following steps: [0128](a) identification of a target peptide substrate whose hydrolysis has a positive effect in connection with the intended purpose, such as curing a disease, diagnosing a disease, processing of ingredients for human or animal nutrition, or other technical processes; [0129](b) provision of an engineered enzyme, the enzyme being specific for the target peptide identified in step (a); and [0130](c) use of the enzyme as provided in step (b) for the intended purpose.
[0131]In a first variant of this aspect of the invention, the engineered enzyme is used as a therapeutic means to inactivate a disease-related target substrate. This application comprises at least the following steps: [0132](a) identification of a target substrate whose function is connected to a disease and whose inactivation has a positive effect in connection with the disease, and determination of a target site within the target substrate characterized by the fact that modification at the target site leads to the inactivation of the target substrate; [0133](b) provision of an engineered enzyme, the enzyme being specific for the target site identified in step (a); and [0134](c) use of the enzyme for the inactivation of the target substrate inside or outside the human body.
[0135]In a preferred embodiment the scaffold of the engineered enzyme provided in step (c) is of human origin in order to avoid or reduce immunogenicity or allergenic effects associated with the application of the enzyme in the human body. In a more preferred embodiment of this variant, the scaffold is of a human protease and the modification is hydrolysis of a target site in a protein target. Preferably, the hydrolysis leads to the activation or inactivation of the peptide or protein target. Potential peptide or protein targets include: cytokines, growth factors, peptide hormones, interleukins, interferons, enzymes from the coagulation cascade, serpins, immunoglobulins, soluble or membrane-bound receptors, cellular or viral surface proteins, peptide drugs, protein drugs.
[0136]A particularly preferred embodiment is based on the finding that the engineered enzyme is capable for the cleavage of human tumor nekrose factor-alpha (TNF-α). The engineered enzymes or the fusion protein can thus be used for preparing medicaments for the treatment of inflammatory diseases (as well as other diseases connected with TNF-α). Preferably, said engineered enzyme or said fusion protein is capable of specifically inactivating human tumor nekrose factor-alpha (hTNF-α), more preferably said engineered enzyme or said fusion protein is capable of hydrolysing the peptide bond between positions 31/32, 32/33, 44/45, 87/88, 128/129 and/or 141/142 (most preferred between positions 31/32 and 32/33) in hTNF-α (SEQ ID NO:96).
[0137]In a further embodiment, the target substrate is a pro-drug which is activated by the engineered enzyme. In a particular embodiment of this variant, the engineered enzyme has proteolytic activity and the target substrate is a protein target which is proteolytically activated. Examples of such pro-drugs are pro-proteins such as the inactivated forms of coagulations factors. In another particular variant, the engineered enzyme is an oxidoreductase and the target substrate is a chemical that can be activated by oxidation.
[0138]In a second variant of this aspect of the invention, the engineered enzyme is used as a technical means in order to catalyze an industrially or nutritionally relevant reaction with defined specificity. In a particular embodiment of this variant the engineered enzyme has proteolytic activity, the catalyzed reaction is a proteolytic processing, and the engineered enzyme specifically hydrolyses one or more industrially or nutrionally relevant protein substrates. In a preferred embodiment of this variant the engineered enzyme hydrolyses one or more industrially or nutrionally relevant protein substrates at specific sites, thereby leading to industrially or nutrionally desired product properties such as texture, taste or precipitation characteristics. In a further particular embodiment of this variant, the engineered enzyme catalyzes the hydrolysis of glycosidic bonds (glycosidase or glycosylases activity). Then, preferably, the catalyzed reaction is a polysaccharide processing, and the engineered enzyme specifically hydrolyses one or more industrially, technically or nutrionally relevant polysaccharide substrates. In a further particular embodiment of this variant, the engineered enzyme catalyzes the hydrolysis of triglyceride esters or lipids (lipase activity). Then, preferably, the catalyzed reaction is a lipid processing step, and the engineered enzyme specifically hydrolyses one or more industrially, technically or nutrionally relevant lipid substrates. In a further particular variant of this embodiment, the engineered enzyme catalyzes the oxidation or reduction of substrates (oxidoreductase activity). Then, preferably, the engineered enzyme specifically oxidizes or reduces one or more industrially, technically or nutrionally relevant chemical substrates.
[0139]A third aspect of the invention is directed to a method for generating engineered enzymes with specificities that are qualitatively and/or quantitatively novel in combination with the protein scaffold. The inventive method comprises at least the following steps: [0140](a) providing a protein scaffold capable to catalyze at least one chemical reaction on at least one target substrate, [0141](b) generating a library of engineered enzymes or isolated engineered enzymes by combining the protein scaffold from step (a) with one or more fully or partially random peptide sequences at sites in the protein scaffold that enable the resulting engineered enzyme to discriminate between at least one target substrate and one or more different substrates and [0142](c) selecting out of the library of engineered enzymes generated in step (b) one or more enzymes that have defined specificities towards at least one target substrate.
[0143]In a first variant of this aspect of the invention, the inventive method comprises at least the following steps: [0144](a) providing a protein scaffold capable to catalyze at least one chemical reaction on at least one target substrate, [0145](b) generating a library of engineered enzymes or isolated engineered enzymes by inserting into the protein scaffold from step (a) one or more fully or partially random peptide sequences at sites in the protein scaffold that enable the resulting engineered enzyme to discriminate between at least one target substrate and one or more different substrates and [0146](c) selecting out of the library of engineered enzymes generated in step (b) one or more enzymes that have defined specificities towards at least one target substrate.
[0147]Preferably, the positions at which the one or more fully or partially random peptide sequences are combined with or inserted into the protein scaffold are identified prior to the combination or insertion.
[0148]The number of insertions or other combinations of fully or partially random peptide sequences as well as their length may vary over a wide range. The number is at least one, preferably more than one, more preferably between two and eleven, most preferably between two and six. The length of such fully or partially random peptide sequences is usually less than 50 amino acid residues. Preferably, the length is between one and 15 amino acid residues, more preferably between one and six amino acid residues. Alternatively, the length is between two and 20 amino acid residues, preferably between two and ten amino acid residues, more preferably between three and eight amino acid residues.
[0149]Preferably such insertions or other combinations are performed on the DNA level, using polynucleotides encoding such protein scaffolds and polynucleotides or oligonucleotides encoding such fully or partially random peptide sequences.
[0150]Optionally, steps (a) to (c) are repeated cyclically, whereby enzymes selected in step (c) serve as the protein scaffold in step (a) of a further cycle, and randomized peptide sequences are either inserted or, alternatively, substituted for peptide sequences that have been inserted in former cycles. Thereby, the number of inserted peptide sequences is either constant or increases over the cycles. The cycles are repeated until one or more enzymes with the intended specificities are generated.
[0151]Moreover, during or after one or more rounds of steps (a) to (c), the scaffold may be mutated at one or more positions in order to make the scaffold more acceptable for the combination with SDR sequences, and/or to increase catalytic activity at a specific pH and temperature, and/or to change the glycosylation pattern, and/or to decrease sensitivity towards enzyme inhibitors, and/or to change enzyme stability.
[0152]In a second variant of this aspect of the invention, the inventive method comprises at least the following steps:
(a) providing a first protein scaffold fragment,(b) connecting said protein scaffold fragment via a peptide linkage with a first SDR, and optionally(c) connecting the product of step (b) via a peptide linkage with a further SDR peptide or with a further protein scaffold fragment, and optionally(d) repeating step (c) for as many cycles as necessary in order to generate a sufficiently specific enzyme, and(e) selecting out of the population generated in steps (a)-(d) one or more enzymes that have the desired specificities toward the one or more target substrates.Protein scaffold fragment means a part of the sequence of a protein scaffold. A protein scaffold is comprised of at least two protein scaffold fragments.
[0153]In a third variant of this aspect of the invention, the protein scaffold, the SDRs and the engineered enzyme are encoded by a DNA sequence and an expression system is used in order to produce the protein. In an alternative variant, the protein scaffold, the SDRs and/or the engineered enzyme are chemically synthesized from peptide building blocks.
[0154]In a fourth variant of this aspect of the invention, the inventive method comprises at least the following steps:
(a) providing a polynucleotide encoding a protein scaffold capable of catalyzing one or more chemical reactions on one or more target substrates;(b) combining one or more fully or partially random oligonucleotide sequence with the polynucleotide encoding the protein scaffold, the fully or partially random oligonucleotide sequences being located at sites in the polynucleotide that enable the encoded engineered enzyme to discriminate between the one or more target substrates and one or more other substrates; and(c) selecting out of the population generated in step (b) one or more polynucleotides that encode enzymes that have the defined specificities toward the one or more target substrates.
[0155]Any enzyme can serve as the protein scaffold in step (a). It can be a naturally occurring enzyme, a variant or a truncated derivate therefore, or an engineered enzyme. For human therapeutic use, the protein scaffold is preferably a mammalian enzyme, and more preferably a human enzyme. In that aspect, the invention is directed to a method for the generation of essentially mammalian, especially of essentially human enzymes with specificities that are different from specificities of any enzyme encoded in mammalian genomes or in the human genome, respectively.
[0156]According to the invention, the protein scaffold provided in step (a) of this aspect requires to be capable of catalyzing one or more chemical reactions on a target substrate. Therefore, a protein scaffold is selected from the group of potential protein scaffolds by its activity on the target substrate.
[0157]In a preferred variant of this aspect of the invention, a protein scaffold with hydrolase activity is used. Preferably, a protein scaffold with proteolytic activity is used, and more preferably, a protease with very low specificity having basic activity on the target substrate is used as the protein scaffold. Examples of proteases from different structural classes with low substrate specificity are Papain, Trypsin, Chymotrypsin, Subtilisin, SET (trypsin-like serine protease from Streptomyces erythraeus), Elastase, Cathepsin G or Chymase. Before being employed as the protein scaffold, the amino acid sequence of the protease may be modified in order to change protein properties other than specificity, e.g catalytic activity, stability, inhibitor sensitivity, or expression yield, essentially as described in WO 92/18645, or in order to change specificity, essentially as described in EP 02020576.3 and PCT/EP03/04864.
[0158]Another option for a feasible protein scaffold are lipases. Hepatic lipase, lipoprotein lipase and pancreatic lipase belong to the "lipoprotein lipase superfamily", which in turn is an example of the GX-class of lipases (M. Fischer, J. Pleiss (2003), Nucl. Acid. Res., 31, 319-321). The substrate specificity of lipases can be characterized by their relative activity towards triglycerol esters of fatty acids and phospholipids, bearing a charged head group. Alternatively, other hydrolases such as esterases, glycosylases, amidases, or nitrilases may be used as scaffolds.
[0159]Transferases are also feasible protein scaffolds. Glycoslytransferases are involved in many biological synthesis involving a variety of donors and acceptors. Alternatively, the protein scaffold may have ligase, lyase, oxidoreductase, or isomerase activity.
[0160]In a first embodiment, the one or more fully or partially random peptide sequences are inserted at specific sites in the protein scaffold. These insertion sites are characterized by the fact that the inserted peptide sequences can act as discriminators between different substrates, i.e. as Specificity Determining Regions or SDRs. Such insertion sites can be identified by several approaches. Preferably, insertion sites are identified by analysis of the three-dimensional structure of the protein scaffolds, by comparative analysis of the primary sequences of the protein scaffold with other enzymes having different quantitative specificities, or experimentally by techniques such as alanine scanning, random mutagenesis, or random deletion, or by any combination thereof.
[0161]A first approach to identify insertion sites for SDRs bases on the three-dimensional structure of the protein scaffold as it can be obtained by x-ray crystallography or by nuclear magnetic resonance studies. Structural alignment of the protein scaffold in comparison with other enzymes of the same structural class but having different quantitative specificities reveals regions of high structural similarity and regions with low structural similarity. Such an analysis can for example be done using public software such as Swiss PDB viewer (Guex, N. and Peitsch, M. C. (1997) Electrophoresis 18, 2714-2723). Regions of low structural similarity are preferred SDR insertion sites.
[0162]In a second approach to identify insertion sites for SDRs, three-dimensional structures of the scaffold protein in complex with competitive inhibitors or substrate analogs are analysed. It is assumed that the binding site of a competitive inhibitor significantly overlaps with the binding site of the substrate. In that case, atoms of the protein that are within a certain distance of atoms of the inhibitor are likely to be in a similar distance to the substrate as well. Choosing a short distance, e.g. <5 Å, will result in an ensemble of protein atoms that are in close contact with the substrate. These residues would constitute the first shell contacts and are therefore preferred insertion sites for SDRs. Once first shell contacts have been identified, second shell contacts can be found by repeating the distance analysis starting from first shell atoms. In yet another alternative of the invention the distance analysis described above is performed starting from the active site residues.
[0163]In third approach to identify insertion sites for SDRs, the primary sequence of the scaffold protein is aligned with other enzymes of the same structural class but having different quantitative specificities using an alignment algorithm. Examples of such alignment algorithms are published (Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol. 215:403-410; "Statistical methods in Bioinformatics: an introduction" by Ewens, W. & Grant, G. R. 2001, Springer, New York). Such an alignment may reveal conserved and non-conserved regions with varying sequence homology, and, in particular, additional sequence elements in one or more enzymes compared to the scaffold protein. Conserved regions of are more likely to contribute to phenotypes shared among the different proteins, e.g. stabilizing the three-dimensional fold. Non-conserved regions and, in particular, additional sequences in enzymes with quantitatively higher specificity (Turner, R. et al. (2002) J. Biol. Chem., 277, 33068-33074) are preferred insertion sites for SDRs.
[0164]For proteases currently five families are known, namely aspartic-, cysteine-, serine-, metallo- and threonine proteases. Each family includes groups of proteases that share a similar fold. Crystallographic structures of members of these groups have been solved and are accessible through public databases, e.g. the Brookhaven protein database (H. M. Berman et al. Nucleic Acids Research, 28 pp. 235-242 (2000)). Such databases also include structural homologs in other enzyme classes and nonenzymatically active proteins of each class. Several tools are available to search public databases for structural homologues: SCOP--a structural classification of proteins database for the investigation of sequences and structures. (Murzin A. G. et al. (1995) J. Mol. Biol. 247, 536-540); CATH--Class, Architecture, Topology and Homologous superfamily: a hierarchical classification of protein domain structures (Orengo et al. (1997) Structure 5(8) 1093-1108); FSSP--Fold classification based on structure-structure alignment of proteins (Holm and Sander (1998) Nucl. Acids Res. 26 316-319); or VAST--Vector alignment search tool (Gibrat, Madej and Bryant (1996) Current Opinion in Structural Biology 6, 377-385).
In the above described approaches, members of structural classes are compared in order to identify insertion sites for SDRs.
[0165]In a preferred variant of these approaches serine proteases of the structural class S1 are compared with each other. Trypsin represents a member with low substrate specificity, as it requires only an arginine or lysine residue at the P1 position. On the other hand, thrombin, tissue-type plasminogen activator or enterokinase all have a high specificity towards their substrate sequences, i.e. (L/I/V/F)XPR NA, CPGR VVGG and DDDK , respectively (Perona, J. & Craik, C. (1997) J. Biol. Chem., 272, 29987-29990; Perona, J. & Craik, C (1995) Protein Science, 4, 337-360). An alignment of the amino acid sequences of these proteases is described in example 1 (FIG. 2) along with the identification of SDRs.
[0166]A further example within the family of serine proteases is given by members of the structural class S8 (subtilisin fold). Subtilisin is the type protease for this class and represents an unspecific protease (Ottesen, M. & Svendsen, A. (1998) Methods Enzymol. 19, 199-215). Furin, PC1 and PC5 are proteases of the same structural class involved in the processing of propeptides and have a high substrate specificity (Seidah, N. & Chretien, M. (1997) Curr. Opin. Biotech., 8: 602-607; Bergeron, F. et al. (2000) J. Mol. Endocrin., 24:1-22). In a preferred variant of the approach alignments of the primary amino acids sequences (FIG. 4) are used to identify eleven sequence stretches longer than three amino acids which specific proteases have in addition compared to subtilisin and are therefore potential specificity determining regions. In a further variant of the approach information from the three-dimensional structure of subtilisin can be used in order to further narrow down the selection (FIG. 3). Out of the eleven inserted sequence stretches, three are especially close to the active site residues, namely stretch number 7, 8 and 11 which are insertions in PC5, PC1 and all three specific proteases, respectively (FIG. 3). In a preferred variant, one or several amino acid stretches of variable length and composition can be inserted into the subtilisin sequence at one or several of the eleven positions. In a more preferred variant of the approach the insertion is performed at regions 7, 8 or 11 or any combination thereof. In another preferred variant of the approach protease scaffolds other than subtilisin from the structural class S8 are used.
[0167]In a further preferred variant of this approach, aspartic acid proteases of the structural class A1 are analyzed (Rawlings, N. D. & Barrett, A. J. (1995). Methods Enzymol. 248, 105-120; Chitpinityol, S. & Crabbe, M J. (1998), Food Chemistry, 61, 395-418). Examples for the A1 structural class of aspartic proteases are pepsin with a low as well as beta-secretase (Gruninger-Leitch, F., et al. (2002) J. Biol. Chem. 277, 4687-4693) and renin (Wang, W. & Liang, T C. (1994) Biochemistry, 33, 14636-14641) with relatively high substrate specificities. Retroviral proteases also belong to this class, although the active enzyme is a dimer of two identical subunits. The viral proteases are essential for the correct processing of the polyprotein precursor to generate functional proteins which requires a high substrate specificity in each case (Wu, J. et al. (1998) Biochemistry, 37, 4518-4526; Pettit, S. et al. (1991) J. Biol. Chem., 266, 14539-14547). Pepsin is the type protease for this class and represents an unspecific protease (Kageyama, T. (2002) Cell. Mol. Life. Sci. 59, 288-306). B-secretase and Cathepsin D (Aguilar, C. F. et al. (1995) Adv. Exp. Med. Biol. 362, 155-166) are proteases of the same structural class and have a high substrate specificity. In a preferred variant of the approach alignments of the primary amino acids sequences (FIG. 6) are used to identify six sequence stretches longer than three amino acids which are inserted in the specific proteases compared to pepsin and are therefore potential specificity determining regions. In a further variant of the approach information from the three-dimensional structure of b-secretase can be used in order to further narrow down the selection. Out of the six inserted sequence stretches, three are especially close to the active site residues, namely stretch number 1, 3 and 4 which are insertions in cathepsin D and beta-secretase, respectively (FIG. 5). In a preferred variant of the approach, one or several amino acid stretches of variable length and composition can be inserted into the pepsin sequence at one or several of the six positions. In a more preferred embodiment of the invention the insertion is performed at the positions 1, 3 or 4 or any combination thereof. In another preferred embodiment of the invention protease scaffolds other than pepsin are used.
[0168]There are cases where a certain structural class does not include known members of low and high specificity. This is exemplified by the C14 class of caspases which belong to the cysteine protease family (Rawlings, N. D. & Barrett, A. J. (1994) Methods Enzymol. 244, 461-486) and which all show high specificity for P4 to P1 positions. For example, caspase-1, caspase-3 and caspase-9 recognize the sequences YVAD , DEVD or LEHD , respectively. Identification of the regions that differ between the caspases will include the regions responsible for the differences in substrate specificity (FIGS. 7 and 8).
[0169]Finally, non-enzymatic proteins of the same fold as the enzyme scaffold may also contribute to the identification of insertion sites for SDRs. For example, haptoglobin (Arcoleo, J. & Greer, J.; (1982) J. Biol. Chem. 257, 10063-10068) and azurocidin (Almeida, R. et al. (1991) Biochem. Biophys. Res. Commun. 177, 688-695) share the same chymotrypsin-like fold with all S1 proteases. Due to substitutions in the active site residues these proteins do not posses any proteolytic function, yet they show high homology with active proteases. Differences between these proteins and specific proteases include regions that can serve as insertion sites for SDRs. In a fourth approach, insertion sites for SDRs are identified experimentally by techniques such as alanine scanning, random mutagenesis, random insertion or random deletion. In contrast to the approach disclosed above, this approach does not require detailed knowledge about the three-dimensional structure of the scaffold protein. In one preferred variant of this approach, random mutagenesis of enzymes with relatively high specificity from the same structural class as the protein scaffold and screening for loss or change of specificity can be used to identify insertion sites for SDRs in the protein scaffold.
[0170]Random mutagenesis, alanine scanning, random insertion or random deletion are all done on the level of the polynucleotides encoding the enzymes. There are a variety of protocols known in the literature (e.g. Sambrook, J. F; Fritsch, E. F.; Maniatis, T.; Cold Spring Harbor Laboratory Press, Second Edition, 1989, New York). For example, random mutagenesis can be achieved by the use of a polymerase as described in patent WO 9218645. According to this patent, the one or more genes encoding the one or more proteases are amplified by use of a DNA polymerase with a high error rate or under conditions that increase the rate of misincorporations. For example the method of Cadwell and Joyce can be employed (Cadwell, R. C. and Joyce, G. F., PCR methods. Appl. 2 (1992) 28-33). Other methods of random mutagenesis such as, but not limited to, the use of mutator stains, chemical mutagens or UV-radiation can be employed as well.
[0171]Alternatively, oligonucleotides can be used for mutagenesis that substitute randomly distributed amino acid residues with an alanine. This method is generally referred to as alanine scanning mutagenesis (Fersht, A. R. Biochemistry (1989) 8031-8036). As a further alternative, modifications of the alanine scanning mutagenesis such as binominal mutagenesis (Gregoret, L. M. and Sauer, R. T. PNAS (1993) 4246-4250) or combinatorial alanine scanning (Weiss et al., PNAS (2000) 8950-8954) can be employed.
[0172]In order to express engineered enzymes, the DNA encoding such engineered proteins is ligated into a suitable expression vector by standard molecular cloning techniques (e.g. Sambrook, J. F; Fritsch, E. F.; Maniatis, T.; Cold Spring Harbor Laboratory Press, Second Edition, 1989, New York). The vector is introduced in a suitable expression host cell, which expresses the corresponding engineered enzyme variant. Particularly suitable expression hosts are bacterial expression hosts such as Escherichia coli or Bacillus subtilis, or yeast expression hosts such as Saccharomyces cerevisae or Pichia pastoris, or mammalian expression hosts such as Chinese Hamster Ovary (CHO) or Baby Hamster Kidney (BHK) cell lines, or viral expression systems such as bacteriophages like M13 or Lambda, or viruses such as the Baculovirus expression system. As a further alternative, systems for in vitro protein expression can be used. Typically, the DNA is ligated into an expression vector behind a suitable signal sequence that leads to secretion of the enzyme variants into the extracellular space, thereby allowing direct detection of protease activity in the cell supernatant. Particularly suitable signal sequences for Escherichia coli are HlyA, for Bacillus subtilis AprE, NprB, Mpr, AmyA, AmyE, Blac, SacB, and for S. cerevisiae B ar1, Suc2, Matα, Inu1A, Ggplp. Alternatively, the enzyme variants are expressed intracellularly and the substrates are expressed also intracellularly. Preferably, this is done essentially as described in patent application WO 0212543, using a fusion peptide substrate comprising two auto-fluorescent proteins linked by the substrate amino-acid sequence.
[0173]As a further alternative, after intracellular expression of the enzyme variants, or secretion into the periplasmatic space using signal sequences such as DsbA, PhoA, PelB, OmpA, OmpT or gIII for Escherichia coli, a permeabilisation or lysis step releases the enzyme variants into the supernatant. The destruction of the membrane barrier can be forced by the use of mechanical means such as ultrasonic, French press, or the use of membrane-digesting enzymes such as lysozyme. As another, further alternative, the genes encoding the enzyme variants are expressed cell-free by the use of a suitable cell-free expression system. For example, the S30 extract from Escherichia coli cells is used for this purpose as described by Lesly et al. (Methods in Molecular Biology 37 (1995) 265-278).
[0174]The ensemble of gene variants generated and expressed by any of the above methods are analyzed with respect to their affinity, substrate specificity or activity by appropriate assay and screening methods as described in detail for example in patent application PCT/EP03/04864. Genes from catalytically active variants having reduced specificity in comparison to the original enzyme are analyzed by sequencing. Sites at which mutations and/or insertions and/or deletions occurred are preferred insertion sites at which SDRs can be inserted site-specifically.
[0175]In a second embodiment, the one or more fully or partially random peptide sequences are inserted at random sites in the protein scaffold. This modification is usually done on the polynucleotide level, i.e. by inserting nucleotide sequences into the gene that encodes the protein scaffold. Several methods are available that enable the random insertion of nucleotide sequences. Systems that can be used for random insertion are for example ligation based systems (Murakami et al. Nature Biotechnology 20 (2002) 76-81), systems based on DNA polymerisation and transposon based systems (e.g. GPS-M® mutagenesis system, NEB Biolabs; MGS® mutation generation system, Finnzymes). The transposon-based methods employ a transposase-mediated insertion of a selectable marker gene that contains at its termini recognition sequences for the transposase as well as two sites for a rare cuffing restriction endonuclease. Using the latter endonuclease one usually releases the selection marker and after religation obtains an insertion. Instead of performing the religation one can alternatively insert a fragment that has terminal recognition sequences for one or two outside cutting restriction endonuclease as well as a selectable marker. After ligation, one releases this fragment using the one or two outside cutting endonucleases. After creating blunt ends by standard methods one inserts blunt ended random fragments at random positions into the gene.
[0176]In a further preferred embodiment, methods for homologous in-vitro recombination are used to combine the mutations introduced by the above mentioned methods to generate enzyme populations. Examples of methods that can be applied are the Recombination Chain Reaction (RCR) according to patent application WO 0134835, the DNA-Shuffling method according to the patent application WO 9522625, the Staggered Extension method according to patent WO 9842728, or the Random Priming recombination according to patent application WO9842728. Furthermore, also methods for non-homologous recombination such as the Itchy method can be applied (Ostermeier, M. et al. Nature Biotechnology 17 (1999) 1205-1209).
[0177]Upon random insertion of a nucleotide sequence into the protein scaffold one obtains a library of different genes encoding enzyme variants. The polynucleotide library is subsequently transferred to an appropriate expression vector. Upon expression in a suitable host or by use of an in vitro expression system, a library of enzymes containing randomly inserted stretches of amino acids is obtained.
[0178]According to step (b) of this third aspect of the invention, one or more fully or partially random peptide sequences are inserted into the protein scaffold. The actual number of such inserted SDRs is determined by the intended quantitative specificity following the relation: the higher the intended specificity is, the more SDRs are inserted. Whereas a single SDR enables the generation of moderately specific enzymes, two SDRs enable already the generation of significantly specific enzymes. However, up to six and more SDRs can be inserted into a protein scaffold. A similar relation is valid for the length of the SDRs: the higher the intended specificity is, the longer are the SDRs that are to be inserted. SDRs can be as short as one to four amino acid residues. They can, however, also be as long as 50 amino acid residues. Significant specificity can already be generated by the use of SDRs of a length of four to six amino acid residues.
[0179]The peptid sequences that are inserted can be fully or partially random. In this context, fully random means that a set of sequences are inserted in parallel that includes sequences that differ from each other in each and every position. Partially random means that a set of sequences are inserted in parallel that includes sequences that differ from each other in at least one position. This difference can be either pair-wise or with respect to a single sequence. For example, when regarding an insertion of the length of four amino acids, partial random could be a set (i) that includes AGGG, GVGG, GGLG, GGGI, or (ii) that includes AGGG, VGGG, LGGG and IGGG. Alternatively, random sequences also comprises sequences that differ from each other in length. Randomization of the peptide sequences is achieved by randomization of the nucleotide sequences that are inserted into the gene at the respective sites. Thereby, randomization can be achieved by employing mixtures of nucleobases as monomers during chemical synthesis of the oligonucleotides. A particularly preferred mixture of monomers for a fully random codon that in addition minimizes the probability of stop codons is NN(GTC). Alternatively, random oligonucleotides can be obtained by fragmentation of DNA into short fragments that are inserted into the gene at the respective sites. The source of the DNA to be fragmented may be a synthetic oligonucleotide but alternatively may originate from cloned genes, cDNAs, or genomic DNA. Preferably, the DNA is a gene encoding an enzyme. The fragmentation can, for example, be achieved by random endonucleolytic digestion of DNA. Preferably, an unspecific endonuclease such as DNAse I (e.g. from bovine pancreas) is employed for the endonucleolytic digestion.
[0180]If steps (a)-(c) of the inventive method are repeated cyclically, there are different alternatives for obtaining random peptide sequences that are inserted in consecutive rounds. Preferably, SDRs that were identified in one round as leading to increased specificity of enzyme are used as templates for the random peptide sequences that are inserted in the following round.
[0181]In a preferred alternative, the sequences selected in one round are analysed and randomized oligonucleotides are generated based on these sequences. This can, for example, be achieved by using in addition to the original nucleotide with a certain percentage mixtures of the other three nucleotides monomers at each position in the oligonucleotide synthesis. If, for example, in a first round an SDRs is identified that has the amino acid sequence ARLT, e.g. encoded by the nucleotide sequence GCG CGC CTT ACC, a random peptide sequence inserted in this SDR site could be encoded by an oligonucleotide with 70% G, 10% A, 10% T and 10% C at the first position, 70% C, 10% G, 10% T and 10% A at the second position, etc. This leads at each position approximately in 1 of 3 cases to the template amino acid and in 2 of 3 cases to another amino acid.
[0182]In another preferred alternative, the sequences selected in one round are analyzed and a consensus library is generated based on these sequences. This can, for example, be achieved by using defined mixtures of nucleotides at each position in the oligonucleotide synthesis in a way that leads to mixtures of the amino acid residues that were identified at each position of the SDR selected in the previous round. If, for example, in a first round two SDRs are identified that have the amino acid sequences ARLT and VPGS, a consensus library inserted in this SDR site in the following round could be encoded by an oligonucleotide with the sequence G(C/T)G C(G/C)C (G/T)(G/T)G (A/T)CC. This would correspond to the random peptide sequence (A/V)(R/P)(L/G/V/W)(T/S), thereby allowing all combinations of the amino acid residues identified in the first round, and, due to the degeneracy of the genetic code, allowing in addition to a lower degree alternative amino acid residues at some positions.
[0183]In another preferred alternative, the sequences selected in one round are, without previous analysis, recombined using methods for the in vitro recombination of polynucleotides, such as the methods described in WO 01/34835 (the following also provides details of the eighth and ninth aspect of the invention).
[0184]After insertion of the partially or fully random sequences into the gene encoding the scaffold protein, and eventually ligation of the resulting gene into a suitable expression vector using standard molecular cloning techniques (Sambrook, J. F; Fritsch, E. F.; Maniatis, T.; Cold Spring Harbor Laboratory Press, Second Edition, 1989, New York), the vector is introduced in a suitable expression host cell which expresses the corresponding enzyme variant. Particularly suitable expression hosts are bacterial expression hosts such as Escherichia coli or Bacillus subtilis, or yeast expression hosts such as Saccharomyces cerevisae or Pichia pastoris, or mammalian expression hosts such as Chinese Hamster Ovary (CHO) or Baby Hamster Kidney (BHK) cell lines, or viral expression systems such as bacteriophages like M13 T7 phage or Lambda, or viruses such as the Baculovirus expression system. As a further alternative, systems for in vitro protein expression can be used. Typically, the DNA is ligated into an expression vector behind a suitable signal sequence that leads to secretion of the enzyme variants into the extracellular space, thereby allowing direct detection of enzyme activity in the cell supernatant. Particularly suitable signal sequences for Escherichia coli are ompA, pelB, HlyA, for Bacillus subtilis AprE, NprB, Mpr, AmyA, AmyE, Blac, SacB, and for S. cerevisiae Bar1, Suc2, Matα, Inu1A, Ggplp.
[0185]Alternatively, the enzyme variants are expressed intracellularly and the substrates are expressed also intracellularly. According to protease variants this is done essentially as described in patent application WO 0212543, using a fusion peptide substrate comprising two auto-fluorescent proteins linked by the substrate amino-acid sequence. As a further alternative, after intracellular expression of the enzyme variants, or secretion into the periplasmatic space using signal sequences such as DsbA, PhoA, PelB, OmpA, OmpT or gIII for Escherichia coli, a permeabilisation or lysis step releases the enzyme variants into the supernatant. The destruction of the membrane barrier can be forced by the use of mechanical means such as ultrasonic, French press, or the use of membrane-digesting enzymes such as lysozyme. As another, further alternative, the genes encoding the enzyme variants are expressed cell-free by the use of a suitable cell-free expression system. For example, the S30 extract from Escherichia coli cells is used for this purpose as described by Lesly et al. (Methods in Molecular Biology 37 (1995) 265-278).
[0186]After introduction of the vector into host cells, these cells are screened for the expression of enzymes with specificity for the intended target substrate. Such screening is typically done by separating the cells from each other, in order to enable the correlation of genotype and phenotype, and assaying the activity of each cell clone after a growth and expression period. Such separation can for example be done by distribution of the cells into the compartments of sample carriers, e.g. as described in WO 01/24933. Alternatively, the cells are separated by streaking on agar plates, by enclosing in a polymer such as agarose, by filling into capillaries, or by similar methods.
[0187]Identification of variants with the intended specificity can be done by different approaches. In the case of proteases, preferably assays using peptide substrates essentially as described in PCT/EP03/04864 are employed.
[0188]Regardless of the expression format, selection of enzyme variants is done under conditions that allow identification of enzymes that recognize and convert the target sequence preferably. As a first alternative, enzymes that recognize and convert the target sequence preferably are identified by screening for enzymes with a high affinity for the target substrate sequence. High affinity corresponds to a low KM which is selected by screening at target substrate concentrations substantially below the KM of the first enzyme. Preferably, the substrates that are used are linked to one or more fluorophores that enable the detection of the modification of the substrate at concentrations below 10 μM, preferably below 1 μM, more preferably below 100 nM, and most preferably below 10 nM.
[0189]As a second alternative, enzymes that recognize and convert the target substrate preferably are identified by employing two or more substrates in the assay and screening for activity on these two or more substrates in comparison. Preferably, the two or more substrates employed are linked to different marker molecules, thereby enabling the detection of the modification of the two or more substrates consecutively or in parallel. In the case of proteases, particularly preferably two peptide substrates are employed, one peptide substrate having an arbitrarily chosen or even partially or fully random amino-acid sequence thereby enabling to monitor the activity on an arbitrary substrate, and the other peptide substrate having an amino-acid sequence identical to or resembling the intended target substrate sequence thereby enabling to monitor the activity on the target substrate. Especially preferably, these two peptide substrates are linked to fluorescent marker molecules, and the fluorescent properties of the two peptide substrates are sufficiently different in order to distinguish both activities when measured consecutively or in parallel. For example, a fusion protein comprising a first autofluorescent protein, a peptide, and a second autofluorescent protein according to patent application WO 0212543 can be used for this purpose. Alternatively, fluorophores such as rhodamines are linked chemically to the peptide substrates.
[0190]As a third alternative, enzymes that recognize and convert the target substrate preferably are identified by employing one or more substrates resembling the target substrate together with competing substrates in high excess. Screening with respect to activity on the substrates resembling the target substrate is then done in the presence of the competing substrates. Enzymes having a specificity which corresponds qualitatively to the target specificity, but having only a low quantitative specificity are identified as negative samples in such a screen. Whereas enzymes having a specificity which corresponds qualitatively and quantitatively to the target specificity are identified positively. Preferably, the one or more substrates resembling the target substrate are linked to marker molecules, thereby enabling the detection of their modifications, whereas the competing substrates do not carry marker molecules. The competing substrates have arbitrarily chosen or random amino-acid sequences, thereby acting as competitive inhibitors for the hydrolysis of the marker-carrying substrates. For example, protein hydrolysates such as Trypton can serve as competing substrates for engineered proteolytic enzymes according to the invention.
[0191]As a fourth alternative, enzymes that recognize and convert the target substrate preferably are identified and selected by an amplification-coupled or growth-coupled selection step. Furthermore, the activity can be measured intracellularily and the selection can be done by a cell sorter, such as a fluorescence-activated cell sorter.
[0192]As a further alternative, enzymes that recognize and convert the target substrate are identified by first selecting enzymes that preferentially bind to the target substrate, and secondly selecting out of this subgroup of enzyme variants those enzymes that convert the target substrate. Selection for enzymes that preferentially bind the target substrate can be either done by selection of binders to the target substrate or by counter-selection of enzymes that bind to other substrates. Methods for the selection of binders or for the counter-selection of non-binders is known in the art. Such methods typically require phenotype-genotype coupling which can be solved by using surface display expression methods. Such methods include, for example, phage or viral display, cell surface display and in vitro display. Phage or viral display typically involves fusion of the protein of interest to a viral/phage protein. Cell surface display, i.e. either bacterial or eukaryotic cell display, typically involves fusion of the protein of interest to a peptide or protein that is located at the cell surface. In in-vitro display, the protein is typically made in vitro and linked directly or indirectly to the mRNA encoding the protein (DE 19646372).
[0193]The invention also provides for a composition or pharmaceutical composition comprising one or more engineered enzymes according to the first aspect of the invention as defined herein before. The composition may optionally comprise an acceptable carrier, excipient and/or auxiliary agent. Non-pharmaceutical compositions as defined herein are research composition, nutritional composition, cleaning composition, disinfection composition, cosmetic composition or composition for personal care. Moreover, DNA sequences coding for the engineered enzyme as defined herein before and vectors containing said DNA sequences are also provided Finally, transformed host cells (prokaryotic or eukaryotic) or transgenic organisms containing such DNA sequences and/or vectors, as well as a method utilizing such host cells or transgenic animals for producing the engineered enzyme of the first aspect of the invention are also contemplated.
DETAILED DESCRIPTION OF THE FIGURES
[0194]FIG. 1: Three-dimensional structure of human trypsin I with the active site residues shown in "ball-and-stick" representation and with the marked regions indicating potential SDR insertion sites.
[0195]FIG. 2: Alignment of the primary amino acid sequences of the human proteases trypsin I, alpha-thrombin and enteropeptidase all of which belong to the structural class S1 of the serine protease family. Trypsin represents an unspecific protease of this structural class, while alpha-thrombin and enteropeptidase are proteases with high substrate specificity. Compared to trypsin several regions of insertions of three or more amino acids into the primary sequence of a-thrombin and enterokinase are seen. The region marked with (-1-) and the region marked with (-3-) are preferred SDR insertion sites. In the tertiary structure of alpha-thrombin both regions are in the vicinity of the substrate binding site. These regions therefore fulfil two criteria to be selected as candidates for SDRs: firstly, they represent insertions in the specific proteases compared to the unspecific one and, secondly, they are close to the substrate binding site. A representation of the three-dimensional structure is given in FIG. 3.
[0196]FIG. 3: Three-dimensional structure of subtilisin with the active site residues being shown in "ball-and-stick" representation and with the numbered regions indicating potential SDR insertion sites.
[0197]FIG. 4: Alignment of the primary amino acid sequences of subtilisin E, furin, PC1 and PC5 all of which belong to the structural class S8 of the serine protease family. Subtilisin E represents an unspecific protease of this structural class, while furin, PC1 and PC5 are proteases with high substrate specificity. Compared to subtilisin several regions of insertions of three or more amino acids into the primary sequence of furin, PC1 and PC5 are seen. The regions marked with (-4-), (-5-), (-7-), (-9-) and (-11-) are preferred SDR insertion sites. These regions stretches fulfill two criteria to be selected as candidates for SDRs: firstly, they represent insertions in the specific proteases compared to the unspecific one and, secondly, they are close to the active site residues.
[0198]FIG. 5: Three-dimensional structure of beta-secretase with the active site residues being shown in "ball-and-stick" representation and with the numbered regions indicating potential SDR insertion sites.
[0199]FIG. 6: Alignment of the primary amino acid sequences of pepsin, b-secretase and cathepsin D, all of which belong to the structural class A1 of the aspartic protease family. Pepsin represents an unspecific protease of this structural class, while b-secretase and cathepsin D are proteases with high substrate specificity. Compared to pepsin several regions of insertions of three or more amino acids into the primary sequence of b-secretase and cathepsin D are seen. The regions marked with -1- to -11- correspond to possible SDR combining sites and are also marked in FIG. 5.
[0200]FIG. 7: illustrates the three-dimensional structure of caspase 7 with the active site residues being shown in "ball-and-stick" representation and with the numbered regions indicating potential SDR insertion sites.
[0201]FIG. 8: shows the primary amino acid sequence of caspase 7 as a member of the cysteine protease class C14 family (see also SEQ ID NO: 14).
[0202]FIG. 9: Schematic representation of method according to the third aspect of the invention.
[0203]FIG. 10: Western blot analysis of trypsin expression. Supernatant of cell cultures expressing variants of trypsin are compared to negative controls. Lane 1: molecular weight standard; lane 2: negative control; lane 3: supernatant of variant a; lane 4: negative control; lane 5: supernatant of variant b. A primary antibody specific to the expressed protein and a secondary antibody for generation of the signal were used.
[0204]FIG. 11: Time course of the proteolytic cleavage of a target substrate. Supernatant of cells containing the vector with the gene for human trypsin and that of cells containing the vector without the gene was incubated with the peptide substrate described in the text. Cleavage of the peptide results in a decreased read out value. Proteolytic activity is confirmed for the positive clone.
[0205]FIG. 12: Relative activity of three engineered proteolytic enzymes in comparison with human trypsin I on two different peptide substrates. A time course of the proteolytic digestion of the two substrates was performed and evaluated. Substrate B was used for screening and substrate A is a closely related sequence. Relative activity of the three variants was normalized to the activity of human trypsin I. Variant 1 and 2 clearly show increased specificity towards the target substrate. Variant 3, on the other hand, serves as a negative control with similar activities as the human trypsin I.
[0206]FIG. 13: Relative specificities of trypsin and variants of engineered proteolytic enzymes with one or two SDRs, respectively. Activity of the proteases was determined in the presence and absence of competitor substrate, i.e. peptone at a concentration of 10 mg/ml. Time courses for the proteolytic cleavage were recorded and the time constants k determined. The ratios between the time constants with and without competitor were formed and represent a quantitative measure for the specificity of the protease. The ratios were normalized to trypsin. The specificity of the variant containing two SDRs is 2.5 fold higher than that of the variant with SDR2 alone.
[0207]FIG. 14: Shows the relative specificities of protease variants in absence and presence of competitor substrate. The protease variants containing two inserts with different sequences and the non-modified scaffold human trypsin I were expressed in a suitable host. Activity of the protease variants was determined as the cleavage rate of a peptide with the desired target sequence of TNF-alpha in the absence and presence of competitor substrate. Specificity is expressed as the ratio of cleavage rates in the presence and absence of competitor.
[0208]FIG. 15: The figure shows the reduction of cytotoxicity induced by human TNF-alpha when incubating the human TNF-alpha with concentrated supernatant from cultures expressing the inventive engineered proteolytic enzymes being specific for human TNF-alpha. This indicates the efficacy of the inventive engineered proteolytic enzymes.
[0209]FIG. 16: The figure shows the reduction of cytotoxicity induced by human TNF-alpha when incubating the human TNF-alpha with different concentrations of purified inventive engineered proteolytic enzyme being specific for human TNF-alpha. Variant g comprises Seq ID No:72 as SDR1 and Seq ID No:73 as SDR2. This indicates the efficacy of the inventive engineered proteolytic enzymes.
[0210]FIG. 17: The figure compares the activity of inventive engineered proteolytic enzymes being specific for human TNF-alpha with the activity of human trypsin I on two protein substrates: (a) human TNF-alpha; (b) mixture of human serum proteins. This indicates the safety of the inventive engineered proteolytic enzymes. Variant x corresponds to Seq ID No: 75 comprising the SDRs according to Seq ID No. 89 (SDR1) and 95 (SDR2). Variants xi and xii correspond to derivatives thereof comprising the same SDR sequences.
[0211]FIG. 18: Specific hydrolysis of human VEGF by an engineered proteolytic enzyme derived from human trypsin.
EXAMPLES
[0212]In the following examples, materials and methods of the present invention are provided including the determination of catalytic properties of enzymes obtained by the method. It should be understood that these examples are for illustrative purpose only and are not to be construed as limiting this invention in any manner. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
[0213]In the experimental examples described below, standard techniques of recombinant DNA technology were used that were described in various publications, e.g. Sambrook et al. (1989), Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, or Ausubel et al. (1987), Current Protocols in Molecular Biology 1987-1988, Wiley Interscience. Unless otherwise indicated, restriction enzymes, polymerases and other enzymes as well as DNA purification kits were used according to the manufacturers specifications.
Example I
Identification of SDR Sites in Human Trypsin
[0214]Insertion sites for SDRs have been identified in the serine protease human trypsin I (structural class S1) by comparison with members of the same structural class having a higher sequence specificity. Trypsin represents a member with low substrate specificity, as it requires only an arginine or lysine residue at the P1 position. On the other hand, thrombin, tissue-type plasminogen activator or enterokinase all have a high specificity towards their substrate sequences, i.e. (L/I/V/F)XPR NA, CPGR VVGG and DDDK , respectively. The primary sequences and tertiary structures of these and further S1 serine proteases have been aligned in order to determine regions of low and high sequence and structure homology and especially regions that correspond to insertions in the sequences of the more specific proteases (FIG. 2). Several regions of insertions equal or longer than 3 amino acids representing potential SDR sites have been identified as indicated in FIG. 1. These regions were chosen as target sites for the insertion of SDRs in the examples below, e.g. SDR1 (region one in FIG. 2, after amino acid 42 according to SEQ ID NO:1) with a length of six and SDR2 (region three in FIG. 2, after amino acid 123 according to SEQ ID NO: 1) with a length of five amino acids, respectively.
Example II
Molecular Cloning of the Human Trypsin I Gene to be Used as Scaffold Protein and Expression of the Mature Protease in B. subtilis
[0215]The gene encoding the unspecific protease human trypsinogen I was cloned into the vector pUC18. Cloning was done as follows: the coding sequence of the protein was amplified by PCR using primers that introduced a KpnI site at the 5' end and a BamHI site at the 3' end. This PCR fragment was cloned into the appropriate sites of the vector pUC18. Identity was confirmed by sequencing. After sequencing the coding sequence of the mature protein was amplified by PCR using primers that introduced different BglI sites at the 5' end and the 3' end.
[0216]This PCR fragment was cloned into the appropriate sites of an E. coli-B. subtilis shuttle vector. The vector contains a pMB1 origin for amplification in E. coli, a neomycin resistance marker for selection in E. coli, as well as a P43 promoter for the constitutive expression in B. subtilis. A 87 bp fragment that contains the leader sequence encoding the signal peptide from the sacB gene of B. subtilis was introduced behind the P43 promoter. Different BglI restriction sites serve as insertion sites for heterologous genes to be expressed.
[0217]Expression of human trypsin I was confirmed by measurement of the proteolytic activity in supernatant of cells containing the vector with the gene in comparison to a negative control. A peptide including an arginine cleavage site was chosen as a substrate. The peptide was N-terminally biotinylated and labeled with a fluorophore at the C-terminus. After incubation of the peptide with culture supernatant streptavidin was added. Uncleaved peptide associate with streptavidin and lead to a high read out value while cleavage results in low read out values. FIG. 11 shows the time course of a proteolytic digestion of B. subtilis cells containing the vector with the trypsin I gene in comparison to B. subtilis cells containing the vector without the trypsin I gene (negative control).
[0218]As a further confirmation of expression of the protease, supernatants of cells containing the vector with the gene and control cells were analyzed by polyacrylamid gel electrophoreses and subsequent western blot using an antibody specific to the target protease. The procedure was performed according to standard methods (Sambrook, J. F; Fritsch, E. F.; Maniatis, T.; Cold Spring Harbor Laboratory Press, Second Edition, 1989, New York). FIG. 8 confirms expression of the protein only in the cells harbouring the vector with the gene for trypsin.
Example III
Providing a Scaffold Protein
[0219]In this example, human trypsin I was used as the scaffold protein. The gene was either used in its natural form, or, alternatively, was modified to result in a scaffold protein with increased catalytic activity or further improved characteristics. The modification was done by random modification of the gene, followed by expression of the enzyme and subsequent selection for increased activity. First, the gene was PCR amplified under error-prone conditions, essentially as described by Cadwell, R. C and Joyce, G. F. (PCR Methods Appl. 2 (1992) 28-33). Error-prone PCR was done using 30 pmol of each primer, 20 nmol dGTP and dATP, 100 nmol dCTP and dTTP, 20 fmol template, and 5 U Taq DNA polymerase in 10 mM Tris HCl pH 7.6, 50 mM KCl, 7 mM MgCl2, 0.5 mM MnCl2, 0.01% gelatin for 20 cycles of 1 min at 94° C., 1 min at 65° C. and 1 min at 72° C. The resulting DNA library was purified using the Qiaquick PCR Purification Kit following the suppliers' instructions. The PCR product was digested with the restriction enzyme BglI and purified. Afterwards, the PCR product was ligated into the E. coli-B. subtilis shuttle vector described above which was digested with BglI and dephosphorylated. The ligation products were transformed into E. coli, amplified in LB, and the plasmids were purified using the Qiagen Plasmid Purification Kit following the suppliers' instructions. Resulting plasmids were transformed into B. subtilis cells.
[0220]Alternatively, or in addition to random mutagenesis, variants of the gene were statistically recombined at homologous positions by use of the Recombination Chain Reaction, essentially as described in WO 0134835. PCR products of the genes encoding the protease variants were purified using the QIAquick PCR Purification Kit following the suppliers' instructions, checked for correct size by agarose gel electrophoresis and mixed together in equimolar amounts. 80 μg of this PCR mix in 150 mM TrisHCl pH 7.6, 6.6 mM MgCl2 were heated for 5 min at 94° C. and subsequently cooled down to 37° C. at 0.05° C./s in order to re-anneal strands and thereby produce heteroduplices in a stochastic manner. Then, 2.5 U Exonuclease III per μg DNA were added and incubated for 20, 40 or 60 min at 37° C. in order to digest different lengths from both 3' ends of the heteroduplices. The partly digested PCR products were refilled with 0.6 U Pfu polymerase per μg DNA by incubating for 15 min at 72° C. in 0.17 mM dNTPs and Pfu polymerase buffer according to the suppliers' instructions. After performing a single PCR cycle, the resulting DNA was purified using the QIAquick PCR Purification Kit following the suppliers' instructions, digested with BglI and ligated into the linearized vector. The ligation products were transformed into E. coli, amplified in LB containing ampicillin as marker, and the plasmids were purified using the Qiagen Plasmid Purification Kit following the suppliers' instructions. Resulting plasmids were transformed into B. subtilis cells.
Example IV
Insertion of SDRs into the Protein Scaffold of Human Trypsin I and Generation of an Engineered Proteolytic Enzyme with Specificity for a Peptide Substrate Having the Sequence KKWLGRVPGGPV
[0221]In order to create insertion sites for SDRs in human trypsin I, two pairs of different restriction sites were introduced into the gene at sites that were identified as potential SDR sites (see Example I above) without changing the amino acid sequence. The insertion of the restriction sites was done by overlap extension PCR. Primers restr1 and restr2 were used for the introduction of SacII and BamHI restriction sites, restr3 and restr4 were used for the introduction of KpnI and NheI restriction sites. The sequences of the primers were as follows:
TABLE-US-00002 Binding site for restr1 and restr2 and the cor- responding amino acid sequence (SEQ ID NO: 54): 5'-GGTGGTATCAGCAGGCCACTGCTACAAGTCCCGCATCCAGGT-3' V V S A G H C Y K S R I Q Forward primer restr1 (SEQ ID NO: 56): 5'-GGTGGTATCCGCGGGCCACTGCTACAAGTCCCGGATCCAGGT-3' Reverse primer restr2 (SEQ ID NO: 57): 5'-ACCTGGATCCGGGACTTGTAGCAGTGGCCCGCGGATACCACC-3' Binding site for restr3 and restr4 and the cor- responding amino acid sequence (SEQ ID NO: 58): 5'- CCACTGGCACGAAGTGCCTCATCTCTGGCTGGGGCAACACTGCGAGCTC T-3' T G T K C L I S G W G N T A S S Forward primer restr3 (SEQ ID NO: 60): 5'- CCACTGGCACGAAGTGCCTCATCTCTGGCTGGGGCAACACTGCGACCTC T-3' Reverse primer restr4 (SEQ ID NO: 61): 5'- AGAGCTAGCAGTGTTGCCCCAGCCAGAGATGAGGCACTTGGTACC AGTGG-3'
[0222]In a first overlap extension PCR, the SacII/BamHI sites were introduced, enabling to insert SDR1, and in a second overlap extension PCR the KpnI/NheI sites, enabling the insertion of SDR2. The product of the overlap extension PCR was amplified using primers pUC-forward and pUC-reverse. The sequences of pUC-forward and pUC-reverse are as follows:
TABLE-US-00003 pUC-forward (SEQ ID NO: 62): 5'- GGGGTACCCCACCACCATGAATCCACTCCT-3' pUC-reverse (SEQ ID NO: 63): 5'- CGGGATCCGGTATAGAGACTGAAGAGATAC-3'
[0223]The restriction sites generated thereby were subsequently used to insert defined or random oligonucleotides into the SDR1 and SDR2 insertion sites by standard restriction and ligation methods. Typically, two complementary synthetic 5'-phosphorylated oligonucleotides were annealed and ligated into a vector carrying the modified human trypsin I gene that was cleaved with the respective restriction enzymes. Oligonucleotides encoding SDR1 were inserted via the SacII/BamHI sites whereas oligonucleotides encoding SDR2 were inserted via the KpnI/NheI sites. For each insertion an oligonucleotide pair according to the following general sequences was used ([P] indicating 5'-phosphorylation, N and X indicating any nucleotide or amino acid residue, respectively):
TABLE-US-00004 oligox-SDR1f (SEQ ID NO: 64): 5'-[P]-GGGCCACTGCTACNNNNNNNNNNNNNNNNNNAAGTCCCG-3' oligox-SDR1r (SEQ ID NO: 66): 3'-CGCCCGGTGACGATGNNNNNNNNNNNNNNNNNNTTCAGGGCC TAG- [P]-5' G H C Y X X X X X X K S oligox-SDR2f (SEQ ID NO: 67): CAAGTGCCTCATCTCTGGCTGGGGCAACNNNNNNNNNNNNNNNACTG-3' oligox-SDR2r (SEQ ID NO: 69): 3'- CATGGTTCACGGAGTAGAGACCGACCCCGTTGNNNNNNNNNNNNNNNTGA CGATC-[P]-5' K C L I S G W G N X X X X X T
[0224]As an alternative to the above method, a PCR based method was used for the integration of random-sequences into the SDR1 and SDR2 insertion sites in the modified human trypsin I. For each SDR, one primer was used where the SDR region is fully randomized. Sequences of the primers were as follows (N=A/C/G/T, B=C/G/T, V=A/C/G):
TABLE-US-00005 Primer SDR1-mutnnb-forward (SEQ ID NO: 70): 5'- TGGTATCCGCGGGCCACTGCTACNNBNNBNNBNNBNNBNNBAAGTCCCGG ATCCAGGTG-3' Primer SDR2-mutnnb-reverse (SEQ ID NO: 71): 5'- GGCGCCAGAGCTAGCAGTVNNVNNVNNVNNVNNGTTGCCCCAGCCAGAGA TG-3'
The codon NNB, or VNN in the reverse strand, allows all 20 amino acids to made, but reduces the probability of encoding a stop codon from 0.047 to 0.021.
[0225]As a further alternative, after identification of SDRs that lead to increased specificity, these SDRs were used as templates for further randomization. Thereby, random peptide sequences were inserted that were partially randomized at each position and partially identical at each position to the original sequence.
[0226]As an example, random peptide sequences that have in approximately 1 of 3 cases the template amino acid residue and in approximately 2 of 3 cases any other amino acid residue at each position were inserted into the two SDR insertion sites of the modified human trypsin I. For this purpose, primers that contain at each nucleotide position of the SDR approximately 70% of the template bases and 30% of a mixture of the three other bases were used.
[0227]With each primer pair a PCR was performed under standard conditions using the human trypsin I gene as template. The resulting DNA was purified using the QIAquick PCR Purification Kit following the suppliers' instructions and digested with SacII and NheI. After digestion the DNA was purified and ligated into the SacII and NheI digested and dephosphorylayted vector. The ligation products were transformed into E. coli, amplified in LB containing the respective marker, and the plasmids were purified using the Qiagen Plasmid Purification Kit following the suppliers' instructions. Resulting plasmids were transformed into B. subtilis cells. These cells were then separated to single cells, grown to clones, and after expression of the protease gene screened for proteolytic activity.
[0228]The following substrates were employed for screening for proteolytic activity (SEQ ID. NOs:76 and 77):
##STR00001##
[0229]Protease variants were screened on substrate B at complexities of 106 variants by confocal fluorescence spectroscopy. The substrate was a peptide biotinylated at the N-terminus and fluorescently labeled at the C-terminus. After incubation of the peptide with supernatant of cells expressing different variants of the protease, streptavidin is added and the samples are analysed by confocal fluorimetry. The low concentration of the peptide (20 nM) leads to a preferential cleavage by proteases with a high kcat/KM value, i.e. proteases with high specificity towards the target sequence.
[0230]Variants selected in the screening procedure were further evaluated fox their specificity towards substrate B and closely related substrate A by measuring time courses of the proteolytic digestion and determining the rate constants which are proportional to the kcat/KM values. Clearly, compared to the human trypsin that was used as scaffold protein, the specific activity of variants 1 and 2 is shifted (SEQ ID NOs: 2 and 3, respectively) towards substrate B. Variant 3 (SEQ ID NO:4), on the other hand, serves as a negative control with similar activities as the human trypsin I. Sequencing of the genes of the three variants revealed the following amino acid sequences in the SDRs.
TABLE-US-00006 TABLE 2 Sequences of the two SDRs in three different variants selected for specific hydrolysis of substrate B (SEQ ID NOs: 78-83) SDR 1 SDR 2 Trypsin -- -- -- -- -- -- -- -- -- -- -- Variant 1 D A V G R D T I T N S Variant 2 N G R D L E V R G T W Variant 3 G F V M F N R S P L T
[0231]In a further experiment a pool of variants containing different numbers of SDRs per gene were screened for increased specificity using a mixture of the defined substrate and pepton as a competing substrate. Variants containing one or two SDRs per gene have been analyzed further. As a measure for the specificity the activity in the peptide cleavage assay was compared with and without the presence of the competing substrate. The concentration of the competing substrate was 10 mg/ml. Under these conditions, unspecific proteases show, compared to specific proteases, a stronger decrease in activity with increasing competitor concentrations (range between 0 and 100 mg/ml). The ratio of proteolytic activity with and without substrate is a quantitative measure for the specificity of the proteases. FIG. 9 shows the relative activities with and without competing substrate. Human trypsin I that was used as the scaffold protein and two variants, one containing only SDR2, and one containing both SDRs, were compared. The specificity of the variant with both SDRs is by a factor of 2.5 higher than that of the variant with SDR2 only, confirming that there is a direct relation between the number of SDRs and the quantitative specificity of resulting engineered proteolytic enzymes.
Example V
Generation of an Engineered Proteolytic Enzyme that Specifically Inactivates Human TNF-Alpha
[0232]Human trypsin alpha I or a derivative comprising one or more of the following amino acid substitutions E56G; R78W; Y131F; A146T; C183R was used as protein scaffold for the generation of an engineered proteolytic enzyme with high specificity towards human TNF-alpha. The identification of SDR sites in human trypsin I or derivatives thereof was done as described above. Two insertion sites within the scaffold were chosen for SDRs. The protease variants containing two inserts with different sequences and also the human trypsin I itself with no inserts were expressed in a Bacillus subtilis cells. The variant protease cells were separated to single cell clones and the protease expressing variants were screened for proteolytic activity on peptides with the desired target sequence of TNF-alpha. The activity of the protease variants was determined as the cleavage rate of a peptide with the desired target sequence of TNF-alpha in the absence and presence of competitor substrate. The specificity is expressed as the ratio of cleavage rates in the presence and absence of competitor (FIG. 14).
TABLE-US-00007 TABLE 3 Relative specificity of variants of engineered proteolytic enzymes with different SDR se- quences in absence and presence of competitor substrate (SEQ ID NOs: 84-95). k with comp./ Seq. of Seq. of k without comp. SDR 1 SDR 2 scaffold (no SDRs) 0.092 -- -- variant a 0.130 RPWDPS VHPTS variant b 0.187 GFVMFN RSPLT variant c 0.235 EIANRE RGART variant d 0.310 KAVVGT RTPIS variant e 0.374 VNIMAA TTARK variant f 0.487 AAFNGD RKDFW
[0233]The antagonistic effect of three inventive protease variants on human TNF-alpha is shown in FIG. 15. By the use of the variants, the induction of apoptosis is almost completely eliminated indicating the anti-inflammatory efficacy of the inventive proteases to initiate TNF-alpha break down. TNF-alpha has been incubated with concentrated supernatant from cultures expressing the variants i to iii for 2 hours. The resulting TNF-alpha has been incubated with non-modified cells for 4 hours. The effect of the remaining TNF-alpha activity was determined as the extent of apoptosis induction by detection of activated caspase-3 as marker for apoptotic cells. For the controls either no protease was added with the human TNF-alpha (dead cells) or buffer instead of human TNF-alpha (live cells) was used, respectively. An analogous experiment is shown in FIG. 16 using purified variant xiii. TNF-alpha was incubated with different concentrations of the purified inventive protease variant.
[0234]To demonstrate the specificity of the inventive protease variants, proteins from human blood serum or purified human TNF-alpha have been incubated with human trypsin I or the inventive engineered proteolytic enzyme variants, respectively. Here, variant x corresponds to Seq ID No: 75 comprising the same SDRs as variant f, i.e. SDRs according to Seq ID No. 89 (SDR1) and 95 (SDR2). Variants xi and xii correspond to derivatives thereof comprising the same SDR sequences. Remaining intact protein was determined as a function of time. While the variants as well as human trypsin I digest human TNF-alpha, only trypsin shows activity on serum protein (FIG. 170 a and b). This demonstrates the high TNF-alpha specificity of the inventive proteolytic enzymes and indicates their safety and accordingly their low side effects for therapeutic use.
Example VI
Generation of an Engineered Proteolytic Enzyme that Specifically Hydrolysis Human VEGF
[0235]Human trypsin I was used as protein scaffold for the generation of an engineered proteolytic enzyme with high specificity towards human VEGF. The identification of SDR sites in human trypsin I was done as described above. Two insertion sites within the scaffold were chosen for SDRs. The protease variants containing two inserts with different sequences were expressed in Bacillus subtilis cells. The variant protease cells were separated to single cell clones and the protease expressing variants were screened as described above. The activity of the protease variants was determined as the rate of VEGF cleavage. 4 μg of recombinant human VEGF165 was incubated with 0.18 μg of purified protease in PBS/pH 7.4 at room temperature. Aliquots were taken at the indicated time points and analysed on a polyacrylamide gel. The extend of cleavage was quantified by densitometric analysis of the bands. The activity is plotted over incubation time in FIG. 18. Specific cleavage was controlled by further SDS polyacrylamide gel analyses.
Sequence CWU
1
SEQUENCE LISTING
<160> NUMBER OF SEQ ID NOS: 113
<210> SEQ ID NO 1
<211> LENGTH: 224
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 1
Ile Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gln Val
1 5 10 15
Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu Ile Asn Glu
20 25 30
Gln Trp Val Val Ser Ala Gly His Cys Tyr Lys Ser Arg Ile Gln Val
35 40 45
Arg Leu Gly Glu His Asn Ile Glu Val Leu Glu Gly Asn Glu Gln Phe
50 55 60
Ile Asn Ala Ala Lys Ile Ile Arg His Pro Gln Tyr Asp Arg Lys Thr
65 70 75 80
Leu Asn Asn Asp Ile Met Leu Ile Lys Leu Ser Ser Arg Ala Val Ile
85 90 95
Asn Ala Arg Val Ser Thr Ile Ser Leu Pro Thr Ala Pro Pro Ala Thr
100 105 110
Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly Asn Thr Ala Ser Ser Gly
115 120 125
Ala Asp Tyr Pro Asp Glu Leu Gln Cys Leu Asp Ala Pro Val Leu Ser
130 135 140
Gln Ala Lys Cys Glu Ala Ser Tyr Pro Gly Lys Ile Thr Ser Asn Met
145 150 155 160
Phe Cys Val Gly Phe Leu Glu Gly Gly Lys Asp Ser Cys Gln Gly Asp
165 170 175
Ser Gly Gly Pro Val Val Cys Asn Gly Gln Leu Gln Gly Val Val Ser
180 185 190
Trp Gly Asp Gly Cys Ala Gln Lys Asn Lys Pro Gly Val Tyr Thr Lys
195 200 205
Val Tyr Asn Tyr Val Lys Trp Ile Lys Asn Thr Ile Ala Ala Asn Ser
210 215 220
<210> SEQ ID NO 2
<211> LENGTH: 235
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 2
Ile Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gln Val
1 5 10 15
Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu Ile Asn Glu
20 25 30
Gln Trp Val Val Ser Ala Gly His Cys Tyr Asp Ala Val Gly Arg Asp
35 40 45
Lys Ser Arg Ile Gln Val Arg Leu Gly Glu His Asn Ile Glu Val Leu
50 55 60
Glu Gly Asn Glu Gln Phe Ile Asn Ala Ala Lys Ile Ile Arg His Pro
65 70 75 80
Gln Tyr Asp Arg Lys Thr Leu Asn Asn Asp Ile Met Leu Ile Lys Leu
85 90 95
Ser Ser Arg Ala Val Ile Asn Ala Arg Val Ser Thr Ile Ser Leu Pro
100 105 110
Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly
115 120 125
Asn Thr Ile Thr Asn Ser Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp
130 135 140
Glu Leu Gln Cys Leu Asp Ala Pro Val Leu Ser Gln Ala Lys Cys Glu
145 150 155 160
Ala Ser Tyr Pro Gly Lys Ile Thr Ser Asn Met Phe Cys Val Gly Phe
165 170 175
Leu Glu Gly Gly Lys Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Val
180 185 190
Val Cys Asn Gly Gln Leu Gln Gly Val Val Ser Trp Gly Asp Gly Cys
195 200 205
Ala Gln Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val
210 215 220
Lys Trp Ile Lys Asn Thr Ile Ala Ala Asn Ser
225 230 235
<210> SEQ ID NO 3
<211> LENGTH: 235
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 3
Ile Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gln Val
1 5 10 15
Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu Ile Asn Glu
20 25 30
Gln Trp Val Val Ser Ala Gly His Cys Tyr Asn Gly Arg Asp Leu Glu
35 40 45
Lys Ser Arg Ile Gln Val Arg Leu Gly Glu His Asn Ile Glu Val Leu
50 55 60
Glu Gly Asn Glu Gln Phe Ile Asn Ala Ala Lys Ile Ile Arg His Pro
65 70 75 80
Gln Tyr Asp Arg Lys Thr Leu Asn Asn Asp Ile Met Leu Ile Lys Leu
85 90 95
Ser Ser Arg Ala Val Ile Asn Ala Arg Val Ser Thr Ile Ser Leu Pro
100 105 110
Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly
115 120 125
Asn Val Arg Gly Thr Trp Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp
130 135 140
Glu Leu Gln Cys Leu Asp Ala Pro Val Leu Ser Gln Ala Lys Cys Glu
145 150 155 160
Ala Ser Tyr Pro Gly Lys Ile Thr Ser Asn Met Phe Cys Val Gly Phe
165 170 175
Leu Glu Gly Gly Lys Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Val
180 185 190
Val Cys Asn Gly Gln Leu Gln Gly Val Val Ser Trp Gly Asp Gly Cys
195 200 205
Ala Gln Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val
210 215 220
Lys Trp Ile Lys Asn Thr Ile Ala Ala Asn Ser
225 230 235
<210> SEQ ID NO 4
<211> LENGTH: 235
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 4
Ile Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gln Val
1 5 10 15
Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu Ile Asn Glu
20 25 30
Gln Trp Val Val Ser Ala Gly His Cys Tyr Ala Ala Thr Asn Gly Asp
35 40 45
Lys Ser Arg Ile Gln Val Arg Leu Gly Glu His Asn Ile Glu Val Leu
50 55 60
Glu Gly Asn Glu Gln Phe Ile Asn Ala Ala Lys Ile Ile Arg His Pro
65 70 75 80
Gln Tyr Asp Arg Lys Thr Leu Asn Asn Asp Ile Met Leu Ile Lys Leu
85 90 95
Ser Ser Arg Ala Val Ile Asn Ala Arg Val Ser Thr Ile Ser Leu Pro
100 105 110
Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly
115 120 125
Asn Arg Lys Asp Phe Trp Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp
130 135 140
Glu Leu Gln Cys Leu Asp Ala Pro Val Leu Ser Gln Ala Lys Cys Glu
145 150 155 160
Ala Ser Tyr Pro Gly Lys Ile Thr Ser Asn Met Phe Cys Val Gly Phe
165 170 175
Leu Glu Gly Gly Lys Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Val
180 185 190
Val Cys Asn Gly Gln Leu Gln Gly Val Val Ser Trp Gly Asp Gly Cys
195 200 205
Ala Gln Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val
210 215 220
Lys Trp Ile Lys Asn Thr Ile Ala Ala Asn Ser
225 230 235
<210> SEQ ID NO 5
<211> LENGTH: 259
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 5
Ile Val Glu Gly Ser Asp Ala Glu Ile Gly Met Ser Pro Trp Gln Val
1 5 10 15
Met Leu Phe Arg Lys Ser Pro Gln Glu Leu Leu Cys Gly Ala Ser Leu
20 25 30
Ile Ser Asp Arg Trp Val Leu Thr Ala Ala His Cys Leu Leu Tyr Pro
35 40 45
Pro Trp Asp Lys Asn Phe Thr Glu Asn Asp Leu Leu Val Arg Ile Gly
50 55 60
Lys His Ser Arg Thr Arg Tyr Glu Arg Asn Ile Glu Lys Ile Ser Met
65 70 75 80
Leu Glu Lys Ile Tyr Ile His Pro Arg Tyr Asn Trp Arg Glu Asn Leu
85 90 95
Asp Arg Asp Ile Ala Leu Met Lys Leu Lys Lys Pro Val Ala Phe Ser
100 105 110
Asp Tyr Ile His Pro Val Cys Leu Pro Asp Arg Glu Thr Ala Ala Ser
115 120 125
Leu Leu Gln Ala Gly Tyr Lys Gly Arg Val Thr Gly Trp Gly Asn Leu
130 135 140
Lys Glu Thr Trp Thr Ala Asn Val Gly Lys Gly Gln Pro Ser Val Leu
145 150 155 160
Gln Val Val Asn Leu Pro Ile Val Glu Arg Pro Val Cys Lys Asp Ser
165 170 175
Thr Arg Ile Arg Ile Thr Asp Asn Met Phe Cys Ala Gly Tyr Lys Pro
180 185 190
Asp Glu Gly Lys Arg Gly Asp Ala Cys Glu Gly Asp Ser Gly Gly Pro
195 200 205
Phe Val Met Lys Ser Pro Phe Asn Asn Arg Trp Tyr Gln Met Gly Ile
210 215 220
Val Ser Trp Gly Glu Gly Cys Asp Arg Asp Gly Lys Tyr Gly Phe Tyr
225 230 235 240
Thr His Val Phe Arg Leu Lys Lys Trp Ile Gln Lys Val Ile Asp Gln
245 250 255
Phe Gly Glu
<210> SEQ ID NO 6
<211> LENGTH: 235
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 6
Ile Val Gly Gly Ser Asn Ala Lys Glu Gly Ala Trp Pro Trp Val Val
1 5 10 15
Gly Leu Tyr Tyr Gly Gly Arg Leu Leu Cys Gly Ala Ser Leu Val Ser
20 25 30
Ser Asp Trp Leu Val Ser Ala Ala His Cys Val Tyr Gly Arg Asn Leu
35 40 45
Glu Pro Ser Lys Trp Thr Ala Ile Leu Gly Leu His Met Lys Ser Asn
50 55 60
Leu Thr Ser Pro Gln Thr Val Pro Arg Leu Ile Asp Glu Ile Val Ile
65 70 75 80
Asn Pro His Tyr Asn Arg Arg Arg Lys Asp Asn Asp Ile Ala Met Met
85 90 95
His Leu Glu Phe Lys Val Asn Tyr Thr Asp Tyr Ile Gln Pro Ile Cys
100 105 110
Leu Pro Glu Glu Asn Gln Val Phe Pro Pro Gly Arg Asn Cys Ser Ile
115 120 125
Ala Gly Trp Gly Thr Val Val Tyr Gln Gly Thr Thr Ala Asn Ile Leu
130 135 140
Gln Glu Ala Asp Val Pro Leu Leu Ser Asn Glu Arg Cys Gln Gln Gln
145 150 155 160
Met Pro Glu Tyr Asn Ile Thr Glu Asn Met Ile Cys Ala Gly Tyr Glu
165 170 175
Glu Gly Gly Ile Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Leu Met
180 185 190
Cys Gln Glu Asn Asn Arg Trp Phe Leu Ala Gly Val Thr Ser Phe Gly
195 200 205
Tyr Lys Cys Ala Leu Pro Asn Arg Pro Gly Val Tyr Ala Arg Val Ser
210 215 220
Arg Phe Thr Glu Trp Ile Gln Ser Phe Leu His
225 230 235
<210> SEQ ID NO 7
<211> LENGTH: 275
<212> TYPE: PRT
<213> ORGANISM: Bacillus subtilis
<400> SEQUENCE: 7
Ile Ala His Glu Tyr Ala Gln Ser Val Pro Tyr Gly Ile Ser Gln Ile
1 5 10 15
Lys Ala Pro Ala Leu His Ser Gln Gly Tyr Thr Gly Ser Asn Val Lys
20 25 30
Val Ala Val Ile Asp Ser Gly Ile Asp Ser Ser His Pro Asp Leu Asn
35 40 45
Val Arg Gly Gly Ala Ser Phe Val Pro Ser Glu Thr Asn Pro Tyr Gln
50 55 60
Asp Gly Ser Ser His Gly Thr His Val Ala Gly Thr Ile Ala Ala Leu
65 70 75 80
Asn Asn Ser Ile Gly Val Leu Gly Val Ser Pro Ser Ala Ser Leu Tyr
85 90 95
Ala Val Lys Val Leu Asp Ser Thr Gly Ser Gly Gln Tyr Ser Trp Ile
100 105 110
Ile Asn Gly Ile Glu Trp Ala Ile Ser Asn Asn Met Asp Val Ile Asn
115 120 125
Met Ser Leu Gly Gly Pro Thr Gly Ser Thr Ala Leu Lys Thr Val Val
130 135 140
Asp Lys Ala Val Ser Ser Gly Ile Val Val Ala Ala Ala Ala Gly Asn
145 150 155 160
Glu Gly Ser Ser Gly Ser Thr Ser Thr Val Gly Tyr Pro Ala Lys Tyr
165 170 175
Pro Ser Thr Ile Ala Val Gly Ala Val Asn Ser Ser Asn Gln Arg Ala
180 185 190
Ser Phe Ser Ser Ala Gly Ser Glu Leu Asp Val Met Ala Pro Gly Val
195 200 205
Ser Ile Gln Ser Thr Leu Pro Gly Gly Thr Tyr Gly Ala Tyr Asn Gly
210 215 220
Thr Ser Met Ala Thr Pro His Val Ala Gly Ala Ala Ala Leu Ile Leu
225 230 235 240
Ser Lys His Pro Thr Trp Thr Asn Ala Gln Val Arg Asp Arg Leu Glu
245 250 255
Ser Thr Ala Thr Tyr Leu Gly Asn Ser Phe Tyr Tyr Gly Lys Gly Leu
260 265 270
Ile Asn Val
275
<210> SEQ ID NO 8
<211> LENGTH: 320
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 8
Val Ala Lys Arg Arg Ala Lys Arg Asp Val Tyr Gln Glu Pro Thr Asp
1 5 10 15
Pro Lys Phe Pro Gln Gln Trp Tyr Leu Ser Gly Val Thr Gln Arg Asp
20 25 30
Leu Asn Val Lys Glu Ala Trp Ala Gln Gly Phe Thr Gly His Gly Ile
35 40 45
Val Val Ser Ile Leu Asp Asp Gly Ile Glu Lys Asn His Pro Asp Leu
50 55 60
Ala Gly Asn Tyr Asp Pro Gly Ala Ser Phe Asp Val Asn Asp Gln Asp
65 70 75 80
Pro Asp Pro Gln Pro Arg Tyr Thr Gln Met Asn Asp Asn Arg His Gly
85 90 95
Thr Arg Cys Ala Gly Glu Val Ala Ala Val Ala Asn Asn Gly Val Cys
100 105 110
Gly Val Gly Val Ala Tyr Asn Ala Arg Ile Gly Gly Val Arg Met Leu
115 120 125
Asp Gly Glu Val Thr Asp Ala Val Glu Ala Arg Ser Leu Gly Leu Asn
130 135 140
Pro Asn His Ile His Ile Tyr Ser Ala Ser Trp Gly Pro Glu Asp Asp
145 150 155 160
Gly Lys Thr Val Asp Gly Pro Ala Arg Leu Ala Glu Glu Ala Phe Phe
165 170 175
Arg Gly Val Ser Gln Gly Arg Gly Gly Leu Gly Ser Ile Phe Val Trp
180 185 190
Ala Ser Gly Asn Gly Gly Arg Glu His Asp Ser Cys Asn Cys Asp Gly
195 200 205
Tyr Thr Asn Ser Ile Tyr Thr Leu Ser Ile Ser Ser Ala Thr Gln Phe
210 215 220
Gly Asn Val Pro Trp Tyr Ser Glu Ala Cys Ser Ser Thr Leu Ala Thr
225 230 235 240
Thr Tyr Ser Ser Gly Asn Gln Asn Glu Lys Gln Ile Val Thr Thr Asp
245 250 255
Leu Arg Gln Lys Cys Thr Glu Ser His Thr Gly Thr Ser Ala Ser Ala
260 265 270
Pro Leu Ala Ala Gly Ile Ile Ala Leu Thr Leu Glu Ala Asn Lys Asn
275 280 285
Leu Thr Trp Arg Asp Met Gln His Leu Val Val Gln Thr Ser Lys Pro
290 295 300
Ala His Leu Asn Ala Asp Asp Trp Ala Thr Asn Gly Val Gly Arg Lys
305 310 315 320
<210> SEQ ID NO 9
<211> LENGTH: 330
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 9
Glu Lys Glu Arg Ser Lys Arg Ser Ala Leu Arg Asp Ser Ala Leu Asn
1 5 10 15
Leu Phe Asn Asp Pro Met Trp Asn Gln Gln Trp Tyr Leu Gln Asp Thr
20 25 30
Arg Met Thr Ala Ala Leu Pro Lys Leu Asp Leu His Val Ile Pro Val
35 40 45
Trp Gln Lys Gly Ile Thr Gly Lys Gly Val Val Ile Thr Val Leu Asp
50 55 60
Asp Gly Leu Glu Trp Asn His Thr Asp Ile Tyr Ala Asn Tyr Asp Pro
65 70 75 80
Glu Ala Ser Tyr Asp Phe Asn Asp Asn Asp His Asp Pro Phe Pro Arg
85 90 95
Tyr Asp Pro Thr Asn Glu Asn Lys His Gly Thr Arg Cys Ala Gly Glu
100 105 110
Ile Ala Met Gln Ala Asn Asn His Lys Cys Gly Val Gly Val Ala Tyr
115 120 125
Asn Ser Lys Val Gly Gly Ile Arg Met Leu Asp Gly Ile Val Thr Asp
130 135 140
Ala Ile Glu Ala Ser Ser Ile Gly Phe Asn Pro Gly His Val Asp Ile
145 150 155 160
Tyr Ser Ala Ser Trp Gly Pro Asn Asp Asp Gly Lys Thr Val Glu Gly
165 170 175
Pro Gly Arg Leu Ala Gln Lys Ala Phe Glu Tyr Gly Val Lys Gln Gly
180 185 190
Arg Gln Gly Lys Gly Ser Ile Phe Val Trp Ala Ser Gly Asn Gly Gly
195 200 205
Arg Gln Gly Asp Asn Cys Asp Cys Asp Gly Tyr Thr Asp Ser Ile Tyr
210 215 220
Thr Ile Ser Ile Ser Ser Ala Ser Gln Gln Gly Leu Ser Pro Trp Tyr
225 230 235 240
Ala Glu Lys Cys Ser Ser Thr Leu Ala Thr Ser Tyr Ser Ser Gly Asp
245 250 255
Tyr Thr Asp Gln Arg Ile Thr Ser Ala Asp Leu His Asn Asp Cys Thr
260 265 270
Glu Thr His Thr Gly Thr Ser Ala Ser Ala Pro Leu Ala Ala Gly Ile
275 280 285
Phe Ala Leu Ala Leu Glu Ala Asn Pro Asn Leu Thr Trp Arg Asp Met
290 295 300
Gln His Leu Val Val Trp Thr Ser Glu Tyr Asp Pro Leu Ala Asn Asn
305 310 315 320
Pro Gly Trp Lys Lys Asn Gly Ala Gly Leu
325 330
<210> SEQ ID NO 10
<211> LENGTH: 297
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 10
Asn Thr His Pro Cys Gln Ser Asp Met Asn Ile Glu Gly Ala Trp Lys
1 5 10 15
Arg Gly Tyr Thr Gly Lys Asn Ile Val Val Thr Ile Leu Asp Asp Gly
20 25 30
Ile Glu Arg Thr His Pro Asp Leu Met Gln Asn Tyr Asp Ala Leu Ala
35 40 45
Ser Cys Asp Val Asn Gly Asn Asp Leu Asp Pro Met Pro Arg Tyr Asp
50 55 60
Ala Ser Asn Glu Asn Lys His Gly Thr Arg Cys Ala Gly Glu Val Ala
65 70 75 80
Ala Ala Ala Asn Asn Ser His Cys Thr Val Gly Ile Ala Phe Asn Ala
85 90 95
Lys Ile Gly Gly Val Arg Met Leu Asp Gly Asp Val Thr Asp Met Val
100 105 110
Glu Ala Lys Ser Val Ser Phe Asn Pro Gln His Val His Ile Tyr Ser
115 120 125
Ala Ser Trp Gly Pro Asp Asp Asp Gly Lys Thr Val Asp Gly Pro Ala
130 135 140
Pro Leu Thr Arg Gln Ala Phe Glu Asn Gly Val Arg Met Gly Arg Arg
145 150 155 160
Gly Leu Gly Ser Val Phe Val Trp Ala Ser Gly Asn Gly Gly Arg Ser
165 170 175
Lys Asp His Cys Ser Cys Asp Gly Tyr Thr Asn Ser Ile Tyr Thr Ile
180 185 190
Ser Ile Ser Ser Thr Ala Glu Ser Gly Lys Lys Pro Trp Tyr Leu Glu
195 200 205
Glu Cys Ser Ser Thr Leu Ala Thr Thr Tyr Ser Ser Gly Glu Ser Tyr
210 215 220
Asp Lys Lys Ile Ile Thr Thr Asp Leu Arg Gln Arg Cys Thr Asp Asn
225 230 235 240
His Thr Gly Thr Ser Ala Ser Ala Pro Met Ala Ala Gly Ile Ile Ala
245 250 255
Leu Ala Leu Glu Ala Asn Pro Phe Leu Thr Trp Arg Asp Val Gln His
260 265 270
Val Ile Val Arg Thr Ser Arg Ala Gly His Leu Asn Ala Asn Asp Trp
275 280 285
Lys Thr Asn Ala Ala Gly Phe Lys Val
290 295
<210> SEQ ID NO 11
<211> LENGTH: 328
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 11
Thr Leu Val Asp Glu Gln Pro Leu Glu Asn Tyr Leu Asp Met Glu Tyr
1 5 10 15
Phe Gly Thr Ile Gly Ile Gly Thr Pro Ala Gln Asp Phe Thr Val Val
20 25 30
Phe Asp Thr Gly Ser Ser Asn Leu Trp Val Pro Ser Val Tyr Cys Ser
35 40 45
Ser Leu Ala Cys Thr Asn His Asn Arg Phe Asn Pro Glu Asp Ser Ser
50 55 60
Thr Tyr Gln Ser Thr Ser Glu Thr Val Ser Ile Thr Tyr Gly Thr Gly
65 70 75 80
Ser Met Thr Gly Ile Leu Gly Tyr Asp Thr Val Gln Val Gly Gly Ile
85 90 95
Ser Asp Thr Asn Gln Ile Phe Gly Leu Ser Glu Thr Glu Pro Gly Ser
100 105 110
Phe Leu Tyr Tyr Ala Pro Phe Asp Gly Ile Leu Gly Leu Ala Tyr Pro
115 120 125
Ser Ile Ser Ser Ser Gly Ala Thr Pro Val Phe Asp Asn Ile Trp Asn
130 135 140
Gln Gly Leu Val Ser Gln Asp Leu Phe Ser Val Tyr Leu Ser Ala Asp
145 150 155 160
Asp Lys Ser Gly Ser Val Val Ile Phe Gly Gly Ile Asp Ser Ser Tyr
165 170 175
Tyr Thr Gly Ser Leu Asn Trp Val Pro Val Thr Val Glu Gly Tyr Trp
180 185 190
Gln Ile Thr Val Asp Ser Ile Thr Met Asn Gly Glu Thr Ile Ala Cys
195 200 205
Ala Glu Gly Cys Gln Ala Ile Val Asp Thr Gly Thr Ser Leu Leu Thr
210 215 220
Gly Pro Thr Ser Pro Ile Ala Asn Ile Gln Ser Asp Ile Gly Ala Ser
225 230 235 240
Glu Asn Ser Asp Gly Asp Met Val Val Ser Cys Ser Ala Ile Ser Ser
245 250 255
Leu Pro Asp Ile Val Phe Thr Ile Asn Gly Val Gln Tyr Pro Val Pro
260 265 270
Pro Ser Ala Tyr Ile Leu Gln Ser Glu Gly Ser Cys Ile Ser Gly Phe
275 280 285
Gln Gly Met Asn Val Pro Thr Glu Ser Gly Glu Leu Trp Ile Leu Gly
290 295 300
Asp Val Phe Ile Arg Gln Tyr Phe Thr Val Phe Asp Arg Ala Asn Asn
305 310 315 320
Gln Val Gly Leu Ala Pro Val Ala
325
<210> SEQ ID NO 12
<211> LENGTH: 358
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 12
Glu Met Val Asp Asn Leu Arg Gly Lys Ser Gly Gln Gly Tyr Tyr Val
1 5 10 15
Glu Met Thr Val Gly Ser Pro Pro Gln Thr Leu Asn Ile Leu Val Asp
20 25 30
Thr Gly Ser Ser Asn Phe Ala Val Gly Ala Ala Pro His Pro Phe Leu
35 40 45
His Arg Tyr Tyr Gln Arg Gln Leu Ser Ser Thr Tyr Arg Asp Leu Arg
50 55 60
Lys Gly Val Tyr Val Pro Tyr Thr Gln Gly Lys Trp Glu Gly Glu Leu
65 70 75 80
Gly Thr Asp Leu Val Ser Ile Pro His Gly Pro Asn Val Thr Val Arg
85 90 95
Ala Asn Ile Ala Ala Ile Thr Glu Ser Asp Lys Phe Phe Ile Asn Gly
100 105 110
Ser Asn Trp Glu Gly Ile Leu Gly Leu Ala Tyr Ala Glu Ile Ala Arg
115 120 125
Pro Asp Asp Ser Leu Glu Pro Phe Phe Asp Ser Leu Val Lys Gln Thr
130 135 140
His Val Pro Asn Leu Phe Ser Leu Gln Leu Cys Gly Ala Gly Phe Pro
145 150 155 160
Leu Asn Gln Ser Glu Val Leu Ala Ser Val Gly Gly Ser Met Ile Ile
165 170 175
Gly Gly Ile Asp His Ser Leu Tyr Thr Gly Ser Leu Trp Tyr Thr Pro
180 185 190
Ile Arg Arg Glu Trp Tyr Tyr Glu Val Ile Ile Val Arg Val Glu Ile
195 200 205
Asn Gly Gln Asp Leu Lys Met Asp Cys Lys Glu Tyr Asn Tyr Asp Lys
210 215 220
Ser Ile Val Asp Ser Gly Thr Thr Asn Leu Arg Leu Pro Lys Lys Val
225 230 235 240
Phe Glu Ala Ala Val Lys Ser Ile Lys Ala Ala Ser Ser Thr Glu Lys
245 250 255
Phe Pro Asp Gly Phe Trp Leu Gly Glu Gln Leu Val Cys Trp Gln Ala
260 265 270
Gly Thr Thr Pro Trp Asn Ile Phe Pro Val Ile Ser Leu Tyr Leu Met
275 280 285
Gly Glu Val Thr Asn Gln Ser Phe Arg Ile Thr Ile Leu Pro Gln Gln
290 295 300
Tyr Leu Arg Pro Val Glu Asp Val Ala Thr Ser Gln Asp Asp Cys Tyr
305 310 315 320
Lys Phe Ala Ile Ser Gln Ser Ser Thr Gly Thr Val Met Gly Ala Val
325 330 335
Ile Met Glu Gly Phe Tyr Val Val Phe Asp Arg Ala Arg Lys Arg Ile
340 345 350
Gly Phe Ala Val Ser Ala
355
<210> SEQ ID NO 13
<211> LENGTH: 351
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 13
Pro Ala Val Thr Glu Gly Pro Ile Pro Glu Val Leu Lys Asn Tyr Met
1 5 10 15
Asp Ala Gln Tyr Tyr Gly Glu Ile Gly Ile Gly Thr Pro Pro Gln Cys
20 25 30
Phe Thr Val Val Phe Asp Thr Gly Ser Ser Asn Leu Trp Val Pro Ser
35 40 45
Ile His Cys Lys Leu Leu Asp Ile Ala Cys Trp Ile His His Lys Tyr
50 55 60
Asn Ser Asp Lys Ser Ser Thr Tyr Val Lys Asn Gly Thr Ser Phe Asp
65 70 75 80
Ile His Tyr Gly Ser Gly Ser Leu Ser Gly Tyr Leu Ser Gln Asp Thr
85 90 95
Val Ser Val Pro Cys Gln Ser Ala Ser Ser Ala Ser Ala Leu Gly Gly
100 105 110
Val Lys Val Glu Arg Gln Val Phe Gly Glu Ala Thr Lys Gln Pro Gly
115 120 125
Ile Thr Phe Ile Ala Ala Lys Phe Asp Gly Ile Leu Gly Met Ala Tyr
130 135 140
Pro Arg Ile Ser Val Asn Asn Val Leu Pro Val Phe Asp Asn Leu Met
145 150 155 160
Gln Gln Lys Leu Val Asp Gln Asn Ile Phe Ser Phe Tyr Leu Ser Arg
165 170 175
Asp Pro Asp Ala Gln Pro Gly Gly Glu Leu Met Leu Gly Gly Thr Asp
180 185 190
Ser Lys Tyr Tyr Lys Gly Ser Leu Ser Tyr Leu Asn Val Thr Arg Lys
195 200 205
Ala Tyr Trp Gln Val His Leu Asp Gln Val Glu Val Ala Ser Gly Leu
210 215 220
Thr Leu Cys Lys Glu Gly Cys Glu Ala Ile Val Asp Thr Gly Thr Ser
225 230 235 240
Leu Met Val Gly Pro Val Asp Glu Val Arg Glu Leu Gln Lys Ala Ile
245 250 255
Gly Ala Val Pro Leu Ile Gln Gly Glu Tyr Met Ile Pro Cys Glu Lys
260 265 270
Val Ser Thr Leu Pro Ala Ile Thr Leu Lys Leu Gly Gly Lys Gly Tyr
275 280 285
Lys Leu Ser Pro Glu Asp Tyr Thr Leu Lys Val Ser Gln Ala Gly Lys
290 295 300
Thr Leu Cys Leu Ser Gly Phe Met Gly Met Asp Ile Pro Pro Pro Ser
305 310 315 320
Gly Pro Leu Trp Ile Leu Gly Asp Val Phe Ile Gly Arg Tyr Tyr Thr
325 330 335
Val Phe Asp Arg Asp Asn Asn Arg Val Gly Phe Ala Glu Ala Ala
340 345 350
<210> SEQ ID NO 14
<211> LENGTH: 305
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 14
Met Leu Glu Ala Asp Asp Gln Gly Cys Ile Glu Glu Gln Gly Val Glu
1 5 10 15
Asp Ser Ala Asn Glu Asp Ser Val Asp Ala Lys Pro Asp Arg Ser Ser
20 25 30
Phe Val Pro Ser Leu Phe Ser Lys Lys Lys Lys Asn Val Thr Met Arg
35 40 45
Ser Ile Lys Thr Thr Arg Asp Arg Val Pro Thr Tyr Gln Tyr Asn Met
50 55 60
Asn Phe Glu Lys Leu Gly Lys Cys Ile Ile Ile Asn Asn Lys Asn Phe
65 70 75 80
Asp Lys Val Thr Gly Met Gly Val Arg Asn Gly Thr Asp Lys Asp Ala
85 90 95
Glu Ala Leu Phe Lys Cys Phe Arg Ser Leu Gly Phe Asp Val Ile Val
100 105 110
Tyr Asn Asp Cys Ser Cys Ala Lys Met Gln Asp Leu Leu Lys Lys Ala
115 120 125
Ser Glu Glu Asp His Thr Asn Ala Ala Cys Phe Ala Cys Ile Leu Leu
130 135 140
Ser His Gly Glu Glu Asn Val Ile Tyr Gly Lys Asp Gly Val Thr Pro
145 150 155 160
Ile Lys Asp Leu Thr Ala His Phe Arg Gly Asp Arg Ser Lys Thr Leu
165 170 175
Leu Glu Lys Pro Lys Leu Phe Phe Ile Gln Ala Cys Arg Gly Thr Glu
180 185 190
Leu Asp Asp Gly Ile Gln Ala Asp Ser Gly Pro Ile Asn Asp Thr Asp
195 200 205
Ala Asn Pro Arg Tyr Lys Ile Pro Val Glu Ala Asp Phe Leu Phe Ala
210 215 220
Tyr Ser Thr Val Pro Gly Tyr Tyr Ser Trp Arg Ser Pro Gly Arg Gly
225 230 235 240
Ser Trp Phe Val Gln Ala Leu Cys Ser Ile Leu Glu Glu His Gly Lys
245 250 255
Asp Leu Glu Ile Met Gln Ile Leu Thr Arg Val Asn Asp Arg Val Ala
260 265 270
Arg His Phe Glu Ser Gln Ser Asp Asp Pro His Phe His Glu Lys Lys
275 280 285
Gln Ile Pro Cys Val Val Ser Met Leu Thr Lys Glu Leu Tyr Phe Ser
290 295 300
Gln
305
<210> SEQ ID NO 15
<211> LENGTH: 262
<212> TYPE: PRT
<213> ORGANISM: Streptomyces sp. K15
<400> SEQUENCE: 15
Val Thr Lys Pro Thr Ile Ala Ala Val Gly Gly Tyr Ala Met Asn Asn
1 5 10 15
Gly Thr Gly Thr Thr Leu Tyr Thr Lys Ala Ala Asp Thr Arg Arg Ser
20 25 30
Thr Gly Ser Thr Thr Lys Ile Met Thr Ala Lys Val Val Leu Ala Gln
35 40 45
Ser Asn Leu Asn Leu Asp Ala Lys Val Thr Ile Gln Lys Ala Tyr Ser
50 55 60
Asp Tyr Val Val Ala Asn Asn Ala Ser Gln Ala His Leu Ile Val Gly
65 70 75 80
Asp Lys Val Thr Val Arg Gln Leu Leu Tyr Gly Leu Met Leu Pro Ser
85 90 95
Gly Cys Asp Ala Ala Tyr Ala Leu Ala Asp Lys Tyr Gly Ser Gly Ser
100 105 110
Thr Arg Ala Ala Arg Val Lys Ser Phe Ile Gly Lys Met Asn Thr Ala
115 120 125
Ala Thr Asn Leu Gly Leu His Asn Thr His Phe Asp Ser Phe Asp Gly
130 135 140
Ile Gly Asn Gly Ala Asn Tyr Ser Thr Pro Arg Asp Leu Thr Lys Ile
145 150 155 160
Ala Ser Ser Ala Met Lys Asn Ser Thr Phe Arg Thr Val Val Lys Thr
165 170 175
Lys Ala Tyr Thr Ala Lys Thr Val Thr Lys Thr Gly Ser Ile Arg Thr
180 185 190
Met Asp Thr Trp Lys Asn Thr Asn Gly Leu Leu Ser Ser Tyr Ser Gly
195 200 205
Ala Ile Gly Val Lys Thr Gly Ser Gly Pro Glu Ala Lys Tyr Cys Leu
210 215 220
Val Phe Ala Ala Thr Arg Gly Gly Lys Thr Val Ile Gly Thr Val Leu
225 230 235 240
Ala Ser Thr Ser Ile Pro Ala Arg Glu Ser Asp Ala Thr Lys Ile Met
245 250 255
Asn Tyr Gly Phe Ala Leu
260
<210> SEQ ID NO 16
<211> LENGTH: 256
<212> TYPE: PRT
<213> ORGANISM: Human cytomegalovirus
<400> SEQUENCE: 16
Met Thr Met Asp Glu Gln Gln Ser Gln Ala Val Ala Pro Val Tyr Val
1 5 10 15
Gly Gly Phe Leu Ala Arg Tyr Asp Gln Ser Pro Asp Glu Ala Glu Leu
20 25 30
Leu Leu Pro Arg Asp Val Val Glu His Trp Leu His Ala Gln Gly Gln
35 40 45
Gly Gln Pro Ser Leu Ser Val Ala Leu Pro Leu Asn Ile Asn His Asp
50 55 60
Asp Thr Ala Val Val Gly His Val Ala Ala Met Gln Ser Val Arg Asp
65 70 75 80
Gly Leu Phe Cys Leu Gly Cys Val Thr Ser Pro Arg Phe Leu Glu Ile
85 90 95
Val Arg Arg Ala Ser Glu Lys Ser Glu Leu Val Ser Arg Gly Pro Val
100 105 110
Ser Pro Leu Gln Pro Asp Lys Val Val Glu Phe Leu Ser Gly Ser Tyr
115 120 125
Ala Gly Leu Ser Leu Ser Ser Arg Arg Cys Asp Asp Val Glu Gln Ala
130 135 140
Thr Ser Leu Ser Gly Ser Glu Thr Thr Pro Phe Lys His Val Ala Leu
145 150 155 160
Cys Ser Val Gly Arg Arg Arg Gly Thr Leu Ala Val Tyr Gly Arg Asp
165 170 175
Pro Glu Trp Val Thr Gln Arg Phe Pro Asp Leu Thr Ala Ala Asp Arg
180 185 190
Asp Gly Leu Arg Ala Gln Trp Gln Arg Cys Gly Ser Thr Ala Val Asp
195 200 205
Ala Ser Gly Asp Pro Phe Arg Ser Asp Ser Tyr Gly Leu Leu Gly Asn
210 215 220
Ser Val Asp Ala Leu Tyr Ile Arg Glu Arg Leu Pro Lys Leu Arg Tyr
225 230 235 240
Asp Lys Gln Leu Val Gly Val Thr Glu Arg Glu Ser Tyr Val Lys Ala
245 250 255
<210> SEQ ID NO 17
<211> LENGTH: 248
<212> TYPE: PRT
<213> ORGANISM: Escherichia coli
<400> SEQUENCE: 17
Val Arg Ser Phe Ile Tyr Glu Pro Phe Gln Ile Pro Ser Gly Ser Met
1 5 10 15
Met Pro Thr Leu Leu Ile Gly Asp Phe Ile Leu Val Glu Lys Phe Ala
20 25 30
Tyr Gly Ile Lys Asp Pro Ile Tyr Gln Lys Thr Leu Ile Glu Thr Gly
35 40 45
His Pro Lys Arg Gly Asp Ile Val Val Phe Lys Tyr Pro Glu Asp Pro
50 55 60
Lys Leu Asp Tyr Ile Lys Arg Ala Val Gly Leu Pro Gly Asp Lys Val
65 70 75 80
Thr Tyr Asp Pro Val Ser Lys Glu Leu Thr Ile Gln Pro Gly Cys Ser
85 90 95
Ser Gly Gln Ala Cys Glu Asn Ala Leu Pro Val Thr Tyr Ser Asn Val
100 105 110
Glu Pro Ser Asp Phe Val Gln Thr Phe Ser Arg Arg Asn Gly Gly Glu
115 120 125
Ala Thr Ser Gly Phe Phe Glu Val Pro Lys Asn Glu Thr Lys Glu Asn
130 135 140
Gly Ile Arg Leu Ser Glu Arg Lys Glu Thr Leu Gly Asp Val Thr His
145 150 155 160
Arg Ile Leu Thr Val Pro Ile Ala Gln Asp Gln Val Gly Met Tyr Tyr
165 170 175
Gln Gln Pro Gly Gln Gln Leu Ala Thr Trp Ile Val Pro Pro Gly Gln
180 185 190
Tyr Phe Met Met Gly Asp Asn Arg Asp Asn Ser Ala Asp Ser Arg Tyr
195 200 205
Trp Gly Phe Val Pro Glu Ala Asn Leu Val Gly Arg Ala Thr Ala Ile
210 215 220
Trp Met Ser Phe Asp Lys Gln Glu Gly Glu Trp Pro Thr Gly Leu Arg
225 230 235 240
Leu Ser Arg Ile Gly Gly Ile His
245
<210> SEQ ID NO 18
<211> LENGTH: 317
<212> TYPE: PRT
<213> ORGANISM: Serratia marcescens
<400> SEQUENCE: 18
Met Glu Gln Leu Arg Gly Leu Tyr Pro Pro Leu Ala Ala Tyr Asp Ser
1 5 10 15
Gly Trp Leu Asp Thr Gly Asp Gly His Arg Ile Tyr Trp Glu Leu Ser
20 25 30
Gly Asn Pro Asn Gly Lys Pro Ala Val Phe Ile His Gly Gly Pro Gly
35 40 45
Gly Gly Ile Ser Pro His His Arg Gln Leu Phe Asp Pro Glu Arg Tyr
50 55 60
Lys Val Leu Leu Phe Asp Gln Arg Gly Cys Gly Arg Ser Arg Pro His
65 70 75 80
Ala Ser Leu Asp Asn Asn Thr Thr Trp His Leu Val Ala Asp Ile Glu
85 90 95
Arg Leu Arg Glu Met Ala Gly Val Glu Gln Trp Leu Val Phe Gly Gly
100 105 110
Ser Trp Gly Ser Thr Leu Ala Leu Ala Tyr Ala Gln Thr His Pro Glu
115 120 125
Arg Val Ser Glu Met Val Leu Arg Gly Ile Phe Thr Leu Arg Lys Gln
130 135 140
Arg Leu His Trp Tyr Tyr Gln Asp Gly Ala Ser Arg Phe Phe Pro Glu
145 150 155 160
Lys Trp Glu Arg Val Leu Ser Ile Leu Ser Asp Asp Glu Arg Lys Asp
165 170 175
Val Ile Ala Ala Tyr Arg Gln Arg Leu Thr Ser Ala Asp Pro Gln Val
180 185 190
Gln Leu Glu Ala Ala Lys Leu Trp Ser Val Trp Glu Gly Glu Thr Val
195 200 205
Thr Leu Leu Pro Ser Arg Glu Ser Ala Ser Phe Gly Glu Asp Asp Phe
210 215 220
Ala Leu Ala Phe Ala Arg Ile Glu Asn His Tyr Phe Thr His Leu Gly
225 230 235 240
Phe Leu Glu Ser Asp Asp Gln Leu Leu Arg Asn Val Pro Leu Ile Arg
245 250 255
His Ile Pro Ala Val Ile Val His Gly Arg Tyr Asp Met Ala Cys Gln
260 265 270
Val Gln Asn Ala Trp Asp Leu Ala Lys Ala Trp Pro Glu Ala Glu Leu
275 280 285
His Ile Val Glu Gly Ala Gly His Ser Tyr Asp Glu Pro Gly Ile Leu
290 295 300
His Gln Leu Met Ile Ala Thr Asp Arg Phe Ala Gly Lys
305 310 315
<210> SEQ ID NO 19
<211> LENGTH: 229
<212> TYPE: PRT
<213> ORGANISM: Escherichia coli
<400> SEQUENCE: 19
Met Glu Leu Leu Leu Leu Ser Asn Ser Thr Leu Pro Gly Lys Ala Trp
1 5 10 15
Leu Glu His Ala Leu Pro Leu Ile Ala Asn Gln Leu Asn Gly Arg Arg
20 25 30
Ser Ala Val Phe Ile Pro Phe Ala Gly Val Thr Gln Thr Trp Asp Glu
35 40 45
Tyr Thr Asp Lys Thr Ala Glu Val Leu Ala Pro Leu Gly Val Asn Val
50 55 60
Thr Gly Ile His Arg Val Ala Asp Pro Leu Ala Ala Ile Glu Lys Ala
65 70 75 80
Glu Ile Ile Ile Val Gly Gly Gly Asn Thr Phe Gln Leu Leu Lys Glu
85 90 95
Ser Arg Glu Arg Gly Leu Leu Ala Pro Met Ala Asp Arg Val Lys Arg
100 105 110
Gly Ala Leu Tyr Ile Gly Trp Ser Ala Gly Ala Asn Leu Ala Cys Pro
115 120 125
Thr Ile Arg Thr Thr Asn Asp Met Pro Ile Val Asp Pro Asn Gly Phe
130 135 140
Asp Ala Leu Asp Leu Phe Pro Leu Gln Ile Asn Pro His Phe Thr Asn
145 150 155 160
Ala Leu Pro Glu Gly His Lys Gly Glu Thr Arg Glu Gln Arg Ile Arg
165 170 175
Glu Leu Leu Val Val Ala Pro Glu Leu Thr Val Ile Gly Leu Pro Glu
180 185 190
Gly Asn Trp Ile Gln Val Ser Asn Gly Gln Ala Val Leu Gly Gly Pro
195 200 205
Asn Thr Thr Trp Val Phe Lys Ala Gly Glu Glu Ala Val Ala Leu Glu
210 215 220
Ala Gly His Arg Phe
225
<210> SEQ ID NO 20
<211> LENGTH: 99
<212> TYPE: PRT
<213> ORGANISM: Human immunodeficiency virus
<400> SEQUENCE: 20
Pro Gln Ile Thr Leu Trp Gln Arg Pro Leu Val Thr Val Lys Ile Gly
1 5 10 15
Gly Gln Leu Arg Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr Val
20 25 30
Leu Glu Asp Ile Asn Leu Pro Gly Lys Trp Lys Pro Lys Met Ile Gly
35 40 45
Gly Ile Gly Gly Phe Ile Lys Val Arg Gln Tyr Asp Gln Ile Leu Ile
50 55 60
Glu Ile Cys Gly Lys Lys Ala Ile Gly Thr Val Leu Val Gly Pro Thr
65 70 75 80
Pro Val Asn Ile Ile Gly Arg Asn Met Leu Thr Gln Ile Gly Cys Thr
85 90 95
Leu Asn Phe
<210> SEQ ID NO 21
<211> LENGTH: 297
<212> TYPE: PRT
<213> ORGANISM: Escherichia coli
<400> SEQUENCE: 21
Ser Thr Glu Thr Leu Ser Phe Thr Pro Asp Asn Ile Asn Ala Asp Ile
1 5 10 15
Ser Leu Gly Thr Leu Ser Gly Lys Thr Lys Glu Arg Val Tyr Leu Ala
20 25 30
Glu Glu Gly Gly Arg Lys Val Ser Gln Leu Asp Trp Lys Phe Asn Asn
35 40 45
Ala Ala Ile Ile Lys Gly Ala Ile Asn Trp Asp Leu Met Pro Gln Ile
50 55 60
Ser Ile Gly Ala Ala Gly Trp Thr Thr Leu Gly Ser Arg Gly Gly Asn
65 70 75 80
Met Val Asp Gln Asp Trp Met Asp Ser Ser Asn Pro Gly Thr Trp Thr
85 90 95
Asp Glu Ala Arg His Pro Asp Thr Gln Leu Asn Tyr Ala Asn Glu Phe
100 105 110
Asp Leu Asn Ile Lys Gly Trp Leu Leu Asn Glu Pro Asn Tyr Arg Leu
115 120 125
Gly Leu Met Ala Gly Tyr Gln Glu Ser Arg Tyr Ser Phe Thr Ala Arg
130 135 140
Gly Gly Ser Tyr Ile Tyr Ser Ser Glu Glu Gly Phe Arg Asp Asp Ile
145 150 155 160
Gly Ser Phe Pro Asn Gly Glu Arg Ala Ile Gly Tyr Lys Gln Arg Phe
165 170 175
Lys Met Pro Tyr Ile Gly Leu Thr Gly Ser Tyr Arg Tyr Glu Asp Phe
180 185 190
Glu Leu Gly Gly Thr Phe Lys Tyr Ser Gly Trp Val Glu Ser Ser Asp
195 200 205
Asn Asp Glu His Tyr Asp Pro Lys Gly Arg Ile Thr Tyr Arg Ser Lys
210 215 220
Val Lys Asp Gln Asn Tyr Tyr Ser Val Ala Val Asn Ala Gly Tyr Tyr
225 230 235 240
Val Thr Pro Asn Ala Lys Val Tyr Val Glu Gly Ala Trp Asn Arg Val
245 250 255
Thr Asn Lys Lys Gly Asn Thr Ser Leu Tyr Asp His Asn Asn Asn Thr
260 265 270
Ser Asp Tyr Ser Lys Asn Gly Ala Gly Ile Glu Asn Tyr Asn Phe Ile
275 280 285
Thr Thr Ala Gly Leu Lys Tyr Thr Phe
290 295
<210> SEQ ID NO 22
<211> LENGTH: 212
<212> TYPE: PRT
<213> ORGANISM: Carica papaya
<400> SEQUENCE: 22
Ile Pro Glu Tyr Val Asp Trp Arg Gln Lys Gly Ala Val Thr Pro Val
1 5 10 15
Lys Asn Gln Gly Ser Cys Gly Ser Cys Trp Ala Phe Ser Ala Val Val
20 25 30
Thr Ile Glu Gly Ile Ile Lys Ile Arg Thr Gly Asn Leu Asn Gln Tyr
35 40 45
Ser Glu Gln Glu Leu Leu Asp Cys Asp Arg Arg Ser Tyr Gly Cys Asn
50 55 60
Gly Gly Tyr Pro Trp Ser Ala Leu Gln Leu Val Ala Gln Tyr Gly Ile
65 70 75 80
His Tyr Arg Asn Thr Tyr Pro Tyr Glu Gly Val Gln Arg Tyr Cys Arg
85 90 95
Ser Arg Glu Lys Gly Pro Tyr Ala Ala Lys Thr Asp Gly Val Arg Gln
100 105 110
Val Gln Pro Tyr Asn Gln Gly Ala Leu Leu Tyr Ser Ile Ala Asn Gln
115 120 125
Pro Val Ser Val Val Leu Gln Ala Ala Gly Lys Asp Phe Gln Leu Tyr
130 135 140
Arg Gly Gly Ile Phe Val Gly Pro Cys Gly Asn Lys Val Asp His Ala
145 150 155 160
Val Ala Ala Val Gly Tyr Gly Pro Asn Tyr Ile Leu Ile Lys Asn Ser
165 170 175
Trp Gly Thr Gly Trp Gly Glu Asn Gly Tyr Ile Arg Ile Lys Arg Gly
180 185 190
Thr Gly Asn Ser Tyr Gly Val Cys Gly Leu Tyr Thr Ser Ser Phe Tyr
195 200 205
Pro Val Lys Asn
210
<210> SEQ ID NO 23
<211> LENGTH: 699
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 23
Ala Gly Ile Ala Ala Lys Leu Ala Lys Asp Arg Glu Ala Ala Glu Gly
1 5 10 15
Leu Gly Ser His Glu Arg Ala Ile Lys Tyr Leu Asn Gln Asp Tyr Glu
20 25 30
Ala Leu Arg Asn Glu Cys Leu Glu Ala Gly Thr Leu Phe Gln Asp Pro
35 40 45
Ser Phe Pro Ala Ile Pro Ser Ala Leu Gly Phe Lys Glu Leu Gly Pro
50 55 60
Tyr Ser Ser Lys Thr Arg Gly Met Arg Trp Lys Arg Pro Thr Glu Ile
65 70 75 80
Cys Ala Asp Pro Gln Phe Ile Ile Gly Gly Ala Thr Arg Thr Asp Ile
85 90 95
Cys Gln Gly Ala Leu Gly Asp Cys Trp Leu Leu Ala Ala Ile Ala Ser
100 105 110
Leu Thr Leu Asn Glu Glu Ile Leu Ala Arg Val Val Pro Leu Asn Gln
115 120 125
Ser Phe Gln Glu Asn Tyr Ala Gly Ile Phe His Phe Gln Phe Trp Gln
130 135 140
Tyr Gly Glu Trp Val Glu Val Val Val Asp Asp Arg Leu Pro Thr Lys
145 150 155 160
Asp Gly Glu Leu Leu Phe Val His Ser Ala Glu Gly Ser Glu Phe Trp
165 170 175
Ser Ala Leu Leu Glu Lys Ala Tyr Ala Lys Ile Asn Gly Cys Tyr Glu
180 185 190
Ala Leu Ser Gly Gly Ala Thr Thr Glu Gly Phe Glu Asp Phe Thr Gly
195 200 205
Gly Ile Ala Glu Trp Tyr Glu Leu Lys Lys Pro Pro Pro Asn Leu Phe
210 215 220
Lys Ile Ile Gln Lys Ala Leu Gln Lys Gly Ser Leu Leu Gly Cys Ser
225 230 235 240
Ile Asp Ile Thr Ser Ala Ala Asp Ser Glu Ala Ile Thr Phe Gln Lys
245 250 255
Leu Val Lys Gly His Ala Tyr Ser Val Thr Gly Ala Glu Glu Val Glu
260 265 270
Ser Asn Gly Ser Leu Gln Lys Leu Ile Arg Ile Arg Asn Pro Trp Gly
275 280 285
Glu Val Glu Trp Thr Gly Arg Trp Asn Asp Asn Cys Pro Ser Trp Asn
290 295 300
Thr Ile Asp Pro Glu Glu Arg Glu Arg Leu Thr Arg Arg His Glu Asp
305 310 315 320
Gly Glu Phe Trp Met Ser Phe Ser Asp Phe Leu Arg His Tyr Ser Arg
325 330 335
Leu Glu Ile Cys Asn Leu Thr Pro Asp Thr Leu Thr Ser Asp Thr Tyr
340 345 350
Lys Lys Trp Lys Leu Thr Lys Met Asp Gly Asn Trp Arg Arg Gly Ser
355 360 365
Thr Ala Gly Gly Cys Arg Asn Tyr Pro Asn Thr Phe Trp Met Asn Pro
370 375 380
Gln Tyr Leu Ile Lys Leu Glu Glu Glu Asp Glu Asp Glu Glu Asp Gly
385 390 395 400
Glu Ser Gly Cys Thr Phe Leu Val Gly Leu Ile Gln Lys His Arg Arg
405 410 415
Arg Gln Arg Lys Met Gly Glu Asp Met His Thr Ile Gly Phe Gly Ile
420 425 430
Tyr Glu Val Pro Glu Glu Leu Ser Gly Gln Thr Asn Ile His Leu Ser
435 440 445
Lys Asn Phe Phe Leu Thr Asn Arg Ala Arg Glu Arg Ser Asp Thr Phe
450 455 460
Ile Asn Leu Arg Glu Val Leu Asn Arg Phe Lys Leu Pro Pro Gly Glu
465 470 475 480
Tyr Ile Leu Val Pro Ser Thr Phe Glu Pro Asn Lys Asp Gly Asp Phe
485 490 495
Cys Ile Arg Val Phe Ser Glu Lys Lys Ala Asp Tyr Gln Ala Val Asp
500 505 510
Asp Glu Ile Glu Ala Asn Leu Glu Glu Phe Asp Ile Ser Glu Asp Asp
515 520 525
Ile Asp Asp Gly Val Arg Arg Leu Phe Ala Gln Leu Ala Gly Glu Asp
530 535 540
Ala Glu Ile Ser Ala Phe Glu Leu Gln Thr Ile Leu Arg Arg Val Leu
545 550 555 560
Ala Lys Arg Gln Asp Ile Lys Ser Asp Gly Phe Ser Ile Glu Thr Cys
565 570 575
Lys Ile Met Val Asp Met Leu Asp Ser Asp Gly Ser Gly Lys Leu Gly
580 585 590
Leu Lys Glu Phe Tyr Ile Leu Trp Thr Lys Ile Gln Lys Tyr Gln Lys
595 600 605
Ile Tyr Arg Glu Ile Asp Val Asp Arg Ser Gly Thr Met Asn Ser Tyr
610 615 620
Glu Met Arg Lys Ala Leu Glu Glu Ala Gly Phe Lys Met Pro Cys Gln
625 630 635 640
Leu His Gln Val Ile Val Ala Arg Phe Ala Asp Asp Gln Leu Ile Ile
645 650 655
Asp Phe Asp Asn Phe Val Arg Cys Leu Val Arg Leu Glu Thr Leu Phe
660 665 670
Lys Ile Phe Lys Gln Leu Asp Pro Glu Asn Thr Gly Thr Ile Glu Leu
675 680 685
Asp Leu Ile Ser Trp Leu Cys Phe Ser Val Leu
690 695
<210> SEQ ID NO 24
<211> LENGTH: 221
<212> TYPE: PRT
<213> ORGANISM: Tobacco etch virus
<400> SEQUENCE: 24
Gly Glu Ser Leu Phe Lys Gly Pro Arg Asp Tyr Asn Pro Ile Ser Ser
1 5 10 15
Thr Ile Cys His Leu Thr Asn Glu Ser Asp Gly His Thr Thr Ser Leu
20 25 30
Tyr Gly Ile Gly Phe Gly Pro Phe Ile Ile Thr Asn Lys His Leu Phe
35 40 45
Arg Arg Asn Asn Gly Thr Leu Leu Val Gln Ser Leu His Gly Val Phe
50 55 60
Lys Val Lys Asn Thr Thr Thr Leu Gln Gln His Leu Ile Asp Gly Arg
65 70 75 80
Asp Met Ile Ile Ile Arg Met Pro Lys Asp Phe Pro Pro Phe Pro Gln
85 90 95
Lys Leu Lys Phe Arg Glu Pro Gln Arg Glu Glu Arg Ile Cys Leu Val
100 105 110
Thr Thr Asn Phe Gln Thr Lys Ser Met Ser Ser Met Val Ser Asp Thr
115 120 125
Ser Cys Thr Phe Pro Ser Ser Asp Gly Ile Phe Trp Lys His Trp Ile
130 135 140
Gln Thr Lys Asp Gly Gln Cys Gly Ser Pro Leu Val Ser Thr Arg Asp
145 150 155 160
Gly Phe Ile Val Gly Ile His Ser Ala Ser Asn Phe Thr Asn Thr Asn
165 170 175
Asn Tyr Phe Thr Ser Val Pro Lys Asn Phe Met Glu Leu Leu Thr Asn
180 185 190
Gln Glu Ala Gln Gln Trp Val Ser Gly Trp Arg Leu Asn Ala Asp Ser
195 200 205
Val Leu Trp Gly Gly His Lys Val Phe Met Asp Lys Pro
210 215 220
<210> SEQ ID NO 25
<211> LENGTH: 371
<212> TYPE: PRT
<213> ORGANISM: Streptococcus pyogenes
<400> SEQUENCE: 25
Asp Gln Asn Phe Ala Arg Asn Glu Lys Glu Ala Lys Asp Ser Ala Ile
1 5 10 15
Thr Phe Ile Gln Lys Ser Ala Ala Ile Lys Ala Gly Ala Arg Ser Ala
20 25 30
Glu Asp Ile Lys Leu Asp Lys Val Asn Leu Gly Gly Glu Leu Ser Gly
35 40 45
Ser Asn Met Tyr Val Tyr Asn Ile Ser Thr Gly Gly Phe Val Ile Val
50 55 60
Ser Gly Asp Lys Arg Ser Pro Glu Ile Leu Gly Tyr Ser Thr Ser Gly
65 70 75 80
Ser Phe Asp Val Asn Gly Lys Glu Asn Ile Ala Ser Phe Met Glu Ser
85 90 95
Tyr Val Glu Gln Ile Lys Glu Asn Lys Lys Leu Asp Ser Thr Tyr Ala
100 105 110
Gly Thr Ala Glu Ile Lys Gln Pro Val Val Lys Ser Leu Leu Asp Ser
115 120 125
Lys Gly Ile His Tyr Asn Gln Gly Asn Pro Tyr Asn Leu Leu Thr Pro
130 135 140
Val Ile Glu Lys Val Lys Pro Gly Glu Gln Ser Phe Val Gly Gln His
145 150 155 160
Ala Ala Thr Gly Ser Val Ala Thr Ala Thr Ala Gln Ile Met Lys Tyr
165 170 175
His Asn Tyr Pro Asn Lys Gly Leu Lys Asp Tyr Thr Tyr Thr Leu Ser
180 185 190
Ser Asn Asn Pro Tyr Phe Asn His Pro Lys Asn Leu Phe Ala Ala Ile
195 200 205
Ser Thr Arg Gln Tyr Asn Trp Asn Asn Ile Leu Pro Thr Tyr Ser Gly
210 215 220
Arg Glu Ser Asn Val Gln Lys Met Ala Ile Ser Glu Leu Met Ala Asp
225 230 235 240
Val Gly Ile Ser Val Asp Met Asp Tyr Gly Pro Ser Ser Gly Ser Ala
245 250 255
Gly Ser Ser Arg Val Gln Arg Ala Leu Lys Glu Asn Phe Gly Tyr Asn
260 265 270
Gln Ser Val His Gln Ile Asn Arg Gly Asp Phe Ser Lys Gln Asp Trp
275 280 285
Glu Ala Gln Ile Asp Lys Glu Leu Ser Gln Asn Gln Pro Val Tyr Tyr
290 295 300
Gln Gly Val Gly Lys Val Gly Gly His Ala Phe Val Ile Asp Gly Ala
305 310 315 320
Asp Gly Arg Asn Phe Tyr His Val Asn Trp Gly Trp Gly Gly Val Ser
325 330 335
Asp Gly Phe Phe Arg Leu Asp Ala Leu Asn Pro Ser Ala Leu Gly Thr
340 345 350
Gly Gly Gly Ala Gly Gly Phe Asn Gly Tyr Gln Ser Ala Val Val Gly
355 360 365
Ile Lys Pro
370
<210> SEQ ID NO 26
<211> LENGTH: 353
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 26
Lys Lys His Thr Gly Tyr Val Gly Leu Lys Asn Gln Gly Ala Thr Cys
1 5 10 15
Tyr Met Asn Ser Leu Leu Gln Thr Leu Phe Phe Thr Asn Gln Leu Arg
20 25 30
Lys Ala Val Tyr Met Met Pro Thr Glu Gly Asp Asp Ser Ser Lys Ser
35 40 45
Val Pro Leu Ala Leu Gln Arg Val Phe Tyr Glu Leu Gln His Ser Asp
50 55 60
Lys Pro Val Gly Thr Lys Lys Leu Thr Lys Ser Phe Gly Trp Glu Thr
65 70 75 80
Leu Asp Ser Phe Met Gln His Asp Val Gln Glu Leu Cys Arg Val Leu
85 90 95
Leu Asp Asn Val Glu Asn Lys Met Lys Gly Thr Cys Val Glu Gly Thr
100 105 110
Ile Pro Lys Leu Phe Arg Gly Lys Met Val Ser Tyr Ile Gln Cys Lys
115 120 125
Glu Val Asp Tyr Arg Ser Asp Arg Arg Glu Asp Tyr Tyr Asp Ile Gln
130 135 140
Leu Ser Ile Lys Gly Lys Lys Asn Ile Phe Glu Ser Phe Val Asp Tyr
145 150 155 160
Val Ala Val Glu Gln Leu Asp Gly Asp Asn Lys Tyr Asp Ala Gly Glu
165 170 175
His Gly Leu Gln Glu Ala Glu Lys Gly Val Lys Phe Leu Thr Leu Pro
180 185 190
Pro Val Leu His Leu Gln Leu Met Arg Phe Met Tyr Asp Pro Gln Thr
195 200 205
Asp Gln Asn Ile Lys Ile Asn Asp Arg Phe Glu Phe Pro Glu Gln Leu
210 215 220
Pro Leu Asp Glu Phe Leu Gln Lys Thr Asp Pro Lys Asp Pro Ala Asn
225 230 235 240
Tyr Ile Leu His Ala Val Leu Val His Ser Gly Asp Asn His Gly Gly
245 250 255
His Tyr Val Val Tyr Leu Asn Pro Lys Gly Asp Gly Lys Trp Cys Lys
260 265 270
Phe Asp Asp Asp Val Val Ser Arg Cys Thr Lys Glu Glu Ala Ile Glu
275 280 285
His Asn Tyr Gly Gly His Asp Asp Asp Leu Ser Val Arg His Cys Thr
290 295 300
Asn Ala Tyr Met Leu Val Tyr Ile Arg Glu Ser Lys Leu Ser Glu Val
305 310 315 320
Leu Gln Ala Val Thr Asp His Asp Ile Pro Gln Gln Leu Val Glu Arg
325 330 335
Leu Gln Glu Glu Lys Arg Ile Glu Ala Gln Lys Arg Lys Glu Arg Gln
340 345 350
Glu
<210> SEQ ID NO 27
<211> LENGTH: 174
<212> TYPE: PRT
<213> ORGANISM: Staphylococcus aureus
<400> SEQUENCE: 27
Tyr Asn Glu Gln Tyr Val Asn Lys Leu Glu Asn Phe Lys Ile Arg Glu
1 5 10 15
Thr Gln Gly Asn Asn Gly Trp Cys Ala Gly Tyr Thr Met Ser Ala Leu
20 25 30
Leu Asn Ala Thr Tyr Asn Thr Asn Lys Tyr His Ala Glu Ala Val Met
35 40 45
Arg Phe Leu His Pro Asn Leu Gln Gly Gln Gln Phe Gln Phe Thr Gly
50 55 60
Leu Thr Pro Arg Glu Met Ile Tyr Phe Gly Gln Thr Gln Gly Arg Ser
65 70 75 80
Pro Gln Leu Leu Asn Arg Met Thr Thr Tyr Asn Glu Val Asp Asn Leu
85 90 95
Thr Lys Asn Asn Lys Gly Ile Ala Ile Leu Gly Ser Arg Val Glu Ser
100 105 110
Arg Asn Gly Met His Ala Gly His Ala Met Ala Val Val Gly Asn Ala
115 120 125
Lys Leu Asn Asn Gly Gln Glu Val Ile Ile Ile Trp Asn Pro Trp Asp
130 135 140
Asn Gly Phe Met Thr Gln Asp Ala Lys Asn Asn Val Ile Pro Val Ser
145 150 155 160
Asn Gly Asp His Tyr Gln Trp Tyr Ser Ser Ile Tyr Gly Tyr
165 170
<210> SEQ ID NO 28
<211> LENGTH: 221
<212> TYPE: PRT
<213> ORGANISM: Saccharomyces cerevisiae
<400> SEQUENCE: 28
Gly Ser Leu Val Pro Glu Leu Asn Glu Lys Asp Asp Asp Gln Val Gln
1 5 10 15
Lys Ala Leu Ala Ser Arg Glu Asn Thr Gln Leu Met Asn Arg Asp Asn
20 25 30
Ile Glu Ile Thr Val Arg Asp Phe Lys Thr Leu Ala Pro Arg Arg Trp
35 40 45
Leu Asn Asp Thr Ile Ile Glu Phe Phe Met Lys Tyr Ile Glu Lys Ser
50 55 60
Thr Pro Asn Thr Val Ala Phe Asn Ser Phe Phe Tyr Thr Asn Leu Ser
65 70 75 80
Glu Arg Gly Tyr Gln Gly Val Arg Arg Trp Met Lys Arg Lys Lys Thr
85 90 95
Gln Ile Asp Lys Leu Asp Lys Ile Phe Thr Pro Ile Asn Leu Asn Gln
100 105 110
Ser His Trp Ala Leu Gly Ile Ile Asp Leu Lys Lys Lys Thr Ile Gly
115 120 125
Tyr Val Asp Ser Leu Ser Asn Gly Pro Asn Ala Met Ser Phe Ala Ile
130 135 140
Leu Thr Asp Leu Gln Lys Tyr Val Met Glu Glu Ser Lys His Thr Ile
145 150 155 160
Gly Glu Asp Phe Asp Leu Ile His Leu Asp Cys Pro Gln Gln Pro Asn
165 170 175
Gly Tyr Asp Cys Gly Ile Tyr Val Cys Met Asn Thr Leu Tyr Gly Ser
180 185 190
Ala Asp Ala Pro Leu Asp Phe Asp Tyr Lys Asp Ala Ile Arg Met Arg
195 200 205
Arg Phe Ile Ala His Leu Ile Leu Thr Asp Ala Leu Lys
210 215 220
<210> SEQ ID NO 29
<211> LENGTH: 166
<212> TYPE: PRT
<213> ORGANISM: Pyrococcus horikoshii
<400> SEQUENCE: 29
Met Lys Val Leu Phe Leu Thr Ala Asn Glu Phe Glu Asp Val Glu Leu
1 5 10 15
Ile Tyr Pro Tyr His Arg Leu Lys Glu Glu Gly His Glu Val Tyr Ile
20 25 30
Ala Ser Phe Glu Arg Gly Thr Ile Thr Gly Lys His Gly Tyr Ser Val
35 40 45
Lys Val Asp Leu Thr Phe Asp Lys Val Asn Pro Glu Glu Phe Asp Ala
50 55 60
Leu Val Leu Pro Gly Gly Arg Ala Pro Glu Arg Val Arg Leu Asn Glu
65 70 75 80
Lys Ala Val Ser Ile Ala Arg Lys Met Phe Ser Glu Gly Lys Pro Val
85 90 95
Ala Ser Ile Cys His Gly Pro Gln Ile Leu Ile Ser Ala Gly Val Leu
100 105 110
Arg Gly Arg Lys Gly Thr Ser Tyr Pro Gly Ile Lys Asp Asp Met Ile
115 120 125
Asn Ala Gly Val Glu Trp Val Asp Ala Glu Val Val Val Asp Gly Asn
130 135 140
Trp Val Ser Ser Arg Val Pro Ala Asp Leu Tyr Ala Trp Met Arg Glu
145 150 155 160
Phe Val Lys Leu Leu Lys
165
<210> SEQ ID NO 30
<211> LENGTH: 316
<212> TYPE: PRT
<213> ORGANISM: Bacillus thermoproteolyticus
<400> SEQUENCE: 30
Ile Thr Gly Thr Ser Thr Val Gly Val Gly Arg Gly Val Leu Gly Asp
1 5 10 15
Gln Lys Asn Ile Asn Thr Thr Tyr Ser Thr Tyr Tyr Tyr Leu Gln Asp
20 25 30
Asn Thr Arg Gly Asp Gly Ile Phe Thr Tyr Asp Ala Lys Tyr Arg Thr
35 40 45
Thr Leu Pro Gly Ser Leu Trp Ala Asp Ala Asp Asn Gln Phe Phe Ala
50 55 60
Ser Tyr Asp Ala Pro Ala Val Asp Ala His Tyr Tyr Ala Gly Val Thr
65 70 75 80
Tyr Asp Tyr Tyr Lys Asn Val His Asn Arg Leu Ser Tyr Asp Gly Asn
85 90 95
Asn Ala Ala Ile Arg Ser Ser Val His Tyr Ser Gln Gly Tyr Asn Asn
100 105 110
Ala Phe Trp Asn Gly Ser Glu Met Val Tyr Gly Asp Gly Asp Gly Gln
115 120 125
Thr Phe Ile Pro Leu Ser Gly Gly Ile Asp Val Val Ala His Glu Leu
130 135 140
Thr His Ala Val Thr Asp Tyr Thr Ala Gly Leu Ile Tyr Gln Asn Glu
145 150 155 160
Ser Gly Ala Ile Asn Glu Ala Ile Ser Asp Ile Phe Gly Thr Leu Val
165 170 175
Glu Phe Tyr Ala Asn Lys Asn Pro Asp Trp Glu Ile Gly Glu Asp Val
180 185 190
Tyr Thr Pro Gly Ile Ser Gly Asp Ser Leu Arg Ser Met Ser Asp Pro
195 200 205
Ala Lys Tyr Gly Asp Pro Asp His Tyr Ser Lys Arg Tyr Thr Gly Thr
210 215 220
Gln Asp Asn Gly Gly Val His Ile Asn Ser Gly Ile Ile Asn Lys Ala
225 230 235 240
Ala Tyr Leu Ile Ser Gln Gly Gly Thr His Tyr Gly Val Ser Val Val
245 250 255
Gly Ile Gly Arg Asp Lys Leu Gly Lys Ile Phe Tyr Arg Ala Leu Thr
260 265 270
Gln Tyr Leu Thr Pro Thr Ser Asn Phe Ser Gln Leu Arg Ala Ala Ala
275 280 285
Val Gln Ser Ala Thr Asp Leu Tyr Gly Ser Thr Ser Gln Glu Val Ala
290 295 300
Ser Val Lys Gln Ala Phe Asp Ala Val Gly Val Lys
305 310 315
<210> SEQ ID NO 31
<211> LENGTH: 169
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 31
Val Leu Thr Glu Gly Asn Pro Arg Trp Glu Gln Thr His Leu Thr Tyr
1 5 10 15
Arg Ile Glu Asn Tyr Thr Pro Asp Leu Pro Arg Ala Asp Val Asp His
20 25 30
Ala Ile Glu Lys Ala Phe Gln Leu Trp Ser Asn Val Thr Pro Leu Thr
35 40 45
Phe Thr Lys Val Ser Glu Gly Gln Ala Asp Ile Met Ile Ser Phe Val
50 55 60
Arg Gly Asp His Arg Asp Asn Ser Pro Phe Asp Gly Pro Gly Gly Asn
65 70 75 80
Leu Ala His Ala Phe Gln Pro Gly Pro Gly Ile Gly Gly Asp Ala His
85 90 95
Phe Asp Glu Asp Glu Arg Trp Thr Asn Asn Phe Arg Glu Tyr Asn Leu
100 105 110
His Arg Val Ala Ala His Glu Leu Gly His Ser Leu Gly Leu Ser His
115 120 125
Ser Thr Asp Ile Gly Ala Leu Met Tyr Pro Ser Tyr Thr Phe Ser Gly
130 135 140
Asp Val Gln Leu Ala Gln Asp Asp Ile Asp Gly Ile Gln Ala Ile Tyr
145 150 155 160
Gly Arg Ser Gln Asn Pro Val Gln Pro
165
<210> SEQ ID NO 32
<211> LENGTH: 496
<212> TYPE: PRT
<213> ORGANISM: Homo sapiens
<400> SEQUENCE: 32
Gln Tyr Ser Pro Asn Thr Gln Gln Gly Arg Thr Ser Ile Val His Leu
1 5 10 15
Phe Glu Trp Arg Trp Val Asp Ile Ala Leu Glu Cys Glu Arg Tyr Leu
20 25 30
Ala Pro Lys Gly Phe Gly Gly Val Gln Val Ser Pro Pro Asn Glu Asn
35 40 45
Val Ala Ile Tyr Asn Pro Phe Arg Pro Trp Trp Glu Arg Tyr Gln Pro
50 55 60
Val Ser Tyr Lys Leu Cys Thr Arg Ser Gly Asn Glu Asp Glu Phe Arg
65 70 75 80
Asn Met Val Thr Arg Cys Asn Asn Val Gly Val Arg Ile Tyr Val Asp
85 90 95
Ala Val Ile Asn His Met Cys Gly Asn Ala Val Ser Ala Gly Thr Ser
100 105 110
Ser Thr Cys Gly Ser Tyr Phe Asn Pro Gly Ser Arg Asp Phe Pro Ala
115 120 125
Val Pro Tyr Ser Gly Trp Asp Phe Asn Asp Gly Lys Cys Lys Thr Gly
130 135 140
Ser Gly Asp Ile Glu Asn Tyr Asn Asp Ala Thr Gln Val Arg Asp Cys
145 150 155 160
Arg Leu Thr Gly Leu Leu Asp Leu Ala Leu Glu Lys Asp Tyr Val Arg
165 170 175
Ser Lys Ile Ala Glu Tyr Met Asn His Leu Ile Asp Ile Gly Val Ala
180 185 190
Gly Phe Arg Leu Asp Ala Ser Lys His Met Trp Pro Gly Asp Ile Lys
195 200 205
Ala Ile Leu Asp Lys Leu His Asn Leu Asn Ser Asn Trp Phe Pro Ala
210 215 220
Gly Ser Lys Pro Phe Ile Tyr Gln Glu Val Ile Asp Leu Gly Gly Glu
225 230 235 240
Pro Ile Lys Ser Ser Asp Tyr Phe Gly Asn Gly Arg Val Thr Glu Phe
245 250 255
Lys Tyr Gly Ala Lys Leu Gly Thr Val Ile Arg Lys Trp Asn Gly Glu
260 265 270
Lys Met Ser Tyr Leu Lys Asn Trp Gly Glu Gly Trp Gly Phe Val Pro
275 280 285
Ser Asp Arg Ala Leu Val Phe Val Asp Asn His Asp Asn Gln Arg Gly
290 295 300
His Gly Ala Gly Gly Ala Ser Ile Leu Thr Phe Trp Asp Ala Arg Leu
305 310 315 320
Tyr Lys Met Ala Val Gly Phe Met Leu Ala His Pro Tyr Gly Phe Thr
325 330 335
Arg Val Met Ser Ser Tyr Arg Trp Pro Arg Gln Phe Gln Asn Gly Asn
340 345 350
Asp Val Asn Asp Trp Val Gly Pro Pro Asn Asn Asn Gly Val Ile Lys
355 360 365
Glu Val Thr Ile Asn Pro Asp Thr Thr Cys Gly Asn Asp Trp Val Cys
370 375 380
Glu His Arg Trp Arg Gln Ile Arg Asn Met Val Ile Phe Arg Asn Val
385 390 395 400
Val Asp Gly Gln Pro Phe Thr Asn Trp Tyr Asp Asn Gly Ser Asn Gln
405 410 415
Val Ala Phe Gly Arg Gly Asn Arg Gly Phe Ile Val Phe Asn Asn Asp
420 425 430
Asp Trp Ser Phe Ser Leu Thr Leu Gln Thr Gly Leu Pro Ala Gly Thr
435 440 445
Tyr Cys Asp Val Ile Ser Gly Asp Lys Ile Asn Gly Asn Cys Thr Gly
450 455 460
Ile Lys Ile Tyr Val Ser Asp Asp Gly Lys Ala His Phe Ser Ile Ser
465 470 475 480
Asn Ser Ala Glu Asp Pro Phe Ile Ala Ile His Ala Glu Ser Lys Leu
485 490 495
<210> SEQ ID NO 33
<211> LENGTH: 370
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 33
Gln Pro Gly Thr Ser Thr Pro Glu Val His Pro Lys Leu Thr Thr Tyr
1 5 10 15
Lys Cys Thr Lys Ser Gly Gly Cys Val Ala Gln Asp Thr Ser Val Val
20 25 30
Leu Asp Trp Asn Tyr Arg Trp Met His Asp Ala Asn Tyr Asn Ser Cys
35 40 45
Thr Val Asn Gly Gly Val Asn Thr Thr Leu Cys Pro Asp Glu Ala Thr
50 55 60
Cys Gly Lys Asn Cys Phe Ile Glu Gly Val Asp Tyr Ala Ala Ser Gly
65 70 75 80
Val Thr Thr Ser Gly Ser Ser Leu Thr Met Asn Gln Tyr Met Pro Ser
85 90 95
Ser Ser Gly Gly Tyr Ser Ser Val Ser Pro Arg Leu Tyr Leu Leu Asp
100 105 110
Ser Asp Gly Glu Tyr Val Met Leu Lys Leu Asn Gly Gln Glu Leu Ser
115 120 125
Phe Asp Val Asp Leu Ser Ala Leu Pro Cys Gly Glu Asn Gly Ser Leu
130 135 140
Tyr Leu Ser Gln Met Asp Glu Asn Gly Gly Ala Asn Gln Tyr Asn Thr
145 150 155 160
Ala Gly Ala Asn Tyr Gly Ser Gly Tyr Cys Asp Ala Gln Cys Pro Val
165 170 175
Gln Thr Trp Arg Asn Gly Thr Leu Asn Thr Ser His Gln Gly Phe Cys
180 185 190
Cys Asn Glu Met Asp Ile Leu Glu Gly Asn Ser Arg Ala Asn Ala Leu
195 200 205
Thr Pro His Ser Cys Thr Ala Thr Ala Cys Asp Ser Ala Gly Cys Gly
210 215 220
Phe Asn Pro Tyr Gly Ser Gly Tyr Lys Ser Tyr Tyr Gly Pro Gly Asp
225 230 235 240
Thr Val Asp Thr Ser Lys Thr Phe Thr Ile Ile Thr Gln Phe Asn Thr
245 250 255
Asp Asn Gly Ser Pro Ser Gly Asn Leu Val Ser Ile Thr Arg Lys Tyr
260 265 270
Gln Gln Asn Gly Val Asp Ile Pro Ser Ala Gln Pro Gly Gly Asp Thr
275 280 285
Ile Ser Ser Cys Pro Ser Ala Ser Ala Tyr Gly Gly Leu Ala Thr Met
290 295 300
Gly Lys Ala Leu Ser Ser Gly Met Val Leu Val Phe Ser Ile Trp Asn
305 310 315 320
Asp Asn Ser Gln Tyr Met Asn Trp Leu Asp Ser Gly Asn Ala Gly Pro
325 330 335
Cys Ser Ser Thr Glu Gly Asn Pro Ser Asn Ile Leu Ala Asn Asn Pro
340 345 350
Asn Thr His Val Val Phe Ser Asn Ile Arg Trp Gly Asp Ile Gly Ser
355 360 365
Thr Thr
370
<210> SEQ ID NO 34
<211> LENGTH: 223
<212> TYPE: PRT
<213> ORGANISM: Aspergillus niger
<400> SEQUENCE: 34
Gln Thr Met Cys Ser Gln Tyr Asp Ser Ala Ser Ser Pro Pro Tyr Ser
1 5 10 15
Val Asn Gln Asn Leu Trp Gly Glu Tyr Gln Gly Thr Gly Ser Gln Cys
20 25 30
Val Tyr Val Asp Lys Leu Ser Ser Ser Gly Ala Ser Trp His Thr Glu
35 40 45
Trp Thr Trp Ser Gly Gly Glu Gly Thr Val Lys Ser Tyr Ser Asn Ser
50 55 60
Gly Val Thr Phe Asn Lys Lys Leu Val Ser Asp Val Ser Ser Ile Pro
65 70 75 80
Thr Ser Val Glu Trp Lys Gln Asp Asn Thr Asn Val Asn Ala Asp Val
85 90 95
Ala Tyr Asp Leu Phe Thr Ala Ala Asn Val Asp His Ala Thr Ser Ser
100 105 110
Gly Asp Tyr Glu Leu Met Ile Trp Leu Ala Arg Tyr Gly Asn Ile Gln
115 120 125
Pro Ile Gly Lys Gln Ile Ala Thr Ala Thr Val Gly Gly Lys Ser Trp
130 135 140
Glu Val Trp Tyr Gly Ser Thr Thr Gln Ala Gly Ala Glu Gln Arg Thr
145 150 155 160
Tyr Ser Phe Val Ser Glu Ser Pro Ile Asn Ser Tyr Ser Gly Asp Ile
165 170 175
Asn Ala Phe Phe Ser Tyr Leu Thr Gln Asn Gln Gly Phe Pro Ala Ser
180 185 190
Ser Gln Tyr Leu Ile Asn Leu Gln Phe Gly Thr Glu Ala Phe Thr Gly
195 200 205
Gly Pro Ala Thr Phe Thr Val Asp Asn Trp Thr Ala Ser Val Asn
210 215 220
<210> SEQ ID NO 35
<211> LENGTH: 184
<212> TYPE: PRT
<213> ORGANISM: Aspergillus niger
<400> SEQUENCE: 35
Ser Ala Gly Ile Asn Tyr Val Gln Asn Tyr Asn Gly Asn Leu Gly Asp
1 5 10 15
Phe Thr Tyr Asp Glu Ser Ala Gly Thr Phe Ser Met Tyr Trp Glu Asp
20 25 30
Gly Val Ser Ser Asp Phe Val Val Gly Leu Gly Trp Thr Thr Gly Ser
35 40 45
Ser Asn Ala Ile Thr Tyr Ser Ala Glu Tyr Ser Ala Ser Gly Ser Ala
50 55 60
Ser Tyr Leu Ala Val Tyr Gly Trp Val Asn Tyr Pro Gln Ala Glu Tyr
65 70 75 80
Tyr Ile Val Glu Asp Tyr Gly Asp Tyr Asn Pro Cys Ser Ser Ala Thr
85 90 95
Ser Leu Gly Thr Val Tyr Ser Asp Gly Ser Thr Tyr Gln Val Cys Thr
100 105 110
Asp Thr Arg Thr Asn Glu Pro Ser Ile Thr Gly Thr Ser Thr Phe Thr
115 120 125
Gln Tyr Phe Ser Val Arg Glu Ser Thr Arg Thr Ser Gly Thr Val Thr
130 135 140
Val Ala Asn His Phe Asn Phe Trp Ala His His Gly Phe Gly Asn Ser
145 150 155 160
Asp Phe Asn Tyr Gln Val Val Ala Val Glu Ala Trp Ser Gly Ala Gly
165 170 175
Ser Ala Ser Val Thr Ile Ser Ser
180
<210> SEQ ID NO 36
<211> LENGTH: 313
<212> TYPE: PRT
<213> ORGANISM: Streptomyces lividans
<400> SEQUENCE: 36
Ala Glu Ser Thr Leu Gly Ala Ala Ala Ala Gln Ser Gly Arg Tyr Phe
1 5 10 15
Gly Thr Ala Ile Ala Ser Gly Arg Leu Ser Asp Ser Thr Tyr Thr Ser
20 25 30
Ile Ala Gly Arg Glu Phe Asn Met Val Thr Ala Glu Asn Glu Met Lys
35 40 45
Ile Asp Ala Thr Glu Pro Gln Arg Gly Gln Phe Asn Phe Ser Ser Ala
50 55 60
Asp Arg Val Tyr Asn Trp Ala Val Gln Asn Gly Lys Gln Val Arg Gly
65 70 75 80
His Thr Leu Ala Trp His Ser Gln Gln Pro Gly Trp Met Gln Ser Leu
85 90 95
Ser Gly Ser Ala Leu Arg Gln Ala Met Ile Asp His Ile Asn Gly Val
100 105 110
Met Ala His Tyr Lys Gly Lys Ile Val Gln Trp Asp Val Val Asn Glu
115 120 125
Ala Phe Ala Asp Gly Ser Ser Gly Ala Arg Arg Asp Ser Asn Leu Gln
130 135 140
Arg Ser Gly Asn Asp Trp Ile Glu Val Ala Phe Arg Thr Ala Arg Ala
145 150 155 160
Ala Asp Pro Ser Ala Lys Leu Cys Tyr Asn Asp Tyr Asn Val Glu Asn
165 170 175
Trp Thr Trp Ala Lys Thr Gln Ala Met Tyr Asn Met Val Arg Asp Phe
180 185 190
Lys Gln Arg Gly Val Pro Ile Asp Cys Val Gly Phe Gln Ser His Phe
195 200 205
Asn Ser Gly Ser Pro Tyr Asn Ser Asn Phe Arg Thr Thr Leu Gln Asn
210 215 220
Phe Ala Ala Leu Gly Val Asp Val Ala Ile Thr Glu Leu Asp Ile Gln
225 230 235 240
Gly Ala Pro Ala Ser Thr Tyr Ala Asn Val Thr Asn Asp Cys Leu Ala
245 250 255
Val Ser Arg Cys Leu Gly Ile Thr Val Trp Gly Val Arg Asp Ser Asp
260 265 270
Ser Trp Arg Ser Glu Gln Thr Pro Leu Leu Phe Asn Asn Asp Gly Ser
275 280 285
Lys Lys Ala Ala Tyr Thr Ala Val Leu Asp Ala Leu Asn Gly Gly Ala
290 295 300
Ser Ser Glu Pro Pro Ala Asp Gly Gly
305 310
<210> SEQ ID NO 37
<211> LENGTH: 362
<212> TYPE: PRT
<213> ORGANISM: Aspergillus niger
<400> SEQUENCE: 37
Met His Ser Phe Ala Ser Leu Leu Ala Tyr Gly Leu Val Ala Gly Ala
1 5 10 15
Thr Phe Ala Ser Ala Ser Pro Ile Glu Ala Arg Asp Ser Cys Thr Phe
20 25 30
Thr Thr Ala Ala Ala Ala Lys Ala Gly Lys Ala Lys Cys Ser Thr Ile
35 40 45
Thr Leu Asn Asn Ile Glu Val Pro Ala Gly Thr Thr Leu Asp Leu Thr
50 55 60
Gly Leu Thr Ser Gly Thr Lys Val Ile Phe Glu Gly Thr Thr Thr Phe
65 70 75 80
Gln Tyr Glu Glu Trp Ala Gly Pro Leu Ile Ser Met Ser Gly Glu His
85 90 95
Ile Thr Val Thr Gly Ala Ser Gly His Leu Ile Asn Cys Asp Gly Ala
100 105 110
Arg Trp Trp Asp Gly Lys Gly Thr Ser Gly Lys Lys Lys Pro Lys Phe
115 120 125
Phe Tyr Ala His Gly Leu Asp Ser Ser Ser Ile Thr Gly Leu Asn Ile
130 135 140
Lys Asn Thr Pro Leu Met Ala Phe Ser Val Gln Ala Asn Asp Ile Thr
145 150 155 160
Phe Thr Asp Val Thr Ile Asn Asn Ala Asp Gly Asp Thr Gln Gly Gly
165 170 175
His Asn Thr Asp Ala Phe Asp Val Gly Asn Ser Val Gly Val Asn Ile
180 185 190
Ile Lys Pro Trp Val His Asn Gln Asp Asp Cys Leu Ala Val Asn Ser
195 200 205
Gly Glu Asn Ile Trp Phe Thr Gly Gly Thr Cys Ile Gly Gly His Gly
210 215 220
Leu Ser Ile Gly Ser Val Gly Asp Arg Ser Asn Asn Val Val Lys Asn
225 230 235 240
Val Thr Ile Glu His Ser Thr Val Ser Asn Ser Glu Asn Ala Val Arg
245 250 255
Ile Lys Thr Ile Ser Gly Ala Thr Gly Ser Val Ser Glu Ile Thr Tyr
260 265 270
Ser Asn Ile Val Met Ser Gly Ile Ser Asp Tyr Gly Val Val Ile Gln
275 280 285
Gln Asp Tyr Glu Asp Gly Lys Pro Thr Gly Lys Pro Thr Asn Gly Val
290 295 300
Thr Ile Gln Asp Val Lys Leu Glu Ser Val Thr Gly Ser Val Asp Ser
305 310 315 320
Gly Ala Thr Glu Ile Tyr Leu Leu Cys Gly Ser Gly Ser Cys Ser Asp
325 330 335
Trp Thr Trp Asp Asp Val Lys Val Thr Gly Gly Lys Lys Ser Thr Ala
340 345 350
Cys Lys Asn Phe Pro Ser Val Ala Ser Cys
355 360
<210> SEQ ID NO 38
<211> LENGTH: 383
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 38
Arg Ala Asp Val Lys Pro Val Thr Val Lys Leu Val Asp Ser Gln Ala
1 5 10 15
Thr Met Glu Thr Arg Ser Leu Phe Ala Phe Met Gln Glu Gln Arg Arg
20 25 30
His Ser Ile Met Phe Gly His Gln His Glu Thr Thr Gln Gly Leu Thr
35 40 45
Ile Thr Arg Thr Asp Gly Thr Gln Ser Asp Thr Phe Asn Ala Val Gly
50 55 60
Asp Phe Ala Ala Val Tyr Gly Trp Asp Thr Leu Ser Ile Val Ala Pro
65 70 75 80
Lys Ala Glu Gly Asp Ile Val Ala Gln Val Lys Lys Ala Tyr Ala Arg
85 90 95
Gly Gly Ile Ile Thr Val Ser Ser His Phe Asp Asn Pro Lys Thr Asp
100 105 110
Thr Gln Lys Gly Val Trp Pro Val Gly Thr Ser Trp Asp Gln Thr Pro
115 120 125
Ala Val Val Asp Ser Leu Pro Gly Gly Ala Tyr Asn Pro Val Leu Asn
130 135 140
Gly Tyr Leu Asp Gln Val Ala Glu Trp Ala Asn Asn Leu Lys Asp Glu
145 150 155 160
Gln Gly Arg Leu Ile Pro Val Ile Phe Arg Leu Tyr His Ala Asn Thr
165 170 175
Gly Ser Trp Phe Trp Trp Gly Asp Lys Gln Ser Thr Pro Glu Gln Tyr
180 185 190
Lys Gln Leu Phe Arg Tyr Ser Val Glu Tyr Leu Arg Asp Val Lys Gly
195 200 205
Val Arg Asn Phe Leu Tyr Ala Tyr Ser Pro Asn Asn Phe Trp Asp Val
210 215 220
Thr Glu Ala Asn Tyr Leu Glu Arg Tyr Pro Gly Asp Glu Trp Val Asp
225 230 235 240
Val Leu Gly Phe Asp Thr Tyr Gly Pro Val Ala Asp Asn Ala Asp Trp
245 250 255
Phe Arg Asn Val Val Ala Asn Ala Ala Leu Val Ala Arg Met Ala Glu
260 265 270
Ala Arg Gly Lys Ile Pro Val Ile Ser Glu Ile Gly Ile Arg Ala Pro
275 280 285
Asp Ile Glu Ala Gly Leu Tyr Asp Asn Gln Trp Tyr Arg Lys Leu Ile
290 295 300
Ser Gly Leu Lys Ala Asp Pro Asp Ala Arg Glu Ile Ala Phe Leu Leu
305 310 315 320
Val Trp Arg Asn Ala Pro Gln Gly Val Pro Gly Pro Asn Gly Thr Gln
325 330 335
Val Pro His Tyr Trp Val Pro Ala Asn Arg Pro Glu Asn Ile Asn Asn
340 345 350
Gly Thr Leu Glu Asp Phe Gln Ala Phe Tyr Ala Asp Glu Phe Thr Ala
355 360 365
Phe Asn Arg Asp Ile Glu Gln Val Tyr Gln Arg Pro Thr Leu Ile
370 375 380
<210> SEQ ID NO 39
<211> LENGTH: 419
<212> TYPE: PRT
<213> ORGANISM: Bacillus circulans
<400> SEQUENCE: 39
Leu Gln Pro Ala Thr Ala Glu Ala Ala Asp Ser Tyr Lys Ile Val Gly
1 5 10 15
Tyr Tyr Pro Ser Trp Ala Ala Tyr Gly Arg Asn Tyr Asn Val Ala Asp
20 25 30
Ile Asp Pro Thr Lys Val Thr His Ile Asn Tyr Ala Phe Ala Asp Ile
35 40 45
Cys Trp Asn Gly Ile His Gly Asn Pro Asp Pro Ser Gly Pro Asn Pro
50 55 60
Val Thr Trp Thr Cys Gln Asn Glu Lys Ser Gln Thr Ile Asn Val Pro
65 70 75 80
Asn Gly Thr Ile Val Leu Gly Asp Pro Trp Ile Asp Thr Gly Lys Thr
85 90 95
Phe Ala Gly Asp Thr Trp Asp Gln Pro Ile Ala Gly Asn Ile Asn Gln
100 105 110
Leu Asn Lys Leu Lys Gln Thr Asn Pro Asn Leu Lys Thr Ile Ile Ser
115 120 125
Val Gly Gly Trp Thr Trp Ser Asn Arg Phe Ser Asp Val Ala Ala Thr
130 135 140
Ala Ala Thr Arg Glu Val Phe Ala Asn Ser Ala Val Asp Phe Leu Arg
145 150 155 160
Lys Tyr Asn Phe Asp Gly Val Asp Leu Asp Trp Glu Tyr Pro Val Ser
165 170 175
Gly Gly Leu Asp Gly Asn Ser Lys Arg Pro Glu Asp Lys Gln Asn Tyr
180 185 190
Thr Leu Leu Leu Ser Lys Ile Arg Glu Lys Leu Asp Ala Ala Gly Ala
195 200 205
Val Asp Gly Lys Lys Tyr Leu Leu Thr Ile Ala Ser Gly Ala Ser Ala
210 215 220
Thr Tyr Ala Ala Asn Thr Glu Leu Ala Lys Ile Ala Ala Ile Val Asp
225 230 235 240
Trp Ile Asn Ile Met Thr Tyr Asp Phe Asn Gly Ala Trp Gln Lys Ile
245 250 255
Ser Ala His Asn Ala Pro Leu Asn Tyr Asp Pro Ala Ala Ser Ala Ala
260 265 270
Gly Val Pro Asp Ala Asn Thr Phe Asn Val Ala Ala Gly Ala Gln Gly
275 280 285
His Leu Asp Ala Gly Val Pro Ala Ala Lys Leu Val Leu Gly Val Pro
290 295 300
Phe Tyr Gly Arg Gly Trp Asp Gly Cys Ala Gln Ala Gly Asn Gly Gln
305 310 315 320
Tyr Gln Thr Cys Thr Gly Gly Ser Ser Val Gly Thr Trp Glu Ala Gly
325 330 335
Ser Phe Asp Phe Tyr Asp Leu Glu Ala Asn Tyr Ile Asn Lys Asn Gly
340 345 350
Tyr Thr Arg Tyr Trp Asn Asp Thr Ala Lys Val Pro Tyr Leu Tyr Asn
355 360 365
Ala Ser Asn Lys Arg Phe Ile Ser Tyr Asp Asp Ala Glu Ser Val Gly
370 375 380
Tyr Lys Thr Ala Tyr Ile Lys Ser Lys Gly Leu Gly Gly Ala Met Phe
385 390 395 400
Trp Glu Leu Ser Gly Asp Arg Asn Lys Thr Leu Gln Asn Lys Leu Lys
405 410 415
Ala Asp Leu
<210> SEQ ID NO 40
<211> LENGTH: 317
<212> TYPE: PRT
<213> ORGANISM: Candida antarctica
<400> SEQUENCE: 40
Leu Pro Ser Gly Ser Asp Pro Ala Phe Ser Gln Pro Lys Ser Val Leu
1 5 10 15
Asp Ala Gly Leu Thr Cys Gln Gly Ala Ser Pro Ser Ser Val Ser Lys
20 25 30
Pro Ile Leu Leu Val Pro Gly Thr Gly Thr Thr Gly Pro Gln Ser Phe
35 40 45
Asp Ser Asn Trp Ile Pro Leu Ser Thr Gln Leu Gly Tyr Thr Pro Cys
50 55 60
Trp Ile Ser Pro Pro Pro Phe Met Leu Asn Asp Thr Gln Val Asn Thr
65 70 75 80
Glu Tyr Met Val Asn Ala Ile Thr Ala Leu Tyr Ala Gly Ser Gly Asn
85 90 95
Asn Lys Leu Pro Val Leu Thr Trp Ser Gln Gly Gly Leu Val Ala Gln
100 105 110
Trp Gly Leu Thr Phe Phe Pro Ser Ile Arg Ser Lys Val Asp Arg Leu
115 120 125
Met Ala Phe Ala Pro Asp Tyr Lys Gly Thr Val Leu Ala Gly Pro Leu
130 135 140
Asp Ala Leu Ala Val Ser Ala Pro Ser Val Trp Gln Gln Thr Thr Gly
145 150 155 160
Ser Ala Leu Thr Thr Ala Leu Arg Asn Ala Gly Gly Leu Thr Gln Ile
165 170 175
Val Pro Thr Thr Asn Leu Tyr Ser Ala Thr Asp Glu Ile Val Gln Pro
180 185 190
Gln Val Ser Asn Ser Pro Leu Asp Ser Ser Tyr Leu Phe Asn Gly Lys
195 200 205
Asn Val Gln Ala Gln Ala Val Cys Gly Pro Leu Phe Val Ile Asp His
210 215 220
Ala Gly Ser Leu Thr Ser Gln Phe Ser Tyr Val Val Gly Arg Ser Ala
225 230 235 240
Leu Arg Ser Thr Thr Gly Gln Ala Arg Ser Ala Asp Tyr Gly Ile Thr
245 250 255
Asp Cys Asn Pro Leu Pro Ala Asn Asp Leu Thr Pro Glu Gln Lys Val
260 265 270
Ala Ala Ala Ala Leu Leu Ala Pro Ala Ala Ala Ala Ile Val Ala Gly
275 280 285
Pro Lys Gln Asn Cys Glu Pro Asp Leu Met Pro Tyr Ala Arg Pro Phe
290 295 300
Ala Val Gly Lys Arg Thr Cys Ser Gly Ile Val Thr Pro
305 310 315
<210> SEQ ID NO 41
<211> LENGTH: 434
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Chimera if pig and homo sapiens
<400> SEQUENCE: 41
Ala Glu Val Cys Tyr Ser His Leu Gly Cys Phe Ser Asp Glu Lys Pro
1 5 10 15
Trp Ala Gly Thr Ser Gln Arg Pro Ile Lys Ser Leu Pro Ser Asp Pro
20 25 30
Lys Lys Ile Asn Thr Arg Phe Leu Leu Tyr Thr Asn Glu Asn Gln Asn
35 40 45
Ser Tyr Gln Leu Ile Thr Ala Thr Asp Ile Ala Thr Ile Lys Ala Ser
50 55 60
Asn Phe Asn Leu Asn Arg Lys Thr Arg Phe Ile Ile His Gly Phe Thr
65 70 75 80
Asp Ser Gly Glu Asn Ser Trp Leu Ser Asp Met Cys Lys Asn Met Phe
85 90 95
Gln Val Glu Lys Val Asn Cys Ile Cys Val Asp Trp Lys Gly Gly Ser
100 105 110
Lys Ala Gln Tyr Ser Gln Ala Ser Gln Asn Ile Arg Val Val Gly Ala
115 120 125
Glu Val Ala Tyr Leu Val Gln Val Leu Ser Thr Ser Leu Asn Tyr Ala
130 135 140
Pro Glu Asn Val His Ile Ile Gly His Ser Leu Gly Ala His Thr Ala
145 150 155 160
Gly Glu Ala Gly Lys Arg Leu Asn Gly Leu Val Gly Arg Ile Thr Gly
165 170 175
Leu Asp Pro Ala Glu Pro Tyr Phe Gln Asp Thr Pro Glu Glu Val Arg
180 185 190
Leu Asp Pro Ser Asp Ala Lys Phe Val Asp Val Ile His Thr Asp Ile
195 200 205
Ser Pro Ile Leu Pro Ser Leu Gly Phe Gly Met Ser Gln Lys Val Gly
210 215 220
His Met Asp Phe Phe Pro Asn Gly Gly Lys Asp Met Pro Gly Cys Lys
225 230 235 240
Thr Gly Ile Ser Cys Asn His His Arg Ser Ile Glu Tyr Tyr His Ser
245 250 255
Ser Ile Leu Asn Pro Glu Gly Phe Leu Gly Tyr Pro Cys Ala Ser Tyr
260 265 270
Asp Glu Phe Gln Glu Ser Gly Cys Phe Pro Cys Pro Ala Lys Gly Cys
275 280 285
Pro Lys Met Gly His Phe Ala Asp Gln Tyr Pro Gly Lys Thr Asn Ala
290 295 300
Val Glu Gln Thr Phe Phe Leu Asn Thr Gly Ala Ser Asp Asn Phe Thr
305 310 315 320
Arg Trp Arg Tyr Lys Val Thr Val Thr Leu Ser Gly Glu Lys Asp Pro
325 330 335
Ser Gly Asn Ile Asn Val Ala Leu Leu Gly Lys Asn Gly Asn Ser Ala
340 345 350
Gln Tyr Gln Val Phe Lys Gly Thr Leu Lys Pro Asp Ala Ser Tyr Thr
355 360 365
Asn Ser Ile Asp Val Glu Leu Asn Val Gly Thr Ile Gln Lys Val Thr
370 375 380
Phe Leu Trp Lys Arg Ser Gly Ile Ser Val Ser Lys Pro Lys Met Gly
385 390 395 400
Ala Ser Arg Ile Thr Val Gln Ser Gly Lys Asp Gly Thr Lys Tyr Asn
405 410 415
Phe Cys Ser Ser Asp Ile Val Gln Glu Asn Val Glu Gln Thr Leu Ser
420 425 430
Pro Cys
<210> SEQ ID NO 42
<211> LENGTH: 471
<212> TYPE: PRT
<213> ORGANISM: Escherichia coli
<400> SEQUENCE: 42
Met Lys Gln Ser Thr Ile Ala Leu Ala Leu Leu Pro Leu Leu Phe Thr
1 5 10 15
Pro Val Thr Lys Ala Arg Thr Pro Glu Met Pro Val Leu Glu Asn Arg
20 25 30
Ala Ala Gln Gly Asp Ile Thr Ala Pro Gly Gly Ala Arg Arg Leu Thr
35 40 45
Gly Asp Gln Thr Ala Ala Leu Arg Asp Ser Leu Ser Asp Lys Pro Ala
50 55 60
Lys Asn Ile Ile Leu Leu Ile Gly Asp Gly Met Gly Asp Ser Glu Ile
65 70 75 80
Thr Ala Ala Arg Asn Tyr Ala Glu Gly Ala Gly Gly Phe Phe Lys Gly
85 90 95
Ile Asp Ala Leu Pro Leu Thr Gly Gln Tyr Thr His Tyr Ala Leu Asn
100 105 110
Lys Lys Thr Gly Lys Pro Asp Tyr Val Thr Asp Ser Ala Ala Ser Ala
115 120 125
Thr Ala Trp Ser Thr Gly Val Lys Thr Tyr Asn Gly Ala Leu Gly Val
130 135 140
Asp Ile His Glu Lys Asp His Pro Thr Ile Leu Glu Met Ala Lys Ala
145 150 155 160
Ala Gly Leu Ala Thr Gly Asn Val Ser Thr Ala Glu Leu Gln Asp Ala
165 170 175
Thr Pro Ala Ala Leu Val Ala His Val Thr Ser Arg Lys Cys Tyr Gly
180 185 190
Pro Ser Ala Thr Ser Glu Lys Cys Pro Gly Asn Ala Leu Glu Lys Gly
195 200 205
Gly Lys Gly Ser Ile Thr Glu Gln Leu Leu Asn Ala Arg Ala Asp Val
210 215 220
Thr Leu Gly Gly Gly Ala Lys Thr Phe Ala Glu Thr Ala Thr Ala Gly
225 230 235 240
Glu Trp Gln Gly Lys Thr Leu Arg Glu Gln Ala Gln Ala Arg Gly Tyr
245 250 255
Gln Leu Val Ser Asp Ala Ala Ser Leu Asn Ser Val Thr Glu Ala Asn
260 265 270
Gln Gln Lys Pro Leu Leu Gly Leu Phe Ala Asp Gly Asn Met Pro Val
275 280 285
Arg Trp Leu Gly Pro Lys Ala Thr Tyr His Gly Asn Ile Asp Lys Pro
290 295 300
Ala Val Thr Cys Thr Pro Asn Pro Gln Arg Asn Asp Ser Val Pro Thr
305 310 315 320
Leu Ala Gln Met Thr Asp Lys Ala Ile Glu Leu Leu Ser Lys Asn Glu
325 330 335
Lys Gly Phe Phe Leu Gln Val Glu Gly Ala Ser Ile Asp Lys Gln Asp
340 345 350
His Ala Ala Asn Pro Cys Gly Gln Ile Gly Glu Thr Val Asp Leu Asp
355 360 365
Glu Ala Val Gln Arg Ala Leu Glu Phe Ala Lys Lys Glu Gly Asn Thr
370 375 380
Leu Val Ile Val Thr Ala Asp His Ala His Ala Ser Gln Ile Val Ala
385 390 395 400
Pro Asp Thr Lys Ala Pro Gly Leu Thr Gln Ala Leu Asn Thr Lys Asp
405 410 415
Gly Ala Val Met Val Met Ser Tyr Gly Asn Ser Glu Glu Asp Ser Gln
420 425 430
Glu His Thr Gly Ser Gln Leu Arg Ile Ala Ala Tyr Gly Pro His Ala
435 440 445
Ala Asn Val Val Gly Leu Thr Asp Gln Thr Asp Leu Phe Tyr Thr Met
450 455 460
Lys Ala Ala Leu Gly Leu Lys
465 470
<210> SEQ ID NO 43
<211> LENGTH: 260
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 43
Leu Lys Ile Ala Ala Phe Asn Ile Arg Thr Phe Gly Glu Thr Lys Met
1 5 10 15
Ser Asn Ala Thr Leu Ala Ser Tyr Ile Val Arg Ile Val Arg Arg Tyr
20 25 30
Asp Ile Val Leu Ile Gln Glu Val Arg Asp Ser His Leu Val Ala Val
35 40 45
Gly Lys Leu Leu Asp Tyr Leu Asn Gln Asp Asp Pro Asn Thr Tyr His
50 55 60
Tyr Val Val Ser Glu Pro Leu Gly Arg Asn Ser Tyr Lys Glu Arg Tyr
65 70 75 80
Leu Phe Leu Phe Arg Pro Asn Lys Val Ser Val Leu Asp Thr Tyr Gln
85 90 95
Tyr Asp Asp Gly Cys Glu Ser Cys Gly Asn Asp Ser Phe Ser Arg Glu
100 105 110
Pro Ala Val Val Lys Phe Ser Ser His Ser Thr Lys Val Lys Glu Phe
115 120 125
Ala Ile Val Ala Leu His Ser Ala Pro Ser Asp Ala Val Ala Glu Ile
130 135 140
Asn Ser Leu Tyr Asp Val Tyr Leu Asp Val Gln Gln Lys Trp His Leu
145 150 155 160
Asn Asp Val Met Leu Met Gly Asp Phe Asn Ala Asp Cys Ser Tyr Val
165 170 175
Thr Ser Ser Gln Trp Ser Ser Ile Arg Leu Arg Thr Ser Ser Thr Phe
180 185 190
Gln Trp Leu Ile Pro Asp Ser Ala Asp Thr Thr Ala Thr Ser Thr Asn
195 200 205
Cys Ala Tyr Asp Arg Ile Val Val Ala Gly Ser Leu Leu Gln Ser Ser
210 215 220
Val Val Pro Gly Ser Ala Ala Pro Phe Asp Phe Gln Ala Ala Tyr Gly
225 230 235 240
Leu Ser Asn Glu Met Ala Leu Ala Ile Ser Asp His Tyr Pro Val Glu
245 250 255
Val Thr Leu Thr
260
<210> SEQ ID NO 44
<211> LENGTH: 686
<212> TYPE: PRT
<213> ORGANISM: Bacillus circulans
<400> SEQUENCE: 44
Ala Pro Asp Thr Ser Val Ser Asn Lys Gln Asn Phe Ser Thr Asp Val
1 5 10 15
Ile Tyr Gln Ile Phe Thr Asp Arg Phe Ser Asp Gly Asn Pro Ala Asn
20 25 30
Asn Pro Thr Gly Ala Ala Phe Asp Gly Thr Cys Thr Asn Leu Arg Leu
35 40 45
Tyr Cys Gly Gly Asp Trp Gln Gly Ile Ile Asn Lys Ile Asn Asp Gly
50 55 60
Tyr Leu Thr Gly Met Gly Val Thr Ala Ile Trp Ile Ser Gln Pro Val
65 70 75 80
Glu Asn Ile Tyr Ser Ile Ile Asn Tyr Ser Gly Val Asn Asn Thr Ala
85 90 95
Tyr His Gly Tyr Trp Ala Arg Asp Phe Lys Lys Thr Asn Pro Ala Tyr
100 105 110
Gly Thr Ile Ala Asp Phe Gln Asn Leu Ile Ala Ala Ala His Ala Lys
115 120 125
Asn Ile Lys Val Ile Ile Asp Phe Ala Pro Asn His Thr Ser Pro Ala
130 135 140
Ser Ser Asp Gln Pro Ser Phe Ala Glu Asn Gly Arg Leu Tyr Asp Asn
145 150 155 160
Gly Thr Leu Leu Gly Gly Tyr Thr Asn Asp Thr Gln Asn Leu Phe His
165 170 175
His Asn Gly Gly Thr Asp Phe Ser Thr Thr Glu Asn Gly Ile Tyr Lys
180 185 190
Asn Leu Tyr Asp Leu Ala Asp Leu Asn His Asn Asn Ser Thr Val Asp
195 200 205
Val Tyr Leu Lys Asp Ala Ile Lys Met Trp Leu Asp Leu Gly Ile Asp
210 215 220
Gly Ile Arg Met Asp Ala Val Lys His Met Pro Phe Gly Trp Gln Lys
225 230 235 240
Ser Phe Met Ala Ala Val Asn Asn Tyr Lys Pro Val Phe Thr Phe Gly
245 250 255
Glu Trp Phe Leu Gly Val Asn Glu Val Ser Pro Glu Asn His Lys Phe
260 265 270
Ala Asn Glu Ser Gly Met Ser Leu Leu Asp Phe Arg Phe Ala Gln Lys
275 280 285
Val Arg Gln Val Phe Arg Asp Asn Thr Asp Asn Met Tyr Gly Leu Lys
290 295 300
Ala Met Leu Glu Gly Ser Ala Ala Asp Tyr Ala Gln Val Asp Asp Gln
305 310 315 320
Val Thr Phe Ile Asp Asn His Asp Met Glu Arg Phe His Ala Ser Asn
325 330 335
Ala Asn Arg Arg Lys Leu Glu Gln Ala Leu Ala Phe Thr Leu Thr Ser
340 345 350
Arg Gly Val Pro Ala Ile Tyr Tyr Gly Thr Glu Gln Tyr Met Ser Gly
355 360 365
Gly Thr Asp Pro Asp Asn Arg Ala Arg Ile Pro Ser Phe Ser Thr Ser
370 375 380
Thr Thr Ala Tyr Gln Val Ile Gln Lys Leu Ala Pro Leu Arg Lys Cys
385 390 395 400
Asn Pro Ala Ile Ala Tyr Gly Ser Thr Gln Glu Arg Trp Ile Asn Asn
405 410 415
Asp Val Leu Ile Tyr Glu Arg Lys Phe Gly Ser Asn Val Ala Val Val
420 425 430
Ala Val Asn Arg Asn Leu Asn Ala Pro Ala Ser Ile Ser Gly Leu Val
435 440 445
Thr Ser Leu Pro Gln Gly Ser Tyr Asn Asp Val Leu Gly Gly Leu Leu
450 455 460
Asn Gly Asn Thr Leu Ser Val Gly Ser Gly Gly Ala Ala Ser Asn Phe
465 470 475 480
Thr Leu Ala Ala Gly Gly Thr Ala Val Trp Gln Tyr Thr Ala Ala Thr
485 490 495
Ala Thr Pro Thr Ile Gly His Val Gly Pro Met Met Ala Lys Pro Gly
500 505 510
Val Thr Ile Thr Ile Asp Gly Arg Gly Phe Gly Ser Ser Lys Gly Thr
515 520 525
Val Tyr Phe Gly Thr Thr Ala Val Ser Gly Ala Asp Ile Thr Ser Trp
530 535 540
Glu Asp Thr Gln Ile Lys Val Lys Ile Pro Ala Val Ala Gly Gly Asn
545 550 555 560
Tyr Asn Ile Lys Val Ala Asn Ala Ala Gly Thr Ala Ser Asn Val Tyr
565 570 575
Asp Asn Phe Glu Val Leu Ser Gly Asp Gln Val Ser Val Arg Phe Val
580 585 590
Val Asn Asn Ala Thr Thr Ala Leu Gly Gln Asn Val Tyr Leu Thr Gly
595 600 605
Ser Val Ser Glu Leu Gly Asn Trp Asp Pro Ala Lys Ala Ile Gly Pro
610 615 620
Met Tyr Asn Gln Val Val Tyr Gln Tyr Pro Asn Trp Tyr Tyr Asp Val
625 630 635 640
Ser Val Pro Ala Gly Lys Thr Ile Glu Phe Lys Phe Leu Lys Lys Gln
645 650 655
Gly Ser Thr Val Thr Trp Glu Gly Gly Ser Asn His Thr Phe Thr Ala
660 665 670
Pro Ser Ser Gly Thr Ala Thr Ile Asn Val Asn Trp Gln Pro
675 680 685
<210> SEQ ID NO 45
<211> LENGTH: 404
<212> TYPE: PRT
<213> ORGANISM: Amycolatopsis orientalis
<400> SEQUENCE: 45
Met Arg Val Leu Ile Thr Gly Cys Gly Ser Arg Gly Asp Thr Glu Pro
1 5 10 15
Leu Val Ala Leu Ala Ala Arg Leu Arg Glu Leu Gly Ala Asp Ala Arg
20 25 30
Met Cys Leu Pro Pro Asp Tyr Val Glu Arg Cys Ala Glu Val Gly Val
35 40 45
Pro Met Val Pro Val Gly Arg Ala Val Arg Ala Gly Ala Arg Glu Pro
50 55 60
Gly Glu Leu Pro Pro Gly Ala Ala Glu Val Val Thr Glu Val Val Ala
65 70 75 80
Glu Trp Phe Asp Lys Val Pro Ala Ala Ile Glu Gly Cys Asp Ala Val
85 90 95
Val Thr Thr Gly Leu Leu Pro Ala Ala Val Ala Val Arg Ser Met Ala
100 105 110
Glu Lys Leu Gly Ile Pro Tyr Arg Tyr Thr Val Leu Ser Pro Asp His
115 120 125
Leu Pro Ser Glu Gln Ser Gln Ala Glu Arg Asp Met Tyr Asn Gln Gly
130 135 140
Ala Asp Arg Leu Phe Gly Asp Ala Val Asn Ser His Arg Ala Ser Ile
145 150 155 160
Gly Leu Pro Pro Val Glu His Leu Tyr Asp Tyr Gly Tyr Thr Asp Gln
165 170 175
Pro Trp Leu Ala Ala Asp Pro Val Leu Ser Pro Leu Arg Pro Thr Asp
180 185 190
Leu Gly Thr Val Gln Thr Gly Ala Trp Ile Leu Pro Asp Glu Arg Pro
195 200 205
Leu Ser Ala Glu Leu Glu Ala Phe Leu Ala Ala Gly Ser Thr Pro Val
210 215 220
Tyr Val Gly Phe Gly Ser Ser Ser Arg Pro Ala Thr Ala Asp Ala Ala
225 230 235 240
Lys Met Ala Ile Lys Ala Val Arg Ala Ser Gly Arg Arg Ile Val Leu
245 250 255
Ser Arg Gly Trp Ala Asp Leu Val Leu Pro Asp Asp Gly Ala Asp Cys
260 265 270
Phe Val Val Gly Glu Val Asn Leu Gln Glu Leu Phe Gly Arg Val Ala
275 280 285
Ala Ala Ile His His Asp Ser Ala Gly Thr Thr Leu Leu Ala Met Arg
290 295 300
Ala Gly Ile Pro Gln Ile Val Val Arg Arg Val Val Asp Asn Val Val
305 310 315 320
Glu Gln Ala Tyr His Ala Asp Arg Val Ala Glu Leu Gly Val Gly Val
325 330 335
Ala Val Asp Gly Pro Val Pro Thr Ile Asp Ser Leu Ser Ala Ala Leu
340 345 350
Asp Thr Ala Leu Ala Pro Glu Ile Arg Ala Arg Ala Thr Thr Val Ala
355 360 365
Asp Thr Ile Arg Ala Asp Gly Thr Thr Val Ala Ala Gln Leu Leu Phe
370 375 380
Asp Ala Val Ser Leu Glu Lys Pro Thr Val Pro Ala Leu Glu His His
385 390 395 400
His His His His
<210> SEQ ID NO 46
<211> LENGTH: 292
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 46
Ser Ile Glu Arg Leu Gly Tyr Leu Gly Phe Ala Val Lys Asp Val Pro
1 5 10 15
Ala Trp Asp His Phe Leu Thr Lys Ser Val Gly Leu Met Ala Ala Gly
20 25 30
Ser Ala Gly Asp Ala Ala Leu Tyr Arg Ala Asp Gln Arg Ala Trp Arg
35 40 45
Ile Ala Val Gln Pro Gly Glu Leu Asp Asp Leu Ala Tyr Ala Gly Leu
50 55 60
Glu Val Asp Asp Ala Ala Ala Leu Glu Arg Met Ala Asp Lys Leu Arg
65 70 75 80
Gln Ala Gly Val Ala Phe Thr Arg Gly Asp Glu Ala Leu Met Gln Gln
85 90 95
Arg Lys Val Met Gly Leu Leu Cys Leu Gln Asp Pro Phe Gly Leu Pro
100 105 110
Leu Glu Ile Tyr Tyr Gly Pro Ala Glu Ile Phe His Glu Pro Phe Leu
115 120 125
Pro Ser Ala Pro Val Ser Gly Phe Val Thr Gly Asp Gln Gly Ile Gly
130 135 140
His Phe Val Arg Cys Val Pro Asp Thr Ala Lys Ala Met Ala Phe Tyr
145 150 155 160
Thr Glu Val Leu Gly Phe Val Leu Ser Asp Ile Ile Asp Ile Gln Met
165 170 175
Gly Pro Glu Thr Ser Val Pro Ala His Phe Leu His Cys Asn Gly Arg
180 185 190
His His Thr Ile Ala Leu Ala Ala Phe Pro Ile Pro Lys Arg Ile His
195 200 205
His Phe Met Leu Gln Ala Asn Thr Ile Asp Asp Val Gly Tyr Ala Phe
210 215 220
Asp Arg Leu Asp Ala Ala Gly Arg Ile Thr Ser Leu Leu Gly Arg His
225 230 235 240
Thr Asn Asp Gln Thr Leu Ser Phe Tyr Ala Asp Thr Pro Ser Pro Met
245 250 255
Ile Glu Val Glu Phe Gly Trp Gly Pro Arg Thr Val Asp Ser Ser Trp
260 265 270
Thr Val Ala Arg His Ser Arg Thr Ala Met Trp Gly His Lys Ser Val
275 280 285
Arg Gly Gln Arg
290
<210> SEQ ID NO 47
<211> LENGTH: 311
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 47
Met Glu Val Lys Ile Phe Asn Thr Gln Asp Val Gln Asp Phe Leu Arg
1 5 10 15
Val Ala Ser Gly Leu Glu Gln Glu Gly Gly Asn Pro Arg Val Lys Gln
20 25 30
Ile Ile His Arg Val Leu Ser Asp Leu Tyr Lys Ala Ile Glu Asp Leu
35 40 45
Asn Ile Thr Ser Asp Glu Tyr Trp Ala Gly Val Ala Tyr Leu Asn Gln
50 55 60
Leu Gly Ala Asn Gln Glu Ala Gly Leu Leu Ser Pro Gly Leu Gly Phe
65 70 75 80
Asp His Tyr Leu Asp Met Arg Met Asp Ala Glu Asp Ala Ala Leu Gly
85 90 95
Ile Glu Asn Ala Thr Pro Arg Thr Ile Glu Gly Pro Leu Tyr Val Ala
100 105 110
Gly Ala Pro Glu Ser Val Gly Tyr Ala Arg Met Asp Asp Gly Ser Asp
115 120 125
Pro Asn Gly His Thr Leu Ile Leu His Gly Thr Ile Phe Asp Ala Asp
130 135 140
Gly Lys Pro Leu Pro Asn Ala Lys Val Glu Ile Trp His Ala Asn Thr
145 150 155 160
Lys Gly Phe Tyr Ser His Phe Asp Pro Thr Gly Glu Gln Gln Ala Phe
165 170 175
Asn Met Arg Arg Ser Ile Ile Thr Asp Glu Asn Gly Gln Tyr Arg Val
180 185 190
Arg Thr Ile Leu Pro Ala Gly Tyr Gly Cys Pro Pro Glu Gly Pro Thr
195 200 205
Gln Gln Leu Leu Asn Gln Leu Gly Arg His Gly Asn Arg Pro Ala His
210 215 220
Ile His Tyr Phe Val Ser Ala Asp Gly His Arg Lys Leu Thr Thr Gln
225 230 235 240
Ile Asn Val Ala Gly Asp Pro Tyr Thr Tyr Asp Asp Phe Ala Tyr Ala
245 250 255
Thr Arg Glu Gly Leu Val Val Asp Ala Val Glu His Thr Asp Pro Glu
260 265 270
Ala Ile Lys Ala Asn Asp Val Glu Gly Pro Phe Ala Glu Met Val Phe
275 280 285
Asp Leu Lys Leu Thr Arg Leu Val Asp Gly Val Asp Asn Gln Val Val
290 295 300
Asp Arg Pro Arg Leu Ala Val
305 310
<210> SEQ ID NO 48
<211> LENGTH: 414
<212> TYPE: PRT
<213> ORGANISM: Pseudomonas putida
<400> SEQUENCE: 48
Thr Thr Glu Thr Ile Gln Ser Asn Ala Asn Leu Ala Pro Leu Pro Pro
1 5 10 15
His Val Pro Glu His Leu Val Phe Asp Phe Asp Met Tyr Asn Pro Ser
20 25 30
Asn Leu Ser Ala Gly Val Gln Glu Ala Trp Ala Val Leu Gln Glu Ser
35 40 45
Asn Val Pro Asp Leu Val Trp Thr Arg Cys Asn Gly Gly His Trp Ile
50 55 60
Ala Thr Arg Gly Gln Leu Ile Arg Glu Ala Tyr Glu Asp Tyr Arg His
65 70 75 80
Phe Ser Ser Glu Cys Pro Phe Ile Pro Arg Glu Ala Gly Glu Ala Tyr
85 90 95
Asp Phe Ile Pro Thr Ser Met Asp Pro Pro Glu Gln Arg Gln Phe Arg
100 105 110
Ala Leu Ala Asn Gln Val Val Gly Met Pro Val Val Asp Lys Leu Glu
115 120 125
Asn Arg Ile Gln Glu Leu Ala Cys Ser Leu Ile Glu Ser Leu Arg Pro
130 135 140
Gln Gly Gln Cys Asn Phe Thr Glu Asp Tyr Ala Glu Pro Phe Pro Ile
145 150 155 160
Arg Ile Phe Met Leu Leu Ala Gly Leu Pro Glu Glu Asp Ile Pro His
165 170 175
Leu Lys Tyr Leu Thr Asp Gln Met Thr Arg Pro Asp Gly Ser Met Thr
180 185 190
Phe Ala Glu Ala Lys Glu Ala Leu Tyr Asp Tyr Leu Ile Pro Ile Ile
195 200 205
Glu Gln Arg Arg Gln Lys Pro Gly Thr Asp Ala Ile Ser Ile Val Ala
210 215 220
Asn Gly Gln Val Asn Gly Arg Pro Ile Thr Ser Asp Glu Ala Lys Arg
225 230 235 240
Met Cys Gly Leu Leu Leu Val Gly Gly Leu Asp Thr Val Val Asn Phe
245 250 255
Leu Ser Phe Ser Met Glu Phe Leu Ala Lys Ser Pro Glu His Arg Gln
260 265 270
Glu Leu Ile Gln Arg Pro Glu Arg Ile Pro Ala Ala Cys Glu Glu Leu
275 280 285
Leu Arg Arg Phe Ser Leu Val Ala Asp Gly Arg Ile Leu Thr Ser Asp
290 295 300
Tyr Glu Phe His Gly Val Gln Leu Lys Lys Gly Asp Gln Ile Leu Leu
305 310 315 320
Pro Gln Met Leu Ser Gly Leu Asp Glu Arg Glu Asn Ala Cys Pro Met
325 330 335
His Val Asp Phe Ser Arg Gln Lys Val Ser His Thr Thr Phe Gly His
340 345 350
Gly Ser His Leu Cys Leu Gly Gln His Leu Ala Arg Arg Glu Ile Ile
355 360 365
Val Thr Leu Lys Glu Trp Leu Thr Arg Ile Pro Asp Phe Ser Ile Ala
370 375 380
Pro Gly Ala Gln Ile Gln His Lys Ser Gly Ile Val Ser Gly Val Gln
385 390 395 400
Ala Leu Pro Leu Val Trp Asp Pro Ala Thr Thr Lys Ala Val
405 410
<210> SEQ ID NO 49
<211> LENGTH: 374
<212> TYPE: PRT
<213> ORGANISM: Equus caballus
<400> SEQUENCE: 49
Ser Thr Ala Gly Lys Val Ile Lys Cys Lys Ala Ala Val Leu Trp Glu
1 5 10 15
Glu Lys Lys Pro Phe Ser Ile Glu Glu Val Glu Val Ala Pro Pro Lys
20 25 30
Ala His Glu Val Arg Ile Lys Met Val Ala Thr Gly Ile Cys Arg Ser
35 40 45
Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val Ile
50 55 60
Ala Gly His Glu Ala Ala Gly Ile Val Glu Ser Ile Gly Glu Gly Val
65 70 75 80
Thr Thr Val Arg Pro Gly Asp Lys Val Ile Pro Leu Phe Thr Pro Gln
85 90 95
Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu
100 105 110
Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gln Asp Gly Thr Ser
115 120 125
Arg Phe Thr Cys Arg Gly Lys Pro Ile His His Phe Leu Gly Thr Ser
130 135 140
Thr Phe Ser Gln Tyr Thr Val Val Asp Glu Ile Ser Val Ala Lys Ile
145 150 155 160
Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu Ile Gly Cys Gly Phe
165 170 175
Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gln Gly
180 185 190
Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val Ile
195 200 205
Met Gly Cys Lys Ala Ala Gly Ala Ala Arg Ile Ile Gly Val Asp Ile
210 215 220
Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys
225 230 235 240
Val Asn Pro Gln Asp Tyr Lys Lys Pro Ile Gln Glu Val Leu Thr Glu
245 250 255
Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val Ile Gly Arg Leu
260 265 270
Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gln Glu Ala Tyr Gly Val
275 280 285
Ser Val Ile Val Gly Val Pro Pro Asp Ser Gln Asn Leu Ser Met Asn
290 295 300
Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala Ile Phe Gly
305 310 315 320
Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met
325 330 335
Ala Lys Lys Phe Ala Leu Asp Pro Leu Ile Thr His Val Leu Pro Phe
340 345 350
Glu Lys Ile Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser Ile
355 360 365
Arg Thr Ile Leu Thr Phe
370
<210> SEQ ID NO 50
<211> LENGTH: 297
<212> TYPE: PRT
<213> ORGANISM: Escherichia coli
<400> SEQUENCE: 50
Met Ala Thr Asn Leu Arg Gly Val Met Ala Ala Leu Leu Thr Pro Phe
1 5 10 15
Asp Gln Gln Gln Ala Leu Asp Lys Ala Ser Leu Arg Arg Leu Val Gln
20 25 30
Phe Asn Ile Gln Gln Gly Ile Asp Gly Leu Tyr Val Gly Gly Ser Thr
35 40 45
Gly Glu Ala Phe Val Gln Ser Leu Ser Glu Arg Glu Gln Val Leu Glu
50 55 60
Ile Val Ala Glu Glu Gly Lys Gly Lys Ile Lys Leu Ile Ala His Val
65 70 75 80
Gly Cys Val Thr Thr Ala Glu Ser Gln Gln Leu Ala Ala Ser Ala Lys
85 90 95
Arg Tyr Gly Phe Asp Ala Val Ser Ala Val Thr Pro Phe Tyr Tyr Pro
100 105 110
Phe Ser Phe Glu Glu His Cys Asp His Tyr Arg Ala Ile Ile Asp Ser
115 120 125
Ala Asp Gly Leu Pro Met Val Val Tyr Asn Ile Pro Ala Leu Ser Gly
130 135 140
Val Lys Leu Thr Leu Asp Gln Ile Asn Thr Leu Val Thr Leu Pro Gly
145 150 155 160
Val Gly Ala Leu Lys Gln Thr Ser Gly Asp Leu Tyr Gln Met Glu Gln
165 170 175
Ile Arg Arg Glu His Pro Asp Leu Val Leu Tyr Asn Gly Tyr Asp Glu
180 185 190
Ile Phe Ala Ser Gly Leu Leu Ala Gly Ala Asp Gly Gly Ile Gly Ser
195 200 205
Thr Tyr Asn Ile Met Gly Trp Arg Tyr Gln Gly Ile Val Lys Ala Leu
210 215 220
Lys Glu Gly Asp Ile Gln Thr Ala Gln Lys Leu Gln Thr Glu Cys Asn
225 230 235 240
Lys Val Ile Asp Leu Leu Ile Lys Thr Gly Val Phe Arg Gly Leu Lys
245 250 255
Thr Val Leu His Tyr Met Asp Val Val Ser Val Pro Leu Cys Arg Lys
260 265 270
Pro Phe Gly Pro Val Asp Glu Lys Tyr Leu Pro Glu Leu Lys Ala Leu
275 280 285
Ala Gln Gln Leu Met Gln Glu Arg Gly
290 295
<210> SEQ ID NO 51
<211> LENGTH: 268
<212> TYPE: PRT
<213> ORGANISM: Salmonella typhimurium
<400> SEQUENCE: 51
Met Glu Arg Tyr Glu Asn Leu Phe Ala Gln Leu Asn Asp Arg Arg Glu
1 5 10 15
Gly Ala Phe Val Pro Phe Val Thr Leu Gly Asp Pro Gly Ile Glu Gln
20 25 30
Ser Leu Lys Ile Ile Asp Thr Leu Ile Asp Ala Gly Ala Asp Ala Leu
35 40 45
Glu Leu Gly Val Pro Phe Ser Asp Pro Leu Ala Asp Gly Pro Thr Ile
50 55 60
Gln Asn Ala Asn Leu Arg Ala Phe Ala Ala Gly Val Thr Pro Ala Gln
65 70 75 80
Cys Phe Glu Met Leu Ala Leu Ile Arg Glu Lys His Pro Thr Ile Pro
85 90 95
Ile Gly Leu Leu Met Tyr Ala Asn Leu Val Phe Asn Asn Gly Ile Asp
100 105 110
Ala Phe Tyr Ala Arg Cys Glu Gln Val Gly Val Asp Ser Val Leu Val
115 120 125
Ala Asp Val Pro Val Glu Glu Ser Ala Pro Phe Arg Gln Ala Ala Leu
130 135 140
Arg His Asn Ile Ala Pro Ile Phe Ile Cys Pro Pro Asn Ala Asp Asp
145 150 155 160
Asp Leu Leu Arg Gln Val Ala Ser Tyr Gly Arg Gly Tyr Thr Tyr Leu
165 170 175
Leu Ser Arg Ser Gly Val Thr Gly Ala Glu Asn Arg Gly Ala Leu Pro
180 185 190
Leu His His Leu Ile Glu Lys Leu Lys Glu Tyr His Ala Ala Pro Ala
195 200 205
Leu Gln Gly Phe Gly Ile Ser Ser Pro Glu Gln Val Ser Ala Ala Val
210 215 220
Arg Ala Gly Ala Ala Gly Ala Ile Ser Gly Ser Ala Ile Val Lys Ile
225 230 235 240
Ile Glu Lys Asn Leu Ala Ser Pro Lys Gln Met Leu Ala Glu Leu Arg
245 250 255
Ser Phe Val Ser Ala Met Lys Ala Ala Ser Arg Ala
260 265
<210> SEQ ID NO 52
<211> LENGTH: 393
<212> TYPE: PRT
<213> ORGANISM: Actinoplanes missouriensis
<400> SEQUENCE: 52
Ser Val Gln Ala Thr Arg Glu Asp Lys Phe Ser Phe Gly Leu Trp Thr
1 5 10 15
Val Gly Trp Gln Ala Arg Asp Ala Phe Gly Asp Ala Thr Arg Thr Ala
20 25 30
Leu Asp Pro Val Glu Ala Val His Lys Leu Ala Glu Ile Gly Ala Tyr
35 40 45
Gly Ile Thr Phe His Asp Asp Asp Leu Val Pro Phe Gly Ser Asp Ala
50 55 60
Gln Thr Arg Asp Gly Ile Ile Ala Gly Phe Lys Lys Ala Leu Asp Glu
65 70 75 80
Thr Gly Leu Ile Val Pro Met Val Thr Thr Asn Leu Phe Thr His Pro
85 90 95
Val Phe Lys Asp Gly Gly Phe Thr Ser Asn Asp Arg Ser Val Arg Arg
100 105 110
Tyr Ala Ile Arg Lys Val Leu Arg Gln Met Asp Leu Gly Ala Glu Leu
115 120 125
Gly Ala Lys Thr Leu Val Leu Trp Gly Gly Arg Glu Gly Ala Glu Tyr
130 135 140
Asp Ser Ala Lys Asp Val Ser Ala Ala Leu Asp Arg Tyr Arg Glu Ala
145 150 155 160
Leu Asn Leu Leu Ala Gln Tyr Ser Glu Asp Arg Gly Tyr Gly Leu Arg
165 170 175
Phe Ala Ile Glu Pro Lys Pro Asn Glu Pro Arg Gly Asp Ile Leu Leu
180 185 190
Pro Thr Ala Gly His Ala Ile Ala Phe Val Gln Glu Leu Glu Arg Pro
195 200 205
Glu Leu Phe Gly Ile Asn Pro Glu Thr Gly Asn Glu Gln Met Ser Asn
210 215 220
Leu Asn Phe Thr Gln Gly Ile Ala Gln Ala Leu Trp His Lys Lys Leu
225 230 235 240
Phe His Ile Asp Leu Asn Gly Gln His Gly Pro Lys Phe Asp Gln Asp
245 250 255
Leu Val Phe Gly His Gly Asp Leu Leu Asn Ala Phe Ser Leu Val Asp
260 265 270
Leu Leu Glu Asn Gly Pro Asp Gly Ala Pro Ala Tyr Asp Gly Pro Arg
275 280 285
His Phe Asp Tyr Lys Pro Ser Arg Thr Glu Asp Tyr Asp Gly Val Trp
290 295 300
Glu Ser Ala Lys Ala Asn Ile Arg Met Tyr Leu Leu Leu Lys Glu Arg
305 310 315 320
Ala Lys Ala Phe Arg Ala Asp Pro Glu Val Gln Glu Ala Leu Ala Ala
325 330 335
Ser Lys Val Ala Glu Leu Lys Thr Pro Thr Leu Asn Pro Gly Glu Gly
340 345 350
Tyr Ala Glu Leu Leu Ala Asp Arg Ser Ala Phe Glu Asp Tyr Asp Ala
355 360 365
Asp Ala Val Gly Ala Lys Gly Phe Gly Phe Val Lys Leu Asn Gln Leu
370 375 380
Ala Ile Glu His Leu Leu Gly Ala Arg
385 390
<210> SEQ ID NO 53
<211> LENGTH: 348
<212> TYPE: PRT
<213> ORGANISM: Bacteriophage T7
<400> SEQUENCE: 53
Val Asn Ile Lys Thr Asn Pro Phe Lys Ala Val Ser Phe Val Glu Ser
1 5 10 15
Ala Ile Lys Lys Ala Leu Asp Asn Ala Gly Tyr Leu Ile Ala Glu Ile
20 25 30
Lys Tyr Asp Gly Val Arg Gly Asn Ile Cys Val Asp Asn Thr Ala Asn
35 40 45
Ser Tyr Trp Leu Ser Arg Val Ser Lys Thr Ile Pro Ala Leu Glu His
50 55 60
Leu Asn Gly Phe Asp Val Arg Trp Lys Arg Leu Leu Asn Asp Asp Arg
65 70 75 80
Cys Phe Tyr Lys Asp Gly Phe Met Leu Asp Gly Glu Leu Met Val Lys
85 90 95
Gly Val Asp Phe Asn Thr Gly Ser Gly Leu Leu Arg Thr Lys Trp Thr
100 105 110
Asp Thr Lys Asn Gln Glu Phe His Glu Glu Leu Phe Val Glu Pro Ile
115 120 125
Arg Lys Lys Asp Lys Val Pro Phe Lys Leu His Thr Gly His Leu His
130 135 140
Ile Lys Leu Tyr Ala Ile Leu Pro Leu His Ile Val Glu Ser Gly Glu
145 150 155 160
Asp Cys Asp Val Met Thr Leu Leu Met Gln Glu His Val Lys Asn Met
165 170 175
Leu Pro Leu Leu Gln Glu Tyr Phe Pro Glu Ile Glu Trp Gln Ala Ala
180 185 190
Glu Ser Tyr Glu Val Tyr Asp Met Val Glu Leu Gln Gln Leu Tyr Glu
195 200 205
Gln Lys Arg Ala Glu Gly His Glu Gly Leu Ile Val Lys Asp Pro Met
210 215 220
Cys Ile Tyr Lys Arg Gly Lys Lys Ser Gly Trp Trp Lys Met Lys Pro
225 230 235 240
Glu Asn Glu Ala Asp Gly Ile Ile Gln Gly Leu Val Trp Gly Thr Lys
245 250 255
Gly Leu Ala Asn Glu Gly Lys Val Ile Gly Phe Glu Val Leu Leu Glu
260 265 270
Ser Gly Arg Leu Val Asn Ala Thr Asn Ile Ser Arg Ala Leu Met Asp
275 280 285
Glu Phe Thr Glu Thr Val Lys Glu Ala Thr Leu Ser Gln Trp Gly Phe
290 295 300
Phe Ser Pro Tyr Gly Ile Gly Asp Asn Asp Ala Cys Thr Ile Asn Pro
305 310 315 320
Tyr Asp Gly Trp Ala Cys Gln Ile Ser Tyr Met Glu Glu Thr Pro Asp
325 330 335
Gly Ser Leu Arg His Pro Ser Phe Val Met Phe Arg
340 345
<210> SEQ ID NO 54
<211> LENGTH: 42
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: CDS
<222> LOCATION: (2)..(40)
<400> SEQUENCE: 54
g gtg gta tca gca ggc cac tgc tac aag tcc cgc atc cag gt 42
Val Val Ser Ala Gly His Cys Tyr Lys Ser Arg Ile Gln
1 5 10
<210> SEQ ID NO 55
<211> LENGTH: 13
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 55
Val Val Ser Ala Gly His Cys Tyr Lys Ser Arg Ile Gln
1 5 10
<210> SEQ ID NO 56
<211> LENGTH: 42
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 56
ggtggtatcc gcgggccact gctacaagtc ccggatccag gt 42
<210> SEQ ID NO 57
<211> LENGTH: 42
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 57
acctggatcc gggacttgta gcagtggccc gcggatacca cc 42
<210> SEQ ID NO 58
<211> LENGTH: 50
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: CDS
<222> LOCATION: (3)..(50)
<400> SEQUENCE: 58
cc act ggc acg aag tgc ctc atc tct ggc tgg ggc aac act gcg agc 47
Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly Asn Thr Ala Ser
1 5 10 15
tct 50
Ser
<210> SEQ ID NO 59
<211> LENGTH: 16
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 59
Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly Asn Thr Ala Ser Ser
1 5 10 15
<210> SEQ ID NO 60
<211> LENGTH: 50
<212> TYPE: DNA
<213> ORGANISM: Description of Artificial Sequence: note = Synthetic
Construct
<220> FEATURE:
<223> OTHER INFORMATION: forward primer restr3
<400> SEQUENCE: 60
ccactggcac gaagtgcctc atctctggct ggggcaacac tgcgagctct 50
<210> SEQ ID NO 61
<211> LENGTH: 50
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 61
agagctagca gtgttgcccc agccagagat gaggcacttg gtaccagtgg 50
<210> SEQ ID NO 62
<211> LENGTH: 30
<212> TYPE: DNA
<213> ORGANISM: Description of Artificial Sequence: note = Synthetic
Construct
<220> FEATURE:
<223> OTHER INFORMATION: primer puc-forward
<400> SEQUENCE: 62
ggggtacccc accaccatga atccactcct 30
<210> SEQ ID NO 63
<211> LENGTH: 30
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 63
cgggatccgg tatagagact gaagagatac 30
<210> SEQ ID NO 64
<211> LENGTH: 39
<212> TYPE: DNA
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 1 - 39
<223> OTHER INFORMATION: any nucleotide
<220> FEATURE:
<221> NAME/KEY: CDS
<222> LOCATION: (2)..(37)
<400> SEQUENCE: 64
g ggc cac tgc tac nnn nnn nnn nnn nnn nnn aag tcc cg 39
Gly His Cys Tyr Xaa Xaa Xaa Xaa Xaa Xaa Lys Ser
1 5 10
<210> SEQ ID NO 65
<211> LENGTH: 12
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<221> NAME/KEY: Variant
<222> LOCATION: 5-10
<223> OTHER INFORMATION: Xaa = any amino acid
<400> SEQUENCE: 65
Gly His Cys Tyr Xaa Xaa Xaa Xaa Xaa Xaa Lys Ser
1 5 10
<210> SEQ ID NO 66
<211> LENGTH: 45
<212> TYPE: DNA
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 1 - 45
<223> OTHER INFORMATION: N=A, C, G, T
<400> SEQUENCE: 66
cgcccggtga cgatgnnnnn nnnnnnnnnn nnnttcaggg cctag 45
<210> SEQ ID NO 67
<211> LENGTH: 47
<212> TYPE: DNA
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: CDS
<222> LOCATION: 2 - 46
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 1 - 47
<223> OTHER INFORMATION: any nucleotide
<400> SEQUENCE: 67
c aag tgc ctc atc tct ggc tgg ggc aac nnn nnn nnn nnn nnn act g 47
Lys Cys Leu Ile Ser Gly Trp Gly Asn Xaa Xaa Xaa Xaa Xaa Thr
1 5 10 15
<210> SEQ ID NO 68
<211> LENGTH: 15
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: Variant
<222> LOCATION: 1 - 15
<223> OTHER INFORMATION: Xaa = any amino acid
<400> SEQUENCE: 68
Lys Cys Leu Ile Ser Gly Trp Gly Asn Xaa Xaa Xaa Xaa Xaa Thr
1 5 10 15
<210> SEQ ID NO 69
<211> LENGTH: 55
<212> TYPE: DNA
<213> ORGANISM: Artificial Sequence = Synthetic Construct
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence: note =
Synthetic Construct
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 1 - 55
<223> OTHER INFORMATION: N=A, C, G, T
<400> SEQUENCE: 69
catggttcac ggagtagaga ccgaccccgt tgnnnnnnnn nnnnnnntga cgatc 55
<210> SEQ ID NO 70
<211> LENGTH: 59
<212> TYPE: DNA
<213> ORGANISM: Description of Artificial Sequence: note = Synthetic
Construct
<220> FEATURE:
<223> OTHER INFORMATION: primer SDR1-mutnnb-forward
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 1 - 59
<223> OTHER INFORMATION: N=A, C, G, T; B=C, G, T
<400> SEQUENCE: 70
tggtatccgc gggccactgc tacnnbnnbn nbnnbnnbnn baagtcccgg atccaggtg 59
<210> SEQ ID NO 71
<211> LENGTH: 52
<212> TYPE: DNA
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 1 - 52
<223> OTHER INFORMATION: N=A, C, G, T; V=A, C, G
<400> SEQUENCE: 71
ggcgccagag ctagcagtvn nvnnvnnvnn vnngttgccc cagccagaga tg 52
<210> SEQ ID NO 72
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 72
Ala Phe Phe Asn Gly Asp
1 5
<210> SEQ ID NO 73
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 73
Arg Lys Asp Pro Trp
1 5
<210> SEQ ID NO 74
<211> LENGTH: 234
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 74
Ile Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gln Val
1 5 10 15
Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu Ile Asn Glu
20 25 30
Gln Trp Val Val Ser Ala Gly His Cys Tyr Ala Ala Phe Asn Gly Lys
35 40 45
Ser Arg Ile Gln Val Arg Leu Gly Glu His Asn Ile Glu Val Leu Glu
50 55 60
Gly Asn Glu Gln Phe Ile Asn Ala Ala Lys Ile Ile Arg His Pro Gln
65 70 75 80
Tyr Asp Arg Lys Thr Leu Asn Asn Asp Ile Met Leu Ile Lys Leu Ser
85 90 95
Ser Arg Ala Val Ile Asn Ala Arg Val Ser Thr Ile Ser Leu Pro Thr
100 105 110
Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly Asn
115 120 125
Arg Lys Asp Phe Trp Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp Glu
130 135 140
Leu Gln Cys Leu Asp Ala Pro Val Leu Ser Gln Ala Lys Cys Glu Ala
145 150 155 160
Ser Tyr Pro Gly Lys Ile Thr Ser Asn Met Phe Cys Val Gly Phe Leu
165 170 175
Glu Gly Gly Lys Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Val Val
180 185 190
Cys Asn Gly Gln Leu Gln Gly Val Val Ser Trp Gly Asp Gly Cys Ala
195 200 205
Gln Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val Lys
210 215 220
Trp Ile Lys Asn Thr Ile Ala Ala Asn Ser
225 230
<210> SEQ ID NO 75
<211> LENGTH: 234
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 75
Ile Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr Gln Val
1 5 10 15
Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu Ile Asn Glu
20 25 30
Gln Trp Val Val Ser Ala Gly His Cys Tyr Ala Ala Phe Asn Gly Lys
35 40 45
Ser Arg Ile Gln Val Arg Leu Gly Glu His Asn Ile Gly Val Leu Glu
50 55 60
Gly Asn Glu Gln Phe Ile Asn Ala Ala Lys Ile Ile Arg His Pro Gln
65 70 75 80
Tyr Asp Trp Lys Thr Leu Asn Asn Asp Ile Met Leu Ile Lys Leu Ser
85 90 95
Ser Arg Ala Val Ile Asn Ala Arg Val Ser Thr Ile Ser Leu Pro Thr
100 105 110
Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu Ile Ser Gly Trp Gly Asn
115 120 125
Arg Lys Asp Phe Trp Thr Ala Ser Ser Gly Ala Asp Phe Pro Asp Glu
130 135 140
Leu Gln Cys Leu Asp Ala Pro Val Leu Ser Gln Thr Lys Cys Glu Ala
145 150 155 160
Ser Tyr Pro Gly Lys Ile Thr Ser Asn Met Phe Cys Val Gly Phe Leu
165 170 175
Glu Gly Gly Lys Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Val Val
180 185 190
Arg Asn Gly Gln Leu Gln Gly Val Val Ser Trp Gly Asp Gly Cys Ala
195 200 205
Gln Lys Asn Lys Pro Gly Val Tyr Thr Lys Val Tyr Asn Tyr Val Lys
210 215 220
Trp Ile Lys Asn Thr Ile Ala Ala Asn Ser
225 230
<210> SEQ ID NO 76
<211> LENGTH: 12
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 76
Leu Leu Trp Leu Gly Arg Val Val Gly Gly Pro Val
1 5 10
<210> SEQ ID NO 77
<211> LENGTH: 12
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 77
Lys Lys Trp Leu Gly Arg Val Pro Gly Gly Pro Val
1 5 10
<210> SEQ ID NO 78
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 78
Asp Ala Val Gly Arg Asp
1 5
<210> SEQ ID NO 79
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence = Synthetic Construct
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence: note =
Synthetic Construct
<400> SEQUENCE: 79
Asn Gly Arg Asp Leu Glu
1 5
<210> SEQ ID NO 80
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 80
Gly Phe Val Met Phe Asn
1 5
<210> SEQ ID NO 81
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 81
Arg Val His Pro Ser
1 5
<210> SEQ ID NO 82
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 82
Val Arg Gly Thr Trp
1 5
<210> SEQ ID NO 83
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 83
Arg Ser Pro Leu Thr
1 5
<210> SEQ ID NO 84
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 84
Arg Pro Trp Asp Pro Ser
1 5
<210> SEQ ID NO 85
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 85
Gly Phe Val Met Phe Asn
1 5
<210> SEQ ID NO 86
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 86
Glu Ile Ala Asn Arg Glu
1 5
<210> SEQ ID NO 87
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 87
Lys Ala Val Val Gly Thr
1 5
<210> SEQ ID NO 88
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 88
Val Asn Ile Met Ala Ala
1 5
<210> SEQ ID NO 89
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 89
Ala Ala Phe Asn Gly Asp
1 5
<210> SEQ ID NO 90
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 90
Val His Pro Thr Ser
1 5
<210> SEQ ID NO 91
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 91
Arg Ser Pro Leu Thr
1 5
<210> SEQ ID NO 92
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 92
Arg Gly Ala Arg Thr
1 5
<210> SEQ ID NO 93
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 93
Arg Thr Pro Ile Ser
1 5
<210> SEQ ID NO 94
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 94
Thr Thr Ala Arg Lys
1 5
<210> SEQ ID NO 95
<211> LENGTH: 5
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 95
Arg Lys Asp Phe Trp
1 5
<210> SEQ ID NO 96
<211> LENGTH: 157
<212> TYPE: PRT
<213> ORGANISM: Artifical Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<400> SEQUENCE: 96
Val Arg Ser Ser Ser Arg Thr Pro Ser Asp Lys Pro Val Ala His Val
1 5 10 15
Val Ala Asn Pro Gln Ala Glu Gly Gln Leu Gln Trp Leu Asn Arg Arg
20 25 30
Ala Asn Ala Leu Leu Ala Asn Gly Val Glu Leu Arg Asp Asn Gln Leu
35 40 45
Val Val Pro Ser Glu Gly Leu Tyr Leu Ile Tyr Ser Gln Val Leu Phe
50 55 60
Lys Gly Gln Gly Cys Pro Ser Thr His Val Leu Leu Thr His Thr Ile
65 70 75 80
Ser Arg Ile Ala Val Ser Tyr Gln Thr Lys Val Asn Leu Leu Ser Ala
85 90 95
Ile Lys Ser Pro Cys Gln Arg Glu Thr Pro Glu Gly Ala Glu Ala Lys
100 105 110
Pro Trp Tyr Glu Pro Ile Tyr Leu Gly Gly Val Phe Gln Leu Glu Lys
115 120 125
Gly Asp Arg Leu Ser Ala Glu Ile Asn Arg Pro Asp Tyr Leu Leu Phe
130 135 140
Ala Glu Ser Gly Gln Val Tyr Phe Gly Ile Ile Ala Leu
145 150 155
<210> SEQ ID NO 97
<211> LENGTH: 6
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence = Synthetic Construct
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
Synthetic
Construct
<220> FEATURE:
<221> NAME/KEY: VARIANT
<222> LOCATION: 1
<223> OTHER INFORMATION: Xaa can be Leu, Ile, Val of Phe
<220> FEATURE:
<221> NAME/KEY: VARIANT
<222> LOCATION: 2
<223> OTHER INFORMATION: Xaa can be any amino acid
<400> SEQUENCE: 97
Xaa Xaa Pro Arg Asn Ala
1 5
<210> SEQ ID NO 98
<211> LENGTH: 8
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 98
Cys Pro Gly Arg Val Val Gly Gly
1 5
<210> SEQ ID NO 99
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 99
Asp Asp Asp Lys
1
<210> SEQ ID NO 100
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence; note =
synthetic construct
<400> SEQUENCE: 100
Ala Gly Gly Gly
1
<210> SEQ ID NO 101
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 101
Gly Val Gly Gly
1
<210> SEQ ID NO 102
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 102
Gly Gly Leu Gly
1
<210> SEQ ID NO 103
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 103
Gly Gly Gly Ile
1
<210> SEQ ID NO 104
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 104
Ala Gly Gly Gly
1
<210> SEQ ID NO 105
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 105
Val Gly Gly Gly
1
<210> SEQ ID NO 106
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence; note =
synthetic construct
<400> SEQUENCE: 106
Leu Gly Gly Gly
1
<210> SEQ ID NO 107
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 107
Ile Gly Gly Gly
1
<210> SEQ ID NO 108
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 108
Ala Arg Leu Thr
1
<210> SEQ ID NO 109
<211> LENGTH: 12
<212> TYPE: DNA
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 109
gcgcgcctta cc 12
<210> SEQ ID NO 110
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 110
Ala Arg Leu Thr
1
<210> SEQ ID NO 111
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<400> SEQUENCE: 111
Val Pro Gly Ser
1
<210> SEQ ID NO 112
<211> LENGTH: 12
<212> TYPE: DNA
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 2
<223> OTHER INFORMATION: n= c or t
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 5
<223> OTHER INFORMATION: n = g or c
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 7-8
<223> OTHER INFORMATION: n = g or t
<220> FEATURE:
<221> NAME/KEY: misc_feature
<222> LOCATION: 10
<223> OTHER INFORMATION: n = a or t
<400> SEQUENCE: 112
gngcncnngn cc 12
<210> SEQ ID NO 113
<211> LENGTH: 4
<212> TYPE: PRT
<213> ORGANISM: Artificial Sequence
<220> FEATURE:
<223> OTHER INFORMATION: Description of Artificial Sequence =
synthetic
construct
<220> FEATURE:
<221> NAME/KEY: VARIANT
<222> LOCATION: 1
<223> OTHER INFORMATION: Xaa can be Ala or Val
<220> FEATURE:
<221> NAME/KEY: VARIANT
<222> LOCATION: 2
<223> OTHER INFORMATION: Xaa can be Arg or Pro
<220> FEATURE:
<221> NAME/KEY: VARIANT
<222> LOCATION: 3
<223> OTHER INFORMATION: Xaa can be Leu, Gly, Val, or Trp
<220> FEATURE:
<221> NAME/KEY: VARIANT
<222> LOCATION: 4
<223> OTHER INFORMATION: Xaa can be Thr or Ser
<400> SEQUENCE: 113
Xaa Xaa Xaa Xaa
1
User Contributions:
comments("1"); ?> comment_form("1"); ?>Inventors list |
Agents list |
Assignees list |
List by place |
Classification tree browser |
Top 100 Inventors |
Top 100 Agents |
Top 100 Assignees |
Usenet FAQ Index |
Documents |
Other FAQs |
User Contributions:
Comment about this patent or add new information about this topic: