Patent application title: Systems and Methods for Designing RNA Nanostructures and Uses Thereof
Inventors:
Rhiju Das (Palo Alto, CA, US)
Joseph Yesselman (Stanford, CA, US)
Kalli Kappel (Stanford, CA, US)
Assignees:
The Board of Trustees of the Leland Stanford Junior University
IPC8 Class: AC12N1511FI
USPC Class:
1 1
Class name:
Publication date: 2022-08-18
Patent application number: 20220259590
Abstract:
Systems and methods for generating RNA nanostructures capable of linking
RNA structures and capable of securing aptamers in an active and stable
structure are disclosed. Generally, RNA possesses many structural
properties to create novel nanostructures and machines. RNA tertiary
structure is composed of discrete and recurring components known as
tertiary `motifs`. Along with the helices that they interconnect, many of
these structural motifs appear highly modular. Systems and methods herein
generate a motif library including canonical and noncanonical motifs to
design a candidate path to connect one or more RNA molecules. These paths
can also be used to secure RNA aptamers to improve aptamer stability and
activity.Claims:
1. A method of designing an RNA nanostructure, comprising: generating a
motif library describing a plurality of structural motifs; and designing
a candidate path between two points of RNA using individual motifs from
the motif library.
2. The method of claim 1, wherein the motif library includes canonical motifs and noncanonical motifs, wherein the canonical motifs are double-stranded RNA helix motifs of variable length.
3.-4. (canceled)
5. The method of claim 2, wherein the noncanonical motifs include one or more of the group consisting of two-way junctions, higher-order junctions, variable-length hairpins, tertiary contacts, and multi-way junctions.
6. The method of claim 1, wherein the designing step includes integrating an aptamer into the candidate path.
7. The method of claim 1, wherein the designing step is performed in a depth-first manner.
8. The method of claim 1, wherein the candidate path is based on motif structure.
9. The method of claim 8, further comprising filling in the candidate path with sequences that best match a target secondary structure.
10. The method of claim 9, wherein the filling in step uses sequences that minimize alternative secondary structures.
11. The method of claim 1, wherein the designing step generates a plurality of candidate paths.
12. The method of claim 11, further comprising filtering the plurality of candidate paths based on at least one limitation.
13. The method of claim 12, wherein the at least one limitation is selected from the group consisting of minimum number of motifs, maximum number of motifs, minimum number of residues, maximum number of residues, minimum stability, and maximum stability.
14. The method of claim 1, further comprising synthesizing an oligonucleotide covering the design of the candidate path.
15. An RNA nanostructure comprising: a plurality of RNA motifs aligned end to end forming a chain, wherein the plurality of RNA motifs are selected from the group consisting of canonical RNA motifs and noncanonical RNA motifs.
16. The RNA nanostructure of claim 15, wherein the plurality of RNA motifs alternate between canonical RNA motifs and noncanonical RNA motifs.
17. (canceled)
18. The RNA nanostructure of claim 15, further comprising two anchor structures, wherein one anchor structure is connected to one end of the chain, and the other anchor structure is connected to the other end of the chain.
19. The RNA nanostructure of claim 18, wherein the two anchor structures are a tetraloop and a tetraloop receptor.
20. The RNA nanostructure of claim 15, further comprising an anchor structure, wherein the plurality of RNA motifs are connected to one end of the anchor structure, and at least one more RNA motif is connected to the other end of the anchor structure.
21. The RNA nanostructure of claim 20, wherein the anchor structure is an aptamer.
22. The RNA nanostructure of claim 15, wherein the canonical RNA motifs are double stranded RNA helix motifs.
23.-24. (canceled)
25. The RNA nanostructure of claim 15, wherein the noncanonical RNA motifs are selected from the group consisting of: two-way junctions, higher-order junctions, variable-length hairpins, tertiary contacts, and multi-way junctions.
Description:
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application Ser. No. 62/894,098, entitled "Methods and Systems for Rational Design of RNA Aptamers and Uses Thereof" to Das et al., filed Aug. 30, 2019 and U.S. Provisional Application Ser. No. 62/835,699, entitled "Systems and Methods for Designing RNA Nanostructures and Uses Thereof" to Das et al., filed Apr. 18, 2019; the disclosures of which are herein incorporated by reference in their entireties.
FIELD OF THE DISCLOSURE
[0003] The present disclosure relates to ribonucleic acid (RNA) aptamers, and in particular methods and systems to design RNA aptamers for increased stability and/or function.
INCORPORATION OF SEQUENCE LISTING
[0004] A computer readable form of the sequence listing, "06060.PRO Construct Sequences_ST25.txt", submitted via EFS-WEB, is herein incorporated by reference in its entirety.
BACKGROUND OF THE DISCLOSURE
[0005] RNA-based nanotechnology is an emerging field that harnesses RNA's unique structural properties to create novel nanostructures and machines. Perhaps more so than for other biomolecules, RNA tertiary structure is composed of discrete and recurring components known as tertiary `motifs`. Along with the helices that they interconnect, many of these structural motifs appear highly modular; that is, each motif folds into a well-defined three-dimensional (3D) structure in a broad range of contexts. By exploiting symmetry, motif repetition, and expert modeling, these motifs have been assembled into novel polyhedra, sheets, and cargo-carrying nanoparticles for biomedical use. Despite these advances, current methods still rely on human intuition in conjunction with simple visualization tools and the field is far from generating RNAs as sophisticated as natural RNA machines, which are asymmetric, too large to be solved by 3D RNA structure prediction methods, and composed of vast repertoires of distinct interacting motifs, most of which are not yet well characterized. (See Guo, P. (2010) The emerging field of RNA nanotechnology. Nat. Nanotechnol. 5, 833-842; Grabow, W. W., and Jaeger, L. (2014) RNA self-assembly and RNA nanotechnology. Acc. Chem. Res. 47, 1871-1880; Leontis, N. B., et al. (2006) The building blocks and motifs of RNA architecture. Curr. Opin. Struct. Biol. 16, 279-287; Jaeger, L., and Chworos, A. (2006) The architectonics of programmable RNA and DNA nanostructures. Curr. Opin. Struct. Biol. 16, 531-543; Jaeger, L., and Leontis, N. B. (2000) Tecto-RNA: One-Dimensional Self-Assembly through Tertiary Interactions. Angew. Chem. Int. Ed. Engl. 39, 2521-2524; Zhang, H., et al. (2013) Crystal structure of 3WJ core revealing divalent ion-promoted thermostability and assembly of the Phi29 hexameric motor pRNA. RNA 19, 1226-1237; Weizmann, Y., and Andersen, E. S. (2017) RNA nanotechnology--The knots and folds of RNA nanoparticle engineering. MRS Bull. 42, 930-935; Jasinski, D., et al. (2017) Advancement of the emerging field of RNA nanotechnology. ACS Nano 11, 1142-1164; Bindewald, E., et al. (2008) Computational strategies for the automated design of RNA nanoscale structures from building blocks using NanoTiler. J Mol Graph Model 27, 299-308; Jossinet, F., et al. (2010) Assemble: an interactive graphical tool to analyze and build RNA architectures at the 2D and 3D levels. Bioinformatics 26, 2057-2059; Wimberly, B. T., et al. (2000) Structure of the 30S ribosomal subunit. Nature 407, 327-339; Nguyen, T. H. D., et al. (2015) The architecture of the spliceosomal U4/U6.U5 tri-snRNP. Nature 523, 47-52; and Miao, Z., et al. (2017) RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. RNA 23, 655-672; the disclosures of which are incorporated herein by reference in their entirety.)
[0006] Additionally, aptamer selection suffers from two critical limitations that prevent its use in engineering scaffolds that do not require target protein reengineering. First, selection experiments are limited by the number of sequences that can be tested, which results in many cases where high quality aptamers cannot be selected. (See e.g., Wang, J. P., et al., Influence of Target Concentration and Background Binding on In Vitro Selection of Affinity Reagents. Plos One, 2012. 7(8); and Gold, L., et al., Aptamer-Based Multiplexed Proteomic Technology for Biomarker Discovery. Plos One, 2010. 5(12); the disclosures of which are incorporated by reference herein in their entireties.) Second, the structure of the aptamer cannot be explicitly controlled, which is undesirable when the goal is to generate an aptamer that can be used to precisely orient proteins relative to each other.
SUMMARY OF THE DISCLOSURE
[0007] This summary is meant to provide examples and is not intended to be limiting of the scope of the invention in any way. For example, any feature included in an example of this summary is not required by the claims, unless the claims explicitly recite the feature. Also, the features described can be combined in a variety of ways. Various features and steps as described elsewhere in this disclosure can be included in the examples summarized here.
[0008] In one embodiment, a method of designing an RNA nanostructure, includes generating a motif library describing a plurality of structural motifs, and designing a candidate path between two points of RNA using individual motifs from the motif library.
[0009] In a further embodiment, the motif library includes canonical motifs and noncanonical motifs.
[0010] In another embodiment, the canonical motifs are double stranded RNA helix motifs of variable length.
[0011] In a still further embodiment, the canonical motifs range in size from 1-22 bp.
[0012] In still another embodiment, the noncanonical motifs include one or more of the group consisting of two-way junctions, higher-order junctions, variable-length hairpins, tertiary contacts, and multi-way junctions.
[0013] In a yet further embodiment, the designing step includes integrating an aptamer into the candidate path.
[0014] In yet another embodiment, the designing step is performed in a depth-first manner.
[0015] In a further embodiment again, the candidate path is based on motif structure.
[0016] In another embodiment again, the method further includes filling in the candidate path with sequences that best match a target secondary structure.
[0017] In a further additional embodiment, the filling in step uses sequences that minimize alternative secondary structures.
[0018] In another additional embodiment, the designing step generates a plurality of candidate paths.
[0019] In a still yet further embodiment, the method further includes filtering the plurality of candidate paths based on at least one limitation.
[0020] In still yet another embodiment, the at least one limitation is selected from the group consisting of minimum number of motifs, maximum number of motifs, minimum number of residues, maximum number of residues, minimum stability, and maximum stability.
[0021] In a still further embodiment again, the method further includes synthesizing an oligonucleotide covering the design of the candidate path.
[0022] In still another embodiment again, an RNA nanostructure comprises a plurality of RNA motifs aligned end to end forming a chain, where the plurality of RNA motifs are selected from the group consisting of canonical RNA motifs and noncanonical RNA motifs.
[0023] In a still further additional embodiment, the plurality of RNA motifs alternate between canonical RNA motifs and noncanonical RNA motifs.
[0024] In still another additional embodiment, the RNA nanostructure further includes an anchor structure connected to one end of the chain.
[0025] In a yet further embodiment again, the RNA nanostructure further includes two anchor structures, where one anchor structure is connected to one end of the chain, and the other anchor structure is connected to the other end of the chain.
[0026] In yet another embodiment again, the two anchor structures are a tetraloop and a tetraloop receptor.
[0027] In a yet further additional embodiment, the RNA nanostructure further includes an anchor structure, wherein the plurality of RNA motifs are connected to one end of the anchor structure, and at least one more RNA motif is connected to the other end of the anchor structure.
[0028] In yet another additional embodiment, the anchor structure is an aptamer.
[0029] In a further additional embodiment again, the canonical RNA motifs are double stranded RNA helix motifs.
[0030] In another additional embodiment again, the canonical RNA motifs range in size from 1 base pair to 100 base pairs.
[0031] In a still yet further embodiment again, the canonical RNA motifs range in size from 1 base pair to 22 base pairs.
[0032] In still yet another embodiment again, the noncanonical RNA motifs are selected from the group consisting of: two-way junctions, higher-order junctions, variable-length hairpins, tertiary contacts, and multi-way junctions.
[0033] The foregoing and other objects, features, and advantages of the disclosed technology will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0034] FIGS. 1A-1C illustrate problems in RNA nanostructure design in accordance with various embodiments.
[0035] FIG. 2 illustrates a method to design RNA nanostructures in accordance with various embodiments.
[0036] FIG. 3. Illustrates a depth-first process for designing an RNA nanostructure in accordance with various embodiments.
[0037] FIGS. 4A-4B illustrate computer performance of various methods for designing an RNA nanostructure in accordance with various embodiments.
[0038] FIGS. 5A-5C illustrate RNA nanostructures to connect a tetraloop/tetraloop receptor (TTR) in accordance with various embodiments.
[0039] FIGS. 6A-6C illustrate RNA nanostructures to connect ribosomal subunits in accordance with various embodiments.
[0040] FIGS. 6D-6E illustrate RNA nanostructures including multi-way junctions in accordance with various embodiments.
[0041] FIG. 7 illustrates RNA nanostructures incorporating an aptamer in accordance with various embodiments.
[0042] FIGS. 8A-8D illustrate RNA nanostructures incorporating an aptamer in accordance with various embodiments.
[0043] FIG. 9A illustrates a method for designing RNA aptamers in accordance with various embodiments.
[0044] FIG. 9B illustrates strategies for increasing binding affinity between RNA aptamers and proteins in accordance with various embodiments.
[0045] FIG. 9C illustrate a schematic for designing RNA aptamers in accordance with various embodiments.
[0046] FIG. 9D illustrates an RNA scaffold designed to bind multiple proteins in accordance with various embodiments.
[0047] FIGS. 10A-10J illustrate exemplary RNA nanostructures in accordance with various embodiments.
[0048] FIGS. 11A-11E illustrate predicted and calculated structures of RNA motifs in accordance with various embodiments.
[0049] FIGS. 12A-12F illustrate RNA nanostructures to connect ribosomal subunits in accordance with various embodiments.
[0050] FIGS. 13A-13C illustrate RNA nanostructures to connect ribosomal subunits in accordance with various embodiments.
[0051] FIGS. 14A-14D illustrate data showing structure and function of an RNA nanostructure incorporating an aptamer in accordance with various embodiments.
[0052] FIG. 15 illustrates data showing function of an RNA nanostructure incorporating an aptamer in accordance with various embodiments.
[0053] FIG. 16 illustrates data showing function of an RNA nanostructure incorporating an aptamer in accordance with various embodiments.
[0054] FIGS. 17A-17B illustrate RNA anchor structures and RNA connecting structures in accordance with various embodiments.
DETAILED DESCRIPTION OF THE DISCLOSURE
[0055] Turning now to the drawings and data, embodiments herein represent a novel approach to 3D RNA design, based on the recognition that numerous recurring problems in the field can be cast into a `pathfinding` problem. (See FIGS. 1A-1C.) Embodiments described herein present a computer-implemented 3D RNA design program, which obviates one or more of the three problems highlighted above describing RNA motif pathfinding problems. Additional embodiments are directed to the RNA nanostructures and structural and functional measurements to test the ability of computationally generated RNA nanostructures, ribosomes, and aptamers to achieve the specific purpose of overcoming the problems described above, without requiring additional rounds of trial and error. Embodiments of the present disclosure describe methods that operate counter to prevailing, human strategies to design RNA nanostructures capable of tethering or linking various RNA sequences securely and over long distances. Additionally, various embodiments improve aptamer function and stability by integrating the aptamer into a linking structure that maintains aptamer conformation.
[0056] First, a founding problem of RNA nanotechnology involves designing a compact nanostructure that aligns the two parts of the tetraloop/tetraloop-receptor (TTR) so that they can form a tertiary contact upon RNA chain folding (FIG. 1A). This task requires finding RNA sequences that interconnect the 5' and 3' ends of the tetraloop (102) to the 3' and 5' ends of the tetraloop receptor, respectively (104, FIG. 1A). The problem has previously been solved through a combination of expert manual modeling and symmetric assembly of multiple chains. (See Jaeger, L., and Leontis, N. B. (2000) Tecto-RNA: One-Dimensional Self-Assembly through Tertiary Interactions. Angew. Chem. Int. Ed. Engl. 39, 2521-2524 and Nasalean, L., et al. (2006) Controlling RNA self-assembly to form filaments. Nucleic Acids Res. 34, 1381-1392; the disclosures of which are incorporated herein by reference in their entirety.) In all cases, an important guiding principle--sometimes called RNA architectonics--has been to design the intermediate RNA chains so that they form RNA modules previously seen in nature, including both canonical double-stranded helices and noncanonical RNA motifs that twist and translate between two desired helical endpoints at the tetraloop and the receptor. This design task is referred to as the `RNA motif pathfinding problem`. The general complexity of this pathfinding task has prevented design of asymmetric, single-chain solutions to the TTR stabilization problem.
[0057] A second problem is highly analogous to the TTR stabilization problem but is more difficult. Efforts to select engineered ribosomes with mRNA decoding, polypeptide synthesis, and protein excretion functions optimized for new substrates might be dramatically accelerated through the design of integrated ribosomes. An important step towards this goal involves tethering the two 23S and 16S rRNAs of the ribosome into a single RNA strand that supports E. coli growth. (See Fried, S. D., et al. (2015) Ribosome subunit stapling for orthogonal translation in E. coli. Angew. Chem. Int. Ed. Engl. 54, 12791-12794; Orelle, C., et al. (2015) Protein synthesis by ribosomes with tethered subunits. Nature 524, 119-124; Carlson, E. D. (2015) Creating Ribo-T: (Design, Build, Test)n. ACS Synth. Biol. 4, 1173-1175; and Schmied, W. H., et al. (2018) Controlling orthogonal ribosome subunit interactions enables evolution of new function. Nature 564, 444-448; the disclosures of which are incorporated herein by reference in their entirety.) Three-dimensional designs for a tether (106) would require solving the RNA motif pathfinding problem (108) over >100 .ANG. distances and avoiding steric collisions with the ribosome's RNA and protein components (110, FIG. 1B). Even after identification of appropriate helix endpoints, this difficult design challenge previously took more than a year to solve using trial-and-error refinement based in vivo assays or ad hoc combination of noncanonical motifs without explicit 3D modeling.
[0058] A third problem involves a more complex instance of two RNA motif pathfinding problems (112, FIG. 1C). A ubiquitous task in RNA nanotechnology is the selection of `aptamer` RNAs (114) that sense or carry target small molecules, such as adenosine 5'-triphosphate or fluorophores. (See Famulok, M. (1999) Oligonucleotide aptamers that recognize small molecules. Curr. Opin. Struct. Biol. 9, 324-329; the disclosure of which is incorporated herein by reference in its entirety.) Despite recent progress, improving aptamers requires numerous rounds of tedious selections, with few design tools available to guide consistent improvements. The desired stabilizations might be achieved by peripheral tertiary contacts that extend out of either end of an aptamer and encircle these aptamers, bracing them into their functional 3D arrangements (116,, FIG. 1C)--analogous to the tertiary contacts that `lock` natural riboswitch aptamers. (See Porter, E. B., et al. (2017) Recurrent RNA motifs as scaffolds for genetically encodable small-molecule biosensors. Nat. Chem. Biol. 13, 295-301; Gotrik, M., et al. (2018) Direct Selection of Fluorescence-Enhancing RNA Aptamers. J. Am. Chem. Soc. 140, 3583-3591; and Montange, R. K., and Batey, R. T. (2008) Riboswitches: emerging themes in RNA structure and function. Annu. Rev. Biophys. 37, 117-133; the disclosures of which are incorporated herein by reference in their entirety.) However, such rational design has not been carried out due to the difficulty of finding the required four strands that interconnect a given aptamer structure and a tertiary contact.
[0059] Additional issues exist in protein scaffolding. Scaffold proteins physically link individual molecules to increase the efficiency of their interaction and have been found to be critical to many cellular signaling processes. (See e.g., Good, M. C., et al., Scaffold proteins: hubs for controlling the flow of cellular information. Science, 2011. 332(6030): p. 680-6; the disclosure of which is incorporated by reference herein in its entirety.) Engineers have realized the potential of these scaffold molecules to reshape cellular behavior and have redesigned scaffold proteins for several applications including altering MAP kinase pathway signaling dynamics and enhancing production of specific metabolites. (See e.g., Dueber, J. E., et al., Synthetic protein scaffolds provide modular control over metabolic flux. Nature Biotechnology, 2009. 27(8): p. 753-U107; and Bashor, C. J., et al., Using engineered scaffold interactions to reshape MAP kinase pathway signaling dynamics. Science, 2008. 319(5869): p. 1539-1543; the disclosures of which is incorporated by reference herein in their entirety.) Synthetic RNA molecules offer increased design flexibility over protein scaffolds and have also been used to spatially arrange proteins to increase metabolic pathway yields and control synthetic transcriptional programs. (See e.g., Delebecque, C. J., et al., Designing and using RNA scaffolds to assemble proteins in vivo. Nature Protocols, 2012. 7(10): p. 1797-1807; Delebecque, C. J., et al., Organization of Intracellular Reactions with Rationally Designed RNA Assemblies. Science, 2011. 333(6041): p. 470-474; Zalatan, J. G., et al., Engineering Complex Synthetic Transcriptional Programs with CRISPR RNA Scaffolds. Cell, 2015. 160(1-2): p. 339-350; and Sachdeva, G., et al., In vivo co-localization of enzymes on RNA scaffolds increases metabolic production in a geometrically dependent manner. Nucleic Acids Research, 2014. 42(14): p. 9493-9503; the disclosures of which is incorporated by reference herein in their entirety.) However, both engineered RNA and protein scaffolds rely on known protein-protein or protein-RNA interactions and thus require protein- or RNA-binding proteins to be fused to the proteins to be scaffolded. This requirement precludes the use of scaffolds for therapeutic applications and makes it much more difficult to control the precise three-dimensional arrangement of the scaffolded proteins.
[0060] Turning to FIG. 2, certain embodiments are directed to computational methods 200 of RNA nanostructure design. In this method, one or more motif libraries are generated at 202. Generated libraries include canonical and/or noncanonical RNA motifs. Canonical motifs are double stranded RNA (dsRNA) helix motifs that vary in sequence and/or length. These motifs possess canonical (e.g., Watson-Crick) base-pairing (e.g., adenosine with uridine and guanosine with cytosine). In some embodiments, the canonical motifs are double stranded RNA molecules with Watson-Crick base paring. In many embodiments, canonical motifs are at least 1 base pair (bp) but can be up to 20 bp, 22 bp 25 bp, 30 bp, 50 bp, 75 bp, 100 bp, or longer. Noncanonical motifs include other RNA structures, including two-way junctions, higher-order junctions, variable-length hairpins, tertiary contacts, multi-way junctions (e.g., Phi29 P-RNA planar 3-way junction), other branched elements, and any other non-canonical motif. In many embodiments, the canonical and noncanonical motifs are empirically derived (e.g., motifs where structures are identified via X-ray crystallography or other known methods of elucidating RNA structure), while some embodiments the canonical and noncanonical motifs are computationally derived (e.g., generating motifs based on known structures and/or base pair interactions). In certain embodiments, the canonical motifs are idealized and sequence invariant. Various embodiments maintain multiple libraries representing each of noncanonical and canonical motifs, while certain embodiments will maintain a single library for both canonical and noncanonical motifs. In certain embodiments, the motifs are entered based on sequence, while many embodiments, the motifs are entered based on structure (e.g., crystallographic structure), such as pdb format. Many embodiments will utilize curated motif libraries of RNA components, such as the RNA 3D Motif Atlas (rna.bgsu.edu/rna3dhub/motifs). (See also Petrov, A. I., et al. (2013) Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas. RNA 19, 1327-1340; the disclosure of which is incorporated by reference in its entirety.)
[0061] At 204, certain embodiments design a candidate RNA structure, or candidate path, connecting two points of RNA, where the path is comprised of one or more RNA motifs in the one or more motif libraries. In this 204, connection points are defined to be linked. These connection points can be on one or more RNA molecules, such as to link two RNA molecules together or to link two ends of a single RNA molecule.
[0062] Various embodiments perform the path designing in a step-by-step in a depth-first manner, where a first motif is joined to a first point to achieve the closest distance to a second point prior to a second motif being added, then a third motif is added to achieve the closest distance to the terminating point. This process is performed, until a candidate path is designed between the first and second points. In various embodiments, the pathfinding will be performed in a bidirectional manner, such that candidate paths will generated starting at the first point and terminating at the second point in addition to candidate paths being generated starting at the second point and terminating at the first point. Additional embodiments will further always begin with a canonical motif, and some embodiments will always end with a canonical motif. Some embodiments will further alternate canonical and noncanonical motifs until a candidate path is identified. Further embodiments will allow for specific settings, such that canonical motifs are selected for larger lengths, while noncanonical motifs are selected for smaller lengths. An illustration of this pathfinding process is illustrated in FIG. 3, where a canonical motif ("helix") is added to a starting point prior to a noncanonical motif ("Motif 1") is added, which is subsequently followed by a canonical motif ("helix") and a noncanonical motif ("Motif 2") until the path meets the finishing point.
[0063] Further embodiments will design the path using structures of specific motifs rather than the RNA sequence of the specific motif to be included into the path. For example, some embodiments will allow a user to specify a specific RNA structure (e.g., an RNA aptamer) to be included in the path in lieu of a canonical or noncanonical motif. In embodiments incorporating a specific RNA structure, the method 200 incorporates a de novo scaffold around the existing structure, which will result in a structure that is more stable and active (in the case of functional structures). This pathway runs counter to prevailing methodologies (discussed further below), which attempt to place RNA structures into known scaffolds, thus plugging such structures into preconstructed scaffolds, which require vast amounts of effort without much success in generating functional scaffolds.
[0064] In 206, if the candidate path was found based on structure, many embodiments will fill in the candidate path with sequences that best match the target secondary structure. Additional embodiments will fill in the candidate path with sequences that minimize alternative secondary structures.
[0065] Once the candidate path sequences have been identified, many embodiments filter the candidate paths at 208. In 208, factors or limitations are utilized to limit the total output of method 200. Such factors include minimum and/or maximum number of motifs (e.g., canonical motifs and noncanonical motifs), minimum and/or maximum number of residues (e.g., the number of bases in the entire RNA strand), and/or minimum and/or maximum stability (e.g., number of Watson-Crick base pairs).
[0066] At 210 of certain embodiments, oligonucleotides are synthesized representing the designed RNA nanostructure. Various embodiments synthesize the RNA nanostructure chemically via various known technologies, while additional embodiments synthesize the RNA nanostructure via biochemical. Example methods of synthesis include phosphoramidite, T7 polymerase, and any other known or applicable means of synthesizing an RNA nanostructure. In various embodiments, the oligonucleotides will include just the developed path from a starting point to an ending point, while in some embodiments, the oligonucleotide includes a portion (including the entirety) of the molecule at the starting point and/or a portion (including the entirety) of the molecule at the ending point. Certain embodiments will synthesize the oligonucleotide using RNA base pairs, while some embodiments will synthesize the oligonucleotide using DNA base pairs, and additional embodiments will synthesize the oligonucleotide using a combination of RNA and DNA base pairs. Further, embodiments synthesize the oligonucleotide double stranded, single stranded, or a combination of double and single stranded.
[0067] At 212, the RNA nanostructure is put into use. Using an RNA nanostructure can include a number of uses, such as a medicament or to enhance RNA function, such as the means described in depth below.
[0068] It should be noted that in numerous embodiments, some components in method 200 will be performed in a different order, performed simultaneously with prior components, and/or omitted. For example, filtering 208 can be completed simultaneously with the pathfinding 204, such that once a path reaches a certain point (e.g., a maximum length and/or a maximum number of motifs) the path is eliminated, and another path is begun. Additionally, if the motif libraries are based on sequence, 206 will be omitted in some embodiments, as there will be no need to fill in the sequence.
[0069] Certain embodiments of method 200 are implemented on non-transitory machine readable media, where method 200 is encoded as processor instructions. In many of these embodiments, execution of the processor instructions by a processor causes the processor to perform one or more steps embodied in method 200. Additional embodiments are further directed to systems comprising a processor and memory, where the memory contains instructions that when read by the processor direct the processor to perform one or more steps embodied in method 200.
[0070] When implemented on a computer, certain embodiments of method 200 scale linearly with problem size (e.g., distance between starting and ending points). Some embodiments will be performed on a consumer-grade computer (e.g., laptop computer), and FIGS. 4A and 4B illustrate the performance of method 100. Specifically, FIG. 4A illustrates that the run time increases with distance, while FIG. 4B shows that the number of residues (e.g., base pairs) required to complete the distance also increases with the problem size. FIGS. 4A and 4B illustrate that certain embodiments method 100 will discover exceptionally long dsRNA paths (e.g., long enough to encircle a ribosome) in less than three seconds.
[0071] The resulting products of method 200 possess a number of characteristics, including the ability to fold properly, traverse long distances, and/or hold aptamers into a functional conformation.
RNA Folding
[0072] Various embodiments possess the ability for the RNA nanostructure to properly fold upon synthesis. FIGS. 5A-5D, show the ability of embodiments to fold appropriately. Specifically, FIG. 5A illustrates an embodiment a novel RNA nanostructure designed to link tetraloops and tetraloop receptors ("TTRs"). In FIG. 5A, embodiments of the novel RNA nanostructures to link TTRs will possess a tetraloop 502, tetraloop receptor 504, and the linking region 506. The structures of several embodiments are illustrated in FIG. 5B. Sequences for the embodiments illustrated in FIG. 5B can be found in the attached sequence listing as SEQ_ID NOs: 1-16. Additionally, some embodiments of the RNA nanostructures illustrated in FIG. 5B allow the TTRs to fold appropriately, as illustrated in FIG. 5C. FIG. 5C illustrates a native gel mobility assay of the embodiments illustrated in FIG. 5B. In FIG. 5C, the embodiments in FIG. 5B are labelled at the top of each image and are run in two lanes of the gel, where the left lane is a native tetraloop possessing the sequence GAAA, while the right lane has this sequence mutated to UUCG. When the native sequence tetraloop migrates further through the gel is an indicator that the linking RNA nanostructure does not disrupt the TTR tertiary fold. Quantification of this information is found below in Table 1.
TABLE-US-00001 TABLE 1 Quantification of properties of TTR linkages SHAPE and TTR DMS Native Gel DMS support Reactivity Mobility Secondary Fold Shift Mg.sup.2+ Folding Construct Structure.sup.a Change.sup.b (cm).sup.c Midpoints.sup.d miniTTR 1 95.2% 3.01 0.205 1.12 +0.34/-0.24 miniTTR 2 94.2% 6.94 0.247 0.08 +0.00/-0.00 miniTTR 3 96.6% 1.63* 0.055* >10* miniTTR 4 96.6% 1.74* 0.204 >10* miniTTR 5 98.1% 4.1 0.236 1.64 +0.32/-0.22 miniTTR 6 95.5% 3.39 0.382 0.74 +0.01/-0.02 miniTTR 7 97.2% 2.66 0.226 3.31 +0.79/-0.55 miniTTR 8 98.5% 1.16* -1.117* >10* miniTTR 9 98.5% 6.18 0.348 0.84 +0.11/-0.11 miniTTR 10 98.5% 6.59 0.405 0.74 +0.08/-0.06 miniTTR 11 96.7% 4.79 0.282 0.87 +0.13/-0.10 miniTTR 12 96.4% 5.3 0.406 0.50 +0.05/-0.03 miniTTR 13 94.2% 1.72* -0.066* >10* miniTTR 14 98.6% 5.21 0.408 0.44 +0.02/-0.01 miniTTR 15 94.2% 3.79 -0.108* 0.95 +0.14/-0.14 miniTTR 16 96.2% 14.47 0.456 0.24 +0.08/-0.02 .sup.aPercent of helical residues that have SHAPE and DMS reactivities < 0.5 reactivity units, suggesting they are in base pairs. .sup.bFor DMS chemical mapping with and without 10 mM Mg.sup.2+, a 2-fold reduction in mean DMS reactivity at the four TTR adenines was considered to pass screen. .sup.cDistance traveled in gel of RNA compared to mutant with tetraloop GAAA changed to UUCG. Positive numbers correspond to faster gel mobility (more compact fold) with wild type tetraloop, as expected for correctly folded RNA. .sup.dRNA that was more than half folded with [Mg.sup.2+] < 10 mM was considered to pass screen *Considered to not pass screen
Long Distance Tethering
[0073] Various embodiments have the ability to link molecules across long distances. FIGS. 6A-6C, show the ability of embodiments to link ribosomal subunits. Specifically, FIG. 6A illustrates an embodiment a novel RNA nanostructure designed to link ribosomal subunits. In FIG. 6A, embodiments of the novel RNA nanostructures to link ribosomal subunits will possess a linking structure 602 that connects the 23S ribosomal subunit 604 and 16S ribosomal subunit 606. The structures of several embodiments are illustrated in FIG. 6B. Sequences for the embodiments illustrated in FIG. 6B can be found in the attached sequence listing as SEQ_ID NOs: 17-25. Additionally, FIG. 6C illustrates how the tethering of some embodiments allows the growth of ribosome-deficient bacteria, which otherwise would be unable to grow without functional ribosomes.
Multi-Junction Linkages
[0074] Additional embodiments generate structures including multi-way junctions. An example of such embodiments is illustrated in FIG. 6D, where multi-way junctions 610 are incorporated into linking region 612 that connects the tetraloop-tetraloop receptor 614. Additionally, some embodiments generate multiple linkages off of such multi-link junctions, such as illustrated in FIG. 6E. FIG. 6E illustrates double-stranded RNA (dsRNA) helix 620 possessing four A-minor interactions 622. Certain embodiments include RNA nanostructures 624 to link the various A-minor interactions 622 using multi-way junctions, such as those illustrated in FIG. 6D. Additional embodiments build off of multi-way junctions to design paths 626 linking additional A-minor interactions 622 located on the dsRNA helix 620. Such embodiments generate a "RNA claw," or aptamer, to hold a dsRNA helix. Embodiments including multi-way junctions still scale linearly when designed in many embodiments (e.g., FIG. 2, method 200) (see also FIGS. 4A-4B). Some embodiments involving including multi-way junctions run faster than embodiments which only use two-way junctions, as multi-way junctions add motifs that have significantly different 6-dimensional orientations between base pair ends.
RNA Aptamer Function and Stability
[0075] RNA aptamers possess the ability to bind small molecules. Unfortunately, prior methods to improve RNA aptamer function have largely been unsuccessful by producing weakened binding affinity or instability in biological environments. Even after multiple rounds of improvement, many prior attempts resulted in diminishing returns. (See, e.g., Carothers, J. M., et al. (2006) Aptamers selected for higher-affinity binding are not more specific for the target ligand. J. Am. Chem. Soc. 128, 7929-7937; Paige, J. S., et al. (2011) RNA mimics of green fluorescent protein. Science 333, 642-646; and Ellington, A. D., and Szostak, J. W. (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822; the disclosures of which are incorporated herein by reference in their entirety.) As such, various embodiments allow for the introduction of RNA aptamers into an RNA nanostructure. Examples of such activity are illustrated below in FIGS. 7-8D. Specifically, FIG. 7 illustrates various embodiments of RNA nanostructures incorporating an aptamer 702 specific for adenosine 5'-triphosphate (ATP) and adenosine 5'-monophosphate (AMP). Sequences for the embodiments illustrated in FIG. 7 can be found in the attached sequence listing as SEQ_ID NOs: 26-35. Additionally, the dissociation constant of various embodiments is reduced by an order of magnitude from the ATP aptamer alone, showing a vast improvement of various embodiments, as shown in Table 2.
TABLE-US-00002 TABLE 2 Quantification of properties of ATP/AMP aptamers of some embodiments Reactivity Mean Formed DMS DMS TTR with Change of A9 reactivity ATP (fold and A10 at TTR change upon ATP without in DMS K.sub.d for ATP, Design binding.sup.a ATP.sup.b reactivity).sup.c .mu.M.sup.d ATP-TTR 1.sup.e n.d. n.d. n.d. n.d. ATP-TTR 2 .sup.e n.d. n.d. n.d. n.d. ATP-TTR 3 -0.24 0.04 1.00 1.5 +0.51/-0.38 ATP-TTR 4 -0.24 0.09 1.46 4.1 +1.30/-0.96 ATP-TTR 5 -0.27 0.17 1.94 1.4 +0.46/-0.35 ATP-TTR 6* 0.02 0.14 2.28 n.d. ATP-TTR 7* 0.04 0.27 1.85 n.d. ATP-TTR 8 -0.11 1.28 1.16 n.d. ATP-TTR 9 -0.71 0.28 2.84 n.d. ATP-TTR 10 -0.22 1.26 0.90 n.d. ATP aptamer -0.41 n.a. n.a. 16.2 +5.70/-4.00 .sup.aDecrease in reactivity beyond 0.2 exceeds experimental error and considered evidence for ATP binding at ATP aptamer. Values normalized to DMS reactivity of single-stranded adenosines in reference GAGUA hairpins flanking design. .sup.bMean DMS reactivity less than 0.5 taken as evidence for tetraloop/tetraloop-receptor (TTR) formation. .sup.cFold change in DMS reactivity with and without ATP. If both the mean reactivity is under 0.5 and the fold change is under 2 it is considered a success. .sup.dK.sub.d lower than reference ATP aptamer demonstrated successful stabilization of ATP aptamer. .sup.eChemical mapping data for ATP-TTR 1 and 2 could not be processed due to strong stops on the capillary electrophoresis readout. *Construct had strong stops in capillary electrophoresis making data too weak to be reliable
[0076] Additionally, the Spinach RNA aptamer binds an analog of the green fluorescent protein chromophore (Z)-4-(3,5-Difluoro-4-hydroxybenzylidene)-1,2-dimethyl-1H-imidazol-5(4H)-- one (DFHBI) within a G-quadruplex. Binding to Spinach enhances the fluorescence of DFHBI by .about.1,000-fold relative to unbound ligand, making this RNA useful for biological interrogations. (See Paige, J. S., et al. (2011) RNA mimics of green fluorescent protein. Science 333, 642-646 and Kellenberger, C. A., et al. (2015) RNA-Based Fluorescent Biosensors for Live Cell Imaging of Second Messenger Cyclic di-AMP. J. Am. Chem. Soc. 137, 6432-6435; the disclosures of which are incorporated herein by reference in their entirety.) However, the binding affinity, brightness, folding efficiency and biological stability remain poor even after extensive efforts to discover improvements such as the minimized Spinach and Broccoli aptamers. (See Strack, R. L., et al. (2013) A superfolding Spinach2 reveals the dynamic nature of trinucleotide repeat-containing RNA. Nat. Methods 10, 1219-1224; Filonov, G. S., et al. (2014) Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299-16308; Ketterer, S., et al. (2015) Systematic reconstruction of binding and stability landscapes of the fluorogenic aptamer spinach. Nucleic Acids Res. 43, 9564-9572; and Song, W., et al. (2014) Plug-and-play fluorophores extend the spectral properties of Spinach. J. Am. Chem. Soc. 136, 1198-1201; the disclosures of which are incorporated herein by reference in their entirety.)
[0077] Turning to FIG. 8A, various embodiments of RNA nanostructures incorporating the Spinach aptamer are illustrated. Sequences for the embodiments illustrated in FIG. 8A can be found in the attached sequence listing as SEQ_ID NOs: 36-51. Additionally, FIGS. 8B and 8C illustrate improved fluorescence intensity of some embodiments Spinach RNA nanostructures (SEQ_ID NOs: 36-51) over just the Spinach aptamer (SEQ_ID NO: 52) as both DFHBI and aptamer concentration are increased. Further, FIG. 8D illustrates improved stability of certain embodiments Spinach RNA nanostructures (SEQ_ID NOs: 36-51) over both the Spinach (SEQ_ID NO: 52) and Broccoli (SEQ_ID NO: 54) aptamers, when the reaction is challenged with cellular lysate, indicating that certain embodiments of RNA nanostructures (SEQ_ID NOs: 36-51) incorporating the Spinach aptamer or more stable than other versions (e.g., Spinach (SEQ_ID NO: 52) and Broccoli (SEQ_ID NO: 54)).
Protein Scaffolding
[0078] A number of embodiments are directed to RNA aptamers to scaffold proteins. In some embodiments, the methods are biased toward sequences that form favorable interactions with target proteins and adopt specific three-dimensional structures. Various embodiments design sequence libraries for in vitro selection experiments. Turning to FIG. 9A, a method 900 to design protein scaffolds is illustrated. At 902, many embodiments select a protein of interest or target protein. Numerous embodiments select the protein, along with sequence, structure, and other protein characteristics from a database of this information, including such databases as Protein Database (PDB). Further embodiments select protein complexes when one or more proteins interact or form a complex structure. At 904, many embodiments identify optimal RNA-binding regions on the surface of the target protein.
[0079] Many embodiments start with a target protein 902, then computationally identify optimal RNA-binding regions 904 on the surface of the target protein, then design small "anchor" RNA structures 906 that bind to these regions, likely with low affinity, and finally design RNA structures 908 that connect the anchors. In further embodiments, the affinity of the designed structures are improved by randomizing specific regions and performing selection experiments.
[0080] Many embodiments identify RNA/protein binding regions by predicting interaction sites between RNA structures and regions on proteins. Certain embodiments utilize a custom scoring function to discriminate between native and non-native structures, where different structures can be calculated as equation 1:
-kT In(P(structure|sequence)) (eq. 1)
[0081] The embodiments utilize an expression for the probability of a structure given its primary sequence (e.g., P(structure|sequence)). In particular, the probability of each monomer in an overall complex structure, such as given in equation 2:
P(M.sub.1,M.sub.2,C|sequence)=P(C|M.sub.1,M.sub.2,sequence) P(M.sub.1,M.sub.2,sequence) P(M.sub.2|sequence) (eq. 2)
where M.sub.1 is the structure of the RNA monomer 1, M.sub.2 is the structure of the protein monomer 2, and C is the structure of the complex.
[0082] Assuming that P(M.sub.1|M.sub.2, sequence) is approximately equal to P(M.sub.1|sequence), the equation becomes equation 3:
P(M.sub.1,M.sub.2,C|sequence)=P(C|M.sub.1,M.sub.2,sequence) P(RNA structure|sequence) P(protein structure|sequence) (eq. 3)
[0083] The energy of the RNA/Protein complex is further given by equation 4:
E(M.sub.1,M.sub.2,C|sequence)=-kT In(P(C|M.sub.1,M.sub.2, sequence))+Score.sub.RNA+Score.sub.protein (eq. 4)
[0084] Medium resolution potentials for both Score.sub.RNA and Score.sub.protein have been previously worked out and implemented within Rosetta. (See Das, R., et al., Atomic accuracy in predicting and designing noncanonical RNA structure. Nat Methods, 2010. 7(4): p. 291-4; Simons, K. T., et al., Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of Molecular Biology, 1997. 268(1): p. 209-225; Simons, K. T., et al., Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins. Proteins-Structure Function and Genetics, 1999. 34(1): p. 82-95; and Das, R. and D. Baker, Automated de novo prediction of native-like RNA tertiary structures. Proceedings of the National Academy of Sciences of the United States of America, 2007. 104(37): p. 14664-14669; the disclosures of which are incorporated herein by reference in their entireties.) Additionally, the expression for P(C|M.sub.1, M.sub.2, sequence) can be decomposed similar to protein-protein docking in equation 5: (See Gray, J. J., et al., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. Journal of Molecular Biology, 2003. 331(1): p. 281-299; the disclosure of which is incorporated herein by reference in its entirety.)
P .function. ( C | M 1 , M 2 , sequence ) = P .function. ( sequence | C , M 1 , M 2 ) .times. P .function. ( C | M 1 , M 2 ) P .function. ( sequence | M 1 , M 2 ) ( eq . 5 ) ##EQU00001##
where P(sequence|M.sub.1, M.sub.2) is constant and can be neglected. Additionally, P(sequence|C, M.sub.1, M.sub.2) can be expanded following framework outlined for knowledge-based protein score function in Rosetta, as in equation 6: (See
P .function. ( sequence | C , M .times. 1 , M .times. 2 ) .apprxeq. r i .di-elect cons. seq 1 , seq 2 P .function. ( r i | E i ) .times. r j .di-elect cons. seq 1 , r k .times. seq 2 P .function. ( r j , r k | d jk , E j , E k ) P .function. ( r j | d jk , E j , E k ) .times. P .function. ( r k | d jk , E j , E k ) ( eq . 6 ) ##EQU00002##
[0085] The first term is the residue environment term (S.sub.env) and the second term is the residue pair term (S.sub.pair). The environments are defined as interface or non-interface and for proteins buried or exposed and for RNA base-paired or not base-paired. Many embodiments use a coarse-grained representation of both the protein and RNA residues in which the sidechains are represented as a single centroid atom. Accordingly, the distances in this potential are computed between these centroid atoms.
[0086] P(C|M.sub.1, M.sub.2) is the sequence-independent part of the interaction and includes terms describing well-formed complexes. To start, this include two terms approximating the attractive and repulsive parts of van der Waals interactions in equation 7:
P(C|M.sub.1,M.sub.2).about.e.sup.-S.sup.contact+e.sup.-S.sup.clash (eq. 7)
[0087] S.sub.contact is proportional to the number of residues between the two monomers that are within an optimal distance range to be determined from the training set of structures described below. S.sub.clash is calculated using atom type dependent distance cutoffs, d.sub.ij.sup.0 determined from the training set following the same method as for the protein potential in equation 8:
S.sub.clash=(d.sub.ij.sup.0).sup.2-(d.sub.ij).sup.2 (eq. 8)
[0088] This leads to a final expression for the protein-RNA score function in equation 9:
E(M.sub.1,M.sub.2,C|sequence)=w.sub.envS.sub.env+w.sub.pairS.sub.pair+w.- sub.contactS.sub.contact+w.sub.clashS.sub.clash+w.sub.RNAScore.sub.RNA+w.s- ub.proteinScore.sub.protein (eq. 9)
where w.sub.env, w.sub.pair, w.sub.contact, w.sub.clash, w.sub.RNA, and w.sub.protein are weights that are fit to optimize prediction of native structures.
[0089] The probabilities of protein/RNA interactions, used to derive S.sub.env, S.sub.pair, S.sub.contact, and S.sub.clash is approximated from the frequencies of these interactions in the non-redundant set of protein/RNA structures found in the Protein Database (PDB). As of June 2016, there are 1283 crystal structures containing both protein and RNA chains, with resolution better than 3.5 .ANG. and less than 70% sequence identity. Additional embodiments further refine the set of structures to ensure it only contains non-redundant structures where the protein and RNA are in the same biological unit.
[0090] The proposed form of P(C|M.sub.1, M.sub.2) described here may be insufficient for successful discrimination of native complexes. The protein/RNA complexes from the PDB are analyzed in certain embodiments to identify additional structural features of well-formed RNA/protein complexes such as possible orientation preferences of secondary structure elements. Some embodiments include systematically testing the inclusion of these additional terms to find the score function that best predicts correctly formed protein/RNA structures.
[0091] At 906 of many embodiments, small "anchor" RNA structures are designed at 906 of many embodiments. RNA binding proteins with high affinity for their RNA targets are often composed of many modules, each of which binds a short RNA sequence with relatively low affinity. (See e.g., Lunde, B. M., et al., RNA-binding proteins: modular design for efficient function. Nature Reviews Molecular Cell Biology, 2007. 8(6): p. 479-490; the disclosure of which is incorporated by reference herein in its entirety.) Various embodiments design high affinity protein binding RNA aptamers. De novo design of these structures can be accomplished through two different paths in accordance with various embodiments. Some embodiments design small "anchor" RNA structures that bind weakly to specific protein surfaces, while additional embodiments design connecting RNA structures. Certain embodiments combine these paths, to incorporate small, anchor RNA structures with connecting RNA structures. FIG. 9B illustrates a schematic of these paths, where 910 represents a protein bound to native RNA anchors. 912 illustrates modified anchors where certain contacts are removed from native anchors to reduce affinity between a protein and its native anchors. 914 illustrates an embodiment with a connecting RNA structure on used on the native anchors to increase affinity between the protein and the native anchors. And, 916 illustrates a design incorporating connecting RNA structures in accordance with some embodiments, where the connecting RNA structure causes the modified anchors to have improved affinity between the protein and the modified anchors.
[0092] By choosing the sites of anchor structures and the paths of the RNA connections between them, embodiments design libraries of RNA aptamers de novo that are likely to have specific structural features. To do this, some embodiments first implement a method for determining specific patches of the protein surface that are most optimal for interacting with RNA, then certain embodiments design RNA structures at the protein surface. Several methods have been developed for predicting the RNA binding sites of RNA binding proteins using both structure and sequence-based approaches. (See e.g., Chen, Y. C., et al., Identifying RNA-binding residues based on evolutionary conserved structural and energetic features. Nucleic Acids Research, 2014. 42(3); Zhao, H. Y., et al., Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets. Nucleic Acids Research, 2011. 39(8): p. 3017-3025; and Perez-Cano, L. and J. Fernandez-Recio, Optimal Protein-RNA Area, OPRA: A propensity-based method to identify RNA-binding sites on proteins. Proteins-Structure Function and Bioinformatics, 2010. 78(1): p. 25-35; the disclosures of which are incorporated by reference herein in their entireties.) Many embodiments adapt a structure-based method to predict patches of an arbitrary protein surface that are most optimal for interacting with RNA. Certain embodiments adapt Optimal protein-RNA area (OPRA) to predict patches of an arbitrary protein surface that are most optimal for interacting with RNA. (See e.g., Perez-Cano, L. and J. Fernandez-Recio, Optimal Protein-RNA Area, OPRA: A propensity-based method to identify RNA-binding sites on proteins. Proteins-Structure Function and Bioinformatics, cited above.) OPRA uses the probability of each amino acid being at an RNA/protein interface, calculated from a training set of RNA/protein complex structures, to assign an energy value to each amino acid. Then, for each amino acid on the surface of the protein, these energy values are summed over all of the neighboring residues within a certain distance cutoff, to give a set of patch scores. Some embodiments calculate updated probabilities for each amino acid using novel training sets as developed in research. Certain embodiments utilize Rosetta to output optimal patch centers as a list of amino acids. (See e.g., Leaver-Fay, A., et al., ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol, 2011. 487: p. 545-74; the disclosure of which is incorporated by reference herein in its entirety.) A number of embodiments utilize these amino acids to serve to aid in designing connecting RNA structures.
[0093] At 908 of many embodiments, connecting structures are designed to connect the anchor RNA structures from 906. In many embodiments, the connecting RNA structures are designed using the structural modularity of RNA motifs to build new RNA structures by combining motifs found in the Protein Database (PDB). Certain methods used in embodiments treat proteins as steric constraints by representing residues of an input structure as beads. However, further embodiments design the optimal connection structures by considering simple interactions with the protein. For example, some embodiments implement a representation for proteins that conserves information about residues and/or include a custom scorer object that rewards favorable interactions between the RNA and the protein for the design of RNA structures around proteins. In various embodiments, favorable interactions are defined as RNA structures that come within approximately 5 .ANG. of positively charged protein residues. Further embodiments use a combination of methods described within this disclosure.
[0094] A schematic of method 900 is illustrated in FIG. 9C where a target protein 920 is selected, then the RNA-binding regions 922 on the surface of the target protein are identified. The small "anchor" RNA structures 924 are shown to interact with the RNA-binding regions 922. Finally, RNA structures 926 that connect the anchors connect the anchor RNA structures 924. Additionally, as noted above, certain embodiments bind multiple proteins with a single RNA scaffold, such as illustrated in FIG. 9D. These embodiments design several different connections between two aptamers designed as above. However, additional RNA structures are added to connect the aptamers to form a single aptamer that binds to more than one protein.
Embodiments of RNA Nanostructures
[0095] Turning to FIGS. 10A-10J, some embodiments are directed to RNA nanostructures to link or join one or more RNA-containing molecules. Many of these embodiments comprise at least one RNA motif 102, while further embodiments include a plurality of RNA motifs 102 (FIG. 10A), where the RNA motifs are aligned end to end forming a chain. In a variety of embodiments, the RNA motifs are selected from canonical motifs (e.g., A-U and C-G base paired) and noncanonical motifs. FIG. 10B illustrates a number of embodiments where canonical motifs 104 and noncanonical motifs 106 are alternated throughout the RNA nanostructure.
[0096] Further embodiments of RNA nanostructures are connected to at least one anchor structure 108, where the anchor structures are selected from aptamers, tetraloops and/or tetraloop receptors (e.g., TTRs, including mini-TTRs), RNA-protein anchors, ribosomes, and other RNA structures. FIG. 10C illustrates an embodiment where one anchor structure 108 is located at one end of a plurality of RNA motifs 104, 106, while FIG. 10D illustrates an embodiment with two anchor structures, where anchor structures are located at each end of a plurality of RNA motifs 104, 106.
[0097] Certain embodiments of RNA nanostructures comprise an anchor structure located between RNA motifs 102, such as illustrated in FIG. 10E. Such embodiments are capable of holding on structure in a particular conformation (e.g., aptamers) to maintain aptamer function, while certain embodiments are capable of linking numerous anchor structures together. In some of the embodiments with a centrally located anchor structure 110 and with alternating canonical and noncanonical RNA motifs, the anchor structure 110 is flanked by canonical motifs 104 among alternating canonical 104 and noncanonical 106 motifs, effectively taking the place of a noncanonical RNA motif (FIG. 10F), while other embodiments, anchor structure 110 is flanked by noncanonical motifs 106 among alternating canonical 104 and noncanonical 106 motifs, effectively taking the place of a canonical RNA motif (FIG. 10G).
[0098] Additional embodiments further comprise a combination of one or more centrally located anchor structures 110 flanked by one or more among RNA motifs 102 with an anchor structure 108 located at least one end of one or more, such as illustrated in FIG. 10H. FIG. 10I illustrates one such embodiment, where the RNA nanostructure comprises an aptamer 112 flanked by one or more RNA motifs 102 located on each side of the aptamer with a tetraloop 114 located at one end and a tetraloop receptor 116 located at the other end. Additionally, certain embodiments comprise a plurality of centrally anchor structures (e.g., FIG. 9D), where RNA a plurality of RNA anchors are joined by RNA motifs forming an RNA scaffold.
[0099] It should also be noted that certain embodiments are circularized in structure, such that one "end" of the RNA nanostructure is connected to the distal end of the RNA nanostructure, such as illustrated in FIG. 10J, where dashed line 118 represents a connection between one RNA motif 102 and a second motif 102.
EXEMPLARY EMBODIMENTS
[0100] Although the following embodiments provide details on certain embodiments of the inventions, it should be understood that these are only exemplary in nature, and are not intended to limit the scope of the invention.
EXAMPLE 1
Building RNA Nanostructures
[0101] Methods: To build a curated motif library of all RNA structural components, a set of non-redundant RNA crystal structures managed by the Leontis and Zirbel groups (version 1.45: rna.bgsu.edu/rna3dhub/nrlist/release/1.45) were obtained. (See Petrov, A. I., et al. (2013) Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas. RNA 19, 1327-1340; the disclosure of which is incorporated herein by reference in its entirety.) This set specifically removes redundant RNA structures that are identical to previously solved structures, such as ribosomes crystallized with different antibiotics. Each RNA structure to extract every motif with Dissecting the Spatial Structure of RNA (DSSR); (see Lu, X.-J., et al. (2015) DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 43, e142; the disclosure of which is incorporated herein by reference in its entirety;) were processed with the following command:
x3dna-dssr -i file.pdb -o file_dssr.out
[0102] Each extracted motif were checked to confirm that it was the correct type, as DSSR sometimes classifies tertiary contacts as higher-order junctions and vice-versa. For each motif collected from DSSR, we ran the X3DNA find_pair and analyze programs to determine the reference frame for the first and last base pair of each motif to allow for alignment between motifs:
[0103] The naming convention for each motif involves the motif classification, the originating PDB accession code, and a unique number to distinguish from other motifs of the same type, all separated by periods. For example, TWOWAY.1GID.2, is a two-way junction from the PDB 1GID and is the third two-way junction to be found in this structure. All motifs retain their original residue numbering, chain IDs and relative position compared to their originating structure.
[0104] In addition to the motifs derived from the PDB, the make-na web server (structure.usc.edu/make-na/server.html) were utilized to generate idealized helices of between 2 and 22 base pairs in length. (see Montange, R. K., and Batey, R. T. (2008) Riboswitches: emerging themes in RNA structure and function. Annu. Rev. Biophys. 37, 117-133; the disclosure of which is incorporated herein by reference in its entirety.) All motifs in these generated libraries are bundled with some embodiments and are grouped together by type (junctions, hairpins, etc.) in sqlite3 databases in the directory RNAMake/RNAMake/resources/motif_libraries/(github.com/RNAMake/RNAMake/tre- e/master/RNAMake/resources/motif_libraries_new).
[0105] To build new RNA nanostructures, certain embodiments seek a path for RNA helices and noncanonical motifs that can connect two base pairs separated by a target translation and rotation. A depth-first search algorithm to discover such RNA paths were developed. The algorithm is guided by a heuristic cost function f inspired by prior manual design efforts. (See Grabow, W. W., and Jaeger, L. (2014) RNA self-assembly and RNA nanotechnology. Acc. Chem. Res. 47, 1871-18802, 25; and Dibrov, S. M., et al. (2011) Self-assembling RNA square. Proc. Natl. Acad. Sci. USA 108, 6405-6408; the disclosures of which are incorporated herein by reference in their entirety.) The algorithm is composed of two terms:
f(path)=h(path)+g(path) (eq. 1)
[0106] The first term, h(path), describes how close the last base pair in the path is to the target base pair; h(path)=0 corresponds to a perfect overlap in translation and rotation. The functional form for h(path) depends on the spatial position of each base pair's centroid d and an orthonormal coordinate frame R defining the rotational orientation of each base pair:
h(path)=|{right arrow over (d.sub.1)}-{right arrow over (d.sub.2)}|+W(|{right arrow over (d.sub.1)}-{right arrow over (d.sub.2)}|).SIGMA..sub.i.sup.3.SIGMA..sub.j.sup.3abs(R.sub.1ij-R.sub.2ij- ) (eq. 2)
(See Filonov, G. S., et al. (2014) Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299-16308; the disclosure of which is incorporated herein by reference in its entirety.)
Here, W(d) is:
[0107] W .function. ( d ) = { 0 , if .times. d > 150 log .times. 150 d , if 1.5 < d < 150 2 , if 1.5 > d ( eq . 3 ) ##EQU00003##
[0108] Where d is measured in Angstroms. The weight W(d) reduces the importance of the current base pair and the target base pair with similar alignment when they are spatially far apart. This term conveys the intuition that aligning the two coordinate frames becomes important only as the path of the motif and helices approaches the target base pair. Embodiments readily allow for the exploration of alternative forms of the cost function terms in (eq. 2) and (eq. 3), including more standard rotationally invariant metrics to define rotation matrix differences; (see Huynh, D. Q. (2009) Metrics for 3D rotations: comparison and analysis. J. Math. Imaging Vis. 35, 155-164; the disclosure of which is incorporated herein by reference in its entirety;) or base-pair-to-base-pair RMSDs based on quaternions; (see Karney, C. F. F. (2007) Quaternions in molecular modeling. J Mol Graph Model 25, 595-604; the disclosure of which is incorporated herein by reference in its entirety;) but these were not tested in the current study.
[0109] The second term in the cost function (eq. 1) is g(path), which parameterizes the properties of the non-canonical RNA motifs and helices comprising the path at each stage of the calculation:
g .function. ( path ) = S ss ( path ) 2 + 2 .times. N motifs ( eq . 4 ) ##EQU00004##
where S.sub.ss is a secondary structure score for all the motifs and helices in the path. This S.sub.ss term favors longer canonical helices as well as motifs with frequently recurring base pairs, as follows. All base pairs found in the RNA motif are scored based on their relative occurrences in all high-resolution crystal structures; all unpaired residues receive a penalty, and Watson-Crick base pairs receive an additional bonus score (Table 3).
TABLE-US-00003 TABLE 3 Scoring penalties for each base pair type X3DNA bp Type Leontis-Westhof Energetic Penalty cm- N/A 6.11 cM - M tHH 6.11 tW + W tWW 3.11 c. + M N/A 5.69 .W + W N/A 6.11 tW - M tWH 2.42 tm - M tSH 2.72 cW + M cWH 3.33 .W - W N/A 4.33 cM + . N/A 6.11 c. - M N/A 6.11 cM + W cHW 4.40 tM + m N/A 6.11 tM - W tHW 3.02 cm - m cSS 5.12 cM - W tHW 6.11 cW - W cWW -2.00 c. - M N/A 5.44 cm + M cSH 2.71 cm - M tSH 3.23 . . . N/A 4.18 cm - W cSW 4.37 tM - m tSH 2.84 c. - W N/A 6.11 cM + m cHS 5.69 cM - m tSH 3.12
Values were derived based on logarithms of the frequencies of these elements in the crystallographic database, i.e. the inverse Boltzmann approximation; (see Finkelstein, A. V., et al. (1995) Why do protein architectures have Boltzmann-like statistics? Proteins 23, 142-150; the disclosure of which is incorporated herein by reference in its entirety;) so that that frequency of the elements in some embodiment designs was similar to what is seen in natural RNA tertiary structures. In addition to the secondary structure score, N.sub.motifs penalizes the total number of motifs in the path, here taken as the number of non-canonical motifs plus the number of canonical motifs (e.g., helices, independent of helix length).
[0110] The search adds motifs and helices to the path in a depth-first manner, while the total cost function f(path) decreases, back-tracking if f(path) increases. Any solutions with h(path) less than 5, i.e., overlap at approximately nucleotide resolution between the path's last base pair and the target base pair, are accepted into a list of final designs. The balance between g(path) and h(path) allows some embodiments to reduce the number of motif combinations considered, finding most solutions in a few seconds. For each solution, EteRNAbot, was used a secondary structure optimization algorithm that has undergone extensive empirical tests to fill in helix sequences. (See Lee, J., et al. (2014) RNA design rules from a massive open laboratory. Proc. Natl. Acad. Sci. USA 111, 2122-2127; the disclosure of which is incorporated herein by reference in its entirety.)
[0111] Proteins that are included in the coordinates supplied to Embodiments are represented as steric beads centered at the C.alpha. atom of each amino acid. This representation allows embodiments to avoid steric clashes with proteins, particularly for the ribosome tethering problems.
[0112] Results: The above method generated a multitude RNA nanostructure designs, as seen in FIGS. 5B, 6B, 7, and 8A in a relatively short amount of time, as illustrated in FIGS. 4A and 4B.
[0113] Conclusion: Embodiments reveal a novel approach to solving RNA pathfinding problems.
EXAMPLE 2
Design, Synthesis and Experimental Testing of TTR Linking Constructs
[0114] Background: The problem of creating a well-folded RNA nanostructure was first solved two decades ago by repurposing the well-characterized tetraloop/receptor (TTR) tertiary contact to bring together two separate RNA chains, analogous to the P4-P6 domain of the Tetrahymena group I self-splicing intron and other natural functional RNAs. While later RNA nanotechnology studies used the TTR module and other structural motifs to design different nanostructures, the resulting RNAs original and later designs have all been multi-chain assemblies. (See Bindewald, E., et al. (2008) Computational strategies for the automated design of RNA nanoscale structures from building blocks using NanoTiler. J Mol Graph Model 27, 299-308; Dibrov, S. M., et al. (2011) Self-assembling RNA square. Proc. Natl. Acad. Sci. USA 108, 6405-6408; Afonin, K. A., et al. (2014) Multifunctional RNA nanoparticles. Nano Lett. 14, 5662-5671; Khisamutdinov, E. F., et al. (2016) Fabrication of RNA 3D nanoprisms for loading and protection of small RNAs and model drugs. Adv. Mater. Weinheim 28, 10079-10087; and Huang, L., and Lilley, D. M. J. (2016) A quasi-cyclic RNA nano-scale molecular object constructed using kink turns. Nanoscale 8, 15189-15195; the disclosures of which are incorporated herein by reference in their entirety.) Testing embodiments on the TTR problem was chosen due to the prospect of achieving the first de novo single-chain solutions to this fundamental problem, which we hypothesized might also help crystallization.
[0115] Methods: To generate TTR linking designs, the coordinates from the X-ray crystal structure of a TTR from the P4-P6 domain of the Tetrahymena ribozyme (residues 146-157, 221-246, and 228-252 from PDB 1GID) were extracted. Second, embodiments were used to build structural segments composed of two-way junctions and helices spanning the last base pair of the hairpin (A146-U157) to base pair U221-A252 of the tetraloop-receptor, thus connecting the TTR into a single continuous strand (FIG. 3). Of 200,000 RNA segments generated, sixteen were selected based on two criteria: 1) the fewest number of motifs used in the solution (i.e. only three unique tertiary motifs); and 2) the tightest predicted atom-wise alignment of the TTR linking design to its target spatial and rotational orientations. These computational designs ranged from 75 to 102 nucleotides in size (for full sequences, see sequence list), significantly shorter than the 157 nucleotides of the natural P4-P6 domain RNA.
[0116] To probe the structures of the TTR linking designs generated by embodiments, quantitative chemical mapping with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) and dimethyl sulfate (DMS) were performed. For all 16 designs illustrated in FIG. 5B, the SHAPE and DMS reactivity of each TTR linking RNA to its respective secondary structure were compared.
[0117] To evaluate the formation of tertiary structure, the change in DMS reactivity of both tetraloop and tetraloop-receptor adenines as a function of Mg.sup.2+ concentration were investigated. Previous studies have demonstrated that TTR formation in the P4-P6 domain is strongly stabilized by Mg.sup.2+. As a control for the unfolded state, we measured the DMS reactivities of the tetraloop and tetraloop-receptor adenines of the TTR of the P4-P6 domain without Mg.sup.2+ (A248, A151, A152, and A153) were measured.
[0118] As an independent test of TTR linking construct folding, each RNA's GAAA tetraloop was replaced with a UUCG tetraloop, which does not form the sequence-specific TTR tertiary contact and is predicted to reduce the RNA's mobility in non-denaturing polyacrylamide gel electrophoresis, as observed for the P4-P6 domain.
[0119] After the gel-based and chemical mapping tests above, whether the embodiment designs might allow crystallization and thereby enable high-resolution characterization of the structural accuracy of the designs were tested. Crystals of miniTTR 6 that diffracted at 2.55 .ANG. resolution (I/.sigma. of 1.0) were obtained. Purified miniTTR 6 RNA diluted in buffer A (30 mM HEPES (pH 7.5), 20 mM MgCl2, and 100 mM KCl) was incubated at 65.degree. C. for 2 min, centrifuged at 13,000 rpm for 2 min, and snap-cooled on ice for approximately 5 min before moving to 25.degree. C. to set up crystallization trays. Within 2-4 weeks, miniTTR 6 crystallized at 25.degree. C. as plates or clusters of plates via sitting-drop vapor diffusion by mixing 2 .mu.L of miniTTR 6 at a concentration of 100 .mu.M with 3 .mu.L of crystallization solution containing 40 mM sodium cacodylate (pH 5.5), 20 mM MgCl2, 2 mM cobalt hexammine, and 40% 2-methyl-2,4-pentanediol (MPD). Crystals of miniTTR 6 grew to maximum dimensions of 700.times.700.times.20 .mu.m and were stabilized and cryogenically protected by increasing the MPD to a final concentration of 44%. Crystals were flash-frozen by plunging into liquid nitrogen. Diffraction data were collected at 100 K using synchrotron X-ray radiation at beamline 4.2.2 of the Advanced Light Source, Lawrence Berkeley National Laboratory (Berkeley, Calif.). The data were processed and scaled using X-ray Detector Software (XDS). The scaled data were handled using Collaborative Computational Project programs.
[0120] The initial structural determination of the miniTTR 6 in the C2 space group was carried out from molecular replacement (MR) in Phaser (CCP4) searching for one copy of a 31-nucleotide model of only the tetraloop and receptor with the identical sequence. The rotational and translational Z-scores were somewhat low, 4.6 and 5.9 respectively, but the maps were of sufficient quality to enable the iterative building of all the residues into the 2Fo-Fc and Fo-Fc maps. Composite omit maps in PHENIX were used to help confirm the model and reduce model bias from the initial MR solution. The models were built using COOT and refined using REFMAC5 and PHENIX. The final model was refined in REFMAC5 and ERRASER, and the overall Rwork and Rfree were refined to 22.9% and 27.4%, respectively. The structure derived from the miniTTR was refined to 2.55 .ANG. against a data set scaled to an overall I/.sigma. of 1.0 at the highest resolution shell with 98.5% completeness.
[0121] Results: Of the 1386 nucleotides in the sixteen TTR linking constructs, 1367 (98.7%) were either reactive at target unpaired regions or protected at target helical residues, supporting the predicted secondary structures. All 19 outliers occurred at helix edges (i.e., flanking base pairs of motifs). These data supported the formation of the expected secondary structures for all TTR linking designs (See Table 1).
[0122] Several TTR linking constructs required less than 1 mM Mg.sup.2+ to fold stably, similarly to or better than reported midpoints for natural TTR-contains RNA nanostructures. Indeed, miniTTR 2 and miniTTR 16 exhibited folding stabilities better than the P4-P6 RNA in side-by-side assays. Furthermore, miniTTR 6 has a much sharper Mg.sup.2+ dependence than P4-P6 with an apparent Hill coefficient of over 10. The adenines exhibited reactivities of 1.27, 0.72, 0.70, and 0.90, respectively. The values are normalized to the reactivity of the reference hairpin loops that flank each design. Upon the addition of 10 mM Mg.sup.2+, the adenines involved in the TTR became protected from DMS modification in the P4-P6 control. As with this folding control, for 12 of the 16 designs (miniTTRs 1, 2, 5-7, 9-12 and 14-16), we observed a more than two-fold decrease in the reactivity of the TTR adenine residues. These results were consistent with Mg.sup.2+-dependent TTR formation. The remaining designs (miniTTRs 3, 4, 8 and 13) did not demonstrate significant changes in DMS reactivity upon addition of 10 mM Mg.sup.2+, indicating that the TTR interaction did not form.
[0123] Of the 16 TTR linking constructs tested, 12 designs displayed mobility shifts consistent with the formation of the TTR tertiary contact (See Table 1). Constructs 4 and 15 exhibited mobility shifts that were inconsistent with our chemical mapping results. The UUCG mutant of miniTTR design 4 displayed a mobility shift, but it did not demonstrate a full two-fold decrease in TTR DMS reactivity, suggesting partial folding. Compared to its UUCG mutant, miniTTR design 15 in the wild-type form (GAAA tetraloop) exhibited a wide, slow-mobility band. In all other cases, the electrophoretic mobility measurements were concordant with our quantitative SHAPE and DMS chemical mapping data, supporting the formation of the TTR and a compact tertiary fold.
[0124] The crystal structure and the embodiment model agreed with an all-heavy-atom RMSD of 4.2 .ANG., better than the nanometer-scale accuracy typically sought in RNA nanotechnology. The primary discrepancy between the modeled 3D structure and the crystal structure was a single motif, a triple mismatch drawn from the large ribosomal subunit. This motif formed multiple consecutive non-canonical base pairs with high B-factors in our miniTTR 6 crystal instead of the conformation found in the ribosomal structure, which involved flipped out adenosines (residues: O2360-O2363, O2424-O2426, PDB:1S72), as shown in FIGS. 11A and 11B, where FIG. 11A illustrates the modeled motif structure, while FIG. 11B illustrates the crystallographic structure. Other motifs in the design achieved near-atomic accuracy, including the TTR tertiary contact (RMSD 0.45 .ANG.; FIG. 11C), a kink-turn variant drawn from the archaeal 50S ribosomal subunit (RMSD 2.0 .ANG.; FIG. 11D) (33), and a `right angle turn` drawn from a viral internal ribosomal entry site domain (RMSD 1.28 .ANG.; FIG. 11E).
[0125] Conclusion: The stability of the TTR liking designs was particularly notable given that P4-P6 and other natural TTR-containing RNAs are larger than the miniTTR designs and have additional stabilizing tertiary contacts and other attempts to make artificial minimized TTR constructs have given significantly worse stabilities.
EXAMPLE 3
Automated 3D Design of Covalently Tethered Ribosomal Subunits
[0126] Background: The ribosome is a ribonucleoprotein machine dominated by two extensive RNA subunits, the 16S and 23S rRNAs. Previous work constructed a tethered ribosome called Ribo-T, in which the large and small subunit rRNAs were connected by an RNA tether to form a single subunit ribosome. In that work, the major bottleneck involved a year of numerous trial-and-error iterations to identify RNA tethers that were not cleaved by ribonucleases in vivo when wild type ribosomes were replaced in the Squires strain (SQ171fg) of E. coli. SQ171fg cells lack genetic rRNA alleles, surviving off plasmids that can be exchanged using positive and negative selections. Early failure rounds involving ribosomes from prior studies are shown in FIG. 12A-12B and success with Ribo-T in FIG. 12C. Nevertheless, the current tethers in Ribo-T are unstructured and unlikely to remain stable if other modules are incorporated (FIG. 12C). It is hypothesized that automated design by the embodiment might give structured, chemically stable tethers for this design problem.
[0127] Methods: For ribosome tether designs, PDB coordinates 3R8T and 4GD2 were used for the 50S and 30S ribosomal subunit structures respectively. From the 50S coordinates, we removed residues A2854-A2863 and, from the 30S, we removed residues A1445-A1457. These designs contained either four or five noncanonical structural motifs each to tether the H101 helix on a circularly permuted 23S rRNA to the h44 helix on the 16S rRNA (FIG. 6B). Of the nine diverse solutions we tested (RM-Tether 1 to 9), DNA templates for seven could be synthesized, and transformation of these DNA templates into SQ171fg allowed an assay as to whether the generated designs could replace wild type ribosomes deleted from growing bacteria.
[0128] The designed tethers were cloned into plasmid pRibo-T-A2058G. The backbone was generated for each design using forward (f) and reverse (r) primer pairs in separate PCR reactions using plasmid pRibo-T as a template, Phusion polymerase (NEB), and 3% DMSO. PCR cycling was as follows: 98.degree. C. for 3 min; 25 cycles of 98.degree. C. for 30 sec, 55.degree. C. for 30 sec, 72.degree. C. for 2 min; and 72.degree. C. for 10 min. Circularly permuted 23S ribosomal RNA (rRNA) was generated with forward and reverse primer pairs, the pRibo-T template, and the same PCR conditions as described above. Each PCR reaction was purified by gel extraction from a 0.7% agarose gel with an E.Z.N.A. gel extraction kit (Omega). Each purified backbone (50 ng) was assembled with the respective 23S insert in 3-fold molar excess using Gibson assembly. Assembly reactions were transformed into POP2136 cells, and the cells were grown at 30.degree. C. overnight. Colonies were picked and plasmids were isolated using an E.Z.N.A. miniprep kit (Omega) and confirmed with full plasmid sequencing by ACGT, Inc.
[0129] Each purified plasmid (100 ng) was separately transformed into electrocompetent SQ171fg cells containing pCSacB. Cells were recovered in 1 mL of SOC media at 37.degree. C. with shaking for 1 hour. Fresh SOC (1.85 mL) supplemented with 50 .mu.g/mL carbenicillin and 0.25% sucrose was inoculated with 250 .mu.L of recovered cells and incubated overnight at 37.degree. C. with shaking. Cultures (10% and 90%) were plated on LB agar plates supplemented with 50 .mu.g/mL carbenicillin, 5% sucrose and 1 mg/mL erythromycin and incubated at 37.degree. C.
[0130] After 48 hours with no visible colonies, the plates were replica plated onto fresh LB agar plates supplemented with 50 .mu.g/mL carbenicillin, 5% sucrose and 1 mg/mL erythromycin and incubated at 37.degree. C. After 72 additional hours, colonies appeared on the plate containing RM-Tether design 4. Eight colonies were streaked onto LB agar supplemented with 50 .mu.g/mL carbenicillin and 1 mg/mL erythromycin and LB agar supplemented with 30 .mu.g/mL kanamycin (to confirm loss of the pCSacB plasmid) and were also used to inoculate 5 mL of LB supplemented with 50 .mu.g/mL carbenicillin and 1 mg/mL erythromycin. Plates were incubated at 37.degree. C., and cultures were incubated at 37.degree. C. with shaking. The OD600 of the cultures was tracked to generate growth curves (Biochrom Libra S4 spectrophotometer). After 5 days at 37.degree. C., total RNA was extracted using an RNA extraction kit from Qiagen. Total RNA was analyzed by gel electrophoresis on a 1% agarose gel with GelRed. Total plasmid was extracted from saturated 5 mL cultures with an E.Z.N.A. miniprep kit (Omega) and sequenced to confirm the correct RM-Tether design 4 sequence.
[0131] For in vitro characterization of ribosomes, all constructs (wild type, Ribo-T v1.0, and RM-Tether 4) were cloned to be under control of a T7 promoter. The T7 promoter was introduced into primers, and amplified using the wild type, Ribo-T v1.0, and RM-Tether 4 plasmids as templates for PCR amplification. PCR products were blunt end ligated, transformed into DH5.alpha. E. coli cells using electroporation, and plated onto LB-agar/ampicillin plates at 37.degree. C. Plasmid was recovered from resulting clones and sequence confirmed.
[0132] In vitro ribosome synthesis, assembly, and translation (iSAT) reactions were set-up as previously described. Briefly, eight 15 .mu.L reactions were prepared and incubated for 2 hours at 37.degree. C., then pooled together.
[0133] Sucrose gradients were prepared from buffer C (10 mM Tris-OAc (pH=7.5 at 4.degree. C.), 60 mM NH4Cl, 7.5 mM Mg(OAc)2, 0.5 mM EDTA, 2 mM DTT) with 10 and 40% sucrose in SW41 polycarbonate tubes using a Biocomp Gradient Master. Gradients were placed in SW41 buckets and chilled to 4.degree. C. 120 .mu.L of pooled iSAT reactions were loaded onto the gradients. The gradients were ultra-centrifuged at 22,500 rpm for 17 hours at 4.degree. C., using an Optima L-80 XP ultracentrifuge (Beckman-Coulter) at medium acceleration and braking (setting of 5 for each). Gradients were analyzed with a BR-188 density gradient fractionation system (Brandel) by pushing 60% sucrose into the gradient at 0.75 mL/min (at normal speed). Traces of A254 readings versus elution volumes were obtained for each gradient. Gradient fractions were collected and analyzed for rRNA content by gel electrophoresis in 1% agarose and imaged in a GelDoc Imager (Bio-Rad). Ribosome profile peaks were identified based on the rRNA content as representing 30S or 50S subunits, 70S ribosomes, or polysomes.
[0134] Fractions containing 70S ribosomes and polysomes were collected and pooled. These fractions were recovered as previously described, with pelleted iSAT ribosomes resuspended in iSAT buffer, aliquoted, and flash-frozen. These pelleted fractions were re-run on a 1 agarose gel and imaged in a GelDoc Imager to confirm tethering in monosome and polysome peaks.
[0135] For SHAPE-seq, in vitro ribosome synthesis, assembly, and translation reactions were set-up as previously described. (See Jewett, M. C., et al. (2013) In vitro integration of ribosomal RNA synthesis, ribosome assembly, and translation. Mol. Syst. Biol. 9, 678; and Fritz, B. R., et al. (2015) Implications of macromolecular crowding and reducing conditions for in vitro ribosome construction. Nucleic Acids Res. 43, 4774-4784; the disclosures of which are incorporated herein by reference in their entirety.) Briefly, 15 .mu.L iSAT reactions each possessing wild type, Ribo-T, or RM-40 were prepared in triplicate, incubated for 2 hours at 37.degree. C., and then placed on ice. To perform SHAPE modification, samples were warmed to 37.degree. C. for 5 minutes, and 7.5 .mu.L of each sample was added to 0.83 .mu.L of 65 mM 1-methyl-7-nitroisatoic anhydride (1M7) or 0.83 .mu.L DMSO (control solvent). Reactions were incubated for 2 minutes, then all samples were Trizol extracted, ethanol precipitated, washed twice with 70% ethanol, and resuspended in 10 .mu.L water. Subsequent library preparation steps were performed as described previously with one exception: 2 custom reverse transcription primers were used to simultaneously probe the regions containing T1 (5'-GGTTAAGCCTCACGG-3') and T2 (5'-CCCTACGGTTACCTTGTTACGAC-3'). (See Watters, K. E., et al. (2016) Simultaneous characterization of cellular RNA structure and function with in-cell SHAPE-Seq. Nucleic Acids Res. 44, e12; the disclosure of which is incorporated herein by reference in its entirety.) Following 2.times.75 bp paired-end Illumina sequencing, SHAPE reactivities were calculated as described by Yu et al. mapping both modification-induced stops and mutations. (See Yu et al. (2018) Estimating RNA structure chemical probing reactivities from reverse transcriptase stops and mutations, BioRxiv; the disclosure of which is incorporated herein by reference in its entirety.) Raw reactivities were calculated using Spats v1.9.8, and were then linearly re-scaled to account for estimated differences in SHAPE probe concentration between replicates. Specifically, one replicate was first selected as the reference. Reactivities for the other datasets were divided by the reference at each position, then the median value of this ratio was taken as the scale factor. Reactivities across each dataset were divided by their scale factor. The same experimental replicate was used to scale reactivities, and reactivities are presented as the average value over these re-scaled replicates.
[0136] Results: One of these seven constructs, RM-Tether 4 (FIG. 12D), led to viable growth of bacterial colonies. DNA sequencing confirmed that these colonies harbored the correct RM-Tether 4 plasmid; and RNA electrophoresis confirmed the presence of a single dominant RNA species with the same length as Ribo-T, with no detectable products corresponding to separate 16S or 23S rRNA lengths or other cleavage products. While the growth rate of this strain was low (FIG. 6C), it was independently confirmed that the ribosomes loaded on mRNA in vitro, using integrated synthesis, assembly, and translation (iSAT) in ribosome-free S150 extracts. Similar to Ribo-T, 70S/monosome and polysomes (and no 30S or 50S subunits) by separation of iSAT-prepared RM-Tether 4 ribosomes on a sucrose gradient were detected (FIG. 12E). Electrophoresis of the polysome fraction confirmed that it contained an uncleaved rRNA the same size as Ribo-T (FIG. 12F). In addition, SHAPE-Seq mapping on this rRNA confirmed that the RM-Tether 4 can be reverse transcribed from one ribosomal subunit to the other across both strands of the tether and highlights chemical reactivity consistent with the design, with one region of flexibility around the middle junction, as seen in FIGS. 13A-13C, where FIG. 13A illustrates a wild-type ribosome, FIG. 13B illustrates a Ribo-T tethered ribosome, and FIG. 13C illustrates a ribosomes tethered with RM-Tether 4.
[0137] Conclusion: Taken together, these data demonstrate that an embodiment-designed ribosomes with structured, chemically stable tethers can replace wild type ribosomes in vivo and more than one such ribosome can be loaded onto a single message in vitro. Embodiments obviate repeated rounds of trial and error that were previously required to achieve these design goals.
EXAMPLE 4
Automated Improvement of ATP-Binding RNA Aptamers
[0138] Background: Small molecules can be bound and sensed by artificially selected RNA aptamers. Unfortunately, these molecules often exhibit weakened binding affinities or instability in biological environments, and additional rounds of selection to improve aptamers typically give diminishing returns. (See Carothers, J. M., et al. (2006) Aptamers selected for higher-affinity binding are not more specific for the target ligand. J. Am. Chem. Soc. 128, 7929-7937; Paige, J. S., et al. (2011) RNA mimics of green fluorescent protein. Science 333, 642-646; and Ellington, A. D., and Szostak, J. W. (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822; the disclosures of which are incorporated herein by reference in their entirety.)
[0139] Methods: Starting with PDB 1AM0 we removed residues A6-A18 and A33-A35 to achieve a minimal ATP aptamer flanked by single Watson-Crick base pairs. We moved these residues into a new PDB `ATP_min.pdb`.
[0140] Results: In all 5210 designs were generated. As with previous construct designs, designs were selected that maximized motif usage and minimized the chain closure score or how close the optimized sequence is to the target base pair. In total, 10 ATP aptamers embedded by an embodiment into scaffolds with tetraloop/receptor contacts, which we called ATP-TTR designs (FIG. 7). Chemical mapping confirmed that four of these RNAs formed the TTR and also retained their ability to bind to ATP, as assessed by DMS protection of aptamer nucleotides A13 and A14 (Table 2). Titrations of ATP read out through chemical mapping (Table 2; FIG. 14A) showed that three designs achieved better ATP dissociation constants (Kd of 1.5, 4.1, and 1.4 .mu.M) than the isolated ATP aptamer under the same conditions (Kd=16.2 .mu.M), improvements by up to an order of magnitude. Three of the ATP-TTRs gave ligand-free DMS reactivity profiles in the aptamer regions similar to the ligand-bound aptamer, suggesting that they pre-form the structure needed for ATP binding rather than requiring conformational rearrangements observed in the isolated ATP aptamer (FIGS. 14B-14C; Table 2).
[0141] Conclusion: These results demonstrate that the TTR peripheral contact efficiently couples to enhance binding of ATP in the aptameric region, as desired. As a further test of this coupling, we confirmed that the Mg.sup.2+ requirements for forming the TTR was reduced in the presence compared to the absence of the small molecule ligand in these constructs (FIG. 14D).
EXAMPLE 5
Automated Improvement of Spinach RNA Aptamers
[0142] Background: Binding to Spinach enhances the fluorescence of DFHBI by .about.1,000-fold relative to unbound ligand, making this RNA useful for biological interrogations (38, 45), although its binding affinity, brightness, folding efficiency and biological stability remain poor even after extensive efforts to discover improvements such as the minimized Spinach and Broccoli aptamers (46-49). (See Paige, J. S., et al. (2011) RNA mimics of green fluorescent protein. Science 333, 642-646; Kellenberger, C. A., et al. (2015) RNA-Based Fluorescent Biosensors for Live Cell Imaging of Second Messenger Cyclic di-AMP. J. Am. Chem. Soc. 137, 6432-6435; Strack, R. L., et al. (2013) A superfolding Spinach2 reveals the dynamic nature of trinucleotide repeat-containing RNA. Nat. Methods 10, 1219-1224; Filonov, G. S., et al. (2014) Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299-16308; Ketterer, S., et al. (2015) Systematic reconstruction of binding and stability landscapes of the fluorogenic aptamer spinach. Nucleic Acids Res. 43, 9564-9572; and Song, W., et al. (2014) Plug-and-play fluorophores extend the spectral properties of Spinach. J. Am. Chem. Soc. 136, 1198-1201; the disclosures of which are incorporated herein by reference in their entirety.)
[0143] Methods: Starting with PDB 6614 we removed residues R19-R31 and R49-R66 to achieve the minimal DFHBI binding aptamer (Spinach_min.pdb).
[0144] A stock of DFHBI (Sigma) was prepared in PBSMKT (1.times.phosphate buffered saline, 5 mM MgCl2, 100 mM KCl, 0.01% Tween-20, pH 7.2) and its absorbance measured using a UV spectrophotometer (NanoDrop, Thermo Scientific). The DFHBI concentration was calculated using an extinction coefficient of 30,100 cm-1/M at 423 nm as previously reported. (See Paige, J. S., et al. (2011) RNA mimics of green fluorescent protein. Science 333, 642-646; the disclosure of which is incorporated herein by reference in its entirety.) A DFHBI titration was performed in half area, flat-bottomed black 96-well plates (Corning) at a final RNA concentration of 200 nM with DFHBI concentration ranging from 10 .mu.M to 10 nM prepared in a 1:2 dilution series. After mixing, the plates were covered with an adhesive film to prevent evaporation and temperature-cycled from room temperature to 4.degree. C. twice over the course of 1 hour to allow aptamer-target equilibration while minimizing magnesium-dependent self-cleavage. Measurements were acquired at room temperature and wells were excited at 462.+-.10 nm and emission was measured at 504.+-.15 nm using a Tecan M1000 plate reader. A fluorescence background was obtained at each DFHBI concentration in the absence of RNA and subtracted from the corresponding wells. The corrected signal for each aptamer at every DFHBI concentration was then least-squares fit using a custom MATLAB script using a 1:1 complexation model according to the following equation:
F = B max * [ T ] [ T ] + K d ( eq . 5 ) ##EQU00005##
Here, [T] is the concentration of DFHBI, K.sub.d is the dissociation constant of the given aptamer, and B.sub.max is the maximum brightness obtained for the given concentration of aptamer.
[0145] Next, we prepared an RNA titration assay using identical measurement, equilibration, and buffer conditions, except with the amount of DFHBI constant at 400 nM and RNA concentrations ranging from 5 .mu.M down to 5 nM prepared in a 1:2 dilution series. A background fluorescence was obtained at 400 nM DFHBI in the absence of RNA and subtracted from each well. The corrected signal was then least-squares fit using a custom MATLAB script using a 1:1 complexation model according to the following equation:
F = F max ( [ A ] * f + DT + K d - ( [ A ] * f + DT + K d ) 2 - 4 * [ A ] * f * DT 2 * DT ) ( eq . 6 ) ##EQU00006##
Where [A] was the concentration of aptamer, f is the folding efficiency, DT is the DFHBI concentration (400 nM), K.sub.d is the dissociation constant calculated for each sequence above, and F.sub.max is the maximum fluorescence signal at dye-binding saturation. Quantum yields were obtained through direct comparison of F.sub.max with the literature value for Broccoli (QY=0.72).
[0146] Small molecules can be bound and sensed by artificially selected RNA aptamers. Unfortunately, these molecules often exhibit weakened binding affinities or instability in biological environments, and additional rounds of selection to improve aptamers typically give diminishing returns. (See Carothers, J. M., et al. (2006) Aptamers selected for higher-affinity binding are not more specific for the target ligand. J. Am. Chem. Soc. 128, 7929-7937; Paige, J. S., et al. (2011) RNA mimics of green fluorescent protein. Science 333, 642-646; and Ellington, A. D., and Szostak, J. W. (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822; the disclosures of which are incorporated herein by reference in their entirety.)
[0147] Each TTR Spinach aptamer was prepared in 60 .mu.L PBSMKT containing 1.66 .mu.M total RNA and 30 .mu.L of this was added to 50 .mu.L of 5 .mu.M DFHBI in PBSMKT in two wells per aptamer. Next, 20 .mu.L of PBSMKT was added to one well per aptamer to give a final concentration of 500 nM RNA and 2.5 .mu.M DFHBI in order to provide a baseline fluorescence. Next, 20 .mu.L of 100% frog egg lysate prepared 4 hours earlier and stored at 4.degree. C., was added to each well and pipet mixed. (Higher lysate concentrations were too optically absorbent to allow fluorescence measurements). Fluorescence measurements were then obtained for every well every 1 minute for 30 minutes, then every 3 minutes for 1 hour, and after every 5 minutes for an additional hour. For evaluation of times to half-fluorescence, the fluorescence of each aptamer in wells containing lysate was normalized to the same aptamer's fluorescence in PBSMKT at every time point in order to account for photobleaching.
[0148] Each TTR Spinach aptamer was prepared in PBSMK (1.times.PBS pH 7.2, 5 mM MgCl.sub.2, 100 mM KCl) containing 1 .mu.M RNA and 2.5 .mu.M DFHBI. The RNA/DFHBI mixture was equilibrated on ice for 30 minutes before aliquoting 50 .mu.L into 4 wells per RNA species. As control reactions, 50 .mu.L of PBSMK containing 2.5 .mu.M DFHBI was added to one of these wells per RNA. Immediately prior to use, PBSMLK (1.times.PBS pH 7.2, 5 mM MgCl.sub.2, 40% E. coli lysate, 100 mM KCl) containing 2.5 .mu.M DFHBI was prepared and 50 .mu.L of this mixture was added to each well to give final concentrations of 500 nM RNA, 2.5 .mu.M DFHBI, and 20% E. coli lysate. Immediately upon addition of PBSMLK, fluorescence intensities were obtained for every well and repeated every 30 s for 8 hours using a Tecan M1000 plate reader.
[0149] To test the in vivo fluorescence of Spinach-TTR variants, designed sequences were cloned between a T7 promoter and T7 terminator in a plasmid harboring carbenicillin resistance and a ColE1 origin of replication. Plasmids were transformed into chemically competent E. coli strain BL21*(DE3) (F.sup.- ompT hsdSB (rB.sup.- mB.sup.-) gal dcm me131 [DE3]), plated on Difco LB+Agar plates containing 100 .mu.g/mL carbenicillin, and grown overnight at 37.degree. C. A cellular autofluorescence control containing a blank plasmid was also included. Individual colonies were grown overnight in LB containing 100 .mu.g/mL carbenicillin, then diluted 1:50 into fresh LB. After 1 h, Isopropyl-.beta.-D-thiogalactoside (IPTG) was added at a final concentration of 100 .mu.M to induce expression of T7 RNA polymerase. After 4.5 h of additional shaking, cells were diluted 1:200 into lx Phosphate Buffered Saline (PBS) containing 2 mg/mL kanamycin and 200 .mu.M (Z)-4-(3,5-Difluoro-4-hydroxybenzylidene)-1,2-dimethyl-1H-imidazol-- 5(4H)-one (DFHBI), then incubated at 37.degree. C. for 5 minutes. A BD Accuri C6 Plus flow cytometer fitted with a high-throughput sampler was then used to measure fluorescence of at least 50,000 events for each sample. Measurements were taken for 4 biological replicates.
[0150] Flow cytometry data analysis was performed using FlowJo (v10.4.1). Cells were gated by FSC-A and SSC-A, and the same gate was used for all samples. The geometric mean fluorescence was calculated for each sample, then all fluorescence measurements were converted to Molecules of Equivalent Fluorescein (MEFL) using CS&T RUO Beads (BD). The average fluorescence (MEFL) of cells expressing blank plasmid (pJBL002) in the presence of DFHBI was then subtracted from each measured fluorescence value.
[0151] Results: In all 697 designs were generated, and a subset were again chosen to maximize number of motifs tested and the chain closure score (how close the designed RNA sequence is to overlay with its target base pair). Out of these designs, 16 `Spinach-TTR` molecules designed by an embodiment to embed the Spinach aptamer into scaffolds with tetraloop/receptor contacts were characterized (FIG. 8A). By carrying out fluorescence assays titrating both RNA and DFHBI concentration, these design's dissociation constants, brightness, and folding efficiency were evaluated (FIGS. 8B-8C). Seven of the 16 Spinach-TTR designs exhibited 2-fold brighter fluorescence than the original Spinach as well as the brighter Broccoli aptamer (FIG. 8B). Two of these constructs, Spinach-TTR 3 and 8 were not only brighter but also gave higher affinity and improved folding efficiency relative to Broccoli and a minimized Spinach construct, Spinach-min (FIG. 8C).
[0152] Additionally, six of the seven Spinach-TTR constructs exhibited fluorescence longer than control Spinach and Broccoli sequences. Spinach-TTR 3 exhibited particularly high stability (FIG. 8D), giving a time to half fluorescence of 131 minutes, compared to <20 minutes for Spinach, Spinach-min, and Broccoli (FIG. 8D). This same robust fluorescence of the Spinach-TTRs was observed in 20% E. coli. lysate, suggesting a general stabilization in biological environments (FIG. 15). Six Spinach-TTR designs were cloned into a plasmid for T7 RNA polymerase-driven expression. Each Spinach-TTR variant was able to significantly activate expression above background, and several designs exceeded the fluorescence observed for both Spinach and Broccoli in vivo (FIG. 16).
[0153] Conclusion: These results demonstrate that the TTR peripheral contact efficiently couples to enhance binding of DFHBI in the aptameric region, thus increasing fluorescence. As a further test, these aptameric designs also showed to be more effective than other aptamers at increasing fluorescence as well as more stable, when challenged with cellular lysate, showing that embodiments herein are a vast improvement in the art at stabilizing and improving aptamer function.
EXAMPLE 6
Designing and Characterizing Novel RNAs Binding to Proteins
[0154] Background: Two well-studied RNA binding proteins, MS2 coat protein and PUF3 can be used as model systems for testing the design of RNA connections. MS2 coat protein specifically binds a 19 nucleotide RNA hairpin structure with nanomolar affinity. (See Carey, J., et al, Interaction of R17 coat protein with synthetic variants of its ribonucleic acid binding site. Biochemistry, 1983. 22(20): p. 4723-30; the disclosure of which is incorporated by reference herein in its entirety.) PUF3 binds an 8-nucleotide single stranded RNA sequence with nanomolar affinity. (See Zhu, D. Y., et al., A 5' cytosine binding pocket in Puf3p specifies regulation of mitochondrial mRNAs. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106(48): p. 20192-20197; the disclosure of which is incorporated by reference herein in its entirety.) Both systems have been extensively characterized and crystal structures of the complexes have been solved. (See e.g., Helgstrand, C., et al., Investigating the structural basis of purine specificity in the structures of MS2 coat protein RNA translational operator hairpins. Nucleic Acids Res, 2002. 30(12): p. 2678-85; the disclosure of which is incorporated by reference herein in its entirety.) Here, designing and testing a library of RNA structures addresses two main questions. First, if removing key binding residues from the RNA targets, e.g. remove the tetraloop from the MS2 hairpin structure, how can the remaining RNA target structure, e.g. the MS2 helix, be built on to create new RNA structures that recover the wildtype binding affinity. Second, can the wildtype RNA structures, e.g., the full MS2 hairpin structure, to create new RNA structures that bind to their target proteins with higher affinity.
[0155] Methods: To address these questions, an embodiment designs a library of sequences which systematically varies the RNA anchor structures. Two examples are shown in FIGS. 17A-17B, which show proteins 1702 binding native RNA residues 1704, which are connected to designed RNA structures 1706. The embodiment varies the number of anchor structures, the strength of the anchor structures (by keeping varying numbers of RNA residues that interact with the protein), and the sites of the anchors. For each set of RNA anchor structures, the embodiment designs several thousand distinct RNA connection structures. Within the RNA structures, the embodiment varies the predicted number of contacts with the protein, the length of the connections, and the extent to which they wrap around the protein. The embodiment assesses the success of these designs by measuring the binding affinities to their target proteins using a high throughput RNA array. (See e.g., Buenrostro, J. D., et al., Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat Biotechnol, 2014. 32(6): p. 562-8; the disclosure of which is incorporated herein by reference in its entirety.) Successful designs are characterized by high affinity binding to the target protein.
EXAMPLE 7
Developing Rules for Designing More Successful RNAs
[0156] Background: Predicting binding affinity increases the predictive capacity for embodiments to design successful RNAs for binding proteins. In particular, some embodiments identify predictive features of successful designs with the goal of increasing the percentage of successful designs in the future. Binding affinity is defined as the free energy difference between the complex and the unbound components.
[0157] Methods: An embodiment approximately estimate the free energy of the bound complex as a linear combination of various features such as the number of protein/RNA contacts, the extent to which the RNA wraps around the protein, the predicted free energy of the bound RNA secondary structure, and the number and strength of anchor structures. The unbound free energy of the protein are neglected for simplicity and the unbound free energy of the RNA are estimated as the free energy of all possible secondary structures, i.e. from Vienna. Weights are fit for each of these terms using a simple linear regression to a training subset. The correlation coefficient and the AUC of the resulting model are used to assess its utility.
[0158] In silico binding affinity prediction is a very difficult problem: previous work showed that even predicting the relative protein binding affinities of small, closely related RNA sequences is challenging and at best yields results accurate to within 1.5 kcal/mol. Because predicting absolute binding affinity is even more challenging, it is possible that the model described above are not predictive. If that is the case, an embodiment focuses on identifying features that increase the likelihood of a successful design, e.g. designs that detectably bind to the target protein. Again, these features are identified from a training subset of the binding affinity data. As an example, an embodiment may identify that designs that have more protein/RNA contacts are more likely to be successful.
[0159] Once the binding affinity model or the predictive features have been established, an embodiment implement a new scoring function to encourage solutions that are predicted to be more successful. The embodiment then designs and test a new library of RNA structures for MS2 and PUF3, in the same manner as described in Example 1.
EXAMPLE 8
Verifying Structures from a Subset of Designs
[0160] Background: A need exists to assess designs to both measure binding affinity and to examine the structure of the complex. An embodiment verifies this assumption for a small subset of designs deemed successful in other embodiments.
[0161] Methods: The RNA/protein structure are examined by performing one dimensional SHAPE chemical mapping on the bound complexes. A SHAPE profile consistent with the secondary structure of the design is expected, with reduced reactivity in regions predicted to be bound to the protein. Additionally, for a small subset of design failures SHAPE chemical mapping in the presence and absence of the protein is performed. By identifying ways in which the designs are failing, design algorithms may be improved.
EXAMPLE 9
Testing Libraries of RNA Aptamers
[0162] Background: Once designed and constructed, aptamer embodiments can be tested for the efficacy in binding particular proteins to which they were designed to bind.
[0163] Methods: The aptamers are designed by first identifying several possible RNA anchor structures/sequences methods, such as those described herein. Then for each of these sets of anchor structures, many different connecting RNA structures are designed. Additionally, each of the libraries contains a subset of sequences with specific randomized portions, for a total of approximately 10.sup.15 sequences in each library. The benchmark set of proteins contains proteins that range in size and for which previous selection attempts have been both successful and unsuccessful. Table 1 lists an initial set of five possible proteins for the benchmark set. Selections are performed for each of these proteins with the designed libraries. This initial benchmark set helps to identify the optimal way in which to incorporate randomized regions into the designed sequences. The success is assessed by the binding affinities of the selected aptamers.
TABLE-US-00004 TABLE 4 Benchmark proteins Size (No. of Previous selection Protein Aptamer/protein Protein amino acids) yielded aptamers? PDB ID complex PDB ID Thrombin 288 Yes 5AFY 3DD2 Human 211 Yes 4W4N 3AGV IgG1 MAPK8 371 No 2XRW -- (JNK1) MEK1 393 -- 1S9J -- MEK2 400 -- 1S9I --
EXAMPLE 10
Investigating Structures of Successful Aptamers
[0164] Background: If or when successful aptamers are identified, the structures of these aptamers can be examined to identify the specific features that contribute to the success.
[0165] Methods: First the structures of the RNA are verified by performing one-dimensional SHAPE chemical mapping. By examining the SHAPE profile in the presence and absence of the protein, the regions of the RNA that are likely to be interacting with the protein are identified. In addition to the chemical mapping experiments, verifying that the RNA is binding to the protein where it was predicted on the surface are performed. To do this, successful designs that were predicted to leave functional sites accessible are assessed. For these aptamer embodiments, the binding affinity of ligands known to bind to the functional site after incubating the protein with the RNA aptamer are assessed. If the binding affinity of the ligand remains the same when the protein is bound to the RNA aptamer, this would suggest that the functional site is indeed accessible. For example, there are several ligands known to bind to the different binding pockets on thrombin. Aptamers can be designed that should specifically leave one of these binding sites accessible. Then, thrombin are incubated with one of the successful aptamers, then the binding affinity of one of the known ligands to the thrombin/aptamer complex are measured.
EXAMPLE 11
Redesigning Aptamers to Increase Affinity
[0166] Background: When selection experiments fail, they generally still yield many low-quality aptamers. This often means aptamers that have high nanomolar or low micromolar affinity to the target protein. Currently, there is no simple strategy for optimizing these aptamers to bind with higher affinity.
[0167] Methods: First, the structure of the RNA aptamer bound to the target protein will be predicted. Using many (.about.100) of the structures that score best, RNA extensions that should wrap around the protein will be designed. A small library of these designs will then be tested experimentally. It is expected that some of these designs will bind to the target protein with higher affinity than the original aptamer.
EXAMPLE 12
Implementing Sampling Schemes for RNA Fragment Assembly
[0168] Background: Certain embodiments will seek to predict a structure of an RNA/protein complex based on RNA sequence and protein structure.
[0169] Methods: An embodiment will extend the fragment assembly algorithm for RNA structure prediction within Rosetta. This method builds de novo RNA structures by sampling torsion angles from fragments of RNA structures from the PDB in a Monte Carlo simulation. Protein binding will be incorporated using two different strategies: 1) fold the RNA in the presence of the protein, and 2) fold the RNA without the protein and then dock it onto the protein surface and remodel interface residues. Both of these initial strategies will use a coarse-grained representation of the protein and RNA residues.
[0170] The first strategy, folding the RNA in the presence of the protein, will involve both fragment insertion and docking moves. Initially, we will implement a strategy similar to that described previously for the simultaneous folding and docking of symmetric protein complexes, in which every tenth move will be a docking attempt. (See Das, R., et al., Simultaneous prediction of protein folding and docking at high resolution. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106(45): p. 18978-18983; the disclosure of which is incorporated by reference herein in its entirety.) Each move will be scored using the potential described herein.
[0171] The novel aspect of the second strategy is essentially the flexible docking algorithm. Initially, the RNA structure will be built with the fragment assembly method. Because the protein will not be present at this stage, structures will be evaluated with the RNA-only potential. The resulting RNA structures will then be docked against the protein and interface residues will be resampled with fragment insertion moves. At this stage, structures will be scored with the RNA/protein potential described herein.
[0172] Finally, coarse-grained structures resulting from either of these two strategies will be converted into full-atom representation. The structures will be refined by sampling side chain rotamers in a Monte Carlo simulation and then performing energy minimization using the high-resolution RNA/protein potential described herein.
[0173] These methods will be tested on a benchmark set of RNA/protein complexes with known structures. Varying amounts of input information will be provided for each complex, ranging from just the protein structure and the RNA sequence, to the protein structure with one or more "anchor" RNA residues bound, to the protein structure and parts of the RNA structure. The results over this range of input information will help to evaluate the reliability of this method in various practical situations.
Doctrine of Equivalents
[0174] Having described several embodiments, it will be recognized by those skilled in the art that various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the invention. Additionally, a number of well-known processes and elements have not been described in order to avoid unnecessarily obscuring the present invention. Accordingly, the above description should not be taken as limiting the scope of the invention.
[0175] Those skilled in the art will appreciate that the foregoing examples and descriptions of various preferred embodiments of the present invention are merely illustrative of the invention as a whole, and that variations in the components or steps of the present invention may be made within the spirit and scope of the invention. Accordingly, the present invention is not limited to the specific embodiments described herein, but, rather, is defined by the scope of the appended claims.
Sequence CWU
1
1
551148DNAArtificial SequenceDesigned sequence in accordance with
embodiments 1ggaacagctc gagtagagct gaaagttgat atggatagag taagagagat
ggaagtctca 60ggggaaactt tgagatggac ggtttacaag ttgtcctaag tcaacaaacg
catcgagtag 120atgcgaacaa agaaacaaca acaacaac
1482164DNAArtificial SequenceDesigned sequence in accordance
with embodiments 2ggaacagctc gagtagagct gaaagttgat atggataata
cgtcaagctt caccgaagaa 60caaatcaggg gaaactttga tttgggaggt gaagaactac
ttgacgttgt cctaagtcaa 120caaacgcatc gagtagatgc gaacaaagaa acaacaacaa
caac 1643145DNAArtificial SequenceDesigned Sequence
3ggaacagctc gagtagagct gaaagttgat atggatgatt aggacatgca ttgctgaggg
60gaaacttttt gcaatgcaac agccaaatcg tcctaagtca acaaacgcat cgagtagatg
120cgaacaaaga aacaacaaca acaac
1454145DNAArtificial SequenceDesigned sequence in accordance with
embodiments 4ggaacagctc gagtagagct gaaagttgat atggatacct aggacatgcc
aatctgtggg 60gaaacttatt gattggcaac agccaaggtg tcctaagtca acaaacgcat
cgagtagatg 120cgaacaaaga aacaacaaca acaac
1455145DNAArtificial SequenceDesigned sequence in accordance
with embodiments 5ggaacagctc gagtagagct gaaagttgat atggattagc
aaggacatgc agagcaaggg 60ggaaacttca cctctgcaac agccacctag tcctaagtca
acaaacgcat cgagtagatg 120cgaacaaaga aacaacaaca acaac
1456161DNAArtificial SequenceDesigned sequence in
accordance with embodiments 6ggaacagctc gagtagagct gaaagttgat
atggatttac tccgaggaga cgaactacca 60cgaacagggg aaactctacc cgtggcgtct
ccgtttgacg agtaagtcct aagtcaacaa 120acgcatcgag tagatgcgaa caaagaaaca
acaacaacaa c 1617169DNAArtificial SequenceDesigned
sequence in accordance with embodiments 7ggaacagctc gagtagagct
gaaagttgat atgggcaaga attgtgccag actttgaact 60actgcgtctc aggggaaact
ttgagatgca gcaaagtcgg taatacaatt cgacccctaa 120gtcaacaaac gcatcgagta
gatgcgaaca aagaaacaac aacaacaac 1698162DNAArtificial
SequenceDesigned sequence in accordance with embodiments 8ggaacagctc
gagtagagct gaaagttgat atggtgaccg caaggatgga agaccaatac 60tatctcaggg
gaaactttga gatagtatag gttggacctt gccagtaacc taagtcaaca 120aacgcatcga
gtagatgcga acaaagaaac aacaacaaca ac
1629165DNAArtificial SequenceDesigned sequence in accordance with
embodiments 9ggaacagctc gagtagagct gaaagttgat atggaaaccg agcccgagga
tatgcttgaa 60aaactcaggg gaaactttga gttttggcgc atatccgttt gacgggagtt
tcctaagtca 120acaaacgcat cgagtagatg cgaacaaaga aacaacaaca acaac
16510166DNAArtificial SequenceDesigned sequence in accordance
with embodiments 10ggaacagctc gagtagagct gaaagttgat atggatttga
aaccggatgg aaggtgagag 60caacgactgc aggggaaact ttgcagtcgg ctcactggac
cgactgacaa atcctaagtc 120aacaaacgca tcgagtagat gcgaacaaag aaacaacaac
aacaac 16611150DNAArtificial SequenceDesigned sequence
in accordance with embodiments 11ggaacagctc gagtagagct gaaagttgat
atggacattc aagttgtgga cgacacatgg 60gggaaacttc atgtagtcga tggaagcaga
gaatgtccta agtcaacaaa cgcatcgagt 120agatgcgaac aaagaaacaa caacaacaac
15012142DNAArtificial SequenceDesigned
sequence in accordance with embodiments 12ggaacagctc gagtagagct
gaaagttgat atggatgtgg tatgccaaca gccatagctg 60ggaaactagc atggacatgg
caccacatcc taagtcaaca aacgcatcga gtagatgcga 120acaaagaaac aacaacaaca
ac 14213143DNAArtificial
SequenceDesigned sequence in accordance with embodiments
13ggaacagctc gagtagagct gaaagttgat atggtggata gtgacatgaa ttctcagggg
60aaactttgag aattcaacag cacaagaagc ctaagtcaac aaacgcatcg agtagatgcg
120aacaaagaaa caacaacaac aac
14314168DNAArtificial SequenceDesigned sequence in accordance with
embodiments 14ggaacagctc gagtagagct gaaagttgat atggttaaca cccgatgatg
gaaggtagga 60gcaacgttgg caggggaaac tttgccaacg gcctactgga catcggcaag
ttaacctaag 120tcaacaaacg catcgagtag atgcgaacaa agaaacaaca acaacaac
16815142DNAArtificial SequenceDesigned sequence in accordance
with embodiments 15ggaacagctc gagtagagct gaaagttgat atggaacctg
tatggagtaa ccaatgggga 60aacttattgg cccatggaag tattggttcc taagtcaaca
aacgcatcga gtagatgcga 120acaaagaaac aacaacaaca ac
14216143DNAArtificial SequenceDesigned sequence in
accordance with embodiments 16ggaacagctc gagtagagct gaaagttgat
atgggagcta ggacatggga ctttagggaa 60acttaaagta tcccaacagc ctagccgagc
ctaagtcaac aaacgcatcg agtagatgcg 120aacaaagaaa caacaacaac aac
143177539DNAArtificial SequenceDesigned
sequence in accordance with embodiments 17gcggccgcga tctctcacct
accaaacaat gcccccctgc aaaaaataaa ttcatataaa 60aaacatacag ataaccatct
gcggtgataa attatctctg gcggtgttga cataaatacc 120actggcggtg atactgagca
cgggtaccgg ccgctgagaa aaagcgaagc ggcactgctc 180tttaacaatt tatcagacaa
tctgtgtggg cactcgaaga tacggattct taacgtcgca 240agacgaaaaa tgaataccaa
gtctcaagag tgaacacgta attcattacg aagtttaatt 300ctttgagcgt caaactttta
aattgaagag tttgatcatg gctcagattg aacgctggcg 360gcaggcctaa cacatgcaag
tcgaacggta acaggaagaa gcttgcttct ttgctgacga 420gtggcggacg ggtgagtaat
gtctgggaaa ctgcctgatg gagggggata actactggaa 480acggtagcta ataccgcata
acgtcgcaag accaaagagg gggaccttcg ggcctcttgc 540catcggatgt gcccagatgg
gattagctag taggtggggt aacggctcac ctaggcgacg 600atccctagct ggtctgagag
gatgaccagc cacactggaa ctgagacacg gtccagactc 660ctacgggagg cagcagtggg
gaatattgca caatgggcgc aagcctgatg cagccatgcc 720gcgtgtatga agaaggcctt
cgggttgtaa agtactttca gcggggagga agggagtaaa 780gttaatacct ttgctcattg
acgttacccg cagaagaagc accggctaac tccgtgccag 840cagccgcggt aatacggagg
gtgcaagcgt taatcggaat tactgggcgt aaagcgcacg 900caggcggttt gttaagtcag
atgtgaaatc cccgggctca acctgggaac tgcatctgat 960actggcaagc ttgagtctcg
tagagggggg tagaattcca ggtgtagcgg tgaaatgcgt 1020agagatctgg aggaataccg
gtggcgaagg cggccccctg gacgaagact gacgctcagg 1080tgcgaaagcg tggggagcaa
acaggattag ataccctggt agtccacgcc gtaaacgatg 1140tcgacttgga ggttgtgccc
ttgaggcgtg gcttccggag ctaacgcgtt aagtcgaccg 1200cctggggagt acggccgcaa
ggttaaaact caaatgaatt gacgggggcc cgcacaagcg 1260gtggagcatg tggtttaatt
cgatgcaacg cgaagaacct tacctggtct tgacatccac 1320ggaagttttc agagatgaga
atgtgccttc gggaaccgtg agacaggtgc tgcatggctg 1380tcgtcagctc gtgttgtgaa
atgttgggtt aagtcccgca acgagcgcaa cccttatcct 1440ttgttgccag cggtccggcc
gggaactcaa aggagactgc cagtgataaa ctggaggaag 1500gtggggatga cgtcaagtca
tcatggccct tacgaccagg gctacacacg tgctacaatg 1560gcgcatacaa agagaagcga
cctcgcgaga gcaagcggac ctcataaagt gcgtcgtagt 1620ccggattgga gtctgcaact
cgactccatg aagtcggaat cgctagtaat cgtggatcag 1680aatgccacgg tgaatacgtt
cccgggcctt gtacacaccg cccgtcacac catgggagtg 1740ggttgcaaaa gaagtaggta
gctgagcaac aggtccgtgc cgaggatttc gatctaagac 1800agtatggggc cccgttgagc
taaccggtac taatgaaccg tgaggcttaa ccgagaggtt 1860aagcgactaa gcgtacacgg
tggatgccct ggcagtcaga ggcgatgaag gacgtgctaa 1920tctgcgataa gcgtcggtaa
ggtgatatga accgttataa ccggcgattt ccgaatgggg 1980aaacccagtg tgtttcgaca
cactatcatt aactgaatcc ataggttaat gaggcgaacc 2040gggggaactg aaacatctaa
gtaccccgag gaaaagaaat caaccgagat tcccccagta 2100gcggcgagcg aacggggagc
agcccagagc ctgaatcagt gtgtgtgtta gtggaagcgt 2160ctggaaaggc gcgcgataca
gggtgacagc cccgtacaca aaaatgcaca tgctgtgagc 2220tcgatgagta gggcgggaca
cgtggtatcc tgtctgaata tggggggacc atcctccaag 2280gctaaatact cctgactgac
cgatagtgaa ccagtaccgt gagggaaagg cgaaaagaac 2340cccggcgagg ggagtgaaaa
agaacctgaa accgtgtacg tacaagcagt gggagcacgc 2400ttaggcgtgt gactgcgtac
cttttgtata atgggtcagc gacttatatt ctgtagcaag 2460gttaaccgaa taggggagcc
gaagggaaac cgagtcttaa ctgggcgtta agttgcaggg 2520tatagacccg aaacccggtg
atctagccat gggcaggttg aaggttgggt aacactaact 2580ggaggaccga accgactaat
gttgaaaaat tagcggatga cttgtggctg ggggtgaaag 2640gccaatcaaa ccgggagata
gctggttctc cccgaaagct atttaggtag cgcctcgtga 2700attcatctcc gggggtagag
cactgtttcg gcaagggggt catcccgact taccaacccg 2760atgcaaactg cgaataccgg
agaatgttat cacgggagac acacggcggg tgctaacgtc 2820cgtcgtgaag agggaaacaa
cccagaccgc cagctaaggt cccaaagtca tggttaagtg 2880ggaaacgatg tgggaaggcc
cagacagcca ggatgttggc ttagaagcag ccatcattta 2940aagaaagcgt aatagctcac
tggtcgagtc ggcctgcgcg gaagatgtaa cggggctaaa 3000ccatgcaccg aagctgcggc
agcgacgctt atgcgttgtt gggtagggga gcgttctgta 3060agcctgcgaa ggtgtgctgt
gaggcatgct ggaggtatca gaagtgcgaa tgctgacata 3120agtaacgata aagcgggtga
aaagcccgct cgccggaaga ccaagggttc ctgtccaacg 3180ttaatcgggg cagggtgagt
cgacccctaa ggcgaggccg aaaggcgtag tcgatgggaa 3240acaggttaat attcctgtac
ttggtgttac tgcgaagggg ggacggagaa ggctatgttg 3300gccgggcgac ggttgtcccg
gtttaagcgt gtaggctggt tttccaggca aatccggaaa 3360atcaaggctg aggcgtgatg
acgaggcact acggtgctga agcaacaaat gccctgcttc 3420caggaaaagc ctctaagcat
caggtaacat caaatcgtac cccaaaccga cacaggtggt 3480caggtagaga ataccaaggc
gcttgagaga actcgggtga aggaactagg caaaatggtg 3540ccgtaacttc gggagaaggc
acgctgatat gtaggtgagg tccctcgcgg atggagctga 3600aatcagtcga agataccagc
tggctgcaac tgtttattaa aaacacagca ctgtgcaaac 3660acgaaagtgg acgtatacgg
tgtgacgcct gcccggtgcc ggaaggttaa ttgatggggt 3720tagcgcaagc gaagctcttg
atcgaagccc cggtaaacgg cggccgtaac tataacggtc 3780ctaaggtagc gaaattcctt
gtcgggtaag ttccgacctg cacgaatggc gtaatgatgg 3840ccaggctgtc tccacccgag
actcagtgaa attgaactcg ctgtgaagat gcagtgtacc 3900cgcggcaaga cggaaagacc
ccgtgaacct ttactatagc ttgacactga acattgagcc 3960ttgatgtgta ggataggtgg
gaggctttga agtgtggacg ccagtctgca tggagccgac 4020cttgaaatac caccctttaa
tgtttgatgt tctaacgttg acccgtaatc cgggttgcgg 4080acagtgtctg gtgggtagtt
tgactggggc ggtctcctcc taaagagtaa cggaggagca 4140cgaaggttgg ctaatcctgg
tcggacatca ggaggttagt gcaatggcat aagccagctt 4200gactgcgagc gtgacggcgc
gagcaggtgc gaaagcaggt catagtgatc cggtggttct 4260gaatggaagg gccatcgctc
aacggataaa aggtactccg gggataacag gctgataccg 4320cccaagagtt catatcgacg
gcggtgtttg gcacctcgat gtcggctcat cacatcctgg 4380ggctgaagta ggtcccaagg
gtatggctgt tcgccattta aagtggtacg cgagctgggt 4440ttagaacgtc gtgagacagt
tcggtcccta tctgccgtgg gcgctggaga actgaggggg 4500gctgctccta gtacgagagg
accggagtgg acgcatcact ggtgttcggg ttgtcatgcc 4560aatggcactg cccggtagct
aaatgcggaa gagataagtg ctgaaagcat ctaagcacga 4620aacttgcccc gagatgagtt
ctccctgacc ctttaagggt cctgaaggaa cgttgaagac 4680gacgacgttg ataggccggg
tgtgtaagcg gggccccata caatgacgga tcgaaatccg 4740tttgacgcac ggcctggcgg
cgcttaccac tttgtgattc atgactgggg tgaagtcgta 4800acaaggtaac cgtaggggaa
cctgcggttg gatcacctcc ttaccttaaa gaagcgtact 4860ttgtagtgct cacacagatt
gtctgataga aagtgaaaag caaggcgttt acgcgttggg 4920agtgaggctg aagagaataa
ggccgttcgc tttctattaa tgaaagctca ccctacacga 4980aaatatcacg caacgcgtga
taagcaattt tcgtgtcccc ttcgtctaga ggcccaggac 5040accgcccttt cacggcggta
acaggggttc gaatccccta ggggacgcca cttgctggtt 5100tgtgagtgaa agtcgccgac
cttaatatct caaaactcat cttcgggtga tgtttgagat 5160atttgctctt taaaaatctg
gatcaagctg aaaattgaaa cactgaacaa cgagagttgt 5220tcgtgagtct ctcaaatttt
cgcaacacga tgatgaatcg aaagaaacat cttcgggttg 5280tgagcttaag cttacaacgc
cgaagctgtt ttggcggatg agagaagatt ttcagcctga 5340tacagattaa atcagaacgc
agaagcggtc tgataaaaca gaatttgcct ggcggcagta 5400gcgcggtggt cccacctgac
cccatgccga actcagaagt gaaacgccgt agcgccgatg 5460gtagtgtggg gtctccccat
gcgagagtag ggaactgcca ggcatcaaat aaaacgaaag 5520gctcagtcga aagactgggc
ctttcgtttt atctgttgtt tgtcggtgaa cgctctcctg 5580agtaggacaa atccgccggg
agcggatttg aacgttgcga agcaacggcc cggagggtgg 5640cgggcaggac gcccgccata
aactgccagg catcaaatta agcagaaggc catcctgacg 5700gatggccttt ttgcgtttct
acaaactctt cctgtcgtca tatctacaag ccggcgcgcc 5760gggaaatgtg cgcggaaccc
ctatttgttt atttttctaa atacattcaa atatgtatcc 5820gctcatgaga caataaccct
gataaatgct tcaataatat tgaaaaagga agagtatgag 5880tattcaacat ttccgtgtcg
cccttattcc cttttttgcg gcattttgcc ttcctgtttt 5940tgctcaccca gaaacgctgg
tgaaagtaaa agatgctgaa gatcagttgg gtgcacgagt 6000gggttacatc gaactggatc
tcaacagcgg taagatcctt gagagttttc gccccgaaga 6060acgttttcca atgatgagca
cttttaaagt tctgctatgt ggcgcggtat tatcccgtgt 6120tgacgccggg caagagcaac
tcggtcgccg catacactat tctcagaatg acttggttga 6180gtactcacca gtcacagaaa
agcatcttac ggatggcatg acagtaagag aattatgcag 6240tgctgcaata accatgagtg
ataacactgc ggccaactta cttctgacaa cgatcggagg 6300accgaaggag ctaaccgctt
ttttgcacaa catgggggat catgtaactc gccttgatcg 6360ttgggaaccg gagctgaatg
aagccatacc aaacgacgag cgtgacacca cgatgcctgc 6420agcaatggca acaacgttgc
gcaaactatt aactggcgaa ctacttactc tagcttcccg 6480gcaacaatta atagactgga
tggaggcgga taaagttgca ggaccacttc tgcgctcggc 6540ccttccggct agctggttta
ttgctgataa atctggagcc ggtgagcgtg ggtctcgcgg 6600tatcattgca gcactggggc
cagatggtaa gccctcccgt atcgtagtta tctacacgac 6660ggggagtcag gcaactatgg
atgaacgaaa tagacagatc gctgagatag gtgcctcact 6720gattaagcat tggtaactgc
agaccaagtt tactcatata tactttagat tgatttaaaa 6780cttcattttt aatttaaaag
gatctaggtg aagatccttt ttgataatct catgaccaaa 6840atcccttaac gtgagttttc
gttccactga gcgtcagacc ccgtagaaaa gatcaaagga 6900tcttcttgag atcctttttt
tctgcgcgta atctgctgct tgcaaacaaa aaaaccaccg 6960ctaccagcgg tggtttgttt
gccggatcaa gagctaccaa ctctttttcc gaaggtaact 7020ggcttcagca gagcgcagat
accaaatact gtccttctag tgtagccgta gttaggccac 7080cacttcaaga actctgtagc
accgcctaca tacctcgctc tgctaatcct gttaccagtg 7140gctgctgcca gtggcgataa
gtcgtgtctt accgggttgg actcaagacg atagttaccg 7200gataaggcgc agcggtcggg
ctgaacgggg ggttcgtgca cacagcccag cttggagcga 7260acgacctaca ccgaactgag
atacctacag cgtgagctat gagaaagcgc cacgcttccc 7320gaagggagaa aggcggacag
gtatccggta agcggcaggg tcggaacagg agagcgcacg 7380agggagcttc cagggggaaa
cgcctggtat ctttatagtc ctgtcgggtt tcgccacctc 7440tgacttgagc gtcgattttt
gtgatgctcg tcaggggggc ggagcctatg gaaaaacgcc 7500agcaacgcgg cctttttacg
gttcctggcc ttttgctgg 7539187559DNAArtificial
SequenceDesigned sequence in accordance with embodiments
18gcggccgcga tctctcacct accaaacaat gcccccctgc aaaaaataaa ttcatataaa
60aaacatacag ataaccatct gcggtgataa attatctctg gcggtgttga cataaatacc
120actggcggtg atactgagca cgggtaccgg ccgctgagaa aaagcgaagc ggcactgctc
180tttaacaatt tatcagacaa tctgtgtggg cactcgaaga tacggattct taacgtcgca
240agacgaaaaa tgaataccaa gtctcaagag tgaacacgta attcattacg aagtttaatt
300ctttgagcgt caaactttta aattgaagag tttgatcatg gctcagattg aacgctggcg
360gcaggcctaa cacatgcaag tcgaacggta acaggaagaa gcttgcttct ttgctgacga
420gtggcggacg ggtgagtaat gtctgggaaa ctgcctgatg gagggggata actactggaa
480acggtagcta ataccgcata acgtcgcaag accaaagagg gggaccttcg ggcctcttgc
540catcggatgt gcccagatgg gattagctag taggtggggt aacggctcac ctaggcgacg
600atccctagct ggtctgagag gatgaccagc cacactggaa ctgagacacg gtccagactc
660ctacgggagg cagcagtggg gaatattgca caatgggcgc aagcctgatg cagccatgcc
720gcgtgtatga agaaggcctt cgggttgtaa agtactttca gcggggagga agggagtaaa
780gttaatacct ttgctcattg acgttacccg cagaagaagc accggctaac tccgtgccag
840cagccgcggt aatacggagg gtgcaagcgt taatcggaat tactgggcgt aaagcgcacg
900caggcggttt gttaagtcag atgtgaaatc cccgggctca acctgggaac tgcatctgat
960actggcaagc ttgagtctcg tagagggggg tagaattcca ggtgtagcgg tgaaatgcgt
1020agagatctgg aggaataccg gtggcgaagg cggccccctg gacgaagact gacgctcagg
1080tgcgaaagcg tggggagcaa acaggattag ataccctggt agtccacgcc gtaaacgatg
1140tcgacttgga ggttgtgccc ttgaggcgtg gcttccggag ctaacgcgtt aagtcgaccg
1200cctggggagt acggccgcaa ggttaaaact caaatgaatt gacgggggcc cgcacaagcg
1260gtggagcatg tggtttaatt cgatgcaacg cgaagaacct tacctggtct tgacatccac
1320ggaagttttc agagatgaga atgtgccttc gggaaccgtg agacaggtgc tgcatggctg
1380tcgtcagctc gtgttgtgaa atgttgggtt aagtcccgca acgagcgcaa cccttatcct
1440ttgttgccag cggtccggcc gggaactcaa aggagactgc cagtgataaa ctggaggaag
1500gtggggatga cgtcaagtca tcatggccct tacgaccagg gctacacacg tgctacaatg
1560gcgcatacaa agagaagcga cctcgcgaga gcaagcggac ctcataaagt gcgtcgtagt
1620ccggattgga gtctgcaact cgactccatg aagtcggaat cgctagtaat cgtggatcag
1680aatgccacgg tgaatacgtt cccgggcctt gtacacaccg cccgtcacac catgggagtg
1740ggttgcaaaa gaagtaggta gctacagggc cguguggcug ugaacgggau ccacggccgg
1800ggccgaggug gggggccccg ttgagctaac cggtactaat gaaccgtgag gcttaaccga
1860gaggttaagc gactaagcgt acacggtgga tgccctggca gtcagaggcg atgaaggacg
1920tgctaatctg cgataagcgt cggtaaggtg atatgaaccg ttataaccgg cgatttccga
1980atggggaaac ccagtgtgtt tcgacacact atcattaact gaatccatag gttaatgagg
2040cgaaccgggg gaactgaaac atctaagtac cccgaggaaa agaaatcaac cgagattccc
2100ccagtagcgg cgagcgaacg gggagcagcc cagagcctga atcagtgtgt gtgttagtgg
2160aagcgtctgg aaaggcgcgc gatacagggt gacagccccg tacacaaaaa tgcacatgct
2220gtgagctcga tgagtagggc gggacacgtg gtatcctgtc tgaatatggg gggaccatcc
2280tccaaggcta aatactcctg actgaccgat agtgaaccag taccgtgagg gaaaggcgaa
2340aagaaccccg gcgaggggag tgaaaaagaa cctgaaaccg tgtacgtaca agcagtggga
2400gcacgcttag gcgtgtgact gcgtaccttt tgtataatgg gtcagcgact tatattctgt
2460agcaaggtta accgaatagg ggagccgaag ggaaaccgag tcttaactgg gcgttaagtt
2520gcagggtata gacccgaaac ccggtgatct agccatgggc aggttgaagg ttgggtaaca
2580ctaactggag gaccgaaccg actaatgttg aaaaattagc ggatgacttg tggctggggg
2640tgaaaggcca atcaaaccgg gagatagctg gttctccccg aaagctattt aggtagcgcc
2700tcgtgaattc atctccgggg gtagagcact gtttcggcaa gggggtcatc ccgacttacc
2760aacccgatgc aaactgcgaa taccggagaa tgttatcacg ggagacacac ggcgggtgct
2820aacgtccgtc gtgaagaggg aaacaaccca gaccgccagc taaggtccca aagtcatggt
2880taagtgggaa acgatgtggg aaggcccaga cagccaggat gttggcttag aagcagccat
2940catttaaaga aagcgtaata gctcactggt cgagtcggcc tgcgcggaag atgtaacggg
3000gctaaaccat gcaccgaagc tgcggcagcg acgcttatgc gttgttgggt aggggagcgt
3060tctgtaagcc tgcgaaggtg tgctgtgagg catgctggag gtatcagaag tgcgaatgct
3120gacataagta acgataaagc gggtgaaaag cccgctcgcc ggaagaccaa gggttcctgt
3180ccaacgttaa tcggggcagg gtgagtcgac ccctaaggcg aggccgaaag gcgtagtcga
3240tgggaaacag gttaatattc ctgtacttgg tgttactgcg aaggggggac ggagaaggct
3300atgttggccg ggcgacggtt gtcccggttt aagcgtgtag gctggttttc caggcaaatc
3360cggaaaatca aggctgaggc gtgatgacga ggcactacgg tgctgaagca acaaatgccc
3420tgcttccagg aaaagcctct aagcatcagg taacatcaaa tcgtacccca aaccgacaca
3480ggtggtcagg tagagaatac caaggcgctt gagagaactc gggtgaagga actaggcaaa
3540atggtgccgt aacttcggga gaaggcacgc tgatatgtag gtgaggtccc tcgcggatgg
3600agctgaaatc agtcgaagat accagctggc tgcaactgtt tattaaaaac acagcactgt
3660gcaaacacga aagtggacgt atacggtgtg acgcctgccc ggtgccggaa ggttaattga
3720tggggttagc gcaagcgaag ctcttgatcg aagccccggt aaacggcggc cgtaactata
3780acggtcctaa ggtagcgaaa ttccttgtcg ggtaagttcc gacctgcacg aatggcgtaa
3840tgatggccag gctgtctcca cccgagactc agtgaaattg aactcgctgt gaagatgcag
3900tgtacccgcg gcaagacgga aagaccccgt gaacctttac tatagcttga cactgaacat
3960tgagccttga tgtgtaggat aggtgggagg ctttgaagtg tggacgccag tctgcatgga
4020gccgaccttg aaataccacc ctttaatgtt tgatgttcta acgttgaccc gtaatccggg
4080ttgcggacag tgtctggtgg gtagtttgac tggggcggtc tcctcctaaa gagtaacgga
4140ggagcacgaa ggttggctaa tcctggtcgg acatcaggag gttagtgcaa tggcataagc
4200cagcttgact gcgagcgtga cggcgcgagc aggtgcgaaa gcaggtcata gtgatccggt
4260ggttctgaat ggaagggcca tcgctcaacg gataaaaggt actccgggga taacaggctg
4320ataccgccca agagttcata tcgacggcgg tgtttggcac ctcgatgtcg gctcatcaca
4380tcctggggct gaagtaggtc ccaagggtat ggctgttcgc catttaaagt ggtacgcgag
4440ctgggtttag aacgtcgtga gacagttcgg tccctatctg ccgtgggcgc tggagaactg
4500aggggggctg ctcctagtac gagaggaccg gagtggacgc atcactggtg ttcgggttgt
4560catgccaatg gcactgcccg gtagctaaat gcggaagaga taagtgctga aagcatctaa
4620gcacgaaact tgccccgaga tgagttctcc ctgacccttt aagggtcctg aaggaacgtt
4680gaagacgacg acgttgatag gccgggtgtg taagcggggc cccccaccgu uugacgcccc
4740gaagccgugg aucccgggag ccacguggcu ggccugucgg cgcttaccac tttgtgattc
4800atgactgggg tgaagtcgta acaaggtaac cgtaggggaa cctgcggttg gatcacctcc
4860ttaccttaaa gaagcgtact ttgtagtgct cacacagatt gtctgataga aagtgaaaag
4920caaggcgttt acgcgttggg agtgaggctg aagagaataa ggccgttcgc tttctattaa
4980tgaaagctca ccctacacga aaatatcacg caacgcgtga taagcaattt tcgtgtcccc
5040ttcgtctaga ggcccaggac accgcccttt cacggcggta acaggggttc gaatccccta
5100ggggacgcca cttgctggtt tgtgagtgaa agtcgccgac cttaatatct caaaactcat
5160cttcgggtga tgtttgagat atttgctctt taaaaatctg gatcaagctg aaaattgaaa
5220cactgaacaa cgagagttgt tcgtgagtct ctcaaatttt cgcaacacga tgatgaatcg
5280aaagaaacat cttcgggttg tgagcttaag cttacaacgc cgaagctgtt ttggcggatg
5340agagaagatt ttcagcctga tacagattaa atcagaacgc agaagcggtc tgataaaaca
5400gaatttgcct ggcggcagta gcgcggtggt cccacctgac cccatgccga actcagaagt
5460gaaacgccgt agcgccgatg gtagtgtggg gtctccccat gcgagagtag ggaactgcca
5520ggcatcaaat aaaacgaaag gctcagtcga aagactgggc ctttcgtttt atctgttgtt
5580tgtcggtgaa cgctctcctg agtaggacaa atccgccggg agcggatttg aacgttgcga
5640agcaacggcc cggagggtgg cgggcaggac gcccgccata aactgccagg catcaaatta
5700agcagaaggc catcctgacg gatggccttt ttgcgtttct acaaactctt cctgtcgtca
5760tatctacaag ccggcgcgcc gggaaatgtg cgcggaaccc ctatttgttt atttttctaa
5820atacattcaa atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat
5880tgaaaaagga agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg
5940gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa
6000gatcagttgg gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt
6060gagagttttc gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt
6120ggcgcggtat tatcccgtgt tgacgccggg caagagcaac tcggtcgccg catacactat
6180tctcagaatg acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg
6240acagtaagag aattatgcag tgctgcaata accatgagtg ataacactgc ggccaactta
6300cttctgacaa cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat
6360catgtaactc gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag
6420cgtgacacca cgatgcctgc agcaatggca acaacgttgc gcaaactatt aactggcgaa
6480ctacttactc tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca
6540ggaccacttc tgcgctcggc ccttccggct agctggttta ttgctgataa atctggagcc
6600ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt
6660atcgtagtta tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc
6720gctgagatag gtgcctcact gattaagcat tggtaactgc agaccaagtt tactcatata
6780tactttagat tgatttaaaa cttcattttt aatttaaaag gatctaggtg aagatccttt
6840ttgataatct catgaccaaa atcccttaac gtgagttttc gttccactga gcgtcagacc
6900ccgtagaaaa gatcaaagga tcttcttgag atcctttttt tctgcgcgta atctgctgct
6960tgcaaacaaa aaaaccaccg ctaccagcgg tggtttgttt gccggatcaa gagctaccaa
7020ctctttttcc gaaggtaact ggcttcagca gagcgcagat accaaatact gtccttctag
7080tgtagccgta gttaggccac cacttcaaga actctgtagc accgcctaca tacctcgctc
7140tgctaatcct gttaccagtg gctgctgcca gtggcgataa gtcgtgtctt accgggttgg
7200actcaagacg atagttaccg gataaggcgc agcggtcggg ctgaacgggg ggttcgtgca
7260cacagcccag cttggagcga acgacctaca ccgaactgag atacctacag cgtgagctat
7320gagaaagcgc cacgcttccc gaagggagaa aggcggacag gtatccggta agcggcaggg
7380tcggaacagg agagcgcacg agggagcttc cagggggaaa cgcctggtat ctttatagtc
7440ctgtcgggtt tcgccacctc tgacttgagc gtcgattttt gtgatgctcg tcaggggggc
7500ggagcctatg gaaaaacgcc agcaacgcgg cctttttacg gttcctggcc ttttgctgg
7559197555DNAArtificial SequenceDesigned sequence in accordance with
embodiments 19gcggccgcga tctctcacct accaaacaat gcccccctgc aaaaaataaa
ttcatataaa 60aaacatacag ataaccatct gcggtgataa attatctctg gcggtgttga
cataaatacc 120actggcggtg atactgagca cgggtaccgg ccgctgagaa aaagcgaagc
ggcactgctc 180tttaacaatt tatcagacaa tctgtgtggg cactcgaaga tacggattct
taacgtcgca 240agacgaaaaa tgaataccaa gtctcaagag tgaacacgta attcattacg
aagtttaatt 300ctttgagcgt caaactttta aattgaagag tttgatcatg gctcagattg
aacgctggcg 360gcaggcctaa cacatgcaag tcgaacggta acaggaagaa gcttgcttct
ttgctgacga 420gtggcggacg ggtgagtaat gtctgggaaa ctgcctgatg gagggggata
actactggaa 480acggtagcta ataccgcata acgtcgcaag accaaagagg gggaccttcg
ggcctcttgc 540catcggatgt gcccagatgg gattagctag taggtggggt aacggctcac
ctaggcgacg 600atccctagct ggtctgagag gatgaccagc cacactggaa ctgagacacg
gtccagactc 660ctacgggagg cagcagtggg gaatattgca caatgggcgc aagcctgatg
cagccatgcc 720gcgtgtatga agaaggcctt cgggttgtaa agtactttca gcggggagga
agggagtaaa 780gttaatacct ttgctcattg acgttacccg cagaagaagc accggctaac
tccgtgccag 840cagccgcggt aatacggagg gtgcaagcgt taatcggaat tactgggcgt
aaagcgcacg 900caggcggttt gttaagtcag atgtgaaatc cccgggctca acctgggaac
tgcatctgat 960actggcaagc ttgagtctcg tagagggggg tagaattcca ggtgtagcgg
tgaaatgcgt 1020agagatctgg aggaataccg gtggcgaagg cggccccctg gacgaagact
gacgctcagg 1080tgcgaaagcg tggggagcaa acaggattag ataccctggt agtccacgcc
gtaaacgatg 1140tcgacttgga ggttgtgccc ttgaggcgtg gcttccggag ctaacgcgtt
aagtcgaccg 1200cctggggagt acggccgcaa ggttaaaact caaatgaatt gacgggggcc
cgcacaagcg 1260gtggagcatg tggtttaatt cgatgcaacg cgaagaacct tacctggtct
tgacatccac 1320ggaagttttc agagatgaga atgtgccttc gggaaccgtg agacaggtgc
tgcatggctg 1380tcgtcagctc gtgttgtgaa atgttgggtt aagtcccgca acgagcgcaa
cccttatcct 1440ttgttgccag cggtccggcc gggaactcaa aggagactgc cagtgataaa
ctggaggaag 1500gtggggatga cgtcaagtca tcatggccct tacgaccagg gctacacacg
tgctacaatg 1560gcgcatacaa agagaagcga cctcgcgaga gcaagcggac ctcataaagt
gcgtcgtagt 1620ccggattgga gtctgcaact cgactccatg aagtcggaat cgctagtaat
cgtggatcag 1680aatgccacgg tgaatacgtt cccgggcctt gtacacaccg cccgtcacac
catgggagtg 1740ggttgcaaaa gaagtaggta gctccuagcg gguagggaga ucauggacgu
aacauucgau 1800ccgagggggg gggaccgttg agctaaccgg tactaatgaa ccgtgaggct
taaccgagag 1860gttaagcgac taagcgtaca cggtggatgc cctggcagtc agaggcgatg
aaggacgtgc 1920taatctgcga taagcgtcgg taaggtgata tgaaccgtta taaccggcga
tttccgaatg 1980gggaaaccca gtgtgtttcg acacactatc attaactgaa tccataggtt
aatgaggcga 2040accgggggaa ctgaaacatc taagtacccc gaggaaaaga aatcaaccga
gattccccca 2100gtagcggcga gcgaacgggg agcagcccag agcctgaatc agtgtgtgtg
ttagtggaag 2160cgtctggaaa ggcgcgcgat acagggtgac agccccgtac acaaaaatgc
acatgctgtg 2220agctcgatga gtagggcggg acacgtggta tcctgtctga atatgggggg
accatcctcc 2280aaggctaaat actcctgact gaccgatagt gaaccagtac cgtgagggaa
aggcgaaaag 2340aaccccggcg aggggagtga aaaagaacct gaaaccgtgt acgtacaagc
agtgggagca 2400cgcttaggcg tgtgactgcg taccttttgt ataatgggtc agcgacttat
attctgtagc 2460aaggttaacc gaatagggga gccgaaggga aaccgagtct taactgggcg
ttaagttgca 2520gggtatagac ccgaaacccg gtgatctagc catgggcagg ttgaaggttg
ggtaacacta 2580actggaggac cgaaccgact aatgttgaaa aattagcgga tgacttgtgg
ctgggggtga 2640aaggccaatc aaaccgggag atagctggtt ctccccgaaa gctatttagg
tagcgcctcg 2700tgaattcatc tccgggggta gagcactgtt tcggcaaggg ggtcatcccg
acttaccaac 2760ccgatgcaaa ctgcgaatac cggagaatgt tatcacggga gacacacggc
gggtgctaac 2820gtccgtcgtg aagagggaaa caacccagac cgccagctaa ggtcccaaag
tcatggttaa 2880gtgggaaacg atgtgggaag gcccagacag ccaggatgtt ggcttagaag
cagccatcat 2940ttaaagaaag cgtaatagct cactggtcga gtcggcctgc gcggaagatg
taacggggct 3000aaaccatgca ccgaagctgc ggcagcgacg cttatgcgtt gttgggtagg
ggagcgttct 3060gtaagcctgc gaaggtgtgc tgtgaggcat gctggaggta tcagaagtgc
gaatgctgac 3120ataagtaacg ataaagcggg tgaaaagccc gctcgccgga agaccaaggg
ttcctgtcca 3180acgttaatcg gggcagggtg agtcgacccc taaggcgagg ccgaaaggcg
tagtcgatgg 3240gaaacaggtt aatattcctg tacttggtgt tactgcgaag gggggacgga
gaaggctatg 3300ttggccgggc gacggttgtc ccggtttaag cgtgtaggct ggttttccag
gcaaatccgg 3360aaaatcaagg ctgaggcgtg atgacgaggc actacggtgc tgaagcaaca
aatgccctgc 3420ttccaggaaa agcctctaag catcaggtaa catcaaatcg taccccaaac
cgacacaggt 3480ggtcaggtag agaataccaa ggcgcttgag agaactcggg tgaaggaact
aggcaaaatg 3540gtgccgtaac ttcgggagaa ggcacgctga tatgtaggtg aggtccctcg
cggatggagc 3600tgaaatcagt cgaagatacc agctggctgc aactgtttat taaaaacaca
gcactgtgca 3660aacacgaaag tggacgtata cggtgtgacg cctgcccggt gccggaaggt
taattgatgg 3720ggttagcgca agcgaagctc ttgatcgaag ccccggtaaa cggcggccgt
aactataacg 3780gtcctaaggt agcgaaattc cttgtcgggt aagttccgac ctgcacgaat
ggcgtaatga 3840tggccaggct gtctccaccc gagactcagt gaaattgaac tcgctgtgaa
gatgcagtgt 3900acccgcggca agacggaaag accccgtgaa cctttactat agcttgacac
tgaacattga 3960gccttgatgt gtaggatagg tgggaggctt tgaagtgtgg acgccagtct
gcatggagcc 4020gaccttgaaa taccaccctt taatgtttga tgttctaacg ttgacccgta
atccgggttg 4080cggacagtgt ctggtgggta gtttgactgg ggcggtctcc tcctaaagag
taacggagga 4140gcacgaaggt tggctaatcc tggtcggaca tcaggaggtt agtgcaatgg
cataagccag 4200cttgactgcg agcgtgacgg cgcgagcagg tgcgaaagca ggtcatagtg
atccggtggt 4260tctgaatgga agggccatcg ctcaacggat aaaaggtact ccggggataa
caggctgata 4320ccgcccaaga gttcatatcg acggcggtgt ttggcacctc gatgtcggct
catcacatcc 4380tggggctgaa gtaggtccca agggtatggc tgttcgccat ttaaagtggt
acgcgagctg 4440ggtttagaac gtcgtgagac agttcggtcc ctatctgccg tgggcgctgg
agaactgagg 4500ggggctgctc ctagtacgag aggaccggag tggacgcatc actggtgttc
gggttgtcat 4560gccaatggca ctgcccggta gctaaatgcg gaagagataa gtgctgaaag
catctaagca 4620cgaaacttgc cccgagatga gttctccctg accctttaag ggtcctgaag
gaacgttgaa 4680gacgacgacg ttgataggcc gggtgtgtaa gcgguccccc ccccguuuga
cgaucgaaug 4740cccguccaug aucugugaac uacccgcuag aagcggcgct taccactttg
tgattcatga 4800ctggggtgaa gtcgtaacaa ggtaaccgta ggggaacctg cggttggatc
acctccttac 4860cttaaagaag cgtactttgt agtgctcaca cagattgtct gatagaaagt
gaaaagcaag 4920gcgtttacgc gttgggagtg aggctgaaga gaataaggcc gttcgctttc
tattaatgaa 4980agctcaccct acacgaaaat atcacgcaac gcgtgataag caattttcgt
gtccccttcg 5040tctagaggcc caggacaccg ccctttcacg gcggtaacag gggttcgaat
cccctagggg 5100acgccacttg ctggtttgtg agtgaaagtc gccgacctta atatctcaaa
actcatcttc 5160gggtgatgtt tgagatattt gctctttaaa aatctggatc aagctgaaaa
ttgaaacact 5220gaacaacgag agttgttcgt gagtctctca aattttcgca acacgatgat
gaatcgaaag 5280aaacatcttc gggttgtgag cttaagctta caacgccgaa gctgttttgg
cggatgagag 5340aagattttca gcctgataca gattaaatca gaacgcagaa gcggtctgat
aaaacagaat 5400ttgcctggcg gcagtagcgc ggtggtccca cctgacccca tgccgaactc
agaagtgaaa 5460cgccgtagcg ccgatggtag tgtggggtct ccccatgcga gagtagggaa
ctgccaggca 5520tcaaataaaa cgaaaggctc agtcgaaaga ctgggccttt cgttttatct
gttgtttgtc 5580ggtgaacgct ctcctgagta ggacaaatcc gccgggagcg gatttgaacg
ttgcgaagca 5640acggcccgga gggtggcggg caggacgccc gccataaact gccaggcatc
aaattaagca 5700gaaggccatc ctgacggatg gcctttttgc gtttctacaa actcttcctg
tcgtcatatc 5760tacaagccgg cgcgccggga aatgtgcgcg gaacccctat ttgtttattt
ttctaaatac 5820attcaaatat gtatccgctc atgagacaat aaccctgata aatgcttcaa
taatattgaa 5880aaaggaagag tatgagtatt caacatttcc gtgtcgccct tattcccttt
tttgcggcat 5940tttgccttcc tgtttttgct cacccagaaa cgctggtgaa agtaaaagat
gctgaagatc 6000agttgggtgc acgagtgggt tacatcgaac tggatctcaa cagcggtaag
atccttgaga 6060gttttcgccc cgaagaacgt tttccaatga tgagcacttt taaagttctg
ctatgtggcg 6120cggtattatc ccgtgttgac gccgggcaag agcaactcgg tcgccgcata
cactattctc 6180agaatgactt ggttgagtac tcaccagtca cagaaaagca tcttacggat
ggcatgacag 6240taagagaatt atgcagtgct gcaataacca tgagtgataa cactgcggcc
aacttacttc 6300tgacaacgat cggaggaccg aaggagctaa ccgctttttt gcacaacatg
ggggatcatg 6360taactcgcct tgatcgttgg gaaccggagc tgaatgaagc cataccaaac
gacgagcgtg 6420acaccacgat gcctgcagca atggcaacaa cgttgcgcaa actattaact
ggcgaactac 6480ttactctagc ttcccggcaa caattaatag actggatgga ggcggataaa
gttgcaggac 6540cacttctgcg ctcggccctt ccggctagct ggtttattgc tgataaatct
ggagccggtg 6600agcgtgggtc tcgcggtatc attgcagcac tggggccaga tggtaagccc
tcccgtatcg 6660tagttatcta cacgacgggg agtcaggcaa ctatggatga acgaaataga
cagatcgctg 6720agataggtgc ctcactgatt aagcattggt aactgcagac caagtttact
catatatact 6780ttagattgat ttaaaacttc atttttaatt taaaaggatc taggtgaaga
tcctttttga 6840taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt
cagaccccgt 6900agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct
gctgcttgca 6960aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc
taccaactct 7020ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc
ttctagtgta 7080gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc
tcgctctgct 7140aatcctgtta ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg
ggttggactc 7200aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt
cgtgcacaca 7260gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg
agctatgaga 7320aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg
gcagggtcgg 7380aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt
atagtcctgt 7440cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag
gggggcggag 7500cctatggaaa aacgccagca acgcggcctt tttacggttc ctggcctttt
gctgg 7555207560DNAArtificial SequenceDesigned sequence in
accordance with embodiments 20gcggccgcga tctctcacct accaaacaat
gcccccctgc aaaaaataaa ttcatataaa 60aaacatacag ataaccatct gcggtgataa
attatctctg gcggtgttga cataaatacc 120actggcggtg atactgagca cgggtaccgg
ccgctgagaa aaagcgaagc ggcactgctc 180tttaacaatt tatcagacaa tctgtgtggg
cactcgaaga tacggattct taacgtcgca 240agacgaaaaa tgaataccaa gtctcaagag
tgaacacgta attcattacg aagtttaatt 300ctttgagcgt caaactttta aattgaagag
tttgatcatg gctcagattg aacgctggcg 360gcaggcctaa cacatgcaag tcgaacggta
acaggaagaa gcttgcttct ttgctgacga 420gtggcggacg ggtgagtaat gtctgggaaa
ctgcctgatg gagggggata actactggaa 480acggtagcta ataccgcata acgtcgcaag
accaaagagg gggaccttcg ggcctcttgc 540catcggatgt gcccagatgg gattagctag
taggtggggt aacggctcac ctaggcgacg 600atccctagct ggtctgagag gatgaccagc
cacactggaa ctgagacacg gtccagactc 660ctacgggagg cagcagtggg gaatattgca
caatgggcgc aagcctgatg cagccatgcc 720gcgtgtatga agaaggcctt cgggttgtaa
agtactttca gcggggagga agggagtaaa 780gttaatacct ttgctcattg acgttacccg
cagaagaagc accggctaac tccgtgccag 840cagccgcggt aatacggagg gtgcaagcgt
taatcggaat tactgggcgt aaagcgcacg 900caggcggttt gttaagtcag atgtgaaatc
cccgggctca acctgggaac tgcatctgat 960actggcaagc ttgagtctcg tagagggggg
tagaattcca ggtgtagcgg tgaaatgcgt 1020agagatctgg aggaataccg gtggcgaagg
cggccccctg gacgaagact gacgctcagg 1080tgcgaaagcg tggggagcaa acaggattag
ataccctggt agtccacgcc gtaaacgatg 1140tcgacttgga ggttgtgccc ttgaggcgtg
gcttccggag ctaacgcgtt aagtcgaccg 1200cctggggagt acggccgcaa ggttaaaact
caaatgaatt gacgggggcc cgcacaagcg 1260gtggagcatg tggtttaatt cgatgcaacg
cgaagaacct tacctggtct tgacatccac 1320ggaagttttc agagatgaga atgtgccttc
gggaaccgtg agacaggtgc tgcatggctg 1380tcgtcagctc gtgttgtgaa atgttgggtt
aagtcccgca acgagcgcaa cccttatcct 1440ttgttgccag cggtccggcc gggaactcaa
aggagactgc cagtgataaa ctggaggaag 1500gtggggatga cgtcaagtca tcatggccct
tacgaccagg gctacacacg tgctacaatg 1560gcgcatacaa agagaagcga cctcgcgaga
gcaagcggac ctcataaagt gcgtcgtagt 1620ccggattgga gtctgcaact cgactccatg
aagtcggaat cgctagtaat cgtggatcag 1680aatgccacgg tgaatacgtt cccgggcctt
gtacacaccg cccgtcacac catgggagtg 1740ggttgcaaaa gaagtaggta gctuaacacc
uguuguggcg cgaaacgacc gauauuuccc 1800ccgauauggc gcgaaaccgg gacgcccgtt
gagctaaccg gtactaatga accgtgaggc 1860ttaaccgaga ggttaagcga ctaagcgtac
acggtggatg ccctggcagt cagaggcgat 1920gaaggacgtg ctaatctgcg ataagcgtcg
gtaaggtgat atgaaccgtt ataaccggcg 1980atttccgaat ggggaaaccc agtgtgtttc
gacacactat cattaactga atccataggt 2040taatgaggcg aaccggggga actgaaacat
ctaagtaccc cgaggaaaag aaatcaaccg 2100agattccccc agtagcggcg agcgaacggg
gagcagccca gagcctgaat cagtgtgtgt 2160gttagtggaa gcgtctggaa aggcgcgcga
tacagggtga cagccccgta cacaaaaatg 2220cacatgctgt gagctcgatg agtagggcgg
gacacgtggt atcctgtctg aatatggggg 2280gaccatcctc caaggctaaa tactcctgac
tgaccgatag tgaaccagta ccgtgaggga 2340aaggcgaaaa gaaccccggc gaggggagtg
aaaaagaacc tgaaaccgtg tacgtacaag 2400cagtgggagc acgcttaggc gtgtgactgc
gtaccttttg tataatgggt cagcgactta 2460tattctgtag caaggttaac cgaatagggg
agccgaaggg aaaccgagtc ttaactgggc 2520gttaagttgc agggtataga cccgaaaccc
ggtgatctag ccatgggcag gttgaaggtt 2580gggtaacact aactggagga ccgaaccgac
taatgttgaa aaattagcgg atgacttgtg 2640gctgggggtg aaaggccaat caaaccggga
gatagctggt tctccccgaa agctatttag 2700gtagcgcctc gtgaattcat ctccgggggt
agagcactgt ttcggcaagg gggtcatccc 2760gacttaccaa cccgatgcaa actgcgaata
ccggagaatg ttatcacggg agacacacgg 2820cgggtgctaa cgtccgtcgt gaagagggaa
acaacccaga ccgccagcta aggtcccaaa 2880gtcatggtta agtgggaaac gatgtgggaa
ggcccagaca gccaggatgt tggcttagaa 2940gcagccatca tttaaagaaa gcgtaatagc
tcactggtcg agtcggcctg cgcggaagat 3000gtaacggggc taaaccatgc accgaagctg
cggcagcgac gcttatgcgt tgttgggtag 3060gggagcgttc tgtaagcctg cgaaggtgtg
ctgtgaggca tgctggaggt atcagaagtg 3120cgaatgctga cataagtaac gataaagcgg
gtgaaaagcc cgctcgccgg aagaccaagg 3180gttcctgtcc aacgttaatc ggggcagggt
gagtcgaccc ctaaggcgag gccgaaaggc 3240gtagtcgatg ggaaacaggt taatattcct
gtacttggtg ttactgcgaa ggggggacgg 3300agaaggctat gttggccggg cgacggttgt
cccggtttaa gcgtgtaggc tggttttcca 3360ggcaaatccg gaaaatcaag gctgaggcgt
gatgacgagg cactacggtg ctgaagcaac 3420aaatgccctg cttccaggaa aagcctctaa
gcatcaggta acatcaaatc gtaccccaaa 3480ccgacacagg tggtcaggta gagaatacca
aggcgcttga gagaactcgg gtgaaggaac 3540taggcaaaat ggtgccgtaa cttcgggaga
aggcacgctg atatgtaggt gaggtccctc 3600gcggatggag ctgaaatcag tcgaagatac
cagctggctg caactgttta ttaaaaacac 3660agcactgtgc aaacacgaaa gtggacgtat
acggtgtgac gcctgcccgg tgccggaagg 3720ttaattgatg gggttagcgc aagcgaagct
cttgatcgaa gccccggtaa acggcggccg 3780taactataac ggtcctaagg tagcgaaatt
ccttgtcggg taagttccga cctgcacgaa 3840tggcgtaatg atggccaggc tgtctccacc
cgagactcag tgaaattgaa ctcgctgtga 3900agatgcagtg tacccgcggc aagacggaaa
gaccccgtga acctttacta tagcttgaca 3960ctgaacattg agccttgatg tgtaggatag
gtgggaggct ttgaagtgtg gacgccagtc 4020tgcatggagc cgaccttgaa ataccaccct
ttaatgtttg atgttctaac gttgacccgt 4080aatccgggtt gcggacagtg tctggtgggt
agtttgactg gggcggtctc ctcctaaaga 4140gtaacggagg agcacgaagg ttggctaatc
ctggtcggac atcaggaggt tagtgcaatg 4200gcataagcca gcttgactgc gagcgtgacg
gcgcgagcag gtgcgaaagc aggtcatagt 4260gatccggtgg ttctgaatgg aagggccatc
gctcaacgga taaaaggtac tccggggata 4320acaggctgat accgcccaag agttcatatc
gacggcggtg tttggcacct cgatgtcggc 4380tcatcacatc ctggggctga agtaggtccc
aagggtatgg ctgttcgcca tttaaagtgg 4440tacgcgagct gggtttagaa cgtcgtgaga
cagttcggtc cctatctgcc gtgggcgctg 4500gagaactgag gggggctgct cctagtacga
gaggaccgga gtggacgcat cactggtgtt 4560cgggttgtca tgccaatggc actgcccggt
agctaaatgc ggaagagata agtgctgaaa 4620gcatctaagc acgaaacttg ccccgagatg
agttctccct gaccctttaa gggtcctgaa 4680ggaacgttga agacgacgac gttgataggc
cgggtgtgta agcgggcguc ccgggagcca 4740uaucgggcaa uaucggucgg agccacaauu
ggugaaagcg gcgcttacca ctttgtgatt 4800catgactggg gtgaagtcgt aacaaggtaa
ccgtagggga acctgcggtt ggatcacctc 4860cttaccttaa agaagcgtac tttgtagtgc
tcacacagat tgtctgatag aaagtgaaaa 4920gcaaggcgtt tacgcgttgg gagtgaggct
gaagagaata aggccgttcg ctttctatta 4980atgaaagctc accctacacg aaaatatcac
gcaacgcgtg ataagcaatt ttcgtgtccc 5040cttcgtctag aggcccagga caccgccctt
tcacggcggt aacaggggtt cgaatcccct 5100aggggacgcc acttgctggt ttgtgagtga
aagtcgccga ccttaatatc tcaaaactca 5160tcttcgggtg atgtttgaga tatttgctct
ttaaaaatct ggatcaagct gaaaattgaa 5220acactgaaca acgagagttg ttcgtgagtc
tctcaaattt tcgcaacacg atgatgaatc 5280gaaagaaaca tcttcgggtt gtgagcttaa
gcttacaacg ccgaagctgt tttggcggat 5340gagagaagat tttcagcctg atacagatta
aatcagaacg cagaagcggt ctgataaaac 5400agaatttgcc tggcggcagt agcgcggtgg
tcccacctga ccccatgccg aactcagaag 5460tgaaacgccg tagcgccgat ggtagtgtgg
ggtctcccca tgcgagagta gggaactgcc 5520aggcatcaaa taaaacgaaa ggctcagtcg
aaagactggg cctttcgttt tatctgttgt 5580ttgtcggtga acgctctcct gagtaggaca
aatccgccgg gagcggattt gaacgttgcg 5640aagcaacggc ccggagggtg gcgggcagga
cgcccgccat aaactgccag gcatcaaatt 5700aagcagaagg ccatcctgac ggatggcctt
tttgcgtttc tacaaactct tcctgtcgtc 5760atatctacaa gccggcgcgc cgggaaatgt
gcgcggaacc cctatttgtt tatttttcta 5820aatacattca aatatgtatc cgctcatgag
acaataaccc tgataaatgc ttcaataata 5880ttgaaaaagg aagagtatga gtattcaaca
tttccgtgtc gcccttattc ccttttttgc 5940ggcattttgc cttcctgttt ttgctcaccc
agaaacgctg gtgaaagtaa aagatgctga 6000agatcagttg ggtgcacgag tgggttacat
cgaactggat ctcaacagcg gtaagatcct 6060tgagagtttt cgccccgaag aacgttttcc
aatgatgagc acttttaaag ttctgctatg 6120tggcgcggta ttatcccgtg ttgacgccgg
gcaagagcaa ctcggtcgcc gcatacacta 6180ttctcagaat gacttggttg agtactcacc
agtcacagaa aagcatctta cggatggcat 6240gacagtaaga gaattatgca gtgctgcaat
aaccatgagt gataacactg cggccaactt 6300acttctgaca acgatcggag gaccgaagga
gctaaccgct tttttgcaca acatggggga 6360tcatgtaact cgccttgatc gttgggaacc
ggagctgaat gaagccatac caaacgacga 6420gcgtgacacc acgatgcctg cagcaatggc
aacaacgttg cgcaaactat taactggcga 6480actacttact ctagcttccc ggcaacaatt
aatagactgg atggaggcgg ataaagttgc 6540aggaccactt ctgcgctcgg cccttccggc
tagctggttt attgctgata aatctggagc 6600cggtgagcgt gggtctcgcg gtatcattgc
agcactgggg ccagatggta agccctcccg 6660tatcgtagtt atctacacga cggggagtca
ggcaactatg gatgaacgaa atagacagat 6720cgctgagata ggtgcctcac tgattaagca
ttggtaactg cagaccaagt ttactcatat 6780atactttaga ttgatttaaa acttcatttt
taatttaaaa ggatctaggt gaagatcctt 6840tttgataatc tcatgaccaa aatcccttaa
cgtgagtttt cgttccactg agcgtcagac 6900cccgtagaaa agatcaaagg atcttcttga
gatccttttt ttctgcgcgt aatctgctgc 6960ttgcaaacaa aaaaaccacc gctaccagcg
gtggtttgtt tgccggatca agagctacca 7020actctttttc cgaaggtaac tggcttcagc
agagcgcaga taccaaatac tgtccttcta 7080gtgtagccgt agttaggcca ccacttcaag
aactctgtag caccgcctac atacctcgct 7140ctgctaatcc tgttaccagt ggctgctgcc
agtggcgata agtcgtgtct taccgggttg 7200gactcaagac gatagttacc ggataaggcg
cagcggtcgg gctgaacggg gggttcgtgc 7260acacagccca gcttggagcg aacgacctac
accgaactga gatacctaca gcgtgagcta 7320tgagaaagcg ccacgcttcc cgaagggaga
aaggcggaca ggtatccggt aagcggcagg 7380gtcggaacag gagagcgcac gagggagctt
ccagggggaa acgcctggta tctttatagt 7440cctgtcgggt ttcgccacct ctgacttgag
cgtcgatttt tgtgatgctc gtcagggggg 7500cggagcctat ggaaaaacgc cagcaacgcg
gcctttttac ggttcctggc cttttgctgg 7560217565DNAArtificial
SequenceDesigned sequence in accordance with embodiments
21gcggccgcga tctctcacct accaaacaat gcccccctgc aaaaaataaa ttcatataaa
60aaacatacag ataaccatct gcggtgataa attatctctg gcggtgttga cataaatacc
120actggcggtg atactgagca cgggtaccgg ccgctgagaa aaagcgaagc ggcactgctc
180tttaacaatt tatcagacaa tctgtgtggg cactcgaaga tacggattct taacgtcgca
240agacgaaaaa tgaataccaa gtctcaagag tgaacacgta attcattacg aagtttaatt
300ctttgagcgt caaactttta aattgaagag tttgatcatg gctcagattg aacgctggcg
360gcaggcctaa cacatgcaag tcgaacggta acaggaagaa gcttgcttct ttgctgacga
420gtggcggacg ggtgagtaat gtctgggaaa ctgcctgatg gagggggata actactggaa
480acggtagcta ataccgcata acgtcgcaag accaaagagg gggaccttcg ggcctcttgc
540catcggatgt gcccagatgg gattagctag taggtggggt aacggctcac ctaggcgacg
600atccctagct ggtctgagag gatgaccagc cacactggaa ctgagacacg gtccagactc
660ctacgggagg cagcagtggg gaatattgca caatgggcgc aagcctgatg cagccatgcc
720gcgtgtatga agaaggcctt cgggttgtaa agtactttca gcggggagga agggagtaaa
780gttaatacct ttgctcattg acgttacccg cagaagaagc accggctaac tccgtgccag
840cagccgcggt aatacggagg gtgcaagcgt taatcggaat tactgggcgt aaagcgcacg
900caggcggttt gttaagtcag atgtgaaatc cccgggctca acctgggaac tgcatctgat
960actggcaagc ttgagtctcg tagagggggg tagaattcca ggtgtagcgg tgaaatgcgt
1020agagatctgg aggaataccg gtggcgaagg cggccccctg gacgaagact gacgctcagg
1080tgcgaaagcg tggggagcaa acaggattag ataccctggt agtccacgcc gtaaacgatg
1140tcgacttgga ggttgtgccc ttgaggcgtg gcttccggag ctaacgcgtt aagtcgaccg
1200cctggggagt acggccgcaa ggttaaaact caaatgaatt gacgggggcc cgcacaagcg
1260gtggagcatg tggtttaatt cgatgcaacg cgaagaacct tacctggtct tgacatccac
1320ggaagttttc agagatgaga atgtgccttc gggaaccgtg agacaggtgc tgcatggctg
1380tcgtcagctc gtgttgtgaa atgttgggtt aagtcccgca acgagcgcaa cccttatcct
1440ttgttgccag cggtccggcc gggaactcaa aggagactgc cagtgataaa ctggaggaag
1500gtggggatga cgtcaagtca tcatggccct tacgaccagg gctacacacg tgctacaatg
1560gcgcatacaa agagaagcga cctcgcgaga gcaagcggac ctcataaagt gcgtcgtagt
1620ccggattgga gtctgcaact cgactccatg aagtcggaat cgctagtaat cgtggatcag
1680aatgccacgg tgaatacgtt cccgggcctt gtacacaccg cccgtcacac catgggagtg
1740ggttgcaaaa gaagtaggta gctcuguagg cgaacuacua acucguacgu augggauuau
1800auucgauaua uggaccuaug gacccccgtt gagctaaccg gtactaatga accgtgaggc
1860ttaaccgaga ggttaagcga ctaagcgtac acggtggatg ccctggcagt cagaggcgat
1920gaaggacgtg ctaatctgcg ataagcgtcg gtaaggtgat atgaaccgtt ataaccggcg
1980atttccgaat ggggaaaccc agtgtgtttc gacacactat cattaactga atccataggt
2040taatgaggcg aaccggggga actgaaacat ctaagtaccc cgaggaaaag aaatcaaccg
2100agattccccc agtagcggcg agcgaacggg gagcagccca gagcctgaat cagtgtgtgt
2160gttagtggaa gcgtctggaa aggcgcgcga tacagggtga cagccccgta cacaaaaatg
2220cacatgctgt gagctcgatg agtagggcgg gacacgtggt atcctgtctg aatatggggg
2280gaccatcctc caaggctaaa tactcctgac tgaccgatag tgaaccagta ccgtgaggga
2340aaggcgaaaa gaaccccggc gaggggagtg aaaaagaacc tgaaaccgtg tacgtacaag
2400cagtgggagc acgcttaggc gtgtgactgc gtaccttttg tataatgggt cagcgactta
2460tattctgtag caaggttaac cgaatagggg agccgaaggg aaaccgagtc ttaactgggc
2520gttaagttgc agggtataga cccgaaaccc ggtgatctag ccatgggcag gttgaaggtt
2580gggtaacact aactggagga ccgaaccgac taatgttgaa aaattagcgg atgacttgtg
2640gctgggggtg aaaggccaat caaaccggga gatagctggt tctccccgaa agctatttag
2700gtagcgcctc gtgaattcat ctccgggggt agagcactgt ttcggcaagg gggtcatccc
2760gacttaccaa cccgatgcaa actgcgaata ccggagaatg ttatcacggg agacacacgg
2820cgggtgctaa cgtccgtcgt gaagagggaa acaacccaga ccgccagcta aggtcccaaa
2880gtcatggtta agtgggaaac gatgtgggaa ggcccagaca gccaggatgt tggcttagaa
2940gcagccatca tttaaagaaa gcgtaatagc tcactggtcg agtcggcctg cgcggaagat
3000gtaacggggc taaaccatgc accgaagctg cggcagcgac gcttatgcgt tgttgggtag
3060gggagcgttc tgtaagcctg cgaaggtgtg ctgtgaggca tgctggaggt atcagaagtg
3120cgaatgctga cataagtaac gataaagcgg gtgaaaagcc cgctcgccgg aagaccaagg
3180gttcctgtcc aacgttaatc ggggcagggt gagtcgaccc ctaaggcgag gccgaaaggc
3240gtagtcgatg ggaaacaggt taatattcct gtacttggtg ttactgcgaa ggggggacgg
3300agaaggctat gttggccggg cgacggttgt cccggtttaa gcgtgtaggc tggttttcca
3360ggcaaatccg gaaaatcaag gctgaggcgt gatgacgagg cactacggtg ctgaagcaac
3420aaatgccctg cttccaggaa aagcctctaa gcatcaggta acatcaaatc gtaccccaaa
3480ccgacacagg tggtcaggta gagaatacca aggcgcttga gagaactcgg gtgaaggaac
3540taggcaaaat ggtgccgtaa cttcgggaga aggcacgctg atatgtaggt gaggtccctc
3600gcggatggag ctgaaatcag tcgaagatac cagctggctg caactgttta ttaaaaacac
3660agcactgtgc aaacacgaaa gtggacgtat acggtgtgac gcctgcccgg tgccggaagg
3720ttaattgatg gggttagcgc aagcgaagct cttgatcgaa gccccggtaa acggcggccg
3780taactataac ggtcctaagg tagcgaaatt ccttgtcggg taagttccga cctgcacgaa
3840tggcgtaatg atggccaggc tgtctccacc cgagactcag tgaaattgaa ctcgctgtga
3900agatgcagtg tacccgcggc aagacggaaa gaccccgtga acctttacta tagcttgaca
3960ctgaacattg agccttgatg tgtaggatag gtgggaggct ttgaagtgtg gacgccagtc
4020tgcatggagc cgaccttgaa ataccaccct ttaatgtttg atgttctaac gttgacccgt
4080aatccgggtt gcggacagtg tctggtgggt agtttgactg gggcggtctc ctcctaaaga
4140gtaacggagg agcacgaagg ttggctaatc ctggtcggac atcaggaggt tagtgcaatg
4200gcataagcca gcttgactgc gagcgtgacg gcgcgagcag gtgcgaaagc aggtcatagt
4260gatccggtgg ttctgaatgg aagggccatc gctcaacgga taaaaggtac tccggggata
4320acaggctgat accgcccaag agttcatatc gacggcggtg tttggcacct cgatgtcggc
4380tcatcacatc ctggggctga agtaggtccc aagggtatgg ctgttcgcca tttaaagtgg
4440tacgcgagct gggtttagaa cgtcgtgaga cagttcggtc cctatctgcc gtgggcgctg
4500gagaactgag gggggctgct cctagtacga gaggaccgga gtggacgcat cactggtgtt
4560cgggttgtca tgccaatggc actgcccggt agctaaatgc ggaagagata agtgctgaaa
4620gcatctaagc acgaaacttg ccccgagatg agttctccct gaccctttaa gggtcctgaa
4680ggaacgttga agacgacgac gttgataggc cgggtgtgta agcggggguc cauaggaugg
4740aaguauaucg aauauaaucc cauacgcaaa uuagcgccua uugcggcgct taccactttg
4800tgattcatga ctggggtgaa gtcgtaacaa ggtaaccgta ggggaacctg cggttggatc
4860acctccttac cttaaagaag cgtactttgt agtgctcaca cagattgtct gatagaaagt
4920gaaaagcaag gcgtttacgc gttgggagtg aggctgaaga gaataaggcc gttcgctttc
4980tattaatgaa agctcaccct acacgaaaat atcacgcaac gcgtgataag caattttcgt
5040gtccccttcg tctagaggcc caggacaccg ccctttcacg gcggtaacag gggttcgaat
5100cccctagggg acgccacttg ctggtttgtg agtgaaagtc gccgacctta atatctcaaa
5160actcatcttc gggtgatgtt tgagatattt gctctttaaa aatctggatc aagctgaaaa
5220ttgaaacact gaacaacgag agttgttcgt gagtctctca aattttcgca acacgatgat
5280gaatcgaaag aaacatcttc gggttgtgag cttaagctta caacgccgaa gctgttttgg
5340cggatgagag aagattttca gcctgataca gattaaatca gaacgcagaa gcggtctgat
5400aaaacagaat ttgcctggcg gcagtagcgc ggtggtccca cctgacccca tgccgaactc
5460agaagtgaaa cgccgtagcg ccgatggtag tgtggggtct ccccatgcga gagtagggaa
5520ctgccaggca tcaaataaaa cgaaaggctc agtcgaaaga ctgggccttt cgttttatct
5580gttgtttgtc ggtgaacgct ctcctgagta ggacaaatcc gccgggagcg gatttgaacg
5640ttgcgaagca acggcccgga gggtggcggg caggacgccc gccataaact gccaggcatc
5700aaattaagca gaaggccatc ctgacggatg gcctttttgc gtttctacaa actcttcctg
5760tcgtcatatc tacaagccgg cgcgccggga aatgtgcgcg gaacccctat ttgtttattt
5820ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata aatgcttcaa
5880taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct tattcccttt
5940tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa agtaaaagat
6000gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa cagcggtaag
6060atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt taaagttctg
6120ctatgtggcg cggtattatc ccgtgttgac gccgggcaag agcaactcgg tcgccgcata
6180cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca tcttacggat
6240ggcatgacag taagagaatt atgcagtgct gcaataacca tgagtgataa cactgcggcc
6300aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt gcacaacatg
6360ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc cataccaaac
6420gacgagcgtg acaccacgat gcctgcagca atggcaacaa cgttgcgcaa actattaact
6480ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga ggcggataaa
6540gttgcaggac cacttctgcg ctcggccctt ccggctagct ggtttattgc tgataaatct
6600ggagccggtg agcgtgggtc tcgcggtatc attgcagcac tggggccaga tggtaagccc
6660tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga acgaaataga
6720cagatcgctg agataggtgc ctcactgatt aagcattggt aactgcagac caagtttact
6780catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc taggtgaaga
6840tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt
6900cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct
6960gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc
7020taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc
7080ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc
7140tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg
7200ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt
7260cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg
7320agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg
7380gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt
7440atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag
7500gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc ctggcctttt
7560gctgg
7565227565DNAArtificial SequenceDesigned sequence in accordance with
embodiments 22gcggccgcga tctctcacct accaaacaat gcccccctgc aaaaaataaa
ttcatataaa 60aaacatacag ataaccatct gcggtgataa attatctctg gcggtgttga
cataaatacc 120actggcggtg atactgagca cgggtaccgg ccgctgagaa aaagcgaagc
ggcactgctc 180tttaacaatt tatcagacaa tctgtgtggg cactcgaaga tacggattct
taacgtcgca 240agacgaaaaa tgaataccaa gtctcaagag tgaacacgta attcattacg
aagtttaatt 300ctttgagcgt caaactttta aattgaagag tttgatcatg gctcagattg
aacgctggcg 360gcaggcctaa cacatgcaag tcgaacggta acaggaagaa gcttgcttct
ttgctgacga 420gtggcggacg ggtgagtaat gtctgggaaa ctgcctgatg gagggggata
actactggaa 480acggtagcta ataccgcata acgtcgcaag accaaagagg gggaccttcg
ggcctcttgc 540catcggatgt gcccagatgg gattagctag taggtggggt aacggctcac
ctaggcgacg 600atccctagct ggtctgagag gatgaccagc cacactggaa ctgagacacg
gtccagactc 660ctacgggagg cagcagtggg gaatattgca caatgggcgc aagcctgatg
cagccatgcc 720gcgtgtatga agaaggcctt cgggttgtaa agtactttca gcggggagga
agggagtaaa 780gttaatacct ttgctcattg acgttacccg cagaagaagc accggctaac
tccgtgccag 840cagccgcggt aatacggagg gtgcaagcgt taatcggaat tactgggcgt
aaagcgcacg 900caggcggttt gttaagtcag atgtgaaatc cccgggctca acctgggaac
tgcatctgat 960actggcaagc ttgagtctcg tagagggggg tagaattcca ggtgtagcgg
tgaaatgcgt 1020agagatctgg aggaataccg gtggcgaagg cggccccctg gacgaagact
gacgctcagg 1080tgcgaaagcg tggggagcaa acaggattag ataccctggt agtccacgcc
gtaaacgatg 1140tcgacttgga ggttgtgccc ttgaggcgtg gcttccggag ctaacgcgtt
aagtcgaccg 1200cctggggagt acggccgcaa ggttaaaact caaatgaatt gacgggggcc
cgcacaagcg 1260gtggagcatg tggtttaatt cgatgcaacg cgaagaacct tacctggtct
tgacatccac 1320ggaagttttc agagatgaga atgtgccttc gggaaccgtg agacaggtgc
tgcatggctg 1380tcgtcagctc gtgttgtgaa atgttgggtt aagtcccgca acgagcgcaa
cccttatcct 1440ttgttgccag cggtccggcc gggaactcaa aggagactgc cagtgataaa
ctggaggaag 1500gtggggatga cgtcaagtca tcatggccct tacgaccagg gctacacacg
tgctacaatg 1560gcgcatacaa agagaagcga cctcgcgaga gcaagcggac ctcataaagt
gcgtcgtagt 1620ccggattgga gtctgcaact cgactccatg aagtcggaat cgctagtaat
cgtggatcag 1680aatgccacgg tgaatacgtt cccgggcctt gtacacaccg cccgtcacac
catgggagtg 1740ggttgcaaaa gaagtaggta gctcuguagg cgaacuacua acucguacgu
augggauuau 1800auucgauaua uggaccuaug gacccccgtt gagctaaccg gtactaatga
accgtgaggc 1860ttaaccgaga ggttaagcga ctaagcgtac acggtggatg ccctggcagt
cagaggcgat 1920gaaggacgtg ctaatctgcg ataagcgtcg gtaaggtgat atgaaccgtt
ataaccggcg 1980atttccgaat ggggaaaccc agtgtgtttc gacacactat cattaactga
atccataggt 2040taatgaggcg aaccggggga actgaaacat ctaagtaccc cgaggaaaag
aaatcaaccg 2100agattccccc agtagcggcg agcgaacggg gagcagccca gagcctgaat
cagtgtgtgt 2160gttagtggaa gcgtctggaa aggcgcgcga tacagggtga cagccccgta
cacaaaaatg 2220cacatgctgt gagctcgatg agtagggcgg gacacgtggt atcctgtctg
aatatggggg 2280gaccatcctc caaggctaaa tactcctgac tgaccgatag tgaaccagta
ccgtgaggga 2340aaggcgaaaa gaaccccggc gaggggagtg aaaaagaacc tgaaaccgtg
tacgtacaag 2400cagtgggagc acgcttaggc gtgtgactgc gtaccttttg tataatgggt
cagcgactta 2460tattctgtag caaggttaac cgaatagggg agccgaaggg aaaccgagtc
ttaactgggc 2520gttaagttgc agggtataga cccgaaaccc ggtgatctag ccatgggcag
gttgaaggtt 2580gggtaacact aactggagga ccgaaccgac taatgttgaa aaattagcgg
atgacttgtg 2640gctgggggtg aaaggccaat caaaccggga gatagctggt tctccccgaa
agctatttag 2700gtagcgcctc gtgaattcat ctccgggggt agagcactgt ttcggcaagg
gggtcatccc 2760gacttaccaa cccgatgcaa actgcgaata ccggagaatg ttatcacggg
agacacacgg 2820cgggtgctaa cgtccgtcgt gaagagggaa acaacccaga ccgccagcta
aggtcccaaa 2880gtcatggtta agtgggaaac gatgtgggaa ggcccagaca gccaggatgt
tggcttagaa 2940gcagccatca tttaaagaaa gcgtaatagc tcactggtcg agtcggcctg
cgcggaagat 3000gtaacggggc taaaccatgc accgaagctg cggcagcgac gcttatgcgt
tgttgggtag 3060gggagcgttc tgtaagcctg cgaaggtgtg ctgtgaggca tgctggaggt
atcagaagtg 3120cgaatgctga cataagtaac gataaagcgg gtgaaaagcc cgctcgccgg
aagaccaagg 3180gttcctgtcc aacgttaatc ggggcagggt gagtcgaccc ctaaggcgag
gccgaaaggc 3240gtagtcgatg ggaaacaggt taatattcct gtacttggtg ttactgcgaa
ggggggacgg 3300agaaggctat gttggccggg cgacggttgt cccggtttaa gcgtgtaggc
tggttttcca 3360ggcaaatccg gaaaatcaag gctgaggcgt gatgacgagg cactacggtg
ctgaagcaac 3420aaatgccctg cttccaggaa aagcctctaa gcatcaggta acatcaaatc
gtaccccaaa 3480ccgacacagg tggtcaggta gagaatacca aggcgcttga gagaactcgg
gtgaaggaac 3540taggcaaaat ggtgccgtaa cttcgggaga aggcacgctg atatgtaggt
gaggtccctc 3600gcggatggag ctgaaatcag tcgaagatac cagctggctg caactgttta
ttaaaaacac 3660agcactgtgc aaacacgaaa gtggacgtat acggtgtgac gcctgcccgg
tgccggaagg 3720ttaattgatg gggttagcgc aagcgaagct cttgatcgaa gccccggtaa
acggcggccg 3780taactataac ggtcctaagg tagcgaaatt ccttgtcggg taagttccga
cctgcacgaa 3840tggcgtaatg atggccaggc tgtctccacc cgagactcag tgaaattgaa
ctcgctgtga 3900agatgcagtg tacccgcggc aagacggaaa gaccccgtga acctttacta
tagcttgaca 3960ctgaacattg agccttgatg tgtaggatag gtgggaggct ttgaagtgtg
gacgccagtc 4020tgcatggagc cgaccttgaa ataccaccct ttaatgtttg atgttctaac
gttgacccgt 4080aatccgggtt gcggacagtg tctggtgggt agtttgactg gggcggtctc
ctcctaaaga 4140gtaacggagg agcacgaagg ttggctaatc ctggtcggac atcaggaggt
tagtgcaatg 4200gcataagcca gcttgactgc gagcgtgacg gcgcgagcag gtgcgaaagc
aggtcatagt 4260gatccggtgg ttctgaatgg aagggccatc gctcaacgga taaaaggtac
tccggggata 4320acaggctgat accgcccaag agttcatatc gacggcggtg tttggcacct
cgatgtcggc 4380tcatcacatc ctggggctga agtaggtccc aagggtatgg ctgttcgcca
tttaaagtgg 4440tacgcgagct gggtttagaa cgtcgtgaga cagttcggtc cctatctgcc
gtgggcgctg 4500gagaactgag gggggctgct cctagtacga gaggaccgga gtggacgcat
cactggtgtt 4560cgggttgtca tgccaatggc actgcccggt agctaaatgc ggaagagata
agtgctgaaa 4620gcatctaagc acgaaacttg ccccgagatg agttctccct gaccctttaa
gggtcctgaa 4680ggaacgttga agacgacgac gttgataggc cgggtgtgta agcggggguc
cauaggaugg 4740aaguauaucg aauauaaucc cauacgcaaa uuagcgccua uugcggcgct
taccactttg 4800tgattcatga ctggggtgaa gtcgtaacaa ggtaaccgta ggggaacctg
cggttggatc 4860acctccttac cttaaagaag cgtactttgt agtgctcaca cagattgtct
gatagaaagt 4920gaaaagcaag gcgtttacgc gttgggagtg aggctgaaga gaataaggcc
gttcgctttc 4980tattaatgaa agctcaccct acacgaaaat atcacgcaac gcgtgataag
caattttcgt 5040gtccccttcg tctagaggcc caggacaccg ccctttcacg gcggtaacag
gggttcgaat 5100cccctagggg acgccacttg ctggtttgtg agtgaaagtc gccgacctta
atatctcaaa 5160actcatcttc gggtgatgtt tgagatattt gctctttaaa aatctggatc
aagctgaaaa 5220ttgaaacact gaacaacgag agttgttcgt gagtctctca aattttcgca
acacgatgat 5280gaatcgaaag aaacatcttc gggttgtgag cttaagctta caacgccgaa
gctgttttgg 5340cggatgagag aagattttca gcctgataca gattaaatca gaacgcagaa
gcggtctgat 5400aaaacagaat ttgcctggcg gcagtagcgc ggtggtccca cctgacccca
tgccgaactc 5460agaagtgaaa cgccgtagcg ccgatggtag tgtggggtct ccccatgcga
gagtagggaa 5520ctgccaggca tcaaataaaa cgaaaggctc agtcgaaaga ctgggccttt
cgttttatct 5580gttgtttgtc ggtgaacgct ctcctgagta ggacaaatcc gccgggagcg
gatttgaacg 5640ttgcgaagca acggcccgga gggtggcggg caggacgccc gccataaact
gccaggcatc 5700aaattaagca gaaggccatc ctgacggatg gcctttttgc gtttctacaa
actcttcctg 5760tcgtcatatc tacaagccgg cgcgccggga aatgtgcgcg gaacccctat
ttgtttattt 5820ttctaaatac attcaaatat gtatccgctc atgagacaat aaccctgata
aatgcttcaa 5880taatattgaa aaaggaagag tatgagtatt caacatttcc gtgtcgccct
tattcccttt 5940tttgcggcat tttgccttcc tgtttttgct cacccagaaa cgctggtgaa
agtaaaagat 6000gctgaagatc agttgggtgc acgagtgggt tacatcgaac tggatctcaa
cagcggtaag 6060atccttgaga gttttcgccc cgaagaacgt tttccaatga tgagcacttt
taaagttctg 6120ctatgtggcg cggtattatc ccgtgttgac gccgggcaag agcaactcgg
tcgccgcata 6180cactattctc agaatgactt ggttgagtac tcaccagtca cagaaaagca
tcttacggat 6240ggcatgacag taagagaatt atgcagtgct gcaataacca tgagtgataa
cactgcggcc 6300aacttacttc tgacaacgat cggaggaccg aaggagctaa ccgctttttt
gcacaacatg 6360ggggatcatg taactcgcct tgatcgttgg gaaccggagc tgaatgaagc
cataccaaac 6420gacgagcgtg acaccacgat gcctgcagca atggcaacaa cgttgcgcaa
actattaact 6480ggcgaactac ttactctagc ttcccggcaa caattaatag actggatgga
ggcggataaa 6540gttgcaggac cacttctgcg ctcggccctt ccggctagct ggtttattgc
tgataaatct 6600ggagccggtg agcgtgggtc tcgcggtatc attgcagcac tggggccaga
tggtaagccc 6660tcccgtatcg tagttatcta cacgacgggg agtcaggcaa ctatggatga
acgaaataga 6720cagatcgctg agataggtgc ctcactgatt aagcattggt aactgcagac
caagtttact 6780catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc
taggtgaaga 6840tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc
cactgagcgt 6900cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg
cgcgtaatct 6960gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg
gatcaagagc 7020taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca
aatactgtcc 7080ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg
cctacatacc 7140tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg
tgtcttaccg 7200ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga
acggggggtt 7260cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac
ctacagcgtg 7320agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat
ccggtaagcg 7380gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc
tggtatcttt 7440atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga
tgctcgtcag 7500gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc
ctggcctttt 7560gctgg
7565237564DNAArtificial SequenceDesigned sequence in
accordance with embodiments 23gcggccgcga tctctcacct accaaacaat
gcccccctgc aaaaaataaa ttcatataaa 60aaacatacag ataaccatct gcggtgataa
attatctctg gcggtgttga cataaatacc 120actggcggtg atactgagca cgggtaccgg
ccgctgagaa aaagcgaagc ggcactgctc 180tttaacaatt tatcagacaa tctgtgtggg
cactcgaaga tacggattct taacgtcgca 240agacgaaaaa tgaataccaa gtctcaagag
tgaacacgta attcattacg aagtttaatt 300ctttgagcgt caaactttta aattgaagag
tttgatcatg gctcagattg aacgctggcg 360gcaggcctaa cacatgcaag tcgaacggta
acaggaagaa gcttgcttct ttgctgacga 420gtggcggacg ggtgagtaat gtctgggaaa
ctgcctgatg gagggggata actactggaa 480acggtagcta ataccgcata acgtcgcaag
accaaagagg gggaccttcg ggcctcttgc 540catcggatgt gcccagatgg gattagctag
taggtggggt aacggctcac ctaggcgacg 600atccctagct ggtctgagag gatgaccagc
cacactggaa ctgagacacg gtccagactc 660ctacgggagg cagcagtggg gaatattgca
caatgggcgc aagcctgatg cagccatgcc 720gcgtgtatga agaaggcctt cgggttgtaa
agtactttca gcggggagga agggagtaaa 780gttaatacct ttgctcattg acgttacccg
cagaagaagc accggctaac tccgtgccag 840cagccgcggt aatacggagg gtgcaagcgt
taatcggaat tactgggcgt aaagcgcacg 900caggcggttt gttaagtcag atgtgaaatc
cccgggctca acctgggaac tgcatctgat 960actggcaagc ttgagtctcg tagagggggg
tagaattcca ggtgtagcgg tgaaatgcgt 1020agagatctgg aggaataccg gtggcgaagg
cggccccctg gacgaagact gacgctcagg 1080tgcgaaagcg tggggagcaa acaggattag
ataccctggt agtccacgcc gtaaacgatg 1140tcgacttgga ggttgtgccc ttgaggcgtg
gcttccggag ctaacgcgtt aagtcgaccg 1200cctggggagt acggccgcaa ggttaaaact
caaatgaatt gacgggggcc cgcacaagcg 1260gtggagcatg tggtttaatt cgatgcaacg
cgaagaacct tacctggtct tgacatccac 1320ggaagttttc agagatgaga atgtgccttc
gggaaccgtg agacaggtgc tgcatggctg 1380tcgtcagctc gtgttgtgaa atgttgggtt
aagtcccgca acgagcgcaa cccttatcct 1440ttgttgccag cggtccggcc gggaactcaa
aggagactgc cagtgataaa ctggaggaag 1500gtggggatga cgtcaagtca tcatggccct
tacgaccagg gctacacacg tgctacaatg 1560gcgcatacaa agagaagcga cctcgcgaga
gcaagcggac ctcataaagt gcgtcgtagt 1620ccggattgga gtctgcaact cgactccatg
aagtcggaat cgctagtaat cgtggatcag 1680aatgccacgg tgaatacgtt cccgggcctt
gtacacaccg cccgtcacac catgggagtg 1740ggttgcaaaa gaagtaggta gctgtctacc
gccccaggga gatatccccc cacaggccat 1800ataagacagc gtatacccgc cgttgagcta
accggtacta atgaaccgtg aggcttaacc 1860gagaggttaa gcgactaagc gtacacggtg
gatgccctgg cagtcagagg cgatgaagga 1920cgtgctaatc tgcgataagc gtcggtaagg
tgatatgaac cgttataacc ggcgatttcc 1980gaatggggaa acccagtgtg tttcgacaca
ctatcattaa ctgaatccat aggttaatga 2040ggcgaaccgg gggaactgaa acatctaagt
accccgagga aaagaaatca accgagattc 2100ccccagtagc ggcgagcgaa cggggagcag
cccagagcct gaatcagtgt gtgtgttagt 2160ggaagcgtct ggaaaggcgc gcgatacagg
gtgacagccc cgtacacaaa aatgcacatg 2220ctgtgagctc gatgagtagg gcgggacacg
tggtatcctg tctgaatatg gggggaccat 2280cctccaaggc taaatactcc tgactgaccg
atagtgaacc agtaccgtga gggaaaggcg 2340aaaagaaccc cggcgagggg agtgaaaaag
aacctgaaac cgtgtacgta caagcagtgg 2400gagcacgctt aggcgtgtga ctgcgtacct
tttgtataat gggtcagcga cttatattct 2460gtagcaaggt taaccgaata ggggagccga
agggaaaccg agtcttaact gggcgttaag 2520ttgcagggta tagacccgaa acccggtgat
ctagccatgg gcaggttgaa ggttgggtaa 2580cactaactgg aggaccgaac cgactaatgt
tgaaaaatta gcggatgact tgtggctggg 2640ggtgaaaggc caatcaaacc gggagatagc
tggttctccc cgaaagctat ttaggtagcg 2700cctcgtgaat tcatctccgg gggtagagca
ctgtttcggc aagggggtca tcccgactta 2760ccaacccgat gcaaactgcg aataccggag
aatgttatca cgggagacac acggcgggtg 2820ctaacgtccg tcgtgaagag ggaaacaacc
cagaccgcca gctaaggtcc caaagtcatg 2880gttaagtggg aaacgatgtg ggaaggccca
gacagccagg atgttggctt agaagcagcc 2940atcatttaaa gaaagcgtaa tagctcactg
gtcgagtcgg cctgcgcgga agatgtaacg 3000gggctaaacc atgcaccgaa gctgcggcag
cgacgcttat gcgttgttgg gtaggggagc 3060gttctgtaag cctgcgaagg tgtgctgtga
ggcatgctgg aggtatcaga agtgcgaatg 3120ctgacataag taacgataaa gcgggtgaaa
agcccgctcg ccggaagacc aagggttcct 3180gtccaacgtt aatcggggca gggtgagtcg
acccctaagg cgaggccgaa aggcgtagtc 3240gatgggaaac aggttaatat tcctgtactt
ggtgttactg cgaagggggg acggagaagg 3300ctatgttggc cgggcgacgg ttgtcccggt
ttaagcgtgt aggctggttt tccaggcaaa 3360tccggaaaat caaggctgag gcgtgatgac
gaggcactac ggtgctgaag caacaaatgc 3420cctgcttcca ggaaaagcct ctaagcatca
ggtaacatca aatcgtaccc caaaccgaca 3480caggtggtca ggtagagaat accaaggcgc
ttgagagaac tcgggtgaag gaactaggca 3540aaatggtgcc gtaacttcgg gagaaggcac
gctgatatgt aggtgaggtc cctcgcggat 3600ggagctgaaa tcagtcgaag ataccagctg
gctgcaactg tttattaaaa acacagcact 3660gtgcaaacac gaaagtggac gtatacggtg
tgacgcctgc ccggtgccgg aaggttaatt 3720gatggggtta gcgcaagcga agctcttgat
cgaagccccg gtaaacggcg gccgtaacta 3780taacggtcct aaggtagcga aattccttgt
cgggtaagtt ccgacctgca cgaatggcgt 3840aatgatggcc aggctgtctc cacccgagac
tcagtgaaat tgaactcgct gtgaagatgc 3900agtgtacccg cggcaagacg gaaagacccc
gtgaaccttt actatagctt gacactgaac 3960attgagcctt gatgtgtagg ataggtggga
ggctttgaag tgtggacgcc agtctgcatg 4020gagccgacct tgaaatacca ccctttaatg
tttgatgttc taacgttgac ccgtaatccg 4080ggttgcggac agtgtctggt gggtagtttg
actggggcgg tctcctccta aagagtaacg 4140gaggagcacg aaggttggct aatcctggtc
ggacatcagg aggttagtgc aatggcataa 4200gccagcttga ctgcgagcgt gacggcgcga
gcaggtgcga aagcaggtca tagtgatccg 4260gtggttctga atggaagggc catcgctcaa
cggataaaag gtactccggg gataacaggc 4320tgataccgcc caagagttca tatcgacggc
ggtgtttggc acctcgatgt cggctcatca 4380catcctgggg ctgaagtagg tcccaagggt
atggctgttc gccatttaaa gtggtacgcg 4440agctgggttt agaacgtcgt gagacagttc
ggtccctatc tgccgtgggc gctggagaac 4500tgaggggggc tgctcctagt acgagaggac
cggagtggac gcatcactgg tgttcgggtt 4560gtcatgccaa tggcactgcc cggtagctaa
atgcggaaga gataagtgct gaaagcatct 4620aagcacgaaa cttgccccga gatgagttct
ccctgaccct ttaagggtcc tgaaggaacg 4680ttgaagacga cgacgttgat aggccgggtg
tgtaagcggc gggtatacgc aatgacgtat 4740ggtggcctgt ggggggatat ctgtgaactg
gggcgaagta gccggcgctt accactttgt 4800gattcatgac tggggtgaag tcgtaacaag
gtaaccgtag gggaacctgc ggttggatca 4860cctccttacc ttaaagaagc gtactttgta
gtgctcacac agattgtctg atagaaagtg 4920aaaagcaagg cgtttacgcg ttgggagtga
ggctgaagag aataaggccg ttcgctttct 4980attaatgaaa gctcacccta cacgaaaata
tcacgcaacg cgtgataagc aattttcgtg 5040tccccttcgt ctagaggccc aggacaccgc
cctttcacgg cggtaacagg ggttcgaatc 5100ccctagggga cgccacttgc tggtttgtga
gtgaaagtcg ccgaccttaa tatctcaaaa 5160ctcatcttcg ggtgatgttt gagatatttg
ctctttaaaa atctggatca agctgaaaat 5220tgaaacactg aacaacgaga gttgttcgtg
agtctctcaa attttcgcaa cacgatgatg 5280aatcgaaaga aacatcttcg ggttgtgagc
ttaagcttac aacgccgaag ctgttttggc 5340ggatgagaga agattttcag cctgatacag
attaaatcag aacgcagaag cggtctgata 5400aaacagaatt tgcctggcgg cagtagcgcg
gtggtcccac ctgaccccat gccgaactca 5460gaagtgaaac gccgtagcgc cgatggtagt
gtggggtctc cccatgcgag agtagggaac 5520tgccaggcat caaataaaac gaaaggctca
gtcgaaagac tgggcctttc gttttatctg 5580ttgtttgtcg gtgaacgctc tcctgagtag
gacaaatccg ccgggagcgg atttgaacgt 5640tgcgaagcaa cggcccggag ggtggcgggc
aggacgcccg ccataaactg ccaggcatca 5700aattaagcag aaggccatcc tgacggatgg
cctttttgcg tttctacaaa ctcttcctgt 5760cgtcatatct acaagccggc gcgccgggaa
atgtgcgcgg aacccctatt tgtttatttt 5820tctaaataca ttcaaatatg tatccgctca
tgagacaata accctgataa atgcttcaat 5880aatattgaaa aaggaagagt atgagtattc
aacatttccg tgtcgccctt attccctttt 5940ttgcggcatt ttgccttcct gtttttgctc
acccagaaac gctggtgaaa gtaaaagatg 6000ctgaagatca gttgggtgca cgagtgggtt
acatcgaact ggatctcaac agcggtaaga 6060tccttgagag ttttcgcccc gaagaacgtt
ttccaatgat gagcactttt aaagttctgc 6120tatgtggcgc ggtattatcc cgtgttgacg
ccgggcaaga gcaactcggt cgccgcatac 6180actattctca gaatgacttg gttgagtact
caccagtcac agaaaagcat cttacggatg 6240gcatgacagt aagagaatta tgcagtgctg
caataaccat gagtgataac actgcggcca 6300acttacttct gacaacgatc ggaggaccga
aggagctaac cgcttttttg cacaacatgg 6360gggatcatgt aactcgcctt gatcgttggg
aaccggagct gaatgaagcc ataccaaacg 6420acgagcgtga caccacgatg cctgcagcaa
tggcaacaac gttgcgcaaa ctattaactg 6480gcgaactact tactctagct tcccggcaac
aattaataga ctggatggag gcggataaag 6540ttgcaggacc acttctgcgc tcggcccttc
cggctagctg gtttattgct gataaatctg 6600gagccggtga gcgtgggtct cgcggtatca
ttgcagcact ggggccagat ggtaagccct 6660cccgtatcgt agttatctac acgacgggga
gtcaggcaac tatggatgaa cgaaatagac 6720agatcgctga gataggtgcc tcactgatta
agcattggta actgcagacc aagtttactc 6780atatatactt tagattgatt taaaacttca
tttttaattt aaaaggatct aggtgaagat 6840cctttttgat aatctcatga ccaaaatccc
ttaacgtgag ttttcgttcc actgagcgtc 6900agaccccgta gaaaagatca aaggatcttc
ttgagatcct ttttttctgc gcgtaatctg 6960ctgcttgcaa acaaaaaaac caccgctacc
agcggtggtt tgtttgccgg atcaagagct 7020accaactctt tttccgaagg taactggctt
cagcagagcg cagataccaa atactgtcct 7080tctagtgtag ccgtagttag gccaccactt
caagaactct gtagcaccgc ctacatacct 7140cgctctgcta atcctgttac cagtggctgc
tgccagtggc gataagtcgt gtcttaccgg 7200gttggactca agacgatagt taccggataa
ggcgcagcgg tcgggctgaa cggggggttc 7260gtgcacacag cccagcttgg agcgaacgac
ctacaccgaa ctgagatacc tacagcgtga 7320gctatgagaa agcgccacgc ttcccgaagg
gagaaaggcg gacaggtatc cggtaagcgg 7380cagggtcgga acaggagagc gcacgaggga
gcttccaggg ggaaacgcct ggtatcttta 7440tagtcctgtc gggtttcgcc acctctgact
tgagcgtcga tttttgtgat gctcgtcagg 7500ggggcggagc ctatggaaaa acgccagcaa
cgcggccttt ttacggttcc tggccttttg 7560ctgg
7564247551DNAArtificial SequenceDesigned
sequence in accordance with embodiments 24gcggccgcga tctctcacct
accaaacaat gcccccctgc aaaaaataaa ttcatataaa 60aaacatacag ataaccatct
gcggtgataa attatctctg gcggtgttga cataaatacc 120actggcggtg atactgagca
cgggtaccgg ccgctgagaa aaagcgaagc ggcactgctc 180tttaacaatt tatcagacaa
tctgtgtggg cactcgaaga tacggattct taacgtcgca 240agacgaaaaa tgaataccaa
gtctcaagag tgaacacgta attcattacg aagtttaatt 300ctttgagcgt caaactttta
aattgaagag tttgatcatg gctcagattg aacgctggcg 360gcaggcctaa cacatgcaag
tcgaacggta acaggaagaa gcttgcttct ttgctgacga 420gtggcggacg ggtgagtaat
gtctgggaaa ctgcctgatg gagggggata actactggaa 480acggtagcta ataccgcata
acgtcgcaag accaaagagg gggaccttcg ggcctcttgc 540catcggatgt gcccagatgg
gattagctag taggtggggt aacggctcac ctaggcgacg 600atccctagct ggtctgagag
gatgaccagc cacactggaa ctgagacacg gtccagactc 660ctacgggagg cagcagtggg
gaatattgca caatgggcgc aagcctgatg cagccatgcc 720gcgtgtatga agaaggcctt
cgggttgtaa agtactttca gcggggagga agggagtaaa 780gttaatacct ttgctcattg
acgttacccg cagaagaagc accggctaac tccgtgccag 840cagccgcggt aatacggagg
gtgcaagcgt taatcggaat tactgggcgt aaagcgcacg 900caggcggttt gttaagtcag
atgtgaaatc cccgggctca acctgggaac tgcatctgat 960actggcaagc ttgagtctcg
tagagggggg tagaattcca ggtgtagcgg tgaaatgcgt 1020agagatctgg aggaataccg
gtggcgaagg cggccccctg gacgaagact gacgctcagg 1080tgcgaaagcg tggggagcaa
acaggattag ataccctggt agtccacgcc gtaaacgatg 1140tcgacttgga ggttgtgccc
ttgaggcgtg gcttccggag ctaacgcgtt aagtcgaccg 1200cctggggagt acggccgcaa
ggttaaaact caaatgaatt gacgggggcc cgcacaagcg 1260gtggagcatg tggtttaatt
cgatgcaacg cgaagaacct tacctggtct tgacatccac 1320ggaagttttc agagatgaga
atgtgccttc gggaaccgtg agacaggtgc tgcatggctg 1380tcgtcagctc gtgttgtgaa
atgttgggtt aagtcccgca acgagcgcaa cccttatcct 1440ttgttgccag cggtccggcc
gggaactcaa aggagactgc cagtgataaa ctggaggaag 1500gtggggatga cgtcaagtca
tcatggccct tacgaccagg gctacacacg tgctacaatg 1560gcgcatacaa agagaagcga
cctcgcgaga gcaagcggac ctcataaagt gcgtcgtagt 1620ccggattgga gtctgcaact
cgactccatg aagtcggaat cgctagtaat cgtggatcag 1680aatgccacgg tgaatacgtt
cccgggcctt gtacacaccg cccgtcacac catgggagtg 1740ggttgcaaaa gaagtaggta
gctgucugac ucguacguag ggcgucacag gagauacuaa 1800gacagaauau accccccgtt
gagctaaccg gtactaatga accgtgaggc ttaaccgaga 1860ggttaagcga ctaagcgtac
acggtggatg ccctggcagt cagaggcgat gaaggacgtg 1920ctaatctgcg ataagcgtcg
gtaaggtgat atgaaccgtt ataaccggcg atttccgaat 1980ggggaaaccc agtgtgtttc
gacacactat cattaactga atccataggt taatgaggcg 2040aaccggggga actgaaacat
ctaagtaccc cgaggaaaag aaatcaaccg agattccccc 2100agtagcggcg agcgaacggg
gagcagccca gagcctgaat cagtgtgtgt gttagtggaa 2160gcgtctggaa aggcgcgcga
tacagggtga cagccccgta cacaaaaatg cacatgctgt 2220gagctcgatg agtagggcgg
gacacgtggt atcctgtctg aatatggggg gaccatcctc 2280caaggctaaa tactcctgac
tgaccgatag tgaaccagta ccgtgaggga aaggcgaaaa 2340gaaccccggc gaggggagtg
aaaaagaacc tgaaaccgtg tacgtacaag cagtgggagc 2400acgcttaggc gtgtgactgc
gtaccttttg tataatgggt cagcgactta tattctgtag 2460caaggttaac cgaatagggg
agccgaaggg aaaccgagtc ttaactgggc gttaagttgc 2520agggtataga cccgaaaccc
ggtgatctag ccatgggcag gttgaaggtt gggtaacact 2580aactggagga ccgaaccgac
taatgttgaa aaattagcgg atgacttgtg gctgggggtg 2640aaaggccaat caaaccggga
gatagctggt tctccccgaa agctatttag gtagcgcctc 2700gtgaattcat ctccgggggt
agagcactgt ttcggcaagg gggtcatccc gacttaccaa 2760cccgatgcaa actgcgaata
ccggagaatg ttatcacggg agacacacgg cgggtgctaa 2820cgtccgtcgt gaagagggaa
acaacccaga ccgccagcta aggtcccaaa gtcatggtta 2880agtgggaaac gatgtgggaa
ggcccagaca gccaggatgt tggcttagaa gcagccatca 2940tttaaagaaa gcgtaatagc
tcactggtcg agtcggcctg cgcggaagat gtaacggggc 3000taaaccatgc accgaagctg
cggcagcgac gcttatgcgt tgttgggtag gggagcgttc 3060tgtaagcctg cgaaggtgtg
ctgtgaggca tgctggaggt atcagaagtg cgaatgctga 3120cataagtaac gataaagcgg
gtgaaaagcc cgctcgccgg aagaccaagg gttcctgtcc 3180aacgttaatc ggggcagggt
gagtcgaccc ctaaggcgag gccgaaaggc gtagtcgatg 3240ggaaacaggt taatattcct
gtacttggtg ttactgcgaa ggggggacgg agaaggctat 3300gttggccggg cgacggttgt
cccggtttaa gcgtgtaggc tggttttcca ggcaaatccg 3360gaaaatcaag gctgaggcgt
gatgacgagg cactacggtg ctgaagcaac aaatgccctg 3420cttccaggaa aagcctctaa
gcatcaggta acatcaaatc gtaccccaaa ccgacacagg 3480tggtcaggta gagaatacca
aggcgcttga gagaactcgg gtgaaggaac taggcaaaat 3540ggtgccgtaa cttcgggaga
aggcacgctg atatgtaggt gaggtccctc gcggatggag 3600ctgaaatcag tcgaagatac
cagctggctg caactgttta ttaaaaacac agcactgtgc 3660aaacacgaaa gtggacgtat
acggtgtgac gcctgcccgg tgccggaagg ttaattgatg 3720gggttagcgc aagcgaagct
cttgatcgaa gccccggtaa acggcggccg taactataac 3780ggtcctaagg tagcgaaatt
ccttgtcggg taagttccga cctgcacgaa tggcgtaatg 3840atggccaggc tgtctccacc
cgagactcag tgaaattgaa ctcgctgtga agatgcagtg 3900tacccgcggc aagacggaaa
gaccccgtga acctttacta tagcttgaca ctgaacattg 3960agccttgatg tgtaggatag
gtgggaggct ttgaagtgtg gacgccagtc tgcatggagc 4020cgaccttgaa ataccaccct
ttaatgtttg atgttctaac gttgacccgt aatccgggtt 4080gcggacagtg tctggtgggt
agtttgactg gggcggtctc ctcctaaaga gtaacggagg 4140agcacgaagg ttggctaatc
ctggtcggac atcaggaggt tagtgcaatg gcataagcca 4200gcttgactgc gagcgtgacg
gcgcgagcag gtgcgaaagc aggtcatagt gatccggtgg 4260ttctgaatgg aagggccatc
gctcaacgga taaaaggtac tccggggata acaggctgat 4320accgcccaag agttcatatc
gacggcggtg tttggcacct cgatgtcggc tcatcacatc 4380ctggggctga agtaggtccc
aagggtatgg ctgttcgcca tttaaagtgg tacgcgagct 4440gggtttagaa cgtcgtgaga
cagttcggtc cctatctgcc gtgggcgctg gagaactgag 4500gggggctgct cctagtacga
gaggaccgga gtggacgcat cactggtgtt cgggttgtca 4560tgccaatggc actgcccggt
agctaaatgc ggaagagata agtgctgaaa gcatctaagc 4620acgaaacttg ccccgagatg
agttctccct gaccctttaa gggtcctgaa ggaacgttga 4680agacgacgac gttgataggc
cgggtgtgta agcggggggu auauucaaug acgguauccg 4740agcugugacg cuggccuacg
caaaucagcc ggcgcttacc actttgtgat tcatgactgg 4800ggtgaagtcg taacaaggta
accgtagggg aacctgcggt tggatcacct ccttacctta 4860aagaagcgta ctttgtagtg
ctcacacaga ttgtctgata gaaagtgaaa agcaaggcgt 4920ttacgcgttg ggagtgaggc
tgaagagaat aaggccgttc gctttctatt aatgaaagct 4980caccctacac gaaaatatca
cgcaacgcgt gataagcaat tttcgtgtcc ccttcgtcta 5040gaggcccagg acaccgccct
ttcacggcgg taacaggggt tcgaatcccc taggggacgc 5100cacttgctgg tttgtgagtg
aaagtcgccg accttaatat ctcaaaactc atcttcgggt 5160gatgtttgag atatttgctc
tttaaaaatc tggatcaagc tgaaaattga aacactgaac 5220aacgagagtt gttcgtgagt
ctctcaaatt ttcgcaacac gatgatgaat cgaaagaaac 5280atcttcgggt tgtgagctta
agcttacaac gccgaagctg ttttggcgga tgagagaaga 5340ttttcagcct gatacagatt
aaatcagaac gcagaagcgg tctgataaaa cagaatttgc 5400ctggcggcag tagcgcggtg
gtcccacctg accccatgcc gaactcagaa gtgaaacgcc 5460gtagcgccga tggtagtgtg
gggtctcccc atgcgagagt agggaactgc caggcatcaa 5520ataaaacgaa aggctcagtc
gaaagactgg gcctttcgtt ttatctgttg tttgtcggtg 5580aacgctctcc tgagtaggac
aaatccgccg ggagcggatt tgaacgttgc gaagcaacgg 5640cccggagggt ggcgggcagg
acgcccgcca taaactgcca ggcatcaaat taagcagaag 5700gccatcctga cggatggcct
ttttgcgttt ctacaaactc ttcctgtcgt catatctaca 5760agccggcgcg ccgggaaatg
tgcgcggaac ccctatttgt ttatttttct aaatacattc 5820aaatatgtat ccgctcatga
gacaataacc ctgataaatg cttcaataat attgaaaaag 5880gaagagtatg agtattcaac
atttccgtgt cgcccttatt cccttttttg cggcattttg 5940ccttcctgtt tttgctcacc
cagaaacgct ggtgaaagta aaagatgctg aagatcagtt 6000gggtgcacga gtgggttaca
tcgaactgga tctcaacagc ggtaagatcc ttgagagttt 6060tcgccccgaa gaacgttttc
caatgatgag cacttttaaa gttctgctat gtggcgcggt 6120attatcccgt gttgacgccg
ggcaagagca actcggtcgc cgcatacact attctcagaa 6180tgacttggtt gagtactcac
cagtcacaga aaagcatctt acggatggca tgacagtaag 6240agaattatgc agtgctgcaa
taaccatgag tgataacact gcggccaact tacttctgac 6300aacgatcgga ggaccgaagg
agctaaccgc ttttttgcac aacatggggg atcatgtaac 6360tcgccttgat cgttgggaac
cggagctgaa tgaagccata ccaaacgacg agcgtgacac 6420cacgatgcct gcagcaatgg
caacaacgtt gcgcaaacta ttaactggcg aactacttac 6480tctagcttcc cggcaacaat
taatagactg gatggaggcg gataaagttg caggaccact 6540tctgcgctcg gcccttccgg
ctagctggtt tattgctgat aaatctggag ccggtgagcg 6600tgggtctcgc ggtatcattg
cagcactggg gccagatggt aagccctccc gtatcgtagt 6660tatctacacg acggggagtc
aggcaactat ggatgaacga aatagacaga tcgctgagat 6720aggtgcctca ctgattaagc
attggtaact gcagaccaag tttactcata tatactttag 6780attgatttaa aacttcattt
ttaatttaaa aggatctagg tgaagatcct ttttgataat 6840ctcatgacca aaatccctta
acgtgagttt tcgttccact gagcgtcaga ccccgtagaa 6900aagatcaaag gatcttcttg
agatcctttt tttctgcgcg taatctgctg cttgcaaaca 6960aaaaaaccac cgctaccagc
ggtggtttgt ttgccggatc aagagctacc aactcttttt 7020ccgaaggtaa ctggcttcag
cagagcgcag ataccaaata ctgtccttct agtgtagccg 7080tagttaggcc accacttcaa
gaactctgta gcaccgccta catacctcgc tctgctaatc 7140ctgttaccag tggctgctgc
cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga 7200cgatagttac cggataaggc
gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc 7260agcttggagc gaacgaccta
caccgaactg agatacctac agcgtgagct atgagaaagc 7320gccacgcttc ccgaagggag
aaaggcggac aggtatccgg taagcggcag ggtcggaaca 7380ggagagcgca cgagggagct
tccaggggga aacgcctggt atctttatag tcctgtcggg 7440tttcgccacc tctgacttga
gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta 7500tggaaaaacg ccagcaacgc
ggccttttta cggttcctgg ccttttgctg g 7551257561DNAArtificial
SequenceDesigned sequence in accordance with embodiments
25gcggccgcga tctctcacct accaaacaat gcccccctgc aaaaaataaa ttcatataaa
60aaacatacag ataaccatct gcggtgataa attatctctg gcggtgttga cataaatacc
120actggcggtg atactgagca cgggtaccgg ccgctgagaa aaagcgaagc ggcactgctc
180tttaacaatt tatcagacaa tctgtgtggg cactcgaaga tacggattct taacgtcgca
240agacgaaaaa tgaataccaa gtctcaagag tgaacacgta attcattacg aagtttaatt
300ctttgagcgt caaactttta aattgaagag tttgatcatg gctcagattg aacgctggcg
360gcaggcctaa cacatgcaag tcgaacggta acaggaagaa gcttgcttct ttgctgacga
420gtggcggacg ggtgagtaat gtctgggaaa ctgcctgatg gagggggata actactggaa
480acggtagcta ataccgcata acgtcgcaag accaaagagg gggaccttcg ggcctcttgc
540catcggatgt gcccagatgg gattagctag taggtggggt aacggctcac ctaggcgacg
600atccctagct ggtctgagag gatgaccagc cacactggaa ctgagacacg gtccagactc
660ctacgggagg cagcagtggg gaatattgca caatgggcgc aagcctgatg cagccatgcc
720gcgtgtatga agaaggcctt cgggttgtaa agtactttca gcggggagga agggagtaaa
780gttaatacct ttgctcattg acgttacccg cagaagaagc accggctaac tccgtgccag
840cagccgcggt aatacggagg gtgcaagcgt taatcggaat tactgggcgt aaagcgcacg
900caggcggttt gttaagtcag atgtgaaatc cccgggctca acctgggaac tgcatctgat
960actggcaagc ttgagtctcg tagagggggg tagaattcca ggtgtagcgg tgaaatgcgt
1020agagatctgg aggaataccg gtggcgaagg cggccccctg gacgaagact gacgctcagg
1080tgcgaaagcg tggggagcaa acaggattag ataccctggt agtccacgcc gtaaacgatg
1140tcgacttgga ggttgtgccc ttgaggcgtg gcttccggag ctaacgcgtt aagtcgaccg
1200cctggggagt acggccgcaa ggttaaaact caaatgaatt gacgggggcc cgcacaagcg
1260gtggagcatg tggtttaatt cgatgcaacg cgaagaacct tacctggtct tgacatccac
1320ggaagttttc agagatgaga atgtgccttc gggaaccgtg agacaggtgc tgcatggctg
1380tcgtcagctc gtgttgtgaa atgttgggtt aagtcccgca acgagcgcaa cccttatcct
1440ttgttgccag cggtccggcc gggaactcaa aggagactgc cagtgataaa ctggaggaag
1500gtggggatga cgtcaagtca tcatggccct tacgaccagg gctacacacg tgctacaatg
1560gcgcatacaa agagaagcga cctcgcgaga gcaagcggac ctcataaagt gcgtcgtagt
1620ccggattgga gtctgcaact cgactccatg aagtcggaat cgctagtaat cgtggatcag
1680aatgccacgg tgaatacgtt cccgggcctt gtacacaccg cccgtcacac catgggagtg
1740ggttgcaaaa gaagtaggta gctgggtaac tgtcgtatag atggaagtgg attcacaccc
1800gatataagac agaatatgga ccccgttgag ctaaccggta ctaatgaacc gtgaggctta
1860accgagaggt taagcgacta agcgtacacg gtggatgccc tggcagtcag aggcgatgaa
1920ggacgtgcta atctgcgata agcgtcggta aggtgatatg aaccgttata accggcgatt
1980tccgaatggg gaaacccagt gtgtttcgac acactatcat taactgaatc cataggttaa
2040tgaggcgaac cgggggaact gaaacatcta agtaccccga ggaaaagaaa tcaaccgaga
2100ttcccccagt agcggcgagc gaacggggag cagcccagag cctgaatcag tgtgtgtgtt
2160agtggaagcg tctggaaagg cgcgcgatac agggtgacag ccccgtacac aaaaatgcac
2220atgctgtgag ctcgatgagt agggcgggac acgtggtatc ctgtctgaat atggggggac
2280catcctccaa ggctaaatac tcctgactga ccgatagtga accagtaccg tgagggaaag
2340gcgaaaagaa ccccggcgag gggagtgaaa aagaacctga aaccgtgtac gtacaagcag
2400tgggagcacg cttaggcgtg tgactgcgta ccttttgtat aatgggtcag cgacttatat
2460tctgtagcaa ggttaaccga ataggggagc cgaagggaaa ccgagtctta actgggcgtt
2520aagttgcagg gtatagaccc gaaacccggt gatctagcca tgggcaggtt gaaggttggg
2580taacactaac tggaggaccg aaccgactaa tgttgaaaaa ttagcggatg acttgtggct
2640gggggtgaaa ggccaatcaa accgggagat agctggttct ccccgaaagc tatttaggta
2700gcgcctcgtg aattcatctc cgggggtaga gcactgtttc ggcaaggggg tcatcccgac
2760ttaccaaccc gatgcaaact gcgaataccg gagaatgtta tcacgggaga cacacggcgg
2820gtgctaacgt ccgtcgtgaa gagggaaaca acccagaccg ccagctaagg tcccaaagtc
2880atggttaagt gggaaacgat gtgggaaggc ccagacagcc aggatgttgg cttagaagca
2940gccatcattt aaagaaagcg taatagctca ctggtcgagt cggcctgcgc ggaagatgta
3000acggggctaa accatgcacc gaagctgcgg cagcgacgct tatgcgttgt tgggtagggg
3060agcgttctgt aagcctgcga aggtgtgctg tgaggcatgc tggaggtatc agaagtgcga
3120atgctgacat aagtaacgat aaagcgggtg aaaagcccgc tcgccggaag accaagggtt
3180cctgtccaac gttaatcggg gcagggtgag tcgaccccta aggcgaggcc gaaaggcgta
3240gtcgatggga aacaggttaa tattcctgta cttggtgtta ctgcgaaggg gggacggaga
3300aggctatgtt ggccgggcga cggttgtccc ggtttaagcg tgtaggctgg ttttccaggc
3360aaatccggaa aatcaaggct gaggcgtgat gacgaggcac tacggtgctg aagcaacaaa
3420tgccctgctt ccaggaaaag cctctaagca tcaggtaaca tcaaatcgta ccccaaaccg
3480acacaggtgg tcaggtagag aataccaagg cgcttgagag aactcgggtg aaggaactag
3540gcaaaatggt gccgtaactt cgggagaagg cacgctgata tgtaggtgag gtccctcgcg
3600gatggagctg aaatcagtcg aagataccag ctggctgcaa ctgtttatta aaaacacagc
3660actgtgcaaa cacgaaagtg gacgtatacg gtgtgacgcc tgcccggtgc cggaaggtta
3720attgatgggg ttagcgcaag cgaagctctt gatcgaagcc ccggtaaacg gcggccgtaa
3780ctataacggt cctaaggtag cgaaattcct tgtcgggtaa gttccgacct gcacgaatgg
3840cgtaatgatg gccaggctgt ctccacccga gactcagtga aattgaactc gctgtgaaga
3900tgcagtgtac ccgcggcaag acggaaagac cccgtgaacc tttactatag cttgacactg
3960aacattgagc cttgatgtgt aggataggtg ggaggctttg aagtgtggac gccagtctgc
4020atggagccga ccttgaaata ccacccttta atgtttgatg ttctaacgtt gacccgtaat
4080ccgggttgcg gacagtgtct ggtgggtagt ttgactgggg cggtctcctc ctaaagagta
4140acggaggagc acgaaggttg gctaatcctg gtcggacatc aggaggttag tgcaatggca
4200taagccagct tgactgcgag cgtgacggcg cgagcaggtg cgaaagcagg tcatagtgat
4260ccggtggttc tgaatggaag ggccatcgct caacggataa aaggtactcc ggggataaca
4320ggctgatacc gcccaagagt tcatatcgac ggcggtgttt ggcacctcga tgtcggctca
4380tcacatcctg gggctgaagt aggtcccaag ggtatggctg ttcgccattt aaagtggtac
4440gcgagctggg tttagaacgt cgtgagacag ttcggtccct atctgccgtg ggcgctggag
4500aactgagggg ggctgctcct agtacgagag gaccggagtg gacgcatcac tggtgttcgg
4560gttgtcatgc caatggcact gcccggtagc taaatgcgga agagataagt gctgaaagca
4620tctaagcacg aaacttgccc cgagatgagt tctccctgac cctttaaggg tcctgaagga
4680acgttgaaga cgacgacgtt gataggccgg gtgtgtaagg gggtccatat tcaatgacgt
4740atcgaaggtg tgaatccatg gactatacga ttgttacccc ggcgcttacc actttgtgat
4800tcatgactgg ggtgaagtcg taacaaggta accgtagggg aacctgcggt tggatcacct
4860ccttacctta aagaagcgta ctttgtagtg ctcacacaga ttgtctgata gaaagtgaaa
4920agcaaggcgt ttacgcgttg ggagtgaggc tgaagagaat aaggccgttc gctttctatt
4980aatgaaagct caccctacac gaaaatatca cgcaacgcgt gataagcaat tttcgtgtcc
5040ccttcgtcta gaggcccagg acaccgccct ttcacggcgg taacaggggt tcgaatcccc
5100taggggacgc cacttgctgg tttgtgagtg aaagtcgccg accttaatat ctcaaaactc
5160atcttcgggt gatgtttgag atatttgctc tttaaaaatc tggatcaagc tgaaaattga
5220aacactgaac aacgagagtt gttcgtgagt ctctcaaatt ttcgcaacac gatgatgaat
5280cgaaagaaac atcttcgggt tgtgagctta agcttacaac gccgaagctg ttttggcgga
5340tgagagaaga ttttcagcct gatacagatt aaatcagaac gcagaagcgg tctgataaaa
5400cagaatttgc ctggcggcag tagcgcggtg gtcccacctg accccatgcc gaactcagaa
5460gtgaaacgcc gtagcgccga tggtagtgtg gggtctcccc atgcgagagt agggaactgc
5520caggcatcaa ataaaacgaa aggctcagtc gaaagactgg gcctttcgtt ttatctgttg
5580tttgtcggtg aacgctctcc tgagtaggac aaatccgccg ggagcggatt tgaacgttgc
5640gaagcaacgg cccggagggt ggcgggcagg acgcccgcca taaactgcca ggcatcaaat
5700taagcagaag gccatcctga cggatggcct ttttgcgttt ctacaaactc ttcctgtcgt
5760catatctaca agccggcgcg ccgggaaatg tgcgcggaac ccctatttgt ttatttttct
5820aaatacattc aaatatgtat ccgctcatga gacaataacc ctgataaatg cttcaataat
5880attgaaaaag gaagagtatg agtattcaac atttccgtgt cgcccttatt cccttttttg
5940cggcattttg ccttcctgtt tttgctcacc cagaaacgct ggtgaaagta aaagatgctg
6000aagatcagtt gggtgcacga gtgggttaca tcgaactgga tctcaacagc ggtaagatcc
6060ttgagagttt tcgccccgaa gaacgttttc caatgatgag cacttttaaa gttctgctat
6120gtggcgcggt attatcccgt gttgacgccg ggcaagagca actcggtcgc cgcatacact
6180attctcagaa tgacttggtt gagtactcac cagtcacaga aaagcatctt acggatggca
6240tgacagtaag agaattatgc agtgctgcaa taaccatgag tgataacact gcggccaact
6300tacttctgac aacgatcgga ggaccgaagg agctaaccgc ttttttgcac aacatggggg
6360atcatgtaac tcgccttgat cgttgggaac cggagctgaa tgaagccata ccaaacgacg
6420agcgtgacac cacgatgcct gcagcaatgg caacaacgtt gcgcaaacta ttaactggcg
6480aactacttac tctagcttcc cggcaacaat taatagactg gatggaggcg gataaagttg
6540caggaccact tctgcgctcg gcccttccgg ctagctggtt tattgctgat aaatctggag
6600ccggtgagcg tgggtctcgc ggtatcattg cagcactggg gccagatggt aagccctccc
6660gtatcgtagt tatctacacg acggggagtc aggcaactat ggatgaacga aatagacaga
6720tcgctgagat aggtgcctca ctgattaagc attggtaact gcagaccaag tttactcata
6780tatactttag attgatttaa aacttcattt ttaatttaaa aggatctagg tgaagatcct
6840ttttgataat ctcatgacca aaatccctta acgtgagttt tcgttccact gagcgtcaga
6900ccccgtagaa aagatcaaag gatcttcttg agatcctttt tttctgcgcg taatctgctg
6960cttgcaaaca aaaaaaccac cgctaccagc ggtggtttgt ttgccggatc aagagctacc
7020aactcttttt ccgaaggtaa ctggcttcag cagagcgcag ataccaaata ctgtccttct
7080agtgtagccg tagttaggcc accacttcaa gaactctgta gcaccgccta catacctcgc
7140tctgctaatc ctgttaccag tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt
7200ggactcaaga cgatagttac cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg
7260cacacagccc agcttggagc gaacgaccta caccgaactg agatacctac agcgtgagct
7320atgagaaagc gccacgcttc ccgaagggag aaaggcggac aggtatccgg taagcggcag
7380ggtcggaaca ggagagcgca cgagggagct tccaggggga aacgcctggt atctttatag
7440tcctgtcggg tttcgccacc tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg
7500gcggagccta tggaaaaacg ccagcaacgc ggccttttta cggttcctgg ccttttgctg
7560g
756126187DNAArtificial SequenceDesigned sequence in accordance with
embodiments 26ggaacagctc gagtagagct gaaagcgata tggtacgacc caggagtccg
gcatatcacg 60acgaaacaga cctgaggaaa ctcaggtctg tggagtgata tgggaagaaa
ctgcggacga 120gaaactgggt cgtacctaag tcgcaaacgc atcgagtaga tgcgaacaaa
gaaacaacaa 180caacaac
18727192DNAArtificial SequenceDesigned sequence in accordance
with embodiments 27ggaacagctc gagtagagct gaaagcgata tggacaccgt
atgtgcgtat acacggctca 60tcactctcat cggaccctgg aaacagggtc cgtgcgtgat
gagggaagaa actgcgtgta 120tactctcatc atacggtgtc ctaagtcgca aacgcatcga
gtagatgcga acaaagaaac 180aacaacaaca ac
19228196DNAArtificial SequenceDesigned sequence in
accordance with embodiments 28ggaacagctc gagtagagct gaaagcgata
tggcatggaa tcagctcaag gaactgtgaa 60cgtatatcgg gcaacgacta ggaaactagt
cgttgggaag aaactgccga tatacgggag 120ttccttgagc gggagattcc atgcctaagt
cgcaaacgca tcgagtagat gcgaacaaag 180aaacaacaac aacaac
19629197DNAArtificial SequenceDesigned
sequence in accordance with embodiments 29ggaacagctc gagtagagct
gaaagcgata tggtatacat gtgaagcgtg agggacgacg 60aaatataggc tggctttcgc
tcggaaacga gcgaaaggga agaaactgca gcctatatgg 120agtccctcac gtggaacatg
tatacctaag tcgcaaacgc atcgagtaga tgcgaacaaa 180gaaacaacaa caacaac
19730169DNAArtificial
SequenceDesigned sequence in accordance with embodiments
30ggaacagctc gagtagagct gaaagcgata tggtacgtat agcaccgtga actactccgg
60catgggtcgg aaacgaccca tgggaagaaa ctgcggagca cggttcccta tacgtaccta
120agtcgcaaac gcatcgagta gatgcgaaca aagaaacaac aacaacaac
16931177DNAArtificial SequenceDesigned sequence in accordance with
embodiments 31ggaacagctc gagtagagct gaaagcgata tgggtccgtg agtctccgag
tatgaccggc 60gtgacgttgg aaacaacgtc acgggaagaa actgcggtca tacagtgaaa
ggagtctcac 120ggaccctaag tcgcaaacgc atcgagtaga tgcgaacaaa gaaacaacaa
caacaac 17732174DNAArtificial SequenceDesigned sequence in
accordance with embodiments 32ggaacagctc gagtagagct gaaagcgata
tggaactcgt cgggaggata tatagaccgg 60catatggtgg aaacaccata tgggaagaaa
ctgcggtcat atatccgaag aacgacgagt 120tcctaagtcg caaacgcatc gagtagatgc
gaacaaagaa acaacaacaa caac 17433192DNAArtificial
SequenceDesigned sequence in accordance with embodiments
33ggaacagctc gagtagagct gaaagcgata tggatgggtg tgtgcgaccc aatggcgtat
60ggagggctat gtccacggaa acgtggacat agggaagaaa ctgcctccat acttgaattg
120ggtctctcat cacacccatc ctaagtcgca aacgcatcga gtagatgcga acaaagaaac
180aacaacaaca ac
19234174DNAArtificial SequenceDesigned sequence in accordance with
embodiments 34ggaacagctc gagtagagct gaaagcgata tgggagcgtg tgggagcatc
cgaactacgt 60atggcattcg ggaaaccgaa tgggaagaaa ctgcatacgc ggatgcgaag
aacacacgct 120ccctaagtcg caaacgcatc gagtagatgc gaacaaagaa acaacaacaa
caac 17435180DNAArtificial SequenceDesigned sequence in
accordance with embodiments 35ggaacagctc gagtagagct gaaagcgata
tggatgtggg catgaccgaa gaacggtacc 60ttgaatccgt caaaggaaac tttgacggat
ggcggtaccg ggaggtcatg ggaagaaact 120gccacatcct aagtcgcaaa cgcatcgagt
agatgcgaac aaagaaacaa caacaacaac 18036107DNAArtificial
SequenceDesigned sequence in accordance with embodiments
36ggaacagctc gagtagagct gaaagggttg ggaagaaact gtggcacttc ggtgccagca
60acccaaacgc atcgagtaga tgcgaacaaa gaaacaacaa caacaac
10737203DNAArtificial SequenceDesigned sequence in accordance with
embodiments 37ggaacagctc gagtagagct gaaagcgata tggtgaccac gaaggacggg
tccacgattc 60aggaagtgta cgaagaaccg gagtagggaa acctactcct cagtattcgt
acatggaaga 120atcgtgttga gtagagtgtg agcgtggtca cctaagtcgc aaacgcatcg
agtagatgcg 180aacaaagaaa caacaacaac aac
20338196DNAArtificial SequenceDesigned sequence in accordance
with embodiments 38ggaacagctc gagtagagct gaaagcgata tggtttctga
aggacgggtc ccataagctg 60tgaactatgg agccaggcac gagaggaaac tctcgtgagc
aactccatag ggagcttatg 120gttgagtaga gtgtgagcag aaacctaagt cgcaaacgca
tcgagtagat gcgaacaaag 180aaacaacaac aacaac
19639196DNAArtificial SequenceDesigned sequence in
accordance with embodiments 39ggaacagctc gagtagagct gaaagcgata
tggatagaag gacgggtccc atattgcgag 60aaacgttata cggaaccgta cgaaggaaac
ttcgtactca gtacgtataa cggagcaata 120tggttgagta gagtgtgagc tatcctaagt
cgcaaacgca tcgagtagat gcgaacaaag 180aaacaacaac aacaac
19640190DNAArtificial SequenceDesigned
sequence in accordance with embodiments 40ggaacagctc gagtagagct
gaaagcgata tggctcgaag gacgggtccc atagggagct 60tgtacgagca acatacgagg
aaactcgtat gccaggcgta caagcgaaga actatggttg 120agtagagtgt gagcgagcct
aagtcgcaaa cgcatcgagt agatgcgaac aaagaaacaa 180caacaacaac
19041195DNAArtificial
SequenceDesigned sequence in accordance with embodiments
41ggaacagctc gagtagagct gaaagcgata tgggaggaag gacgggtccc acagagcgaa
60gaactattag aggaccgtat ttcggaaacg aaatactagt actctaatag ggagctctgt
120ggttgagtag agtgtgagcc tccctaagtc gcaaacgcat cgagtagatg cgaacaaaga
180aacaacaaca acaac
19542191DNAArtificial SequenceDesigned sequence in accordance with
embodiments 42ggaacagctc gagtagagct gaaagcgata tggcttgaag gacgggtccc
atacagcgac 60gaaagtgtct tcgagtacag aaggaaactt ctgtatagaa gacactggag
ctgtatggtt 120gagtagagtg tgagcaagcc taagtcgcaa acgcatcgag tagatgcgaa
caaagaaaca 180acaacaacaa c
19143179DNAArtificial SequenceDesigned sequence in accordance
with embodiments 43ggaacagctc gagtagagct gaaagcgata tggctgtgga
aggacgggtc ccatggaaga 60cgtcaccgaa gtcggaaacg acttcgcaag acgtcaggaa
gtggttgagt agagtgtgag 120ccacagccta agtcgcaaac gcatcgagta gatgcgaaca
aagaaacaac aacaacaac 17944196DNAArtificial SequenceDesigned sequence
in accordance with embodiments 44ggaacagctc gagtagagct gaaagcgata
tggattgaag gacgggtccc taggagcgcg 60aaacttatat aggaaccgtt cgtcggaaac
gacgaactca gtactatata aggagctcct 120aggttgagta gagtgtgagc aatcctaagt
cgcaaacgca tcgagtagat gcgaacaaag 180aaacaacaac aacaac
19645191DNAArtificial SequenceDesigned
sequence in accordance with embodiments 45ggaacagctc gagtagagct
gaaagcgata tggcagctaa cagacttata tggagagtcc 60tggaaggacg ggtccgaggg
aaacctcgtt gagtagagtg tgagccagga ctcgacgaaa 120tataagtcgg gttagctgcc
taagtcgcaa acgcatcgag tagatgcgaa caaagaaaca 180acaacaacaa c
19146197DNAArtificial
SequenceDesigned sequence in accordance with embodiments
46ggaacagctc gagtagagct gaaagcgata tgggatcctg gcaagctgta cataagacag
60ctccatcgaa ggacgggtcc gtcggaaacg acgttgagta gagtgtgagc gatggagcaa
120tgacgtgtac agcgacccag gatccctaag tcgcaaacgc atcgagtaga tgcgaacaaa
180gaaacaacaa caacaac
19747198DNAArtificial SequenceDesigned sequence in accordance with
embodiments 47ggaacagctc gagtagagct gaaagcgata tggtgctacc tcagtatcct
accacatcag 60ctataacgaa ggacgggtcc gaaggaaact tcgttgagta gagtgtgagc
gttatagcga 120cgacgtggta ggagaaccgg tagcacctaa gtcgcaaacg catcgagtag
atgcgaacaa 180agaaacaaca acaacaac
19848196DNAArtificial SequenceDesigned sequence in accordance
with embodiments 48ggaacagctc gagtagagct gaaagcgata tggcaagcac
aaaagctcta tataagacag 60cttgtaggaa ggacgggtcc gaaggaaact tcgttgagta
gagtgtgagc ctacaagcaa 120tgacgtatag agcgctgtgc ttgcctaagt cgcaaacgca
tcgagtagat gcgaacaaag 180aaacaacaac aacaac
19649194DNAArtificial SequenceDesigned sequence in
accordance with embodiments 49ggaacagctc gagtagagct gaaagcgata
tggatcttag gacctcctat ggagtagctt 60gtatgaagga cgggtccgag ggaaacctcg
ttgagtagag tgtgagcata caagcgcaga 120taccatagga gaccctaaga tcctaagtcg
caaacgcatc gagtagatgc gaacaaagaa 180acaacaacaa caac
19450195DNAArtificial SequenceDesigned
sequence in accordance with embodiments 50ggaacagctc gagtagagct
gaaagcgata tggagctgta gcaagttgtt tcgtgccagc 60tacttgaagg acgggtccag
aggaaactct gttgagtaga gtgtgagcaa gtagcggtaa 120tacgaaacaa cgacctacag
ctcctaagtc gcaaacgcat cgagtagatg cgaacaaaga 180aacaacaaca acaac
19551193DNAArtificial
SequenceDesigned sequence in accordance with embodiments
51ggaacagctc gagtagagct gaaagcgata tggagctcca ctaagctact gagggagctt
60gtacgaagga cgggtccgaa ggaaacttcg ttgagtagag tgtgagcgta caagctgtga
120actcagtagt atgtggagct cctaagtcgc aaacgcatcg agtagatgcg aacaaagaaa
180caacaacaac aac
19352179DNAArtificial SequenceDesigned sequence in accordance with
embodiments 52ggaacagctc gagtagagct gaaagcgata tgggaagtca ccttggtcag
gaagtatgga 60aggacgggtc cataggaaac tatgttgagt agagtgtgag ccatatggaa
gaccaagcaa 120gacttcccta agtcgcaaac gcatcgagta gatgcgaaca aagaaacaac
aacaacaac 1795384DNAArtificial SequenceDesigned sequence in
accordance with embodiments 53ggacgcgacc gaaatggtga aggacgggtc
cagtgcgaaa cacgcactgt tgagtagagt 60gtgagctccg taactggtcg cgtc
845459DNAArtificial SequenceDesigned
sequence in accordance with embodiments 54ggcagtggaa ggacgggtcc
ggcgtggaaa cacgccgttg agtagagtgt gagccactg 595551DNAArtificial
SequenceDesigned sequence in accordance with embodiments
55ggagacggtc gggtccagat attcgtatct gtcgagtaga gtgtgggctc c
51
User Contributions:
Comment about this patent or add new information about this topic: