Selecting the right vector for your molecular biology experiment is fundamental to ensure the experiment’s success. There are many elements to consider when choosing which vector backbone to use. To make it easier, we have compiled a list of some of the key factors to keep in mind when choosing a vector:
Cloning or Expression?
The first question to ask yourself is what you intend to do with your vector. Cloning vectors are useful for generating many copies of your gene. Expression vectors are associated with the actual expression of the gene into mRNA and protein in the target organism. Cloning vectors usually contain features associated with the insertion or removal of DNA fragments. For example, they have multiple cloning sites with many restriction sites, antibiotic resistant genes, etc. Expression vectors, however, contain additional organism specific sequences relating to expression such as promoters, RBS sequences, Kozak sequences (in eukaryotes), or the Shine Dalgarno sequence (in prokaryotes). A similar vector type is the transcription-only vector that goes only to the mRNA production phase and requires less components than expression vectors.
This factor is relevant for both cloning and expression vectors. The only aspect to consider here is whether you are cloning a large or small DNA fragment. Most general plasmids (i.e. pUC) can cope with inserts up to 15 kb. Anything bigger can complicate the replication and cause problems with stability. However, if you have a large DNA insert, there are special types of vectors suitable for such applications, which have a different origin of replication. As a general remark, origins of bigger vectors have a lower copy number than those of smaller vectors. Keep in mind that the bigger the vector is, the lower the transfection or transformation efficiency and its stability. Nevertheless, vectors can usually get up to 52 kb.
Both cloning and expression vectors require you to choose a selectable marker. This marker allows for the identification of a positive transformant. When you grow your cell lines under this condition, only the cells that have incorporated the vector will survive (a process called selection). There are two major types of selectable markers:
- Drug-resistance markers: incorporation of a gene encoding an enzyme that inactivates a specific antibiotic.
- Auxotrophic markers: allows cells with the marker to survive without essential nutrients in the medium.
When choosing your cloning vector resistance, you should consider using:
- A positive selection - under selection (antibiotic or nutrient), only cells that incorporated the selectable marker will survive.
- A negative selection - under selection, only cells that did not incorporate the selectable marker will survive.
Here is a summary of the common selectable marker genes and their respective antibiotics and target organisms:
Restriction Sites in MCS
A crucial step during the cloning process is the ligation of the DNA fragment to the plasmid, which is greatly facilitated by the use of specific restriction enzymes. This means it is important to check if the desired restriction sites are compatible with your insert. Nowadays, this is not so problematic as most modern vectors include an artificial stretch of DNA called an MCS (multiple cloning site) containing many different long restriction endonuclease cutting sites. If all else fails, blunt-end ligations are possible but often difficult to complete successfully.
Avoiding Self Ligation (Recombination)
The presence of sticky ends in vectors after the digest process often causes self ligation. In order to avoid this issue, you can use a phosphatase treatment:
- Just before the ligation step, dephosphorylate the vector using Alkaline Phosphatase. In this process the 5’ phosphate group, which the ligase needs for phosphodiester bond formation, is removed.
- Depending on how the insert and vector are prepared, other ‘end’ treatments such as blunting, A-tailing, and phosphorylation may be required.
- Polymerases allow you to modify the restriction site by filling it in and exonucleases by removing it (blunting the sticky ends) in a process called end modifications.
For cloning vectors only:
If you choose to work with a cloning vector, you need to decide what is the copy number (high, medium, or low) in order to receive the desired number of copies at the end of the process. Usually, a high-copy vector is the best approach to produce the highest yields. For example, pBluescript has a copy number of 300-500 and pUC can reach 700. You may be wondering why some vectors have a more modest value of 10-12 copy number. These are more specialised vectors that were developed to counteract some problems caused by high-copy vectors. Your DNA fragment can, for example, become toxic to the cell when present in high levels. In this case, the best way to avoid any problems may be to use a low-copy vector. Another option is to use BACs or YACs (bacterial and yeast artificial chromosomes, respectively). These vectors are used for sequences up to 350 kb and have a single copy in each cell.
For expression vectors only:
If you want to drive the expression of your desired gene, you need a vector that contains functional elements in your host organism. Expression vectors produce proteins through transcription of the vector's insert followed by translation of the mRNA produced. Although they share similar requirements, expression in different host organisms require additional elements: a selectable marker, a promoter for initiation of transcription, a ribosomal binding site for translation initiation, a termination signal, and so on. It is also important to make sure to codon-optimize your insert in order for the translation levels to be optimized as well.
The common organisms used for high protein expression are:
- Mammalian - usually CHO cells
- Insects - using the Baculovirus system
- Pichia (Pichia pastoris) - single-cell eukaryotic fungi with glycosylation, specially suited for large scale protein production
- S. Cerevisiae (Saccharomyces cerevisiae) - the workhorse of every eukaryotic research
- E. Coli (Escherichia coli) - the workhorse of prokaryotic research, produces basic peptides without any modifications
- Plants - usually Tobacco cell culture
In prokaryotes, the commonly used inducible promoters are promoters derived from the Lac operon and the T7 promoter. Other strong promoters used include the Trp promoter and the Tac promoter, which is a hybrid of both the Trp and Lac Operon promoters.
Conditions inside the cell are responsible for promoter activation. These promoters allow the expression of the gene in certain parts of the organism or under certain conditions such as Tetracycline-controlled transcriptional activation. A strong promoter can lead to loss of protein in inclusion bodies as well as cause metabolic stress in the organism. Promoters can be inducible, repressible, conditional, or constitutive.
A transcription terminator is a sequence that marks the end of a gene or operon in genomic DNA during transcription. This sequence mediates transcriptional termination by providing signals in the newly synthesized mRNA that triggers processes to release the mRNA from the transcriptional complex. These processes include the direct interaction of the mRNA secondary structure with the complex and/or the indirect activities of recruited termination factors.
The Ribosome Binding Site (RBS) is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of protein translation in prokaryotes. The RBS sequence follows the promoter and ensures efficient translation of the protein of interest. In eukaryotes, the promoter sequence usually contains the RBS as well as binding sequences for polymerases such as Kozak sequences.
In order to ensure that soluble protein is produced at the end of the process, it is imperative that the rate of translation is not faster than the rate of protein folding. While transcription and translation rates can be controlled, protein folding, secretion, and membrane insertion rates cannot. Therefore, it is not optimal to alter a cellular process to simply produce as much protein as possible because protein that is not folded fast enough negatively impacts the processes downstream. Ultimately, the goal is to optimize the entire system to produce more soluble and active protein.
In order to predict translation initiation rate and the protein expression levels in bacteria, you can use the RBS Calculator developed by Prof. Howard Salis. This tool has the ability to predict the translation initiation rate for each start codon in an mRNA sequence. It also enables the design of the ideal RBS to achieve optimal translation rate, thus controlling protein expression.
Consider adding a tag or a fusion protein to your vector in order to further understand the function of a specific gene. For example, fusing your protein to an epitope tag, such as Flag or HA, makes it easy to identify your protein using an antibody against that epitope. This allows you to conduct western blots and immunoprecipitations of the protein. It also makes it possible to see the protein’s localization using immunohistochemistry, even if you do not have a specific antibody. Another common practice is to fuse your protein to another protein, such as GFP, which allows you to visualize the cellular localization of your protein.
Just remember that when you are designing your plasmid you should keep your gene "in frame" with the fusion protein. This means that the final product should be translated as a single string of amino acids that preserve the sequence of your gene and of the fusion protein. Another concern is the removal of the the stop codon from the N terminal CDS, which is crucial so translation will not stop at the end of the first protein.
A list of commonly used tags/fusion proteins can be found here.
A Final Note:
In addition to choosing the right vector, do not forget to choose the right design software platform for your molecular biology experiment. Genome Compiler is a convenient all-in-one environment that allows you to seamlessly import your backbones and genes of interest from various resources directly into cloning wizards and to intuitively simulate your cloning. With the Genome Compiler wizards you are able to easily add and edit the elements into your cloning vector. It also allows you to codon optimize your sequence and to modify its ends. Cloning and expression vectors from a plethora of repositories and databases can be found in Genome Compiler. The software’s Materials Box contains vectors from Sigma-Aldrich, Addgene, Synberc, Lucigen, and more. The RBS Calculator is also integrated inside Genome Compiler, as well as libraries of promoters, terminators, coding sequences, and other commonly used parts. You can also collaborate easily and share the sequences with your colleagues. Once your design is ready, you can order the sequences to be synthesized directly from Genome Compiler.
Happy cloning! 🙂