Proteins are large molecules and are usually ineffective as drugs due to delivery and stability issues. As a large molecule, a single protein may have multiple biological functions where each function is defined by localized interactions of a specific sequence of amino acids in the protein with another protein or a non-protein ligand. In the peptide-based drug discovery process, the conventional one protein-one experiment strategy is too time-consuming and costly. What is required is high throughput screening that screens a large number of compounds quickly and efficiently in a parallel manner.
Drug development is broken down into several distinct steps. The first step is to “map” the active sequences (i.e. epitope) of a protein, a process called epitope mapping that will determine the minimum sequence of a peptide that makes up the active domain of the protein. Epitope mapping synthesizes a peptide library that consists of overlapping peptide sequences that makes up the original protein. To determine the sequences of a peptide library, the selection process has the following considerations:
- As the length of the peptide grows, the number of peptides to synthesize decreases.
- As the offset number grows (i.e. the number of residues that the peptide sequence shifts from the original protein sequence), the number of peptides to synthesize decreases.
- As the peptide sequence grows longer, the potential to achieve multiple hits (i.e. the peptide sequences that contain all of the essential residues in the epitope) increases.
The length of the protein sequence will determine the number of peptides in the library. Choosing a longer sequence and a shorter offset number would be ideal, but the monetary cost could become too great. Shorter peptide lengths, in contrast, will lead to more peptide sequences to synthesize and are more economical. Because it may be difficult to predetermine the ideal minimum number of peptides needed, the common practice is to use 8 to 20 residues (preferably in the 12 to 16 range) with the offset number being approximately 1/3 of the peptide length.
PEPTIDE LIBRARY TYPES AND STRATEGIES
The next step in drug development is peptide sequence optimization, the step that will determine the structure and functional relationships of the targeted epitope, utilizing four practical strategies.
- Alanine Scanning Library: Alanine (chosen because it is the smallest amino acid that maintains chirality) is systematically substituted for each amino acid position in the epitope. If alanine takes the place of essential amino acid, the result would be significant reduction inactivity. The relative importance of that specific amino acid could also be measured by the degree of activity reduction.
- Truncation Library: Involves the systematic truncation of the flanking residues to determine the minimum length required for optimum peptide activity.
- Random Library: A shotgun approach. A mixture of all 20 amino acids, or a set combination of amino acids, is simultaneously substituted for selected residues in the peptide sequence (i.e. the wobble sequence). The mixtures of these random libraries are then analyzed.
- Positional Scanning Library: A selected position or positions in a peptide sequence are each systematically replaced with different amino acids. The resulting change in activity reveals the preferred amino acid residues at these positions.
The last step for peptide-based drug development is sequence stabilization. Structural stabilization of the peptide needs to be done to preserve their potency over time. Three different strategies can be employed to achieve this goal:
- The most common method is to substitute selected amino acids with non-standard amino acids, like either homolog of natural amino acids (ex. ornithine, homolysine, norleucine, and norvaline) or the chiral analogs (D-forms) or the naturally-occurring amino acids (L-forms).
- Another method is to incorporate intramolecular bridges to form cyclic structures.
- The stabilization can be achieved through the chemical modification of the N- and C-termini (usually by acetylation and amidation, respectively).
Peptide Library and Mapping Citations
Karlsson, Erik A., et al. mBio 7.4 (2016): e01144-16.
"Peptides 20 amino acids in length spanning the HA and NA proteins of the A/Anhui/1/2013 (H7N9) virus were generated with 15-amino-acid overlaps, resulting in the synthesis of 110 HA peptides and 90 NA peptides synthesized at >90% purity (CPC Scientific). A small number of peptides were synthesized at >70% purity, following multiple synthesis and purification attempts. A poly(K) linker was added to each peptide to increase solubility and to improve the binding orientation of peptides to the Hydrogel slides."
Ren, Yin, and Sangeeta N. Bhatia. U.S. Patent No. 9,006,415. 14 Apr. 2015.
"The tandem peptide library used in this work was synthesized via standard FMOC solid-phase peptide synthesis and purified by high-performance liquid chromatography at the MIT Biopolymers Core, Tufts University Core Facility or CPC Scientific, Inc."
Shojaei, Farbod, et al. Journal of Experimental & Clinical Cancer Research 31.1 (2012): 1.
- Pfizer Global Research and Development, Department of Oncology, La Jolla, CA, USA
"Epitope mapping studies were carried out using an overlapping series of synthetic peptides (CPC Scientific, CA) designed based on the primary sequence of OPN. Peptides corresponding to the region 143-172 of human OPN are listed below [..]"
Wang, Xuelian, et al. Clinical and Vaccine Immunology 15.6 (2008): 937-945.
"A series of 15-mer peptides overlapping each other by 10 amino acids and a series of 9-mer peptides overlapping each other by 8 amino acids covering the HPV16 E6 protein have been described (20). To define the minimal and optimal amino acid sequences of the CD8 T-cell epitope, 8-mer, 10-mer, 11-mer, and homologous peptides (see Table 1) were synthesized as needed (CPC Scientific, Inc., San Jose, CA)."
Stair, Jacqueline L., et al. Journal of Combinatorial Chemistry 8.6 (2006): 929-934.
"The combinatorial library (CPC Scientific) was composed of the sequence GXXGXXGXXGXX (X = cysteine, aspartic acid, or glutamic acid; G = glycine) and synthesized onto TentaGel Macrobeads..."
