Loop modeling is a problem in protein structure prediction requiring the prediction of the conformations of loop regions in proteins with or without the use of a structural template. Computer programs that solve these problems have been used to research a broad range of scientific topics from ADP to breast cancer.[1] [2] Because protein function is determined by its shape and the physiochemical properties of its exposed surface, it is important to create an accurate model for protein/ligand interaction studies.[3] The problem arises often in homology modeling, where the tertiary structure of an amino acid sequence is predicted based on a sequence alignment to a template, or a second sequence whose structure is known. Because loops have highly variable sequences even within a given structural motif or protein fold, they often correspond to unaligned regions in sequence alignments; they also tend to be located at the solvent-exposed surface of globular proteins and thus are more conformationally flexible. Consequently, they often cannot be modeled using standard homology modeling techniques. More constrained versions of loop modeling are also used in the data fitting stages of solving a protein structure by X-ray crystallography, because loops can correspond to regions of low electron density and are therefore difficult to resolve.
Regions of a structural model that are predicted by non-template-based loop modeling tend to be much less accurate than regions that are predicted using template-based techniques. The extent of the inaccuracy increases with the number of amino acids in the loop. The loop amino acids' side chains dihedral angles are often approximated from a rotamer library, but can worsen the inaccuracy of side chain packing in the overall model. Andrej Sali's homology modeling suite MODELLER includes a facility explicitly designed for loop modeling by a satisfaction of spatial restraints method. All methods require an upload of the PDB file and some require the specification of the loop location.
In general, the most accurate predictions are for loops of fewer than 8 amino acids. Extremely short loops of three residues can be determined from geometry alone, provided that the bond lengths and bond angles are specified. Slightly longer loops are often determined from a "spare parts" approach, in which loops of similar length are taken from known crystal structures and adapted to the geometry of the flanking segments. In some methods, the bond lengths and angles of the loop region are allowed to vary, in order to obtain a better fit; in other cases, the constraints of the flanking segments may be varied to find more "protein-like" loop conformations. The accuracy of such short loops may be almost as accurate as that of the homology model upon which it is based. It should also be considered that the loops in proteins may not be well-structured and therefore have no one conformation that could be predicted; NMR experiments indicate that solvent-exposed loops are "floppy" and adopt many conformations, while the loop conformations seen by X-ray crystallography may merely reflect crystal packing interactions, or the stabilizing influence of crystallization co-solvents.
As mentioned above homology-based methods use a database to align the target protein gap with a known template protein. A database of known structures is searched for a loop that fits the gap of interest by similarity of sequence and stems (the edges of the gap created by the unknown loop structure). The success of this method largely depends on the quality of that alignment. Since the loop is the least conserved portion of a protein’s structure, the homology-based method cannot always find a known template that aligns with the target sequence. Fortunately, the template databases are always adding new templates so the problem of not being able to find an alignment is becoming less of an issue. Some programs that use this method are SuperLooper and FREAD.
Otherwise known as an ab initio method, non-template based approaches use a statistical model to fill in the gaps created by the unknown loop structure. Some of these programs include MODELLER, Loopy, and RAPPER; but each of these programs approaches the problem in a different manner. For example, Loopy uses samples of torsion angle pairs to generate the initial loop structure then it revises this structure to maintain a realistic shape and closure, while RAPPER builds from one end of the gap to the other by extending the stem with different sampled angles until the gap is closed.[4] Yet another method is the “divide and conquer” approach. This involves subdividing the loop into 2 segments and then repeatedly dividing and transforming each segment until the loop is small enough to be solved.[5] Even with all these methods non-template based approaches are most accurate up to 12 residues (amino acids within the loop).
There are three problems that arise when using a non-template based technique. First, there are constraints that limit the possibilities for local region modeling. One such constraint is that loop termini are required to end at the correct anchor position. Also, the Ramachandran space cannot contain a backbone of dihedral angles. Second, a modeling program has to use a set procedure. Some programs use the “spare parts” approach as mentioned above. Other programs use a de novo approach that samples sterically feasible loop conformations and selects the best one. Third, determining the best model means that a scoring method must be created to compare the various conformations.[6]