Some theoretical considerations suggest that the reason for this inadequacy is probably not methodological and the existing methods perform nearly optimally, especially in combination with each other. None of the protein structure predicting methods perform satisfactorily, which is very frustrating because genome sequencing projects are producing numerous novel coding sequences, and understanding the structure is probably necessary in order to understand the function.
Amino acid sequences contain all the information necessary to make up the correct three-dimensional structure that is, given a proper environment, a protein would fold up spontaneously into a conformation that minimizes the total free energy of the system. True ab initio approaches rely on Anfinsen's thermodynamic principle, which states that protein folding is thermodynamically determined. some template sequence, which is reliably similar to the target sequence, already exists and the sequence-structure connection is known. The first two methods are knowledge based (database-driven), i.e. Existing approaches are commonly classified as: (1) comparative modeling (2) fold recognition and (3) ab initio methods. The problem is to predict the native three-dimensional structure of a protein from its amino acid sequence. The protein folding problem has been one of the grand challenges in computational molecular biology. However, it was not sufficient to choose one preferred configuration between the many possible predicted options. The location and pattern of the most compatible subsequences was very similar or identical when the three fundamentally different matrices were used, which indicates the consistency of physicochemical compatibility. We tried to predict or reconstruct simple 2D representations of 3D structures from the sequence using these matrices by applying a dot plot-like method. Regression analyzes indicated at least 7 well distinguished clusters regarding size compatibility and 5 clusters of charge compatibility. Size compatibility between residues (well known to exist in nucleic acids) is a novel observation for proteins. We found statistically significant positive correlations between these indices and the propensity for amino acid co-locations in real protein structures (a sample containing total 34630 co-locations in 80 different protein structures): for HCI: p < 0.01, n = 400 in 10 subgroups for SCI p < 1.3E-08, n = 400 in 10 subgroups for CCI: p < 0.01, n = 175). Each index characterized the expected strength of interaction (compatibility) of two amino acids by numbers from 1 (not compatible) to 20 (highly compatible). We indexed the 200 possible amino acid pairs for their compatibility regarding the three major physicochemical properties – size, charge and hydrophobicity – and constructed Size, Charge and Hydropathy Compatibility Indices and Matrices (SCI & SCM, CCI & CCM, and HCI & HCM).