(Ramachandran & rotamers)

`README.html`- this file (supplemental documentation and notes for posterity ;)- howto/ - additional
documentation on how to use this data for specific tasks

`StructValid.pdf`- the 2003 paper by Lovell,*et al.*that fully describes the Ramachandran data, including the density-dependent smoothing method. The rotamer data has not yet been published (as of April 2003), but the information on materials & methods is equally applicable. Please, read the paper carefully before trying to use this data in your own research! [details]`PenultRotLib.pdf`- the 2000 paper by Lovell,*et al.*that describes an earlier analysis of sidechain rotamers. It focuses on identifying and naming true rotamers (*i.e.*, energy minima) and so is complementary to this data, which defines a probability distribution over all possible conformations.

`Makefile`- a file that will re-create`stat/`,`pct/`, and`kin/`from the data in`srcdata/``lib/`- Java software used to smooth the input data (Silk)`scriptbin/`- a collection of AWK scripts that were used to prepare the input data`srcdata/`- quality-filtered input data that was extracted from the Top500 database

`stat/`- raw density traces that can be used for statistical purposes, like Boltzmann energy potentials [details]`pct/`- density traces that have been converted so as to be useful in determing whether given conformations are allowed or outlier [details]`kin/`- kinemage format illustrations for exploring the data interactively [details]

- The Top500 database of PDB files - available from http://kinemage.biochem.duke.edu
- The
`kin2Dcont`and`kin3Dcont`contouring software - available from http://kinemage.biochem.duke.edu

This gives us more than 100,000 residues to analyze. To remove some of the noisiest data, residues with

Because glycine is a symmetrical molecule, the local physical constraints on its phi-psi preferences should also be symmetrical. However, natural selection favors glycines only when they are necessary, and so there is a frequency bias towards L-alpha in the natural distribution. To generate our density traces, we

As described in The Penultimate Rotamer Library, previous rotamer libraries have included some physically impossible "decoy" rotamers for leucine that fill roughly the same space as real leucine rotamers. We

After inspecting the distributions for phenylalanine and tyrosine, we conclude that there is no observable difference. Therefore, in order to improve the quality of the data, rotamer data for

We are now classifying density levels by the fraction of data points

Another important change relates to the way the smoothing functions are specified. In the paper, we give the maximum radius of the cosine function, the distance at which it falls to zero. The software now uses the half-width (

In fact, we don't actually use a Gaussian (something like exp[-x

However, the choice of smoothing function has far less impact on the outcome than does the use of our density-dependent smoothing algorithm. The problem with the traditional, one-pass, Gaussian-smoothing analysis is that it blurs out the boundaries of the Ramachandran plot. Some regions, like the shallow "beach" in the lower-left of the general plot, have very sparse populations and soft boundaries. Other regions, like alpha helix, are "cliffs" that have very high populations (many orders of magnitude above the other regions) with very hard boundaries (the population falls to zero just a few degrees to the right). The traditional analysis is incapable of treating both in a way that gives in physically realistic results -- either the beach is left too lumpy or the cliff is smeared out.

It is for this reason that we developed the density-dependent smoothing algorithm, which smooths the dense regions less and the sparse regions more. In this application the cosine has some advantages over the Gaussian, because it falls to zero at a finite distance. Thus, it can be computed without truncation, so its volume really sums to 1. Also, the Gaussian must be evaluated further out [we use 4.5 halfwidths as the limit (where the value falls to ~ 1e-6 of its maximum), rather than just 2 for the cosine] in order to get a good approximation, which means it can take substantially longer to compute, particularly for higher-dimensional data spaces. Finally, because its tails actually go to zero, the cosine is less prone to smearing out the cliffs than the Gaussian is. The suggestion for constructing a density trace using cosines was taken from the NCSS statistical analysis software package; see http://exploringdata.cqu.edu.au/den_trac.htm.

As far as we know, the density-dependent smoothing algorithm is completely novel; no existing statistical technique could be discovered to treat this type of problem. Our approach attempts to better represent what we believe is the true underlying structure of the (noisy) data. Thus, this analysis is almost like image processing, in which one filters and manipulates a noisy photograph in an attempt to extract a clearer image of the original subject. The resulting image is quite different from the original photo, but (hopefully) is a better representation of

Each

This data would be appropriate for statistical applications, such as predicting the energy difference between two conformational states. Normalizing the data in a way that is appropriate to the application at hand is left to the user.

Note that the Ramachandran plots are heavily biased by inter-residue interactions -- secondary structure. For this reason, alpha helix and beta sheet conformations are greatly over-represented relative to their individual energies. You may find it more helpful to work with the data labeled "nosec," which has all residues in repetitive secondary structure removed.