The residue composition of a ligand binding site determines the interactions available for diffusion-mediated ligand binding, and understanding general composition of these sites is of great importance if we are to gain insight into the functional diversity of the proteome. Many structure-based drug design methods utilize such heuristic information for improving prediction or characterization of ligand-binding sites in proteins of unknown function. The Binding MOAD database if one of the largest curated sets of protein-ligand complexes, and provides a source of diverse, high-quality data for establishing general trends of residue composition from currently available protein structures. We present an analysis of 3,295 non-redundant proteins with 9,114 non-redundant binding sites to identify residues over-represented in binding regions versus the rest of the protein surface. The Binding MOAD database delineates biologically-relevant "valid" ligands from "invalid" small-molecule ligands bound to the protein. Invalids are present in the crystallization medium and serve no known biological function. Contacts are found to differ between these classes of ligands, indicating that residue composition of biologically relevant binding sites is distinct not only from the rest of the protein surface, but also from surface regions capable of opportunistic binding of non-functional small molecules. To confirm these trends, we perform a rigorous analysis of the variation of residue propensity with respect to the size of the dataset and the content bias inherent in structure sets obtained from a large protein structure database. The optimal size of the dataset for establishing general trends of residue propensities, as well as strategies for assessing the significance of such trends, are suggested for future studies of binding-site composition.
PLOS COMPUTATIONAL BIOLOGY
proteins,protein binding,computational biology,amino acid sequence,ligands,binding sites,hydrogen bonding
Protein ligand,Plasma protein binding,Protein structure database,Binding site,Biology,Ligand (biochemistry),Small molecule,Proteome,Bioinformatics,Protein structure