Date of Thesis
Bachelor of Science
Brian R. King
Luiz Felipe Perrone
Machine learning, Data mining, Protein contact map, Decision trees, Bagging
Proteins' function and structure are intrinsically related. In order to understand proteins' functionality, it is essential for medical and biological researchers to deter- mine proteins' three-dimensional structure. The traditional method using NMR spectroscopy or X-ray crystallography are inefficient compared to computational methods. Fortunately, substantial progress has been made in the prediction of protein structure in bioinformatics. Despite these achievements, the computational complexity of protein folding remains a challenge. Instead, many methods aim to predict a protein contact map from protein sequence using machine learning algorithms. In this thesis, we introduce a novel ensemble method for protein contact map prediction based on bagging multiple decision trees. A random sampling method is used to address the large class imbalance in contact maps. To generalize the feature space, we further clustered the amino acid alphabet from twenty to ten. A software is also developed to view protein contact map at certain threshold and separation. The parameters used in decision trees are determined experimentally, and the overall results for the first L, L/2 and L/5 predictions for protein of length L are evaluated.
Ren, Chuqiao, "Predicting Protein Contact Map By Bagging Decision Trees" (2015). Honors Theses. 329.