Date of Thesis



With a virus such as Human Immunodeficiency Virus (HIV) that has infected millions of people worldwide, and with many unaware that they are infected, it becomes vital to understand how the virus works and how it functions at the molecular level. Because there currently is no vaccine and no way to eradicate the virus from an infected person, any information about how the virus interacts with its host greatly increases the chances of understanding how HIV works and brings scientists one step closer to being able to combat such a destructive virus. Thousands of HIV viruses have been sequenced and are available in many online databases for public use. Attributes that are linked to each sequence include the viral load within the host and how sick the patient is currently. Being able to predict the stage of infection for someone is a valuable resource, as it could potentially aid in treatment options and proper medication use. Our approach of analyzing region-specific amino acid composition for select genes has been able to predict patient disease state up to an accuracy of 85.4%. Moreover, we output a set of classification rules based on the sequence that may prove useful for diagnosing the expected clinical outcome of the infected patient.


HIV, Bioinformatics, Machine Learning, Sequences

Access Type

Honors Thesis (Bucknell Access Only)

Degree Type

Bachelor of Science



First Advisor

Marie Pizzorno

Second Advisor

Brian King

Third Advisor

Brian King