Winter 2016: Introduction to Bioinformatics and Computational Biology

Schedule: Monday and Wednesday 9-10:20am

Location: TTI-C conference room 526 on the 5th floor, 6045 S Kenwood Ave, Chicago, IL 60637

Instructor: Jinbo Xu (jinboxu@gmail.com, office: TTI-C room 528)

 

Students can register this course through the University of Chicago (CS department).

 

 

With availability of a large-scale of genomic, expression and structural data, mathematics/statistics/computer science is being extensively used for the understanding of biological data at the molecular level. This course will focus on the application of machine learning and computer algorithms to the problems in the field of molecular biology. In particular, this course will cover some fundamental computational molecular biology problems including sequence alignment, homology search, RNA/protein structure analysis and prediction, gene expression, biological network analysis and next-generation sequencing.

 

Students are highly encouraged to read the following materials before attending this class since they will not be covered in the class.

1. The Department of Energy's Primer on Molecular Genetics.

2. The Department of Energy's Overview of the Human Genome Project.

3. Hunter's molecular biology for computer scientists.

4. National New Biology Initiative: A New Biology for the 21st Century.

Syllabus

Here is a syllabus for this course.  A temporary reading list is available at here.

Intended Audience

 

Graduate students or senior undergraduate students with Math/CS/statistics/biology background. To be able to finish the assignments and the final research project, students shall do some programming using C++, Java, Matlab, Python, R or other scientific computing software.

Evaluation

 

There will be no examination for this course. The final grade consists of three components: homework, one final research project and attendance. For the homework assignments, you can re-implement a popular algorithm or conduct an experiment to compare several popular bioinformatics tools and summarize your work in a technical report (around 5 pages). The homework assignments will account for 50% of the final grade. The final research project requires you to develop some new algorithms for a bioinformatics problem. You are not required to come up with extremely innovative ideas, although it is highly encouraged. Incremental improvement over existing algorithms is acceptable for the final research project. Please hand in a report of the final research project. The final project accounts for 40% of the final grade. All the students are required to finish both homework and the final research project. However, undergraduate students will be marked more generously. The students have to attend the class to earn the remaining 10%.

Homework Assignments

1)     Redevelop the PAM and BLOSUM matrices and compare them with the published matrices.

2)     Conduct experiments to compare PSI-BLAST, CS-BLAST and HHBlits

3)     Reimplement the dynamic programming algorithm for local sequence alignment and compare your code with the established tools such as FASTA, BLAST and the Smith-Waterman algorithm

4)     Design an experiment to study how accurate is the BLAST E-value estimation (for protein homology search). Use a random model we taught in the class for both the query sequence and the database

5)     Benchmark several multiple sequence alignment tools such as ProbCons, T-Coffee, MUSCLE

 

You can use existing libraries or Matlab to implement your algorithm. However, please clearly point out your contribution in your report. If you use other bioinformatics libraries, please pay more attention to result analysis.

Research Projects

Please choose one of the following topics. You are also encouraged to propose your own topics. However, you can not work on the same topic for both your assignment and your research project.

1.     Develop new algorithms for pairwise or multiple protein-protein interaction network alignment

2.     Develop new algorithms for network motif discovery

3.     Develop new algorithms for the generation of degree-preserving random networks

4.     Develop new algorithms for protein interface alignment

5.     Develop new algorithms for alignment of protein binding sites

6.     Develop new algorithms for protein binding site prediction

7.     Develop new algorithms for RNA pseudo-knots prediction

 

The due date of the final project is early mid in March, 2012. Please send me a brief abstract (one paragraph) to tell me what want to work on before mid in Feb, 2016. If you need your final grade to graduate, please talk to me and hand in the final project earlier. If you need more time to complete the research project, please also talk to me.