SC²S Colloquium - June 25, 2008
Date: | June 25 |
Room: | 02.07.023 |
Time: | 1 pm, s.t. |
13:00 Uhr - Thomas Haller: Mulitreference Alignment of 3D Data from Electron Tomography (DA)
The goal of this thesis is the development of software for fast averaging and alignment of sub-tomograms. The sub-tomograms contain the three-dimensional structural information of macromolecular protein complexes imaged by cryo-electron tomography. Cryo-electron tomography is an emerging technique used in structural biology, capable of resolving large complexes at nanometre resolution, providing insight into the function and interactions that take place within a cellular context. However, attainable resolution is limited by the low signal-to-noise ratio and distortion from artifacts such as the ``missing wedge (i.e.\ missing structural information caused by inherent limitations of electron tomography). Coherent averaging of sub-tomograms is a method for improving signal-to-noise ratios and compensating for such artifacts and therefore improving resolution.
The presented application herein computes the average of the randomly orientated complexes, automatically determining the necessary unknown alignments/orientations. The algorithm applies an iterative approach, whereby sub-tomograms are aligned to estimated averages which are then recalculated from the aligned sub-tomograms and used in subsequent refinement of the alignments. The alignments between sub-tomograms and the estimated average are determined by searching for the strongest correlation score through hierarchical rotational sampling. Normalized cross-correlation is used as the measure of similarity with a special consideration for the missing wedge. By handling multiple references/averages and classifying the sub-tomograms according to their similarity with the (class) averages, a rotational invariant k-means clustering is realized that allows the classification of sub-tomograms according to their structure.
Due to the high computational complexity of the multireference alignment problem, computation time is a critical issue. Hence, the developed software is parallelized using message passing interface (MPI) and adopted to run on clusters.