Foldcomp
24 March, 2023 - Categories: project - Tags: protein, protein-structure, compression, c++, bioinformatics, bioinformatics-software
Repository Presentation Paper Slides ISMB2022 poster ISMB2023 poster


Foldcomp is a protein structure compression tool written in C++ which utilzes torsion angles to represent protein structures in a compact format. It compresses the backbone atoms to 8 bytes and the side chain to additionally 4-5 byes per residue, thus an averaged-sized protein of 350 residues requires ~6kb. Foldcomp efficient compressed format stores protein structures requiring only 13 bytes per residue, which reduces the required storage space by an order of magnitude compared to saving 3D coordinates directly. We achieve this reduction by encoding the torsion angles of the backbone as well as the side-chain angles in a compact binary file format (FCZ). By adopting MMseqs2's database format, we could reduce the file number as well.