StrucTFactor

Transcription factor prediction using protein 3D secondary structures

We refer to our GitHub for more details: https://github.com/lieboldj/StrucTFactor

Contact Information: Khalique Newaz
E-mail: khalique.newaz@uni-hamburg.de

Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs. Recently proposed TF prediction methods do not rely on DBDs. Such methods use features of protein sequences to train a machine learning model, and then use the trained model to predict whether a protein is a TF or not. Because the 3-dimensional (3D) structure of a protein captures more information than its sequence, using 3D protein structures will likely allow for more accurate prediction of novel TFs.

Overview of TF prediction method
Overview of our TF prediction method: StrucTFactor

Software / Hardware prerequisites

Before you begin, ensure that you have the following prerequisites installed on your system:

Installation Steps

Given CUDA version 12.2, the following installation guide was tested:

1. Clone the repository:
   git clone https://github.com/lieboldj/StrucTFactor.git

2. Create a conda environment:
   conda env create -f stf.yml
   conda activate stf
        
If you want to benchmark with DeepReg, please install tensorflow.

Predict TF/non-TF for a pdb file

Given a pdb file, you run the following to predict whether this protein is a TF:

cd strucTFactor
python predictTFwithStrucTFactor.py -i <pdb_file>
        

The result will be printed in your terminal and written in output.csv.

Please cite: Liebold J., Neuhaus F., Geiser J., Kurtz S., Baumbach J., Newaz K. (2024). Transcription factor prediction using protein 3D secondary structures.

© 2024. All rights reserved