Bioinformatics projects for internship, master thesis, or bachelor thesis in the Sonnhammer group:
Projects are offered in the area of protein function prediction and analysis, mainly in three projects:
Specific project descriptions are listed here:
- Gene network construction and analysis
- Orthology identification and analysis
- Protein domain architecture evolution and analysis
- HiPathway - discovery of novel pathways from high-throughput data
The goal of this project is to use high-throughput omics data to derive groups of proteins with a coherent function and to map these to known pathways. This will reveal which pathways are rediscoverable and also provide novel protein sets that represent novel pathways. There are several databases and methods that can be used, and the goal is to apply several of them and compare the results. The project will involve script writing and analysis of results.
- Modules in gene regulatory networks
The aim of the project is to find biologically relevant or functional modules in Gene Regulatory Networks (GRNs) inferred from cancer cell line knockdown data. This will be done by applying module detection methods to large GRNs in order to identify submodules. The found modules will be characterized and evaluated for reproducibility and functional features. The project will involve script writing and analysis of results.
- Differential network analysis
The project aims to identify modules (sets of tightly interconnected genes) in gene networks of prostate cancer. Further, modular differences will be studied in healthy and cancerous regions of tissue samples to reveal changed genes and pathways. The project will involve script writing and analysis of results.
- Fast orthology analysis
Orthology in InParanoid, Hieranoid use the Blast tool. Recently, much faster homology searching tools have become available. The goal is to speed up the above orthology inference algorithms, while retaining the same accuracy. An additional aim is to make the 2-pass homology search strategy more efficient. The project will involve script writing (mostly Perl) and analysis of results.
- Benchmarking the next generation of homology search tools
Recently, ultrafast homology search tools like MMSeqs2 and Diamond have become popular as they claim to combine much higher speed than Blast with the same sensitivity. The goal of the project is use a previously developed benchmark to evaluate the sensitivity and speed of these new tools. The project will involve script writing and analysis of results.
- Network-based identification of new disease genes.
The goal is to use the FunCoup network to identify genes not previously associated with a disease but with an enrichment of network links to known disease genes. Furthermore, conservation of the relevant subnetwork in model organisms will be analysed in order to identify particularly interesting new candidates. The project will involve script writing and analysis of results.
- Disease module detection and analysis
- Domain orthology
Orthology in InParanoid, Hieranoid, and most other ortholog databases is defined on the whole-protein level. The goal is to explore the difference when orthology is defined on the domain level, and to establish rules for when domain orthology can give an advantage. The project will involve script writing and analysis of results.
- Benchmarking of pathway analysis methods
BinoX is a new tool for measuring crosstalk between gene sets, particularly aimed at pathway annotation. The goal is to benchmark BinoX and other similar tools such as NEAT, NEA+, and LEGO, possibly also in combination with clustering of the gene sets. The project will involve script writing and analysis of results.
- Ultrafast sequence clustering for Pfam-B
Pfam is a comprehensive database of annotated Pfam domains. Between these Pfam-A domains are long stretches of unclassified sequences. In the past, homology-based methods were used to cluster these stretches into Pfam-B domains but with the huge growth of the sequence databases this method is no longer feasible. Instead we want to employ ultrafast alignment-free methods to make an approximate but feasible sequence clustering. The project will involve exploration of alignment-free algorithms and packages, programming, script writing, and analysis of results.
- Development of an interactive website to explore protein networks
The FunCoup network database is a vast resource of functional coupling between proteins and genes. The goal of the project is to develop a new network viewer with modern web technologies that is integrated with the FunCoup database and can make full use of its features. The project will mainly involve programming.
- Pathway crosstalk enrichment visualisation
The PathwAX website provides online pathway annotation based on crosstalk derived through FunCoup's genome wide functional association networks. The goal is to develop a network visualisation tool that will show the crosstalk between a given query gene set and a give pathway. The project will mainly involve programming.
- Development of an online Hieranoid database.
- A tool for analysing protein domain architecture evolution.
- Protein domain architecture evolution
- Benchmarking the next generation of homology inference tools
- Adaptive evolution in birds
- Interactive website to explore protein networks