April 8, 2005 This is material that concerns a comparison of the SAM and HMMER HMM packages carried out by Markus Wistrand and Erik Sonnhammer. Please also see the paper: "Improved profile HMM performance by assessment of critical features in SAM and HMMER", Wistrand et al, BMC Bioinformatics, 6:99 Included material - TEST.tar.gz: The testset that was used consists of 505 Pfam alignments. These were matched to the superfamily level in SCOP. HMMs were trained from the Pfam alignments and then tested on sequences from the ASTRAL set that belong to the same superfamily (positives) or to another fold (negatives). Zipped and tarred. - INDEX: The file specifies which positive and negative sequences that are meant to use to test each HMM built from a particular alignment. > Pfam_fam_alignment Test_seq_prefix average_length_of_seqs average_identity_of_seqs positive seq 1 positive seq 2 .... - HiddenMarkovModel_mw.pm and model_convert.pl: This perl code that was used to convert between HMMER and SAM models. It is almost identical to M. Madera and J. Gough's free code, but I have included the possibility to convert to and from HMMER global/local models. There is one or two additional minor changes (see the code) See Madera and Gough's excellent description! http://www.mrc-lmb.cam.ac.uk/genomes/julian/convert/convert.html - CODE.tar: This is the my modification to the free HMMER code (2.3.2). It includes an implementation of an entropy-based way of calculating the effective sequence number. It also includes the atp algorithm which trains the HMM on negative data. Download the file and do 'tar -xf'. Then see further in the README file in SAM_HMMER_code. - recode3.20comp.pri: This is a Dirichlet mixture prior estimated by the Computational Biology group at UCSC. I have modified the format such that it can be used in HMMER (--prior recode3.20comp.pri). It was used during the test. - recode3.20comp.fssp-trans.pri: Here I have combined two priors from UCSC (the recode3.20comp.pri and the fssp-trained.regularizer) to a priorfile that can be read by HMMER. It was also used during the test. OBS: The prior research at UCSC is free and can be reached here: http://www.cse.ucsc.edu/research/compbio/dirichlets/index.html HMMER is distributed under the terms of the GNU General Public License as published by the Free Software Foundation (see LICENSE distributed along with HMMER). Any modifications to HMMER must also be distributed under the same terms. This means that this code is free to anyone and that any changes must also be made freely available. See INSTALL for installation of SAM_HMMER_code (first you need to install HMMER 2.3.2 from http://hmmer.wustl.edu/).