1. HRGM_CD-HIT_[similarity-level]_Proteins.faa
- Amino acid sequences
2. HRGM_CD-HIT_[similarity-level]_eggnogmapper.tsv
- eggNog-mapper result file
3. HRGM_CD-HIT_[similarity-level]_cluster_info.tsv
- col1: representative (by CD-HIT, the longest sequence)
- col2: members (separated by ';')
4. Taxonomic annotation of proteins
- Species origin of the member proteins
- col1: protein name
- col2: taxonomic level of the lowest common ancestor
- col3: species list (separated by '|')
WARNING: Please check the md5sum for the big files.
Present directory - data/protein_catalog/4.HRGM_Proteins