Guide

0.KIJ_redundant_Proteins/
- Redundant protein set that predicted from 29,082 KIJ_Genomes (protein count:64.7M)
1. KIJ_unique_Proteins/
- Identical proteins were removed from the redundant proteins (protein count: 22.1M)
2. KIJ_CD-HIT-100_Proteins/
- 100% similarity cutoff CD-HIT was performed on 1.KIJ_unique_Proteins (protein count: 20.6M)
3. KIJ-UHGP_unique_Proteins/
- KIJ_CD-HIT-100_Proteins and UHGP-100 are merged and identical sequences are removed (protein count: 107.0M)
4. HRGM_Proteins
- FINAL HRGM Protein catalog.
- CD-HIT 100%, 95%, 90%, 70%, and 50% are performed on KIJ_CD-HIT-100_Proteins sequentially. (See the original paper methods)
- Protein count
i ) HRGM-100: 103.7M
ii ) HRGM-95 : 20.0M
iii) HRGM-90 : 14.8M
iv ) HRGM-70 : 8.5M
v ) HRGM-50 : 4.7M
Present directory - data/protein_catalog
Name | Last modified | Size | |
---|---|---|---|
Parent Directory | - | - | |
0.KIJ_redundant_Proteins | 2020-06-23 01:30:22 | - | |
1.KIJ_unique_Proteins | 2021-10-21 14:51:55 | - | |
2.KIJ_CD-HIT-100_Proteins | 2020-11-10 09:43:21 | - | |
3.KIJ-UHGP_unique_Proteins | 2020-11-10 09:43:29 | - | |
4.HRGM_Proteins | 2021-01-13 14:06:56 | - | |
README.txt | 2020-11-09 19:48:28 | 798 B |