Guide


0.KIJ_redundant_Proteins/
- Redundant protein set that predicted from 29,082 KIJ_Genomes (protein count:64.7M)

1. KIJ_unique_Proteins/
- Identical proteins were removed from the redundant proteins (protein count: 22.1M)

2. KIJ_CD-HIT-100_Proteins/
- 100% similarity cutoff CD-HIT was performed on 1.KIJ_unique_Proteins (protein count: 20.6M)

3. KIJ-UHGP_unique_Proteins/
- KIJ_CD-HIT-100_Proteins and UHGP-100 are merged and identical sequences are removed (protein count: 107.0M)

4. HRGM_Proteins
- FINAL HRGM Protein catalog.
- CD-HIT 100%, 95%, 90%, 70%, and 50% are performed on KIJ_CD-HIT-100_Proteins sequentially. (See the original paper methods)
- Protein count
i ) HRGM-100: 103.7M
ii ) HRGM-95 : 20.0M
iii) HRGM-90 : 14.8M
iv ) HRGM-70 : 8.5M
v ) HRGM-50 : 4.7M




Present directory - data/protein_catalog

Name Last modified Size
Parent Directory--
0.KIJ_redundant_Proteins2020-06-23 01:30:22-
1.KIJ_unique_Proteins2021-10-21 14:51:55-
2.KIJ_CD-HIT-100_Proteins2020-11-10 09:43:21-
3.KIJ-UHGP_unique_Proteins2020-11-10 09:43:29-
4.HRGM_Proteins2021-01-13 14:06:56-
README.txt2020-11-09 19:48:28798 B