1. Descrition: KIJ_CD-HIT-100_Proteins and UHGP-100_unique proteins are merged, and identical proteins are de-replicated
2. Number of proteins: 107 million
3. Protein fasta file: KIJ-UHGP_unique_Proteins.faa
4. Cluster info file: KIJ-UHGP_unique_Proteins.cluster_info.tsv
>format: 1st column - representative
2nd column - member proteins (separated by ';')
>Representative protein is the longest sequence of the cluster.


Present directory - data/protein_catalog/3.KIJ-UHGP_unique_Proteins

Name Last modified Size
Parent Directory--
KIJ-UHGP_unique_Proteins.cluster_info.tsv.gz2020-06-17 11:34:583 GB
KIJ-UHGP_unique_Proteins.faa.gz2020-06-17 11:35:5023 GB
readme.txt2020-11-04 20:02:40445 B