1. Descrition: KIJ_unique_Proteins.faa are further clustered with CD-HIT.
CD-HIT options -c 1.0 -aS 0.8 -n 5
2. Number of proteins: 20,662,850
3. Protein fasta file: KIJ_CD-HIT-100_Proteins.faa
4. Cluster info file: KIJ_CD-HIT-100_Proteins.cluster_info.tsv
>format: 1st column - representative
2nd column - member proteins (separated by ';')
>Representative protein is the longest sequence of the cluster.


Present directory - data/protein_catalog/2.KIJ_CD-HIT-100_Proteins

Name Last modified Size
Parent Directory--
KIJ_CD-HIT-100_Proteins.cluster_info.tsv.gz2020-06-17 11:36:12232 MB
KIJ_CD-HIT-100_Proteins.faa.gz2020-06-16 18:54:354 GB
readme.txt2020-06-16 19:02:14445 B