1. Descrition: KIJ_unique_Proteins.faa are further clustered with CD-HIT.
CD-HIT options -c 1.0 -aS 0.8 -n 5
2. Number of proteins: 20,662,850
3. Protein fasta file: KIJ_CD-HIT-100_Proteins.faa
4. Cluster info file: KIJ_CD-HIT-100_Proteins.cluster_info.tsv
>format: 1st column - representative
2nd column - member proteins (separated by ';')
>Representative protein is the longest sequence of the cluster.
Present directory - data/protein_catalog/2.KIJ_CD-HIT-100_Proteins
Name | Last modified | Size | |
---|---|---|---|
Parent Directory | - | - | |
KIJ_CD-HIT-100_Proteins.cluster_info.tsv.gz | 2020-06-17 11:36:12 | 232 MB | |
KIJ_CD-HIT-100_Proteins.faa.gz | 2020-06-16 18:54:35 | 4 GB | |
readme.txt | 2020-06-16 19:02:14 | 445 B |