GEMs/
: Genome-scale metabolic models (GEMs) reconstructed for non-redundant genomes in HRGMv2.
⚠️ GEM reconstruction failed for the following four genomes due to unknown reasons:
(GENOME087726 in HRGMv2_0709, GENOME205746 in HRGMv2_3350, GENOME226109 in HRGMv2_3524, GENOME227506 in HRGMv2_3550)
- GEMs_results/ : Individual GEM models (in XML format) for each non-redundant genome
- For bulk download:
1. GEMs.tar.gz – compressed archive of the entire GEMs/ folder
2. download_link_info.tsv – table listing full download URLs for each GEM file
HRGMv2_Genomes/
: Final genome catalog representing 4,824 non-redundant species in HRGMv2
- HRGMv2_Rep_Genome/ : Genome assemblies (FASTA format) of 4,824 representative genomes (one per species)
- HRGMv2_Pangenomes/ : Pangenomes for each species, including core/accessory gene sets and Panaroo outputs
Total_Genomes/
: All genome sequences used during HRGMv2 construction, grouped by redundancy level
- Redundant_genomes/ : Genome sequences of 230,632 input genomes (prior to dereplication)
- Nonredundant_genomes/ : Final set of 155,211 dereplicated genomes used to define HRGMv2 species
- For bulk download:
1. Redundant_genomes.tar.gz – archive of the Redundant_genomes/ folder
2. Nonredundant_genomes.tar.gz – archive of the Nonredundant_genomes/ folder
3. download_link_info.tsv – table with full download links for each genome
Taxonomy_Profiling/
: Resources for performing taxonomic profiling using HRGMv2 species
- 16S_rRNA/ : Predicted 16S rRNA sequences and related statistics
- HRGMv2_kraken2_customdb/ : Custom taxonomy database for Kraken2 and Bracken
- HRGMv2_metaphlan_customdb/ : MetaPhlAn4-compatible custom database
** METADATA
- HRGMv2_Cluster_metadata.tsv : Species-level metadata for the 4,824 HRGMv2 clusters (e.g., taxonomy, genome quality, etc.)
- Dereplication_genomes_metadata.tsv : Metadata for all 230,632 genomes used prior to dereplication
- HRGMv2_gtdbr220_results.tsv : GTDB r220-based taxonomic assignments for the 4,824 HRGMv2 species
* File system structure:
Most large directories follow a 3-level or 4-level hierarchical structure to facilitate navigation and prevent overload of individual folders.
Example – for HRGMv2_Rep_Genome/:
HRGMv2_Rep_Genome/ ← Root directory
└── HRGMv2_20XX/ ← Level 1 (group of ~100 genomes)
└── HRGMv2_204X/ ← Level 2 (group of ~10 genomes)
├── HRGMv2_2040.fna ← Level 3 (genome FASTA file)
├── HRGMv2_2041.fna
└── ...