GEMs/
: Genome-scale metabolic models (GEMs) reconstructed for non-redundant genomes in HRGMv2.
⚠️ GEM reconstruction failed for the following four genomes due to unknown reasons:
(GENOME087726 in HRGMv2_0709, GENOME205746 in HRGMv2_3350, GENOME226109 in HRGMv2_3524, GENOME227506 in HRGMv2_3550)
- GEMs_results/ : Individual GEM models (in XML format) for each non-redundant genome
- For bulk download:
1.GEMs.tar.gz– compressed archive of the entireGEMs/folder
2.download_link_info.tsv– table listing full download URLs for each GEM file
HRGMv2_Genomes/
: Final genome catalog representing 4,824 non-redundant species in HRGMv2
- HRGMv2_Rep_Genome/ : Genome assemblies (FASTA format) of 4,824 representative genomes (one per species)
- HRGMv2_Pangenomes/ : Pangenomes for each species, including core/accessory gene sets and Panaroo outputs
Total_Genomes/
: All genome sequences used during HRGMv2 construction, grouped by redundancy level
- Redundant_genomes/ : Genome sequences of 230,632 input genomes (prior to dereplication)
- Nonredundant_genomes/ : Final set of 155,211 dereplicated genomes used to define HRGMv2 species
- For bulk download:
1.Redundant_genomes.tar.gz– archive of theRedundant_genomes/folder
2.Nonredundant_genomes.tar.gz– archive of theNonredundant_genomes/folder
3.download_link_info.tsv– table with full download links for each genome
Taxonomy_Profiling/
: Resources for performing taxonomic profiling using HRGMv2 species
- 16S_rRNA/ : Predicted 16S rRNA sequences and related statistics
- HRGMv2_kraken2_customdb/ : Custom taxonomy database for Kraken2 and Bracken
- HRGMv2_metaphlan_customdb/ : MetaPhlAn4-compatible custom database
** METADATA
- HRGMv2_Cluster_metadata.tsv : Species-level metadata for the 4,824 HRGMv2 clusters (e.g., taxonomy, genome quality, etc.)
- Dereplication_genomes_metadata.tsv : Metadata for all 230,632 genomes used prior to dereplication
- HRGMv2_gtdbr220_results.tsv : GTDB r220-based taxonomic assignments for the 4,824 HRGMv2 species
* File system structure:
Most large directories follow a 3-level or 4-level hierarchical structure to facilitate navigation and prevent overload of individual folders.
Example – forHRGMv2_Rep_Genome/:
HRGMv2_Rep_Genome/ ← Root directory
└── HRGMv2_20XX/ ← Level 1 (group of ~100 genomes)
└── HRGMv2_204X/ ← Level 2 (group of ~10 genomes)
├── HRGMv2_2040.fna ← Level 3 (genome FASTA file)
├── HRGMv2_2041.fna
└── ...
Present directory - data/genome_catalog
| Name | Last modified | Size | |
|---|---|---|---|
| Parent Directory | - | - | |
| GEMs | 2025-07-21 10:49:58 | - | |
| HRGMv2_Genomes | 2025-07-21 11:01:01 | - | |
| Taxonomy_Profiling | 2025-07-21 10:47:37 | - | |
| Total_Genomes | 2025-04-02 04:03:29 | - | |
| Dereplication_genomes_metadata.tsv | 2025-02-16 22:09:35 | 79 MB | |
| HRGMv2_Cluster_metadata.tsv | 2025-02-16 22:09:35 | 1 MB | |
| HRGMv2_gtdbr220_results.tsv | 2025-04-21 19:43:01 | 1 MB | |
| README.txt | 2025-07-21 10:37:52 | 3 KB |