GEMs/

: Genome-scale metabolic models (GEMs) reconstructed for non-redundant genomes in HRGMv2.
    ⚠️ GEM reconstruction failed for the following four genomes due to unknown reasons:
(GENOME087726 in HRGMv2_0709, GENOME205746 in HRGMv2_3350, GENOME226109 in HRGMv2_3524, GENOME227506 in HRGMv2_3550)

- GEMs_results/ : Individual GEM models (in XML format) for each non-redundant genome
- For bulk download:
    1. GEMs.tar.gz – compressed archive of the entire GEMs/ folder
    2. download_link_info.tsv – table listing full download URLs for each GEM file

HRGMv2_Genomes/

: Final genome catalog representing 4,824 non-redundant species in HRGMv2

- HRGMv2_Rep_Genome/ : Genome assemblies (FASTA format) of 4,824 representative genomes (one per species)
- HRGMv2_Pangenomes/ : Pangenomes for each species, including core/accessory gene sets and Panaroo outputs

Total_Genomes/

: All genome sequences used during HRGMv2 construction, grouped by redundancy level

- Redundant_genomes/ : Genome sequences of 230,632 input genomes (prior to dereplication)
- Nonredundant_genomes/ : Final set of 155,211 dereplicated genomes used to define HRGMv2 species

- For bulk download:
    1. Redundant_genomes.tar.gz – archive of the Redundant_genomes/ folder
    2. Nonredundant_genomes.tar.gz – archive of the Nonredundant_genomes/ folder
    3. download_link_info.tsv – table with full download links for each genome

Taxonomy_Profiling/

: Resources for performing taxonomic profiling using HRGMv2 species

- 16S_rRNA/ : Predicted 16S rRNA sequences and related statistics
- HRGMv2_kraken2_customdb/ : Custom taxonomy database for Kraken2 and Bracken
- HRGMv2_metaphlan_customdb/ : MetaPhlAn4-compatible custom database

** METADATA

- HRGMv2_Cluster_metadata.tsv : Species-level metadata for the 4,824 HRGMv2 clusters (e.g., taxonomy, genome quality, etc.)
- Dereplication_genomes_metadata.tsv : Metadata for all 230,632 genomes used prior to dereplication
- HRGMv2_gtdbr220_results.tsv : GTDB r220-based taxonomic assignments for the 4,824 HRGMv2 species

* File system structure:

Most large directories follow a 3-level or 4-level hierarchical structure to facilitate navigation and prevent overload of individual folders.
Example – for HRGMv2_Rep_Genome/:


HRGMv2_Rep_Genome/ ← Root directory
└── HRGMv2_20XX/ ← Level 1 (group of ~100 genomes)
└── HRGMv2_204X/ ← Level 2 (group of ~10 genomes)
├── HRGMv2_2040.fna ← Level 3 (genome FASTA file)
├── HRGMv2_2041.fna
└── ...



Present directory - data/genome_catalog

Name Last modified Size
Parent Directory--
GEMs2025-07-21 10:49:58-
HRGMv2_Genomes2025-07-21 11:01:01-
Taxonomy_Profiling2025-07-21 10:47:37-
Total_Genomes2025-04-02 04:03:29-
Dereplication_genomes_metadata.tsv2025-02-16 22:09:3579 MB
HRGMv2_Cluster_metadata.tsv2025-02-16 22:09:351 MB
HRGMv2_gtdbr220_results.tsv2025-04-21 19:43:011 MB
README.txt2025-07-21 10:37:523 KB