Downloading complete mitochondrial genomes

Updated on 2025-07-02

Kirill Kryukov

Summary

This page describes the method I used for downloading all available complete mitochondrial genomes from NCBI Organelle and NCBI Nucleotide databases. I needed this data for computing some statistics. This page and scripts are in public domain, feel free to use. Please email any feedback to kkryukov@gmail.com.

NCBI Organelle

Downloading

Follow steps in "ncbiorg-1-download.txt":

  1. Open NCBI "Organelle" in browser: https://www.ncbi.nlm.nih.gov/datasets/organelle/.
  2. In "Selected taxa" type "Eukaryota".
  3. Click "Filters", select "Mitochondrion".
  4. Click on checkbox before "Scientific name", to select all entries.
  5. To get summary, click "Download", select "Download Table". This downloads "ncbi_dataset.tsv".
  6. To get sequences, click "Download", select "Download Package". This downloads "ncbi_dataset.zip".

Processing

  1. Use ncbiorg-2-get-sequences.pl to compress nucleotide sequences into "ncbiorg-sequences.fna.naf".
  2. Use ncbiorg-3-get-unique-names.sh to get the list of unique organism names in "unique-organism-names.txt": cat ncbi_dataset.tsv | tail -n +2 | cut -f 8 | sort | uniq >ncbiorg-unique-organism-names.txt.

NCBI Nucleotide

Downloading

Follow steps in "ncbinuc-1-download.txt":

  1. Open NCBI Nucleotide page in browser: https://www.ncbi.nlm.nih.gov/nuccore.
  2. Search for "Eukaryota".
  3. Set "Genetic compartments" to "Mitochondrion".
  4. Set "Molecule types" to "genomic DNA/RNA".
  5. Set "Sequence length" to "Custom range..." of 5500 to 100000000. Because the smallest known mitochondrial genome is 5,967 bp in Plasmodium falciparum (https://en.wikipedia.org/wiki/Mitochondrial_DNA).
  6. To get summary, use "Send to:" => "Complete Record", click on "File", choose Format: "Summary", click "Create File". This downloads a file named "nuccore_result.txt".
  7. To get sequences, use "Send to:" => "Complete Record", click on "File", choose Format: "FASTA", click "Create File". This downloads a file named "sequences.fasta".

Finding complete genomes

  1. Use ncbinuc-2-compress.pl to compress sequences.
  2. Use ncbinuc-3-get-names.pl to get names into "nuccore_results.names.txt".
  3. Use ncbinuc-4-get-complete-genome-names.pl to parse the names and save those with complete genomes into "nuccore_results.names.complete-genome.txt".
  4. Use ncbinuc-5-truncate-names.pl to keep only accessions and organism names in sequence names.

Comments

On 2025-06-27 NCBI Organelle contained mitochondrial genomes of 18,133 unique organisms. On the same day, NCBI Nucleotide contained complete mitochondrial genomes for 31,801 unique organisms. It's probably best to use both resources.

Detection of complete genomes from NCBI Nucleotide is based on parsing sequence names, which could be unreliable. It may miss some complete genomes, and may erroneously detect some partial sequence as complete. This may be OK for some purposes, but please take care when using this data.