Downloading gnomAD

PUBLISHED ON JUN 20, 2019 — BIOINFORMATICS, GENETICS

Here is a concise guide to downloading gnomAD using gsutil. I am typically working on a server, and (begrudgingly) use conda to manage software. The only real trick here is getting the conda environment setup – I could have easily called this “Using gsutil with conda.” – the actual gsutil utility is pretty easy to use. gsutil, needed for accessing the gnomAD data, is buried within the ‘google-cloud-sdk’ conda package, not to be confused with the numerous other ‘google-cloud-’ conda packages. (A conda search for gsutil will not, at the time of writing, find the correct google-cloud-sdk package.) The Google Cloud SDK requires python v2, so I think it makes sense to create a dedicated conda environment:

# Create a new conda env with gsutil AND crcmod -- which will allow for
# "sliced" downloads.
conda create --name GoogleCloud python=2 google-cloud-sdk crcmod

You can now activate the new environment and use gsutil to download gnomAD data:

conda activate GoogleCloud
gsutil ls gs://gnomad-public/release
# gs://gnomad-public/release/
# gs://gnomad-public/release/2.0.1/
# gs://gnomad-public/release/2.0.2/
# gs://gnomad-public/release/2.1.1/
# gs://gnomad-public/release/2.1/

Before starting the download, make sure crcmod is correctly compiled. Doing so will speed up the download significantly (days to hours, in my case).

gsutil version -l
#gsutil version: 4.38
#checksum: 58d3e78c61e7e0e80813a6ebc26085f6 (OK)
#boto version: 2.49.0
#python version: 2.7.15 | packaged by conda-forge | (default, Feb 28 2019, 04:00:11) [GCC 7.3.0]
#OS: Linux 3.10.0-693.5.2.el7.x86_64
#multiprocessing available: True
#using cloud sdk: True
#pass cloud sdk credentials to gsutil: True
#config path(s): No config found
#gsutil path: /home/dfiler/tools/miniconda3/envs/GoogleCloud/share/google-cloud-sdk-251.0.0-0/bin/gsutil
#compiled crcmod: True
#installed via package manager: False
#editable install: False

Make sure complied crcmod is True. We can now download data from the desired release. Here, I will show how to download

mkdir -p gnomAD/r2.1.1
gsutil -m cp gs://gnomad-public/release/2.1.1/liftover_grch38/vcf/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38* gnomAD/r2.1.1/
gsutil -m cp gs://gnomad-public/release/2.1.1/liftover_grch38/vcf/genomes/gnomad.genomes.r2.1.1.sites.liftover_grch38* gnomAD/r2.1.1/

Note: not all releases have the same subdirectory structure. Use the -r flag with the gsutil call to copy a whole subdirectory. Also, these take awhile, so submit it to the scheduler or use a screen session if you’re working on a server.