Skip Navigation Links www.nws.noaa.gov 
NOAA logo - Click to go to the NOAA home page National Weather Service   NWS logo - Click to go to the NWS home page
Climate Prediction Center
 
 

CPC Search
About Us
   Our Mission
   Who We Are

Contact Us
   CPC Information
   CPC Web Team

 
HOME > Monitoring and Data > Oceanic & Atmospheric Data > Meteorlogical Data Servers
 
 
Fast Downloading of GRIB Files
Partial http transfers
 
Introduction

Downloading meteorological data can be a pain. Servers are under-powered, connections are slow and the "bean counters" figure that 800 GB will store a trillion spreadsheets so who would want more disk space? Can't help with the last problem but downloading data in GRIB files can be made faster.

Often people only need a few fields from a GRIB file. For example, the GFS forecasts contain over 600 fields per forecast time. Many people are only interested in a few fields such as the precipitation or 500 mb heights. Assuming we only wanted two fields, downloading 600+ fields to get two fields is just silly.

If You are Lucky, it is Simple

Some datasets have pre-configured scripts to download the data. See Part 2 for more information.

Details

The http protocol allows "random access" reading; however, that means that we need an index file and a http program that supports random access. For the index file, we can modify a wgrib2 inventory. For the random-access http(s) program, we can use cURL. Both are freely available, widely used, work on many platforms and are easily scripted/automated/put into a cronjob.

The basic format of the quick download is,

   get_inv.pl INV_URL | grep (options) FIELDS | get_grib.pl GRIB_URL OUTPUT
  
   INV_URL is the URL of a wgrib/wgrib2 inventory
       ex. https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv
  
   grep (options) FIELDS selects the desired fields (wgrib compatible)
       ex. grep -F ":HGT:500 mb:" selects ":HGT:500 mb"
       ex. grep -E ":(HGT|TMP):500 mb:" selects ":HGT:500 mb:" and ":TMP:500 mb:"
  
   GRIB_URL is the URL of the grib file
       ex. https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12
  
   OUTPUT is the name of the for the downloaded grib file

The "get_inv.pl INV_URL" downloads the wgrib inventory off the net and adds a range field. The "grep FIELDS" uses the grep command to select desired fields from the inventory. Use of the "grep FIELDS" is similar to the procedure used when using wgrib/wgrib2 to extract fields. The "get_grib.pl GRIB_URL OUTPUT" uses the filtered inventory to select the fields from GRIB_URL to download. The selected fields are saved in OUTPUT.

Examples
get_inv.pl https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv | \
grep ":HGT:500 mb:" | \
get_grib.pl https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 out.grb
  
The above example can be written on one line without the back slashes. (Back slashes are the unix convention indicating the line is continued on the next line.) The example downloads the the 500 mb height from the 12 hour (f12) from the 00Z (t00z) GFS fcst from the NCEP NOMAD2 server.   
  
get_inv.pl https://nomad2.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv | \
egrep "(:HGT:500 mb:|:TMP:1000 mb:)" | \
get_grib.pl https://nomad2.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 out.grb
  
The above example is similar to the earlier example except it downloads both the 500 mb height and the 1000 mb temperature.
Warning: Metacharacters

In the beginning, you could filter the inventory with strings like

  egrep ":(UGRD|VGRD|TMP|HGT):(1000|500|200) mb:"
  egrep "(:UGRD:200 mb:|:TMP:2 m above ground:)"
First egrep was deprecated and was replaced by "grep -E". No big deal. Then someone decided to put egrep wildcards into the official level information. Imagine trying to do
  grep -E "(:UGRD:200 mb:|:HGT:PV=2e-06 (Km^2/kg/s) surface:)"
You see the problem. The HGT level field contains "(" and ")". To get rid of the special meaning of "(" and ")", they should be quote by \( and \). The caret "^" also has a special meaning and should be quoted too. The fixed line is
  grep -E "(:UGRD:200 mb:|:HGT:PV=2e-06 \(Km\^2/kg/s\) surface:)"
You should backquote all the regex metacharacters including
 
\,^,$,.,|,?,*,+,(,),[,],{,}
Sample Script

Here is an example of downloading a year of R2 data.

#!/bin/sh
# simple script to download 4x daily V winds at 10mb
# from the R2 archive

set -x
date=197901
enddate=197912
while [ $date -le $enddate ]
do
     url="https://nomad3.ncep.noaa.gov/pub/reanalysis-2/6hr/pgb/pgb.$date"
     get_inv.pl "${url}.inv" | grep ":VGRD:" | grep ":10 mb" | \
     get_grib.pl "${url}" pgb.$date
     date=$(($date + 1))
     if [ $(($date % 100)) -eq 13 ] ; then
         date=$(($date - 12 + 100));
     fi
done
Requirements
  1. perl
  2. grep
  3. cURL
  4. grib files and their wgrib inventory on an http server
  5. get_inv.pl
  6. get_grib.pl
Configuration (UNIX/Linux)
The first two lines of get_inv.pl and get_grib.pl need to be modified. The first line should point to your perl interpreter. The second line needs to point to the location of curl if it is not on your path.

Usage: Windows

There have been some reports that the perl scripts didn't work on Windows machines. The problem was solved by Alexander Ryan.
Hi Wesley,

thought this might be of some use to your win32 users.

I had the following problem when running the get_grib.pl file as per your instructions.

run this
grep ":UGRD:" < my_inv | get_grib.pl $URL ugrd.grb
and I would get the error No download! No matching grib fields. on further
investigation I found that it was just skipping the while STDIN part of the
code. a few google searches later and I found that for some strange reason in
the pipe I needed to specify the path or command for perl even though the file
associations for .pl are set up. (don't fiqure)

this works for me

grep ":UGRD:" < my_inv | PERL get_grib.pl $URL ugrd.grb

Regards and thanks for the fine service
Alexander Ryan


Another email from Alexander

Hi Wesley,
Further to my last email here are some details regarding the enviorment I run this all on for your referance.

My computer is P4 1.7GHz with 1Gb Ram running Windows 2000 service pack 4
Perl version :V5.6.1 provided by https://www.activestate.com
cUrl Version: 7.15.4 from https://curl.haxx.se/
grep & egrep: win32 versions of grep and egrep, I found both at https://unxutils.sourceforge.net who provide some useful ports of common GNU utilities to native Win32. (no cygwin required)

so far this is working fine

Regards Alexander



Apparently,
 
   get_inv.pl INV_URL | grep FIELDS | perl get_grib.pl URL OUTPUT
 
should work. Linux users probably will gravitate towards the cygwin system because it includes bash, an X-server, compilers and the usual unix tools.

Tips
If you want to download multiple fields, for example, precipitation and 2 meter temperature, you can type,  
 
     URL="https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.2006070312/gfs.t12z.pgrb2f00"
     get_inv.pl $URL.idx | egrep ':(PRATE|TMP:2 m above gnd):' | get_grib.pl $URL out
 
The above code will put the precipiation and 2-m temp in the file out. Of course, egrep understands regular expressions which is a very powerful feature.

If you are doing multiple downloads from the same file, you can save time by keeping a local copy of the inventory. For example,
 
     URL="https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.2006070312/gfs.t12z.pgrb2f00"
     get_inv.pl $URL.idx > my_inv
     grep ":UGRD:" < my_inv | get_grib.pl $URL ugrd.grb
     grep ":VGRD:" < my_inv | get_grib.pl $URL vgrd.grb
     grep ":TMP:" < my_inv | get_grib.pl $URL tmp.grb
 
The above code saves two extra downloads of the inventory.  

Some people have slow internet connections. A user was complaining about bad downloads. Turns out that the user was using a modem and cURL was "timing out". The user solved the problem by adding the following options to the cURL command "-y 30 -Y 30" which are found within get_inv.pl and get_grib.pl. The options tell curl to only "time out" when the download rate is less than 30 bytes per second for 30 seconds. Glad I don't have to use a modem.

Notes for Data Providers

The grib data needs to accessable be on an http server. Often this is a minor change in the httpd configuration.

The users will need a wgrib inventory (grib-1) or a wgrib2 inventory (grib-2). It is convenient if the inventory is in the same directory as the data files and uses the '.inv' suffix convention. The inventory can be created by,
 
     GRIB-1: wgrib -s grib_file > grib_file.inv  
     GRIB-2: wgrib2 -s grib_file > grib_file.inv
 

GRIB-2

Grib-2 has been supported since the summer of 2006.

Notes

In theory, curl allows random access to FTP servers but in practice we found this to be slow (each random access is its own FTP session). Support for the FTP access was dropped 2/2005 because we want data providers to use the faster http protocol.

Regional Subsetting

The need for regional subsetting grows as the grids get finer and finer. With grib2, it may be possible to do regional subsetting on the client side but that would be some tricky coding if possible. Right now, I am happy with the g2subset software that is running on the nomads servers. This server software is faster than the grib1 software (ftp2u/ftp4u) even with the overhead of the jpeg2000 decompression.


Created: 1/21/2005, modified 6/2017 information, modified 9/2020 to remove the news about https
comments: Wesley.Ebisuzaki@noaa.gov

NOAA/ National Weather Service
National Centers for Environmental Prediction
Climate Prediction Center
5830 University Research Court
College Park, Maryland 20740
Climate Prediction Center Web Team
Page last modified: November 5, 2002
Disclaimer Privacy Notice

Privacy Policy