Climate Prediction Center

www.nws.noaa.gov

About Us



Contact Us

HOME > Monitoring_and_Data > Oceanic and Atmospheric Data > Reanalysis: Atmospheric Data > wgrib2ms

wgrib2ms

Introduction

Wgrib2 was designed to be parallelized by what-may-be-called dataflow programming. Data flows into a black box and data flows out. One way to parallelize is to divide the data flow into N streams, process each stream separately and then recombine the streams at the end of the processing. Wgrib2ms parallelizes wgrib2 this way. The limitation of this parallelization is that it uses pipes and is limited by pipe speed, disk speed, the number of CPUs on a node/cpu and the overhead of setting up and running the parallel job. Pipe speed can be increased by increasing the pipe buffer size (linux kernel 2.6.35+).

Wgrib2ms parallelizes a wgrib2 command by dividing the data flow into N streams which are processed independently. Only a limited number of output options are supported. Note that the inventory from wgrib2ms is in a different order than the inventory from a wgrib2 command. Each grid (submessage) is processed by send it to one of the N streams. Since -new_grid requires that vector fields to be processed in order, this division of labor is incompatible with -new_grid. Any wgrib2 option that requires an order of processing is incmopatible with wgrib2ms.

wgrib2 output options supported by wgrib2m

-grib
-grib_out
-ijsmall_grib
-new_grid
all other output options should not be used

wgrib2ms restrictions on the output options

Each output option must write to a different file
Each output option must write to the output file for every record processed.
You can use the -match option because -match selects the record prior to processing
You cannot use -if to select the record to be output (see restriction 2)
Output options can only write grib (ex. -netcdf, -cvs are not allowed)

wgrib2 reading options supported by wgrib2ms

processing a regular grib file (not a pipe)
-i (reading inventory from stdin) added v1.1
-import will cause problems

wgrib2 options that work differently in wgrib2ms

Some options still work but may behave differently in wgrib2ms. Since the processing is split in to N streams, each copy of wgrib2 will not see all the records. For example, you may want to calculate the 1000mb-500mb thickness. If one copy of wgrib2 gets the 1000 mb Z and other one gets the 500 mb Z, then you can't calculate the thinkness. This will affect

-rpn
-import

Usage

wgrib2ms N (wgrib2 subset options)
  for N > 1, execute wgrib2 (wgrib2 subset options) in N streams
  for N < -1, produces script running -N streams

v1.1+
  grep ":HGT:" nam.idx | wgrib2ms 3 -i nam.grb2 -set_grib_type c3 -grib_out HGT.c3

Example

wgrib2ms 4 IN.grb -set_grib_type c3 -new_grid_winds -new_grid ncep grid 221 out22.grb -new_grid ncep grep 3 out3.grb

Observations

Using Centos 6.4 on a FX 8320 (8 core), there was little speed up with N > 4 when using 1 MB grib messages. Using grib messages < 64KB (pipe buffer size), the processing scaled better with the number of streams. The program, gmerge, should be written to be multi-threading.

Code location: https://www.ftp.cpc.ncep.noaa.gov/wd51we/wgrib2_aux_progs/wgrib2m