wgrib2ms
Introduction
Wgrib2 was designed to be parallelized by what-may-be-called dataflow programming.
Data flows into a black box and data flows out. One way to parallelize is to
divide the data flow into N streams, process each stream separately and then recombine
the streams at the end of the processing. Wgrib2ms parallelizes wgrib2 this way.
The limitation of this parallelization is that it uses pipes and is
limited by pipe speed, disk speed, the number of CPUs on a node/cpu and the
overhead of setting up and running the parallel job. Pipe speed can be increased
by increasing the pipe buffer size (linux kernel 2.6.35+).
Wgrib2ms parallelizes a wgrib2 command by dividing the data flow into N
streams which are processed independently. Only a limited number of output
options are supported. Note that the inventory from wgrib2ms is in a different
order than the inventory from a wgrib2 command. Each grid (submessage) is processed
by send it to one of the N streams. Since -new_grid requires that vector
fields to be processed in order, this division of labor is incompatible with -new_grid.
Any wgrib2 option that requires an order of processing is incmopatible with wgrib2ms.
wgrib2 output options supported by wgrib2m
- -grib
- -grib_out
- -ijsmall_grib
- -new_grid
- all other output options should not be used
wgrib2ms restrictions on the output options
- Each output option must write to a different file
- Each output option must write to the output file for every record processed.
- You can use the -match option because -match selects the record prior to processing
- You cannot use -if to select the record to be output (see restriction 2)
- Output options can only write grib (ex. -netcdf, -cvs are not allowed)
wgrib2 reading options supported by wgrib2ms
- processing a regular grib file (not a pipe)
- -i (reading inventory from stdin) added v1.1
- -import will cause problems
wgrib2 options that work differently in wgrib2ms
Some options still work but may behave differently in wgrib2ms.
Since the processing is split in to N streams, each copy of
wgrib2 will not see all the records. For example, you
may want to calculate the 1000mb-500mb thickness. If one
copy of wgrib2 gets the 1000 mb Z and other one gets the 500 mb Z,
then you can't calculate the thinkness. This will affect
- -rpn
- -import
Usage
wgrib2ms N (wgrib2 subset options)
for N > 1, execute wgrib2 (wgrib2 subset options) in N streams
for N < -1, produces script running -N streams
v1.1+
grep ":HGT:" nam.idx | wgrib2ms 3 -i nam.grb2 -set_grib_type c3 -grib_out HGT.c3
Example
wgrib2ms 4 IN.grb -set_grib_type c3 -new_grid_winds -new_grid ncep grid 221 out22.grb -new_grid ncep grep 3 out3.grb
Observations
Using Centos 6.4 on a FX 8320 (8 core), there was little speed up with N > 4 when
using 1 MB grib messages. Using grib messages < 64KB (pipe buffer size), the
processing scaled better with the number of streams. The program, gmerge,
should be written to be multi-threading.
Code location: https://www.ftp.cpc.ncep.noaa.gov/wd51we/wgrib2_aux_progs/wgrib2m
See also:
wgrib2m
|