Climate Prediction Center - alt+g2ctl+gmp home page

www.nws.noaa.gov

About Us



Contact Us

HOME > Monitoring_and_Data > Oceanic and Atmospheric Data > Reanalysis: Atmospheric Data > alt_g2ctl and alt_gmp

alt_g2ctl, alt_gmp

Alt_g2ctl and alt_gmp are alternatives to g2ctl and gribmap which are two programs that create control and index files so that GrADS can read grib2 files. Why do you need alternatives? Well grib has evolved. At one time, a variable name, level and time was enough to uniquely identify a field. Those days are gone. The field MASSDEN (mass density) can have several modifiers such as type of aerosol, size of aerosol, and chemical composition. You can add other modifiers such as ensemble information, and analysis/forecast/analysis error. Specifying the field by a fixed-format string of numbers because difficult. An alternative is to store the field identifier as a free-format text string.

The alt programs use the wgrib2 extended variable name, time stamp and level to create the field identifier. This creates a flexable method for identifying the field in the ctl file. Alt_g2ctl is a variant of perl program, g2ctl. Alt_gmp replace gribamp is a simpler program (628 vs. 2238 lines) because it generates the index file from wgrib2 inventories rather than grib files.

alt_g2ctl and alt_gmp are alternative g2ctl and gribmap programs. alt_g2ctl takes grib2 files and creates GrADS control files. alt_gmp takes the control files generated by alt_g2ctl and creates GraDS index files. The alt control files are compatible with GrADS but not gribmap. Here is the workflow.

     std:   g2ctl (1) -> gribmap (3) -> GrADS
     alt:   alt_g2ctl (2)-> alt_gmp (3) -> GrADS

     (1) std-format grib2 control file format
     (2) alt-format grib2 control file format, compatible with GrADS
     (3) std format index file

Speed and BIG datasets

The current gribmap is single threaded. You paid all that money for more cores and you are stuck using one core. Alt_gmp will take advantage of the extra cores. Won't make any difference with a non-templated data set but you'll notice the difference with a TB data set. For example, you have a templated data set of 40 years of daily files (40 * 365.25 = 14610). Gribmap will use one thread to read all the 14610 files. You can run alt_gmp to have N threads and each thread will read 14610/N files.

Another problem with making index files of big datasets, is that if you update one file (example add the current month to an archive), gribmap forces you to reread the entire data set with one thread. (Note: GrADS 2.1.a2 fixes this problem with an update mode.) Alt_gmp has a threaded update mode. This update mode works by saving the scan in a tiny file (compressed wgrib2 inventory).

The New Design

The alt gribmap (alt_gmp) is a completely new implementation of the gribmap program. It is uses the more flexible alt-grib2 control format. There are no options for the alt_gmp command. All the options used in alt_g2ctl are embedded in the the ctl file and used by alt_gmp. This avoids the common error when g2clt/gribmap use different options. (For example, the ctl file was made with the -O option, the idx file will also be made with the -0 option.) Alt_gmp is also multitasked (user selectable) and has an update option. The latter two features are important when working with large grib2 datasets. The design of alt_gmp is different from gribmap. Gribmap is C code that reads the ctl and grib files to create an index file. Alt_gmp is a perl script that

(1) scans the ctl file (perl is great for text scanning)
(2) finds the grib files that need to be read (file1..filen)
(3) do file = file1 .. filen (this loop is threaded)
(4) runs wgrib2 on the file to create an inventory
(5) reads the wgrib2 inventory to updates the index
(6) enddo
(5) writes out the index

In the update mode, a small wgrib2 inventory file is written for each grib file. In future runs of alt_gmp, the small inventory file, if present, is read instead of the much larger grib2 file. This save considerable amount of time for large datasets.

alt_g2ctl is modification of g2ctl. It remains a perl script that uses wgrib2 to query the files.

The Good

big (templated) data sets: multitasking (linux and unix) vs single task with gribmap
big (templated) data sets: fast update mode
grib2 support: more flexible
grib2 support: can support variants that g2ctl may not support
grib2 support: easier to support variants that g2ctl doesn't (yet) support
ensemble dimension not supported yet
alt_g2ctl and alt_gmp are more integrated. Options for alt_g2ctl are saved and passed to alt_gmp.
code is simpler and easier to maintain (IMHO)

The Bad

immature: 2GB+ files not supported (is there a need?)
ensemble dimension not supported yet
immature: many small features are not supported yet
limitations of g2ctl still remain in alt_g2ctl

You may notice that "ensemble dimension not supported yet" is both a Good and a Bad. A former boss wanted to have a dataset of ensemble forecasts. There were multiple starting times and multiple forecast hours. With the alt system, he has able to use the time dimension to refer to the valid time and special names to indicate the ensemble and forecast hour. The ctl file had to be hand modified but it worked. In another case, a user had a time series of 120 hour forecasts by the control forecast. Since forecasts had ensemble information (control), it needed an EDEF section. However, the starting time of the control forecast was not fixed. It could be handled by alt_gmp but not gribmap.

Suggestions

I (WNE) use both alt_g2ctl and g2ctl for my work. For ensembles and large (2GB+) files, I use g2ctl/gribmap. For large reanalysis data sets, I use alt_g2ctl/alt_gmp. For newer grib2 product definition templates, you may have to use alt_g2ctl. For simple stuff, there is no big difference between g2ctl and alt_g2ctl.

Status

Was released early 2013.
5/2021: still doesn't support 2GB+ files and the ensemble dimension
5/2021: supports some grib files that g2ctl does not
5/2021: on new machines, I tend to only install alt_g2ctl. YMMV

Instructions: alt_gmp


     alt_gmp  (-v) (-v) (-v) (-i) FILE.CTL


     -i FILE.CTL   identifies the control file
                   -i is optional (v 0.0.3)

     -v           set verbosity level = 1
     -v -v        set verbosity level = 2
     -v -v -v     set verbosity level = 3

      Comments: No options are allowed.  The options like -update, -b, -0
                are embedded in the .ctl file by alt_g2ctl.

Instructions: alt_g2ctl

alt_g2ctl [list of options] TEMPLATE [INDEX] > FILE.CTL

TEMPLATE The TEMPLATE may include an optional directory
TEMPLATE may contain template wildcards: %y4, %y2, %m2,
%d2, %h2, %n2, %f2, %f3
alt_g2ctl only understands a fraction of the possible template
possibiilities. The template wild cards can only be in the
filename. The chronological order of the files must match
the sorted namelist. This restricts the order of the template
wildcards

If your filename and directory structure do not match this,
make the CTL file for a single file. Then add the templates,
and adjust the TDEF statement. This works because alt_gmp
is not restricted to a sorted namelist == chronological order.

INDEX the INDEX file name is optional, if not provided, the name of
the index file will be generated

Options

-0 .. use analysis time
same as g2ctl/gribmap
-0t .. use analysis time + fhour
-b .. use use start of ave/acc period or fcst time
same as g2ctl/gribmap
-bt .. use use start of ave/acc period or fcst time + fhour
-e .. use use end of ave/acc period or fcst time (default)
-et .. use use end of ave/acc period or fcst time + fhour
-update .. alt_gmp will be in fast update mode
-nthreads N .. number of threads used by alt_gmp (default=1)
-wgrib2 EXE .. replace wgrib2 by EXE
-short .. remove comments and shorten variables names
-match X .. only match X (regex) from wgrib2 inv, can be repeated
-not X .. not X (regex) from wgrib2 inv, can be repeated
-prs .. pressure (mb) vertical coordinates (default)
-iso .. pot temp (K) vertical coordinates
-dsl .. below sea level (m) vertical coordinates
-bsl .. below sea level (m) vertical coordinates (obsolete)
-sig .. sigma (0..1) vertical coordinates
-no_profile .. no vertical coordinates
-365 .. 365 day calendar
-ts[timestep] .. set timestep for individual time files (e.g. -ts6hr)
-lc .. set lowercase option for parameter names
-pdef_linear .. linear interpolation for thinned grids
-raw .. raw grid

Note 1: the index file will be generated by the alt_gmp program, default: grib_file.idx
Note 2: the pdef file is only generated for thinned lat-lon grids, default: grib_file.pdef
Note 3: template options supported: %y4 %y2 %m2 %d2 %h2 %n2 %f2 %f3
Note 4: Pre 5/2021: -match, -not only apply to generation of ctl file
could not handle aliasing of fields
5/2021+: -match, -not apply to generation of ctl and idx file
needed when aliasing of fields

The number of threads used to scan the grib2 files is set by the option -nthreads. For a single lightly used local disk, setting nthreads to one is the fastest as it minimizes disk head movements. For some systems, NFS is the speed limiting factor. By setting nthreads larger than one, you can improve the speed of NFS for loading multiple files. Finally some filesystems have huge bandwidths and the limiting factor is the number of threads. Note that the nthreads option only applies to templated data sets.

Variable names are too long

GrADS allows up to 15 characters for the variable names. Alt_g2ctl uses the wgrib2 extended name as the variable name. Unfortunately with extended name can get quite long. Here is the var section from a ctl file.

vars 3
APCPdpercentile_from_climate_distributionsfc 0 0 "APCP.percentile_from_climate_distribution:
   surface" * APCP.percentile_from_climate_distribution:surface
TMPdpercentile_from_climate_distribution2m 0 0 "TMP.percentile_from_climate_distribution:2 m
   above ground" * TMP.percentile_from_climate_distribution:2 m above ground
WINDdpercentile_from_climate_distribution10m 0 0 "WIND.percentile_from_climate_distribution:
   10 m above ground" * WIND.percentile_from_climate_distribution:10 m above ground
endvars

In this example, the variable name exceeds 15 characters so GrADS shortens the name into 15 characters which may cause the variable names to overlap. In this example, the total length of the line may also cause problems for GrADS. The solution is the use the -short option in alt_g2ctl. The option replaces the variable name by "v(integer)" and elminates the comment field. Using the same data file, the var section becomes,

vars 3
v1 0 0 "APCP.percentile_from_climate_distribution:surface"
v2 0 0 "TMP.percentile_from_climate_distribution:2 m above ground"
v3 0 0 "WIND.percentile_from_climate_distribution:10 m above ground"
endvars

From GrADS, you can see the variable definitions by "q file".

When you use the -short option, there is a constant need to do a "q file", and search the results to find the variable name. When the number of variables gets large, you waste time trying to find the variable name. So you can create a ctl file with only a subset of the variables. You can use "-match X" and "-not X" options. "X" can be a regular expression, and the the -match and -not options can be repeated.

Multiple fields with the same ctl name

Consider the fallowing grib file, the fields are the temperataure, mininum temperature (0-3 hour forecast), and maximum temperature (0-3 hour forecast). If you make the default control file, you only get one ctl name.

-sh-4.2$ wgrib2 tmp.grb 
1:0:d=2020010815:TMP:2 m above ground:anl:ens mean
2:147764:d=2020010815:TMP:2 m above ground:0-3 hour min fcst:ens mean
3:295134:d=2020010815:TMP:2 m above ground:0-3 hour max fcst:ens mean

-sh-4.2$ alt_g2ctl tmp.grb 
...
vars 1
TMPdens_mean2m 0 0 "TMP.ens_mean:2 m above ground" * TMP.ens_mean:2 m above ground
endvars

The difference in the three fields is in the forecast time stamp. To get three ctl names, you need to use the option -0t

-sh-4.2$ alt_g2ctl -0t tmp.grb
...
vars 3
TMPdens_mean03hourmaxfcst2m 0 0 "TMP.ens_mean:0-3 hour max fcst:2 m above ground" * TMP.ens_mean:0-3 hour max fcst:2 m above ground
TMPdens_mean03hourminfcst2m 0 0 "TMP.ens_mean:0-3 hour min fcst:2 m above ground" * TMP.ens_mean:0-3 hour min fcst:2 m above ground
TMPdens_meananl2m 0 0 "TMP.ens_mean:anl:2 m above ground" * TMP.ens_mean:anl:2 m above ground
endvars

The -0t option solves most of the problems with multple fields that have one ctl name. An exception is for time series of monthly means.

-sh-4.2$ wgrib2  mon_tmp_202012.grb 
1:0:d=2020120100:TMP:2 m above ground:248@3 hour ave(0-3 hour max fcst),missing=0:ens mean
2:138135:d=2020120100:TMP:2 m above ground:248@3 hour ave(0-3 hour min fcst),missing=0:ens mean
3:276861:d=2020120100:TMP:2 m above ground:248@3 hour ave(anl),missing=0:ens mean

For monthly means, the inventory differs depending on the number of days in the month. (248@3 hour ave(anl) will be 240@3 hour ave(anl) for a 30 day month) As a result, alt_gmp will consider the fields to be different and will not make them a time series. Therefore, the -0t option will not work. To display the 3 fields, you have to make three control files.

$ alt_g2ctl -0 -not ' (min|max) ' mon_tmp_%y4%m2.grb flx_anl.idx >flx_anl.ctl
$ alt_g2ctl -0 -match ' max ' mon_tmp_%y4%m2.grb flx_max.idx >flx_max.ctl
$ alt_g2ctl -0 -match ' min ' mon_tmp_%y4%m2.grb flx_min.idx >flx_min.ctl

Note: the index file has to be specified, otherwise the same index file will be
used causing a mid-summer night's ordeal.

$ alt_gmp flx_anl.ctl
$ alt_gmp flx_max.ctl
$ alt_gmp flx_min.ctl

WARNING: you have to delete pre-existing *.invd??.gz files in same directory with the grib files
WARNING: you cannot use the update option because the ctl files use the same *.gz files
WARNING: you have to use alt_gmp 0.0.8+ (May 2021+)

The some cases, you will have to use the old technique of taking the original
grib file and splitting them into pieces. For example, the min fields could be
written to a "min" grib fiie, as well as the max fields could be written
to a "max" grib file. The third grib file would have all the fields except the
min and max fields. This technique had to be used with a flux file that had
both 3-hour average instantaneous latent heat fluxes. The -match and -not
technique didn't work because we want the ctl file to contain the 3-hour fluxes
and instantaneous values of soil moisture, snow depth, 2 m temperature, etc.
In this case, 4 grib files were created: min fields, max fields, instantaneous
fields that duplicated time averaged fields, and everthing else. The
min and max fields could have been included with the "everything else" file;
however, that could cause problems with some difficulties with some
grib programs.

What are the *.invd??.gz files

The *.invd??.gz files are gzipped inventories of the grib files. Alt_gmp
will read the inventories in preference to the grib files. This increases
the speed of alt_gmp many-fold. If you change a grib file, you have to delete
the corresponding *invd*gz file too. The number in the file name corresponds
to the type of inventory being made. For example, if you use the analysis time
time instead of the forecast or verification time, then you get two
different types of inventories.
Since the inventories can change with the version of wgrib2, you may need
to remake the *invd*gz if you adopt a new version of wgrib2.

Principles of Operation

GrADS defines the varname field in the ctl file as

http://cola.gmu.edu/grads/gadoc/descriptorfile.html#VARS

The format of the variable records is as follows:

varname levs units description v2.0.1 or earlier
varname levs units description v2.0.2 or later

G2ctl and gribmap use the newer format. Alt_g2ctl and alt_gmp use this
format,

varname levs "wgrib2 inventory fragment" units description alt_g2ctl/alt_gmp

GrADS interprets this variant as the old format with a long units description.

One problem with g2ctl/gribmap is that you have to use the same flags such as "-0"
for both programs. Occasionally you have to refresh the ctl/idx files and you
may have forgotten which flag you should use. Alt_g2ctl/alt_gmp solves the problem by
writing/reading the flags in the ctl. Additionally the type of wgrib2 inventory
is saved in the ctl file as well as the name of the wgrib2 inventories.
* alt_gmp options: update=0
* alt_gmp options: nthreads=1
* alt_gmp options: big=0
* wgrib2 inventory flags: -npts -set_ext_name 1 -end_FT -ext_name -lev
* wgrib2 inv suffix: .invd01

Alt_g2ctl is based on g2ctl. However it uses wgrib2 to generate
a "inventory fragment" to identify the gribmessage instead of a sequence
of numbers.
Alt_gmp and gribmap read the ctl file and create an table of the
indices of the gribmessages. Alt_gmp is a perl script and is not based
on gribmap. Alt_gmp doesn't handle 2G+ files because I had problems
reverse engineering the format of the index table. Alt_gmp calls
wgrib2 to generate wgrib2 inventory. Fields are identified by
matching the wgrib2-inventory-fragments to the wgrib2 inventory.

Code

by https
You need to use a recent version of wgrib2.

Comments: Wesley.Ebisuzaki@noaa.gov


NOAA/ National Weather Service National Centers for Environmental Prediction Climate Prediction Center 5830 University Research Court College Park, Maryland 20740 Climate Prediction Center Web Team Page modified: Jan 2018, May 2021. Jan 2023	Disclaimer	Privacy Policy