Sophie

Sophie

distrib > Fedora > 18 > i386 > by-pkgid > dcaf9bd555d1ce386641f56c6523d3ed > files > 46

grads-2.0.2-1.fc18.i686.rpm

<!--Copyright (C) 1988-2005 by the Institute of Global Environment and Society (IGES). See file COPYRIGHT for more information.-->

<html>
<head>
<title>The Ensemble Dimension</title>
<link href="../../assets/NewIGES.css" rel="stylesheet" type="text/css">
<style type="text/css">
<!--
.style2 {color: #990000}
-->
</style>
</head>
<body
 text="#000000">


<h2>The Ensemble Dimension</h2>
<h4>
  <a href="#intro">Ensemble Handling in GrADS</a><br>
  <a href="#ddf">The EDEF entry in a descriptor file </a><br>
  <a href="#data">How to organize the data</a><br>
  <a href="#example1">Example #1: Lag Ensembles</a><br>
  <a href="#example2">Example #2: Retrospective Daily Hindcasts from CFS</a><br>
  <a href="#example3">Example #3: Ensembles with Different Lengths and Start Times in Binary Format</a><br>
</h4>
<hr>

<h3><a name="intro"></a>Ensemble Handling in GrADS</h3>
<p> GrADS version 2.0 supports a fifth dimension for gridded data sets. This extra dimension has been implemented in a general way, but has been optimized for  use with ensembles. Thus the name of the dimension is E or &quot;ens&quot;. A good way to illustrate how the E dimension has been implemented in GrADS is through a series of plots using real data. 
<p>Let's begin with  a 16-day forecast initialized 1 January. This is a 4-Dimensional data set that varies in space (X, Y, and Z) and time (T). To create the following display we fixed the latitude, longitude, and level dimensions and  set time  to span the entire forecast; the 1-D plot below shows  the time series of predicted data values (16-day forecast of 500mb height at 45N, 60W):
<p><img src="edemo1.png" alt="edemo1">
<p>Next, suppose you rerun the same forecast 20 more times, tweaking the initial conditions a little bit for each run -- you've created a 21-member ensemble forecast. All the ensemble members  have the same length and start time. A classic technique for illustrating variability among ensemble members is to draw a &quot;spaghetti&quot; plot -- one contour line for each member. The following display shows a spaghetti plot for the same location and vertical level as drawn above, with one line for each ensemble member  (16-day forecast of 500mb height at 45N, 60W from 21 ensemble members):
<p><img src="edemo2.png" alt="edemo2" width="800" height="287">
<p>Using GrADS 2.0 and the ensemble dimension, we do not need to consider the ensemble members as separate data sets -- we can group them together and treat them as a single 5-dimensional data set. Thus we can display the  same data  in the spaghetti plot as a 2-D grid: Time on the X-axis vs. Ensemble Member on the Y-axis. Each contour line in the plot above becomes a row in the grid below, with grid boxes colored according to data values. 

<p><img src="edemo3.png" alt="edemo3" width="800" height="328">
<p>Because GrADS handles the ensemble members  as a single grid, we can perform some calculations over the ensemble dimension. The GrADS analysis functions operate on E the same way they do on X, Y, Z, and T. Below is yet another version of the same data in the spaghetti plot; this time the display shows the ensemble mean (the red line), the ensemble mean +/- one  standard deviation  (green bars), and the minimum and maximum values over all members (blue whiskers):
<p><img src="edemo4.png" alt="edemo4" width="799" height="294">

<h3><a name="ddf"></a>The EDEF entry in a descriptor file</h3>
<p>The first step in creating a  descriptor file for a 5-D gridded data set is to add an entry to describe the ensemble axis (E) using the keyword <a href="descriptorfile.html#EDEF">EDEF</a>. The E-axis is always linear, and there is no &quot;world coordinate&quot; equivalent for ensembles. An ensemble member is called by its grid index (1, 2, etc.) or its name. The ensemble names are aliases for the grid indices, but are also used as the substitution string  when file <a href="templates.html">templating</a> is in use. Ensemble names must be 15 characters or less, and contain only lower case alphanumeric characters.  (<span class="style2">In version 2.0.0 and later, mixed case ensemble names are allowed</span>). The grid for each ensemble member must be identical in X, Y, and Z, and all members must have the same list of variables. The time axis is a bit more flexible -- ensemble members may have different start times and different lengths, but they must all share the same time increment. The axis described in the TDEF entry is  an envelope that spans the time ranges of all ensemble members.
<p>There are two different syntaxes for the EDEF entry: compact and expanded. The compact syntax   is simpler and contains  the number of ensembles, the &quot;names&quot; keyword, and a space-delimted list of the names for each ensemble member. The expanded syntax  is a collection of records framed by EDEF and ENDEDEF; each record contains a name, individual time axis information, and  GRIB2 codes (if they are required).
If all of the ensemble members have an identical time axis (i.e. length, initial time, and increment are the same for each one), and the data format is <em>not</em> GRIB2, then the members are distinguished only by their names, and the compact EDEF syntax may be used. For example:
<code><p>EDEF 6 names e1 e2 e3 e4 e5 e6 <br>
EDEF 21 names cntrl p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 n0 n1 n2 n3 n4 n5 n6 n7 n8 n9</p></code>
<p>If the ensemble members do not have identical time axes (i.e., their lengths or initial times are not the same), 
or if you need to include the GRIB2 codes, then you must use the expanded EDEF syntax.  Individual ensemble records contain the ensemble name, its length, and initial time. If the data are in GRIB2 format, then some additional comma-delimited codes are added following the initial time. (See the <a href="descriptorfile.html#EDEF">EDEF reference page</a> for more details.) <a href="ensembles.html#example2">Example #2</a> at the  bottom of this page shows the TDEF and EDEF entries for a set of  forecasts where members were initialized at staggered start times.</p>
<code></code>
<p>The EDEF entry is only required in a descriptor file if the data set varies in the E dimension.  A 4-D data set does not require an &quot;EDEF 1 names 1&quot; entry. If EDEF is omitted, GrADS will know the data set is 4-D, and will set up a default  E axis for internal use only. This is especially important if the descriptor file will be used to serve the data set via the GrADS Data Server. </p>
<p>To query the ensemble metadata once the file is opened, use the &quot;<code><a href="gradcomdqens.html">q ens</a></code>&quot; command. </p>
<p>&nbsp;</p>
<h3><a name="data"></a>How to organize the data</h3>
<p>As you create the descriptor file  for your 5-D ensemble data set, you must also consider how to organize the data. It is possible to have all  data in one file, but it is more likely that a data set will be an aggretation of separate files, organized using <a href="templates.html">templates</a>. GrADS supports templating over the T and E dimensions, but there are some limitations on templating that depend on the data format. Note that the substitution strings for templating on T may be separated into several pieces (e.g. %y4, %m2, et al.) and can appear in different locations in the filename string in the DSET entry. For ensembles, the  sole substitution string (%e) is the ensemble name (provided in the EDEF entry), which is limited to 15 characters. If you are templating over the ensemble dimension,  there can be only one ensemble member per file.
Some additional considerations for constructing and managing ensemble data sets based on data format are given below. 
<p><strong>Binary  Format:</strong> The structure of a multi-dimensional binary data set is determined by the order in which the horizonal grids are written to file. The building blocks are stacked in a sequence according to dimension. The sequence goes in the following order starting from the fastest varying dimension to the slowest varying dimension: longitude (X), latitude (Y), vertical level (Z), variable (VAR), time (T), and ensemble (E).
5-D ensemble data sets are created by concatenating 4-D data sets together -- the ensemble dimension varies outside of all the others. If the data format is binary and file templating is used for the time dimension, then file templating for the ensemble dimension must also be used. If the data format is binary and files are templated together <em>only over the ensemble dimension</em>, then the entire time series for each member must be contained in the individual data files;  if the members have different lengths and start times, they must be padded with missing values so that the individual data files for each member are the same size. You can avoid padding your data files with missing data by using file templating over both the time and ensemble dimensions. Please see <a href="#example3">Example #3</a> below. 
<p><strong>GRIB  Format: </strong> The structure of a multi-dimensional GRIB data set is determined by the  axis and variable declarations in the descriptor file; this information is contained in the  index file created by the <a href="gradutilgribmap.html">gribmap</a> utility. The order in which the horizontal grids (records) are written to file is not as critical as it is for binary data. The GRIB2 format has an expanded set of header fields for ensemble metadata, so  two records that are from different ensembles but  otherwise identical may be distinguished. This is not the case for GRIB1. File templating on T and E in any combination is supported for GRIB2. If you are using the ensemble dimension with GRIB1, then templating for the ensemble dimension must be used and the data for each ensemble member must be in a separate  file.
<p><strong>Self-Describing File Format:</strong> The structure of a multi-dimensional data set in NetCDF or HDF-SDS format is determined by the way the coordinate dimensions in the self-describing file are matched to the 5 grid dimensions in GrADS. This matching may be  accomplished in three ways: 
<ol>
<li>Use the <a href="gradcomdsdfopen.html">sdfopen</a> command -- GrADS uses only the metadata in the self-describing file to determine how to match  the coordinate dimensions in the file with longitude, latitude, level, time, and ensemble. You can use the <a href="gradcomdxdfopen.html">sdfopen</a> command  for 5-D ensemble data sets   served by the GrADS Data Server (GDS) -- the ensemble metadata  are tailored specifically for GrADS so that <a href="gradcomdsdfopen.html">sdfopen</a> will work properly. <br>
<li>Use the <a href="gradcomdxdfopen.html">xdfopen</a> command -- GrADS  needs some external metadata to supplement or replace what is in the  self-describing file in order to match the coordinate dimensions. This external metadata is provided in a special descriptor file with a syntax especially for the <a href="gradcomdxdfopen.html">xdfopen</a> command. In an xdfopen-style descriptor, there is support for three variations on the compact syntax of the EDEF entry:<br>
<code>edef &lt;<i>SDF_dimension_name</i>&gt; <br>
edef &lt;<i>SDF_dimension_name</i>&gt; &lt;<em>size</em>&gt;<br>

edef &lt;<i>SDF_dimension_name</i>&gt; &lt;<em>size</em>&gt; names &lt;<em>list of names</em>&gt; </code><br>
Note these variations are different from the compact syntax for EDEF in a complete descriptor file because they include the name of the coordinate variable as defined in the data file. <br>
<li>Use the <a href="gradcomdopen.html">open</a> command -- GrADS uses a <a href="SDFdescriptorfile.html">complete descriptor file</a> that contains all the metadata it needs to map the coordinate dimensions. Metadata in the data file is ignored, and the mapping of variable dimensions to grid dimensions is accomplished through the units field of the variable declaration. </ol>
<p>Note that file templating on T and E in any combination is supported for the NetCDF and HDF-SDS formats (as of version 2.0.a5). </p>
<p>&nbsp;</p>
<h3><a name="example1"></a>Example #1: <strong>Creating a Lag Ensemble Data Set</strong
>
</h3>
<p> A lag ensemble is a collection of forecasts with different initialization times. It differs from the ensemble forecast described above because 
each member spans a  different (shifted) time range. If you consider the 24-hour geopotential height forecast in each member of a lag ensemble data set, the fields will not be valid at the same time. Similarly if you consider  the height fields from all members at a fixed valid time, each member will have a different lead time (offset from the initial time). Below is an illustration of a lag ensemble data set. The graphic is related to the example above because it shows similar 16-day forecasts of 500mb height at 45N, 60W, but in this case the 15 ensemble members were initialized at successive 12-hour intervals:
<p><img src="edemo5.png" alt="edemo5">
<p>If you want to create an ensemble data set using GRIB forecasts with varying initial times, and the data files are GRIB1 or GRIB2 without any ensemble metadata (i.e., Product Definition Template 0 or 8), then you must use file templating over E and your ensemble names must appear somewhere in your data file names. 
  It is not adequate to  use only %iy %im %id %ih and %f3 in the DSET entry and the expanded form of EDEF to indicate the different start times of each ensemble member. This would uniquely match a file name for each time and member, but only for the special case of each member having a different initial time. If you had some ensemble members with the same start time, then the time metadata in the grib headers would be identical and there would be no way to distinguish the members.  Omitting the %e in the DSET entry implicitly assumes that all members have identical time axes, and that all members are packed into one file for a given time. 
<p>
A convenient way to set up a lag ensemble data set is to create symbolic links for the ensemble names that point to each directory containing a single forecast. Suppose  you have a directory structure based on the YYYYMMDDHH of the forecast initialization time:
<p>  <code> 
  ./2009010100/gfs.*.grb2<br>
  ./2009010112/gfs.*.grb2<br>
  ./2009010200/gfs.*.grb2</code>
  
<p>Create a set of symbolic links that associate an ensemble name with each directory:</p>
<p>  <code>
  ./e1 -&gt; ./2009010100<br>
  ./e2 -&gt; ./2009010112<br>
  ./e3 -&gt; ./2009010200</code></p>
<p>And the descriptor file would contain the following entries:
<p><code>DSET ^./%e/gfs.%iy4%im2%id2%ih2.f%f3.grb2<br>
  ...<br>
  TDEF 69 linear 00z1jan2009 6hr<br>
  EDEF 3<br>
  e1  65  00z1jan2009<br>
  e2  65  12z1jan2009<br>
  e3  65  00z2jan2009<br>
  ENDEDEF</code>
<p>&nbsp;</p>
<h3><a name="example2"></a>Example #2: Retrospective daily hindcasts from <a href="http://cfs.ncep.noaa.gov/">NCEP Climate Forecast System (CFS) </a></h3>
<p>The CFS daily hindcasts are an example of an ensemble data set with members that have different start times and different lengths. The hindcast  members are nominally 9 months long, with  unevenly staggered start times and identical end times. The following graphic shows  CFS forecasts of 500mb height at 45N, 60W, illustrating the  temporal coverage of the 15 ensemble members. A complete descriptor file for this data set is also provided.</p>
<p><img src="edemo6.png" alt="edemo6" width="800" height="283"></p>
<p><code>dset ^z500.%e.feb.2000.cfs.data<br>
  title 5D NCEP CFS Ensemble Hindcast Initialized February 2000 2.5 degree/12-hourly grid<br>
  dtype grib<br>
  index ^z500.feb.2000.cfs.map<br>
  undef 9.999e+20<br>
  options yrev template<br>
  xdef 144 linear   0 2.5<br>
  ydef  73 linear -90 2.5<br>
  zdef   1 levels 1<br>
  tdef 593 linear 12z09jan2000 12hr<br>
  edef 15<br>
  m01 593 12z09jan2000 <br>
  m02 591 12z10jan2000 <br>
  m03 589 12z11jan2000 <br>
  m04 587 12z12jan2000 <br>
  m05 585 12z13jan2000 <br>
  m06 573 12z19jan2000 <br>
  m07 571 12z20jan2000 <br>
  m08 569 12z21jan2000 <br>
  m09 567 12z22jan2000 <br>
  m10 565 12z23jan2000 <br>
  m11 551 12z30jan2000 <br>
  m12 549 12z31jan2000 <br>
  m13 547 12z01feb2000 <br>
  m14 545 12z02feb2000 <br>
  m15 543 12z03feb2000 <br>
  endedef<br>
  vars 1<br>
  z500    0  7,100,500   500mb Geopotential height [gpm]<br>
endvars<br>
@ z500 String units gpm</code></p>
<p>&nbsp;</p>
<h3><a name="example3" id="example3"></a>Example #3: Ensembles with different lengths and start times in binary format</h3>
<p>This example shows how you set up a data set with ensembles of different lengths and start times in binary format. There are 6 members, spanning a period of 20 years. The figure below illustrates the coverage in time of each member. Below the figure is the data descriptor file. Note that if this data set was only templated over E and not  templated over T, then the binary file for each member would have to be padded with missing values so that the data file for each member was the same size, spanning the entire time axis.</p>
<p><img src="edemo7.png" alt="edemo7" width="800" height="229"></p>
<p>
<code>DSET   /data/examples/monthly.%y4%m2.%e.dat<br>
  TITLE Example of Ensembles in Binary Format<br>
  undef -9.99e8<br>
  options template<br>
  XDEF  360 LINEAR -179.5 1.0<br>
  YDEF  180 LINEAR  -89.5 1.0<br>
  ZDEF    1 linear   1     1<br>
  TDEF  240 LINEAR  1jan1988 1mo<br>
  EDEF   6<br>
  e1   &nbsp;48  1jan1988<br>
  e2   &nbsp;83  1jan1991<br>
  e3  101  1jan1992<br>
  e4  152  1may1995<br>
  e5  128  1may1997<br>
  e6   &nbsp;96  1jan2000<br>
  ENDEDEF<br>
  VARS   8<br>
  lhf    0   99 latent heat flux (W/m**2)<br>
  tx&nbsp;     0   99 zonal wind stress (N/m**2)<br>
  ty     &nbsp;0   99 meridional wind stress (N/m**2)<br>
  shf    0   99 sensible heat flux (W/m**2)<br>
  hum    0   99 surface air (~10-m) specific humidity (g/kg)<br>
  pw     &nbsp;0   99 lowest 500-m precipitable water (g/cm**2)<br>
  wpd    0   99 10-m wind speed (m/s)<br>
  hd     &nbsp;0   99 sea-air humidity difference (g/kg)<br>
  ENDVARS</code></p>
</body>
</html>