/* Copyright 2009, UCAR/Unidata and OPeNDAP, Inc. See the COPYRIGHT file for more information. */
The document has two primary parts.
Also, the reader should note that code is a bit crufty and needs refactoring. This is primarily because it was originally defined to support only C and each new language stresses the code.
In order to get ncgen to generate output for a new language, the following steps are required.
#define ENABLE_C
)
and insert a new one of the form
#define ENABLE_JAVA
int fortran_flag;
)
and insert a new declaration.
int java_flag;
fortran_flag = 0;
)
in the body of the main() procedure and add a new initialization.
java_flag = 0;.
case 'l':
).
Duplicate one of the instances there and add to the conditionals.
It should look like this.
} else if(strcmp(lang_name, "java") == 0 || strcmp(lang_name, "Java") == 0) {java_flag = 1;}
#ifndef ENABLE_C
).
Add a new one for Java.
It should look like this.
#ifndef ENABLE_JAVA if(java_flag) { fprintf(stderr,"Java not currently supported\n"); exit(1); } #endif
In order to facilitate code generation, it is useful to look at the translations produced by other languages. The idea is to take these translations and decide what the corresponding Java (for example) code would look like. Then the idea is to modify the genc code (in genj.c) to reflect that translation.
In most of the rest of this discussion, the genc.c and cdata.c code will be used to explain the operation. Appropriate procedure renaming should be done for new languages (e.g, for Java, genc_XXX is changed to genj_XXX consistently).
void cprint(Bytebuffer* buf)
-- dump the contents of buf to output (ccode actually).
void cpartial(char* line)
-- dump the specified string to output.
void cline(char* line)
-- dump the specified string to output and add a newline.
void clined(int n, char* line)
-- dump the specified string to output preceded by
n instances of indentation.
void cflush(void)
-- dump the contents of ccode to standard output
and reset the ccode buffer.
The Bytebuffer type is an important data structure. It allows for dynamically creating strings of characters (actually arbitrary 8 bit values). Most of the operations should be obvious: examine bytebuffer.h. It is used widely in this code especially to capture sub-pieces of the generated code that must be saved for out-of-order output.
It has at its disposal several global lists of Symbols. Note that the lists cross all groups.
The superficial operation of gen_ncc is as follows; the details are provided later where the operation is complex.
The following code generates C code for defining the groups. It is fairly canonical and can be seen repeated in variant form when defining dimensions, types, variables, and attributes.
This code is redundant but for consistency, the root group ncid is stored like all other group ncids. Note that nprintf is a macro wrapper around snprint.
nprintf(stmt,sizeof(stmt),"%s%s = ncid;",indented(1),groupncid(rootgroup)); cline(stmt);
The loop walks all group symbols in preorder form and generates C code call to nc_def_grp using parameters taken from the group Symbol instance (gsym). The call to nc_def_grp is succeeded by a call to the check_err procedure to verify the operation's result code.
for(igrp=0;igrpNote the call to indented(). It generates a blank string corresponding to indentation to a level of its argument N; level n might result in more or less than N blank characters.container == NULL) PANIC("null container"); nprintf(stmt,sizeof(stmt), "%sstat = nc_def_grp(%s, \"%s\", &%s);", indented(1), groupncid(gsym->container), gsym->name, groupncid(gsym)); cline(stmt); // print the def_grp call clined(1,"check_err(stat,__LINE__,__FILE__);"); } flushcode();
Note also that one must be careful when dumping names (e.g. gsym->name above) if the name is expected to contain utf8 characters. For C, utf8 works fine in strings, but with a language like Java, which takes utf-16 characters, some special encoding is required to convert the non-ascii characters to use the \uxxxx form.
The code to generate dimensions, types, attributes, variables is similar, although often more complex.
The code to generate C equivalents of CDL types is in the procedure definectype(). Note that this code is not the code that invokes e.g. nc_def_vlen. The generated C types are used when generating datalists so that the standard C constant assignment mechanism will produce the correct memory values.
For non-C languages, the interaction between this code and the nc_def_TYPE code may be rather more complex than with C.
The genc_deftype procedure is the one that actually generates C code to define the netcdf types. The generated C code is designed to store the resulting typeid into the C variable defined earlier for holding that typeid.
Note that for compound types, the NC_COMPOUND_OFFSET macro is normally used to match netcdf offsets to the corresponding struct type generated in definectype. However, there is a flag, TESTALIGNMENT, that can be set to use a computed value for the offset. And for non-C languages, handling offsets is tricky and is addressed in more detail below.
The idea is that one has a set of procedures in C with a simple interface that can be invoked by the output language. These procedures do the following.
This method is appropriate to use with most non-C languages, with interpretive languages (e.g., Ruby and Perl), and even is probably the best way to get FORTRAN to handle the full netcdf-4 data model.
The data generation code is divided into two primary groups. One group handles all non-primitive variables and types. The other group handles all primitive variables and types (especially fields). The reason for this is that almost all languages can handle simple lists of primitive values. However, for non-primitive types, one of the methods from the previous section needs to be used.
Secondarily, the primitive handling code is divided into two groups. One group handles the character type and the other group handles all other primitive types. The code for the first group is in chardata.c and is generally usable across all languages.
The reason for this split is for historical reasons. It turns out that it is tricky to properly handle variables (or Compound type fields) of type NC_CHAR. Here the term "proper" means to mimic the output of the original ncgen program. To this end, a set of generically useful routines are define in the chardata.c file. These routines take a datasource and walk it to build a single string of characters, with appropriate fill, to correspond to a NC_CHAR typed variable or field. Unless your language has special requirements, it is probably best to always use these routines to process datalists for variables of type NC_CHAR.
As a rule, the genc.c code calls a limited set of entry points into cdata.c. Again as a rule, cdata.c does not call genc.c code except for the closure mechanism described below.
The critical pieces of code for part I are the procedures genc_defineattr() and genc_definevardata() in genc.c.
As with variables, defining attributes of type NC_CHAR requires use of the gen_charXXX procedures.
As an aside, commas are added when needed to the list of constants using the commify procedure.
Their are three primary procedures that are called from the genj.c code.
Internally, each of these three procedures invokes the genc_data procedure to process part of a datalist.
Basically, each call to the callback will generate C code for some C constants and calls to nc_put_vara(). The closure data structure (struct Putvar) is defined as follows.
typedef struct Putvar { int (*putvar)(struct Putvar*, Odometer*, Bytebuffer*); int rank; Bytebuffer* code; size_t startset[NC_MAX_VAR_DIMS]; struct CDF { int grpid; int varid; } cdf; struct C { Symbol* var; } c; } Putvar;An instance of the closure is created for each variable that is the target of nc_put_vara(). It is initialized with the variable's symbol, rank, group id and variable id. It is also provided with a Bytebuffer into which it is supposed to store the generated C code. The startset is the cached previous set of dimension indices used for generating the nc_put_vara (see below).
The callback procedure (field "putvar") for generating C code putvar is assigned to the procedure called cputvara() (defined in genc.c). This procedure takes as arguments the closure object, an odometer describing the current set of dimension indices, and a Bytebuffer containing the generated C constants to be assigned to this slice of the variable.
Every time the closure procedure is called, it generates a C variable to hold the generated C constant. It also generated C constants to hold the start and count vectors required by nc_put_vara. It then generates an nc_put_vara() call. The start vector argument for the nc_put_vara is defined by the startset field of the closure. The count vector argument to nc_put_vara is computed from the current cached start vector and from the indices in the odometer. After the nc_put_vara() is generated, the odometer vector is assigned to the startset field in the closure for use on the next call.
There are some important assumptions about the state of the odometer when it is called.
In particular, this means that the start vector is zero for all positions except position zero. The count vector is positions, except zero is the index in the odometer, which is assumed to be the max.
For start position zero, the position is taken from the last saved startset. The count position zero is the difference between that last start position and the current odometer zeroth index.
As an optimization, ncgen tracks which datatypes will require use of vlen constants. This is any type whose definition is a vlen or whose basetype contains a vlen type.
The vlen generation process is two-fold. First, in the procedure processdatalist1() in semantics.c, the location of the struct Datalist objects that correspond to vlen constants is stored in a list called vlenconstants. When detected, each such Datalist object is tagged with a unique identifier and the vlen length (count). These will be used later to generate references to the vlen constant. These counts are only accurate for non-char typed variables; Special handling is in place to handle character vlen constants.
The second vlen constant processing action is in the procedure genc_vlenconstant() in cdata.c First, it walks the vlenconstants list and generates C code for C variables to define the vlen constant and C code to assign the vlen constant's data to that C variable.
When, later, the genc_datalist procedure encounters a Datalist tagged as representing a data list, it can generate a nc_vlen_t constant as {<count>,<vlenconstantname>} and use it directly in the generated C datalist constant.
The pool mechanism wraps malloc and records the malloc'd memory in a circular buffer. When the buffer reaches its maximum size, previously allocated pool buffers are free'd. This is good in that the user does not have to litter code with free() statements. It is bad in that the pool allocated memory can be free'd too early if the memory does not have a short enough life. If you suspect the latter, then bump the size of the circular buffer and see if the problem goes away. If so, then your code is probably holding on to a pool buffer too long and should use regular malloc/free.
In the end, I am not sure if this is a good idea, but if does make the code simpler.
The canonical code for non-destructive walking of a List
Bytebuffer provides two ways to access its internal buffer of characters.
One is "bbContents()", which returns a direct pointer to the buffer,
and the other is "bbDup()", which returns a malloc'd string containing
the contents and is guaranteed to be null terminated.
Suppose we have the declaration
A particular point in the three dimensions, say [x][y][z], is reduced to
a number in the range 0..29 by computing
The Odometer type stores a set of dimensions
and supports operations to iterate over all possible
dimension combinations.
The definition of Odometer is defined by the types Odometer and Dimdata.
for(i=0;i<listlength(list);i++) {
T* element = (T*)listget(list,i);
...
}
Odometer: Multi-Dimensional Array Handling
The odometer data type is used to convert
multiple dimensions into a single integer.
The rule for converting a multi-dimensional
array to a single dimensions is as follows.
int F[2][5][3];
.
There are obviously a total of 2 X 5 X 3 = 30 integers in F.
Thus, these three dimensions will be reduced to a single dimension of size 30.
((x*5)+y)*3+z
.
The corresponding general C code is as follows.
size_t
dimmap(int rank, size_t* indices, size_t* sizes)
{
int i;
size_t count = 0;
for(i=0;i
In this code, the indices variable corresponds to the x,y, and z.
The sizes variable corresponds to the 2,5, and 3.
typedef struct Dimdata {
unsigned long datasize; // actual size of the datalist item
unsigned long index; // 0 <= index < datasize
unsigned long declsize;
} Dimdata;
typedef struct Odometer {
int rank;
Dimdata dims[NC_MAX_VAR_DIMS];
} Odometer;
The following primary operations are defined.
Misc. Notes
Change Log