ASTM E2078-00(2010)
(Guide)Standard Guide for Analytical Data Interchange Protocol for Mass Spectrometric Data
Standard Guide for Analytical Data Interchange Protocol for Mass Spectrometric Data
SIGNIFICANCE AND USE
General Coding Guidelines—The NetCDF libraries are supplied to developers as source code. End users receive the libraries in compiled binary form as part of a vendor's application.
Developers setting out to write a program to convert their data files to the Mass Spectrometric Data Protocol should consider using the NetCDF utilities ncgen and ncdump. After developers create the NetCDF file they should use the ncdump program to generate the ASCII representation of the data file, and examine it to ensure the data are being correctly put into the file.
Make Files for NetCDF Libraries and Utilities—In general the compilation is straightforward. The make files were modified after they were received from the Unidata Corporation, because they did not compile the first time on PCs. The changes needed to get the Unidata distribution to run on DOS are (1) rename the file MAKEFILE to UNIX.MK, and (2) rename MSOFT.MK to MAKEFILE, and then run NMAKE. The default switches in the Unidata distribution use the switches for the floating point coprocessor and Microsoft Windows options.
The protocol kit contains some complete makefile examples for Microsoft C V6.0 running on DOS. The Microsoft C V6.0 compiler manual should be consulted for the exact meaning of the compiler and linker options.
The VMS and SunOS compilation instructions are in directories for those operating systems.
NetCDF Library Build Order—The NetCDF libraries must be built in a specific order. The correct order to build the NetCDF directories is:
UTIL XDR SRC NCDUMP NCGEN NCTEST
The UTIL and XDR makefiles work as distributed using NMAKE with Microsoft C V6.0.
SCOPE
1.1 This guide covers the implementation of the Mass Spectrometric Data Protocol in analytical software applications. Implementation of this protocol requires:
1.1.1 Specification E2077, which contains the full set of data definitions. The mass spectrometric data protocol is not based upon any specific implementation; it is designed to be independent of any particular implementation so that implementations can change as technology evolves. The protocol is implemented in categories to speed its acceptance through actual use.
1.1.2 Specification E2077 contains a full description of the contents of the data communications protocol, including the analytical information categories with data elements and their attributes for most aspects of mass spectrometric tests.
1.2 The analytical information categories are a practical convenience for breaking down the standardization process into smaller, more manageable pieces. It is easier for developers to build consensus and produce working systems based on smaller information sets, without the burden and complexity of the hundreds of data elements contained in all the categories. The categories also assist vendors and end users in using the guide in their computing environments.
1.3 The network common data format (NetCDF) data interchange system is the container used to communicate data between applications in a way that is independent of both computer architectures and end-user applications. In essence, it is a special type of application designed for data interchange.
1.4 The common data language (CDL) template for mass spectrometry is a language specification of the mass spectrometry dataset being interchanged. With the use of the NetCDF utilities, this human-readable template can be used to generate an equivalent binary file and the software subroutine calls needed for input and output of data in analytical applications.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2078 − 00(Reapproved 2010)
Standard Guide for
Analytical Data Interchange Protocol for Mass
Spectrometric Data
This standard is issued under the fixed designation E2078; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2. Referenced Documents
2.1 ASTM Standards:
1.1 This guide covers the implementation of the Mass
Spectrometric Data Protocol in analytical software applica- E2077 Specification for Analytical Data Interchange Proto-
col for Mass Spectrometric Data
tions. Implementation of this protocol requires:
1.1.1 Specification E2077, which contains the full set of 2.2 Other Standard:
data definitions. The mass spectrometric data protocol is not
NetCDF User’s Guide
based upon any specific implementation; it is designed to be
2.3 ISO Standards:
independent of any particular implementation so that imple-
8601:1988 Data elements and interchange formats, (First
mentations can change as technology evolves. The protocol is
edition published 1988-06-15; with Technical Corrigen-
implemented in categories to speed its acceptance through
dum 1 published 1991-05-01)
actual use.
3. List of Contents and Use
1.1.2 Specification E2077 contains a full description of the
contents of the data communications protocol, including the
3.1 NetCDF Toolkit—The protocol is an application pro-
analytical information categories with data elements and their
gramming interface (API) layered on top of the public domain
attributes for most aspects of mass spectrometric tests.
NetCDF toolkit. NetCDF is a set of tools that facilitate reading
or writing platform-independent, self-describing data files. All
1.2 The analytical information categories are a practical
data in a NetCDF file is written using the external data
convenience for breaking down the standardization process
representation (XDR). XDR was developed by Sun Microsys-
into smaller, more manageable pieces. It is easier for develop-
tems and is used for platform-independent file systems for all
ers to build consensus and produce working systems based on
workstations and personal computers. Each NetCDF data
smaller information sets, without the burden and complexity of
element is self-describing - it has a name, type, and dimen-
the hundreds of data elements contained in all the categories.
sionality. A NetCDF file contains three parts: a dimensions
The categories also assist vendors and end users in using the
section, which defines the names and size of all dimensions
guide in their computing environments.
used to describe variables; a variables section, which defines
1.3 The network common data format (NetCDF) data inter-
the names, data types, dimensionality, and attributes for all
change system is the container used to communicate data
variables used in the file; and finally, a data section, which
between applications in a way that is independent of both
contains the actual values assigned to the variables. Attributes
computerarchitecturesandend-userapplications.Inessence,it
are numbers or strings which augment the description of
is a special type of application designed for data interchange.
variables or the file as a whole.
1.4 The common data language (CDL) template for mass
3.1.1 For example, a variable “x_axis_ values” might con-
spectrometry is a language specification of the mass spectrom-
tain an array of numbers representing the abscissa of a
etry dataset being interchanged. With the use of the NetCDF
two-dimensional data set. It would have a dimension, possibly
utilities, this human-readable template can be used to generate
named “x_axis_size,” which would specify the number of
an equivalent binary file and the software subroutine calls
abscissa points. The variable might have some descriptive
needed for input and output of data in analytical applications.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
This guide is under the jurisdiction of ASTM Committee E13 on Molecular Standards volume information, refer to the standard’s Document Summary page on
Spectroscopy and Separation Science and is the direct responsibility of Subcom- the ASTM website.
mittee E13.15 on Analytical Data. Available for Russell K. Rew, Unidata Program Center, University Corporation
Current edition approved Nov. 1, 2010. Published November 2010. Originally for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000.
approved in 2000. Last previous edition approved in 2005 as E2078 – 00 (2005). Available from ISO, 1 Rue de Varembe, Case Postale 56, CH 1211, Geneve,
DOI: 10.1520/E2078-00R10. Switzerland.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2078 − 00 (2010)
attributes,suchas“units”(withavalueof“Seconds,”perhaps), 4. Conventions
“scale_factor” (with a value of 1000.0, specifying that all
4.1 The format convention adopted in this guide is as
storedabscissavaluesshouldbemultipliedby1000.0togetthe
follows:
actual value), or “long_name” (with value“ Time”, which
(1) Normal text is presented in this font (Times New
might be used to label the abscissa when drawing a plot).
Roman).
3.1.2 The NetCDF toolkit has been placed in the public
(2) API symbols (functions, formal types, etc.) are pre-
domain by the Unidata Program Center, a non-profit software
sented in boldface Helvetica font.
support organization for the University Corporation forAtmo-
(3) Parameters to API functions are presented in italic
spheric Research. The Unidata Program Center is funded by
Helvetica font.
the National Science Foundation, National Center for Atmo-
(4) Example code is presented in normal Helvetica font.
spheric Research, and other organizations and provides ongo-
4.2 Other Conventions—All indices begin at zero (C con-
ing development and support of NetCDF and related tools.
vention). In several data structures, a scan_no or inst_no
3.1.3 The NetCDF version currently supported in this
element must be loaded before reading or writing. This
implementation is 2.3.2.
identifies the scan or instrument component number for which
3.2 Data Structures—Each of the analytical information
data will be read or written. In all cases, scan or instrument
class tables in the specification document has a corresponding
component numbers begin at zero.
data structure; however, not every field in each table has a
4.2.1 All date/time stamps are formatted using the ISO
corresponding data element in a structure, and the data struc-
standard 8601 format referenced in the specification. An API
tures may have elements that do not appear in any class table.
utility function is provided for conversion between date/time
Most of these differences are due to details of the implemen-
information in numeric form and ISO-8601 string format (see
tation which could not be hidden.
ms_convert_date(), below).
3.2.1 The data structures provide the mapping between the
5. Mass Spectrometric Data Protocol Distribution Kit
attribute name and data type described in the specification and
the field and actual data type in the file. The actual NetCDF 5.1 It is intended that potential users of this implementation
dimension, variable, and attribute names are hidden from the can obtain a complete NetCDF and API distribution kit from
API level. These names in fact are irrelevant for application various instrument vendors’ Web sites. Information on how to
programs; it is the data structure which provides the informa-
obtain the kit will be posted on the ASTM website
tion interchange between the application and the file. (www.astm.org) under Committee E01.25.
3.2.2 Each data structure and its mapping to an analytical
5.2 The Analytical Data Interchange Protocol for Mass
information class are described in detail later in this guide.
Spectrometric Data distribution kit contains:
3.2.3 Application Programming Interface Functions:
5.2.1 Software—NetCDF distribution kit from Unidata
3.2.3.1 The application programming interface provides
(withthemodifiedmakefileneededtomakethekitcompileout
programmatic access to the contents of the files. Mass spectral
of the box).
data occurs in three forms: global information, which relates to
5.2.2 NetCDF User’s Guide—supplied by Unidata Program
thecontentsoftheentirefile,informationwhichdescribeseach
Center.
part of a multi-component instrument, and information which
5.2.3 Specification E2077.
changes on a scan-by-scan basis for spectra and library entries.
5.2.4 Guide E2078.
API functions are provided for opening a file for reading or
6. Hardware and Software
writing; closing a file; reading and writing global, per-
component instrument, and per-scan spectral and library infor-
6.1 This section describes the hardware and software con-
mation; initializing and clearing data structure contents; and a
figurations used for testing. In general, the NetCDF system
few miscellaneous utility functions. Each of these functions is
puts very few requirements on the hardware because most
described in detail in a later section of this guide.
routines are left on disk. Only routines being used at any
3.2.4 Enumerated Sets—Many of the attributes listed in the
particulartimearekeptinmemory.Anylimitationsfoundwere
Analytical Data Interchange Protocol for Mass Spectrometric typically those not imposed by NetCDF but ones imposed by
Dataspecificationhaveanenumeratedsetofassociatedvalues. the operating system or environment.
The attribute may take only one value from that restricted set. 6.1.1 Hardware (Personal Computers)—The personal com-
In the implementation, each such attribute is defined as a puter system hardware originally used for testing was:
formal C type, and the allowed values are defined as an 6.1.1.1 Intel 80286 processor,
enumerated set of that formal type. Each enumerated value is 6.1.1.2 640K minimum,
associated with a unique string literal, and it is these string 6.1.1.3 Monochrome, EGA, VGA graphics,
literals, not the enumeration values, which are written to or 6.1.1.4 20 megabyte minimum, 80 megabyte hard-disk is
read from the file. This practice both enforces the use of the typical, and
proper enumeration values and follows the NetCDF dictum 6.1.1.5 A mouse (optional).
that files be self-describing. If the enumeration values were 6.1.1.6 NetCDF works well on AT-class machines and
written instead of the strings, then some lookup mechanism higher. NetCDF does not have the items in 6.1.1.1 – 6.1.1.5 as
would be required external to the NetCDF file to translate the requirements. These are just the minimum, base-level systems
number into something meaningful. that were used.
E2078 − 00 (2010)
6.1.2 Software—NetCDF runs on MS-DOS, OS/2, 7.3.1 The UTIL and XDR makefiles work as distributed
Macintosh, Windows 95, and Windows NT operating systems using NMAKE with Microsoft C V6.0.
for personal computers. NetCDF was originally ported from
8. CDL Template Structure
UNIX to DOS running on an IBM-PS/2 Model 80. It was
recently ported to the Macintosh OS. NetCDF is written in the
8.1 ANetCDF template is built from CDLstatements and is
C programming language, and there are FORTRAN jackets
structured into three sections: (1) dimension declarations, (2)
available for applications that want to use FORTRAN calls.
variable declarations, and (3) the data section.
The personal computer software originally employed for test-
8.2 AfewpointsofclarificationabouttheCDLlanguageare
ing and developing NetCDF applications was:
given here to facilitate its understanding. For more in-depth
6.1.2.1 Microsoft DOS V3.3 or above,
informationonCDL,pleaseconsultthe NetCDF User’s Guide.
6.1.2.2 Microsoft C Compiler V6.0,
8.2.1 A NetCDF template starts with the word “NetCDF”
6.1.2.3 Microsoft Windows V3.0,
followed by the dataset name.
6.1.2.4 Microsoft Windows SDK, and
8.2.2 CDL comments are indicated by two forward slash
6.1.2.5 NetCDF Version 2.0.1.
characters (//).
6.1.3 Workstations and Servers—NetCDF runs easily on
8.2.3 Section indicators (dimensions:, variables:, and data:)
UNIX workstations such as Sun 3, Sun 4, VAXstations,
end with a colon character (:). These are the only tokens that
DECstation 3100, VAXstation II running ULTRIX or VMS,
end with a colon character.
and IBM RS/6000. There are no particular hardware require-
8.2.4 Statements within sections end with the semicolon
ments for workstation class machines, since all workstations
character (;).
have the minimum hardware outlined for personal computers
8.2.5 Variable names beginning with numbers must be
in 6.1.1.
preceded by an underline character (_). Otherwise the ncgen
parser will flag an error.
7. Significance and Use
8.2.5.1 Underline characters were chosen for this protocol
7.1 General Coding Guidelines—The NetCDF libraries are
over hyphen characters, because some compilers may interpret
supplied to developers as source code. End users receive the
hyphens as subtraction operators. The feature of CDL that
libraries in compiled binary form as part of a vendor’s
allows implicit numerical datatyping of attributes in not being
application.
used in the first version of the template. Instead, all floating
7.1.1 Developers setting out to write a program to convert
point attributes are being handled as strings. This forces
their data files to the Mass Spectrometric Data Protocol should
programmers to explicitly type variables, thereby encouraging
consider using the NetCDF utilities ncgen and ncdump. After
more deliberate programming styles. For example:
developers create the NetCDF file they should use the ncdump
:aia_template_revision = “0.8”; //M12345
program to generate the ASCII representation of the data file,
:netcdf_revision = “2.0.1”; //M12345
and examine it to ensure the data are being correctly put into
Consult the NetCDF User’s Guide for more complete
the file.
information on CDL syntax and usage.
7.2 Make Files for NetCDF Libraries and Utilities—In
8.2.6 Underline characters only can be used as separators
generalthecompilationisstraightforward.Themakefileswere
between words within variable names, like:
modified after they were received from the Unidata
aia-template-revision, or aia_template_revision.
Corporation, because they did not compile the first time on
8.2.7 Numerical data types for single values can be declared
PCs. The changes needed to get the Unidata distribution to run
implicitly by putting numbers on the right side of an assign-
on DOS are (1) rename the file MAKEFILE to UNIX.MK, and
ment statement, like:
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.