ASTM E2078-00(2005)
(Guide)Standard Guide for Analytical Data Interchange Protocol for Mass Spectrometric Data
Standard Guide for Analytical Data Interchange Protocol for Mass Spectrometric Data
SIGNIFICANCE AND USE
General Coding Guidelines—The NetCDF libraries are supplied to developers as source code. End users receive the libraries in compiled binary form as part of a vendor’application.
7.1.1 Developers setting out to write a program to convert their data files to the Mass Spectrometric Data Protocol should consider using the NetCDF utilities ncgen and ncdump. After developers create the NetCDF file they should use the ncdump program to generate the ASCII representation of the data file, and examine it to ensure the data are being correctly put into the file.
Make Files for NetCDF Libraries and Utilities—In general the compilation is straightforward. The make files were modified after they were received from the Unidata Corporation, because they did not compile the first time on PCs. The changes needed to get the Unidata distribution to run on DOS are (1) rename the file MAKEFILE to UNIX.MK, and (2) rename MSOFT.MK to MAKEFILE, and then run NMAKE. The default switches in the Unidata distribution use the switches for the floating point coprocessor and Microsoft Windows options.
7.2.1 The protocol kit contains some complete makefile examples for Microsoft C V6.0 running on DOS. The Microsoft C V6.0 compiler manual should be consulted for the exact meaning of the compiler and linker options.
7.2.2 The VMS and SunOS compilation instructions are in directories for those operating systems.
NetCDF Library Build Order—The NetCDF libraries must be built in a specific order. The correct order to build the NetCDF directories is:
UTIL XDR SRC NCDUMP NCGEN NCTEST
7.3.1 The UTIL and XDR makefiles work as distributed using NMAKE with Microsoft C V6.0.
SCOPE
1.1 This guide covers the implementation of the Mass Spectrometric Data Protocol in analytical software applications. Implementation of this protocol requires:
1.1.1 Specification E 2077, which contains the full set of data definitions. The mass spectrometric data protocol is not based upon any specific implementation; it is designed to be independent of any particular implementation so that implementations can change as technology evolves. The protocol is implemented in categories to speed its acceptance through actual use.
1.1.2 Specification E 2077 contains a full description of the contents of the data communications protocol, including the analytical information categories with data elements and their attributes for most aspects of mass spectrometric tests.
1.2 The analytical information categories are a practical convenience for breaking down the standardization process into smaller, more manageable pieces. It is easier for developers to build consensus and produce working systems based on smaller information sets, without the burden and complexity of the hundreds of data elements contained in all the categories. The categories also assist vendors and end users in using the guide in their computing environments.
1.3 The network common data format (NetCDF) data interchange system is the container used to communicate data between applications in a way that is independent of both computer architectures and end-user applications. In essence, it is a special type of application designed for data interchange.
1.4 The common data language (CDL) template for mass spectrometry is a language specification of the mass spectrometry dataset being interchanged. With the use of the NetCDF utilities, this human-readable template can be used to generate an equivalent binary file and the software subroutine calls needed for input and output of data in analytical applications.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2078 – 00 (Reapproved 2005)
Standard Guide for
Analytical Data Interchange Protocol for Mass
Spectrometric Data
This standard is issued under the fixed designation E2078; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2. Referenced Documents
1.1 This guide covers the implementation of the Mass 2.1 ASTM Standards:
Spectrometric Data Protocol in analytical software applica- E2077 Specification for Analytical Data Interchange Proto-
tions. Implementation of this protocol requires: col for Mass Spectrometric Data
1.1.1 Specification E2077, which contains the full set of 2.2 Other Standard:
data definitions. The mass spectrometric data protocol is not NetCDF User’s Guide
based upon any specific implementation; it is designed to be 2.3 ISO Standards:
independent of any particular implementation so that imple- 8601:1988 Data elements and interchange formats, (First
mentations can change as technology evolves. The protocol is edition published 1988-06-15; with Technical Corrigen-
implemented in categories to speed its acceptance through dum 1 published 1991-05-01)
actual use.
3. List of Contents and Use
1.1.2 Specification E2077 contains a full description of the
3.1 NetCDF Toolkit—The protocol is an application pro-
contents of the data communications protocol, including the
analytical information categories with data elements and their gramming interface (API) layered on top of the public domain
NetCDF toolkit. NetCDF is a set of tools that facilitate reading
attributes for most aspects of mass spectrometric tests.
1.2 The analytical information categories are a practical or writing platform-independent, self-describing data files. All
data in a NetCDF file is written using the external data
convenience for breaking down the standardization process
into smaller, more manageable pieces. It is easier for develop- representation (XDR). XDR was developed by Sun Microsys-
tems and is used for platform-independent file systems for all
ers to build consensus and produce working systems based on
workstations and personal computers. Each NetCDF data
smaller information sets, without the burden and complexity of
the hundreds of data elements contained in all the categories. element is self-describing - it has a name, type, and dimen-
sionality. A NetCDF file contains three parts: a dimensions
The categories also assist vendors and end users in using the
guide in their computing environments. section, which defines the names and size of all dimensions
used to describe variables; a variables section, which defines
1.3 The network common data format (NetCDF) data inter-
change system is the container used to communicate data the names, data types, dimensionality, and attributes for all
variables used in the file; and finally, a data section, which
between applications in a way that is independent of both
computerarchitecturesandend-userapplications.Inessence,it contains the actual values assigned to the variables. Attributes
are numbers or strings which augment the description of
is a special type of application designed for data interchange.
1.4 The common data language (CDL) template for mass variables or the file as a whole.
3.1.1 For example, a variable “x_axis_ values” might con-
spectrometry is a language specification of the mass spectrom-
etry dataset being interchanged. With the use of the NetCDF tain an array of numbers representing the abscissa of a
two-dimensional data set. It would have a dimension, possibly
utilities, this human-readable template can be used to generate
an equivalent binary file and the software subroutine calls named “x_axis_size,” which would specify the number of
needed for input and output of data in analytical applications.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
This guide is under the jurisdiction of ASTM Committee E13 on Molecular Standards volume information, refer to the standard’s Document Summary page on
Spectroscopy and Chromatography and is the direct responsibility of Subcommittee the ASTM website.
E13.15 on Analytical Data. Available for Russell K. Rew, Unidata Program Center, University Corporation
Current edition approved Sept. 1, 2005. Published November 2005. Originally for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000.
approved in 2000. Last previous edition approved in 2000 as E2078 – 00. DOI:
Available from ISO, 1 Rue de Varembe, Case Postale 56, CH 1211, Geneve,
10.1520/E2078-00R05. Switzerland.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
E2078 – 00 (2005)
abscissa points. The variable might have some descriptive would be required external to the NetCDF file to translate the
attributes,suchas“units”(withavalueof“Seconds,”perhaps), number into something meaningful.
“scale_factor” (with a value of 1000.0, specifying that all
storedabscissavaluesshouldbemultipliedby1000.0togetthe 4. Conventions
actual value), or “long_name” (with value“ Time”, which
4.1 The format convention adopted in this guide is as
might be used to label the abscissa when drawing a plot).
follows:
3.1.2 The NetCDF toolkit has been placed in the public
(1) Normal text is presented in this font (Times New
domain by the Unidata Program Center, a non-profit software
Roman).
support organization for the University Corporation forAtmo-
(2) API symbols (functions, formal types, etc.) are pre-
spheric Research. The Unidata Program Center is funded by
sented in boldface Helvetica font.
the National Science Foundation, National Center for Atmo-
(3) Parameters to API functions are presented in italic
spheric Research, and other organizations and provides ongo-
Helvetica font.
ing development and support of NetCDF and related tools.
(4) Example code is presented in normal Helvetica font.
3.1.3 The NetCDF version currently supported in this
4.2 Other Conventions—All indices begin at zero (C con-
implementation is 2.3.2.
vention). In several data structures, a scan_no or inst_no
element must be loaded before reading or writing. This
3.2 Data Structures—Each of the analytical information
identifies the scan or instrument component number for which
class tables in the specification document has a corresponding
data will be read or written. In all cases, scan or instrument
data structure; however, not every field in each table has a
component numbers begin at zero.
corresponding data element in a structure, and the data struc-
4.2.1 All date/time stamps are formatted using the ISO
tures may have elements that do not appear in any class table.
Most of these differences are due to details of the implemen- standard 8601 format referenced in the specification. An API
utility function is provided for conversion between date/time
tation which could not be hidden.
information in numeric form and ISO-8601 string format (see
3.2.1 The data structures provide the mapping between the
ms_convert_date(), below).
attribute name and data type described in the specification and
the field and actual data type in the file. The actual NetCDF
5. Mass Spectrometric Data Protocol Distribution Kit
dimension, variable, and attribute names are hidden from the
API level. These names in fact are irrelevant for application
5.1 It is intended that potential users of this implementation
programs; it is the data structure which provides the informa-
can obtain a complete NetCDF and API distribution kit from
tion interchange between the application and the file.
various instrument vendors’ Web sites. Information on how to
obtain the kit will be posted on the ASTM website
3.2.2 Each data structure and its mapping to an analytical
(www.astm.org) under Committee E01.25.
information class are described in detail later in this guide.
5.2 The Analytical Data Interchange Protocol for Mass
3.2.3 Application Programming Interface Functions:
Spectrometric Data distribution kit contains:
3.2.3.1 The application programming interface provides
5.2.1 Software—NetCDF distribution kit from Unidata
programmatic access to the contents of the files. Mass spectral
(withthemodifiedmakefileneededtomakethekitcompileout
data occurs in three forms: global information, which relates to
of the box).
thecontentsoftheentirefile,informationwhichdescribeseach
5.2.2 NetCDF User’s Guide—supplied by Unidata Program
part of a multi-component instrument, and information which
Center.
changes on a scan-by-scan basis for spectra and library entries.
5.2.3 Specification E2077.
API functions are provided for opening a file for reading or
5.2.4 Guide E2078.
writing; closing a file; reading and writing global, per-
component instrument, and per-scan spectral and library infor-
6. Hardware and Software
mation; initializing and clearing data structure contents; and a
few miscellaneous utility functions. Each of these functions is 6.1 This section describes the hardware and software con-
figurations used for testing. In general, the NetCDF system
described in detail in a later section of this guide.
puts very few requirements on the hardware because most
3.2.4 Enumerated Sets—Many of the attributes listed in the
routines are left on disk. Only routines being used at any
Analytical Data Interchange Protocol for Mass Spectrometric
particulartimearekeptinmemory.Anylimitationsfoundwere
Dataspecificationhaveanenumeratedsetofassociatedvalues.
typically those not imposed by NetCDF but ones imposed by
The attribute may take only one value from that restricted set.
the operating system or environment.
In the implementation, each such attribute is defined as a
6.1.1 Hardware (Personal Computers)—The personal com-
formal C type, and the allowed values are defined as an
puter system hardware originally used for testing was:
enumerated set of that formal type. Each enumerated value is
6.1.1.1 Intel 80286 processor,
associated with a unique string literal, and it is these string
6.1.1.2 640K minimum,
literals, not the enumeration values, which are written to or
6.1.1.3 Monochrome, EGA, VGA graphics,
read from the file. This practice both enforces the use of the
proper enumeration values and follows the NetCDF dictum 6.1.1.4 20 megabyte minimum, 80 megabyte hard-disk is
that files be self-describing. If the enumeration values were typical, and
written instead of the strings, then some lookup mechanism 6.1.1.5 A mouse (optional).
E2078 – 00 (2005)
6.1.1.6 NetCDF works well on AT-class machines and
XDR
SRC
higher. NetCDF does not have the items in 6.1.1.1-6.1.1.5 as
NCDUMP
requirements. These are just the minimum, base-level systems
NCGEN
that were used.
NCTEST
6.1.2 Software—NetCDF runs on MS-DOS, OS/2, Macin-
7.3.1 The UTIL and XDR makefiles work as distributed
tosh, Windows 95, and Windows NT operating systems for
using NMAKE with Microsoft C V6.0.
personalcomputers.NetCDFwasoriginallyportedfromUNIX
8. CDL Template Structure
to DOS running on an IBM-PS/2 Model 80. It was recently
ported to the Macintosh OS. NetCDF is written in the C
8.1 ANetCDF template is built from CDLstatements and is
programming language, and there are FORTRAN jackets
structured into three sections: (1) dimension declarations, (2)
available for applications that want to use FORTRAN calls.
variable declarations, and (3) the data section.
The personal computer software originally employed for test-
8.2 AfewpointsofclarificationabouttheCDLlanguageare
ing and developing NetCDF applications was:
given here to facilitate its understanding. For more in-depth
6.1.2.1 Microsoft DOS V3.3 or above,
informationonCDL,pleaseconsultthe NetCDF User’s Guide.
6.1.2.2 Microsoft C Compiler V6.0,
8.2.1 A NetCDF template starts with the word “NetCDF”
6.1.2.3 Microsoft Windows V3.0,
followed by the dataset name.
6.1.2.4 Microsoft Windows SDK, and
8.2.2 CDL comments are indicated by two forward slash
6.1.2.5 NetCDF Version 2.0.1.
characters (//).
6.1.3 Workstations and Servers—NetCDF runs easily on
8.2.3 Section indicators (dimensions:, variables:, and data:)
UNIX workstations such as Sun 3, Sun 4, VAXstations,
end with a colon character (:). These are the only tokens that
DECstation 3100, VAXstation II running ULTRIX or VMS,
end with a colon character.
and IBM RS/6000. There are no particular hardware require-
8.2.4 Statements within sections end with the semicolon
ments for workstation class machines, since all workstations
character (;).
have the minimum hardware outlined for personal computers
8.2.5 Variable names beginning with numbers must be
in 6.1.1.
preceded by an underline character (_). Otherwise the ncgen
parser will flag an error.
7. Significance and Use
8.2.5.1 Underline characters were chosen for this protocol
over hyphen characters, because some compilers may interpret
7.1 General Coding Guidelines—The NetCDF libraries are
hyphens as subtraction operators. The feature of CDL that
supplied to developers as source code. End users receive the
allows implicit numerical datatyping of attributes in not being
libraries in compiled binary form as part of a vendor’s
used in the first version of the template. Instead, all floating
application.
point attributes are being handled as strings. This forces
7.1.1 Developers setting out to write a program to convert
programmers to explicitly type variables, thereby encouraging
their data files to the Mass Spectrometric Data Protocol should
more deliberate programming styles. For example:
consider using the NetCDF utilities ncgen and ncdump. After
:aia_template_revision = “0.8”; //M12345
developers create the NetCDF file they should use the ncdump
:netcdf_revision = “2.0.1”; //M12345
program to generate the ASCII representation of the data file,
and examine it to ensure the data are being correctly put into Consult the NetCDF User’s Guide for more complete
information on CDL syntax and usage.
the file.
7.2 Make Files for NetCDF Libraries and Utilities—In 8.2.6 Underline characters only can be used as separators
between words within variable names, like:
generalthecompilationisstraightforward.Themakefileswere
modified after they were received from the Unidata Corpora-
aia-template-revision, or aia_template_revision.
tion, because they did not compile the first time on PCs. The
8.2.7 Numerical data types for single values can be declared
changes needed to get the Unidata distribution to run on DOS
implicitly by putting numbers on the right side of an assign-
are (1) rename the file MAKEFILE to UNIX.MK, and (2)
ment
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.