ASTM E2078-00
(Guide)Standard Guide for Analytical Data Interchange Protocol for Mass Spectrometric Data
Standard Guide for Analytical Data Interchange Protocol for Mass Spectrometric Data
SCOPE
1.1 This guide covers the implementation of the Mass Spectrometric Data Protocol in analytical software applications. Implementation of this protocol requires:
1.1.1 Specification E 2077, which contains the full set of data definitions. The mass spectrometric data protocol is not based upon any specific implementation; it is designed to be independent of any particular implementation so that implementations can change as technology evolves. The protocol is implemented in categories to speed its acceptance through actual use.
1.1.2 Specification E 2077 contains a full description of the contents of the data communications protocol, including the analytical information categories with data elements and their attributes for most aspects of mass spectrometric tests.
1.2 The analytical information categories are a practical convenience for breaking down the standardization process into smaller, more manageable pieces. It is easier for developers to build consensus and produce working systems based on smaller information sets, without the burden and complexity of the hundreds of data elements contained in all the categories. The categories also assist vendors and end users in using the guide in their computing environments.
1.3 The network common data format (NetCDF) data interchange system is the container used to communicate data between applications in a way that is independent of both computer architectures and end-user applications. In essence, it is a special type of application designed for data interchange.
1.4 The common data language (CDL) template for mass spectrometry is a language specification of the mass spectrometry dataset being interchanged. With the use of the NetCDF utilities, this human-readable template can be used to generate an equivalent binary file and the software subroutine calls needed for input and output of data in analytical applications.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E 2078 – 00
Standard Guide for
Analytical Data Interchange Protocol for Mass
Spectrometric Data
This standard is issued under the fixed designation E 2078; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
1. Scope tocol for Mass Spectrometric Data
2.2 Other Standard:
1.1 This guide covers the implementation of the Mass
NetCDF User’s Guide
Spectrometric Data Protocol in analytical software applica-
2.3 ISO Standards:
tions. Implementation of this protocol requires:
8601:1988 Data elements and interchange formats, (First
1.1.1 Specification E 2077, which contains the full set of
edition published 1988-06-15; with Technical Corrigen-
data definitions. The mass spectrometric data protocol is not
dum 1 published 1991-05-01)
based upon any specific implementation; it is designed to be
independent of any particular implementation so that imple-
3. List of Contents and Use
mentations can change as technology evolves. The protocol is
3.1 NetCDF Toolkit—The protocol is an application pro-
implemented in categories to speed its acceptance through
gramming interface (API) layered on top of the public domain
actual use.
NetCDF toolkit. NetCDF is a set of tools that facilitate reading
1.1.2 Specification E 2077 contains a full description of the
or writing platform-independent, self-describing data files. All
contents of the data communications protocol, including the
data in a NetCDF file is written using the external data
analytical information categories with data elements and their
representation (XDR). XDR was developed by Sun Microsys-
attributes for most aspects of mass spectrometric tests.
tems and is used for platform-independent file systems for all
1.2 The analytical information categories are a practical
workstations and personal computers. Each NetCDF data
convenience for breaking down the standardization process
element is self-describing - it has a name, type, and dimen-
into smaller, more manageable pieces. It is easier for develop-
sionality. A NetCDF file contains three parts: a dimensions
ers to build consensus and produce working systems based on
section, which defines the names and size of all dimensions
smaller information sets, without the burden and complexity of
used to describe variables; a variables section, which defines
the hundreds of data elements contained in all the categories.
the names, data types, dimensionality, and attributes for all
The categories also assist vendors and end users in using the
variables used in the file; and finally, a data section, which
guide in their computing environments.
contains the actual values assigned to the variables. Attributes
1.3 The network common data format (NetCDF) data inter-
are numbers or strings which augment the description of
change system is the container used to communicate data
variables or the file as a whole.
between applications in a way that is independent of both
3.1.1 For example, a variable “x_axis_ values” might con-
computer architectures and end-user applications. In essence, it
tain an array of numbers representing the abscissa of a
is a special type of application designed for data interchange.
two-dimensional data set. It would have a dimension, possibly
1.4 The common data language (CDL) template for mass
named “x_axis_size,” which would specify the number of
spectrometry is a language specification of the mass spectrom-
abscissa points. The variable might have some descriptive
etry dataset being interchanged. With the use of the NetCDF
attributes,suchas“units”(withavalueof“Seconds,”perhaps),
utilities, this human-readable template can be used to generate
“scale_factor” (with a value of 1000.0, specifying that all
an equivalent binary file and the software subroutine calls
storedabscissavaluesshouldbemultipliedby1000.0togetthe
needed for input and output of data in analytical applications.
actual value), or “long_name” (with value“ Time”, which
2. Referenced Documents might be used to label the abscissa when drawing a plot).
3.1.2 The NetCDF toolkit has been placed in the public
2.1 ASTM Standards:
domain by the Unidata Program Center, a non-profit software
E 2077 Specification for Analytical Data Interchange Pro-
Annual Book of ASTM Standards, Vol 03.06.
1 3
This guide is under the jurisdiction of ASTM Committee E13 on Molecular Available for Russell K. Rew, Unidata Program Center, University Corporation
Spectroscopy and Chromatography and is the direct responsibility of Subcommittee for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000.
E13.15 on Analytical Data. Available from ISO, 1 Rue de Varembe, Case Postale 56, CH 1211, Geneve,
Current edition approved March 10, 2000. Published July 2000. Switzerland.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
E2078–00
support organization for the University Corporation for Atmo- (3) Parameters to API functions are presented in italic
spheric Research. The Unidata Program Center is funded by Helvetica font.
the National Science Foundation, National Center for Atmo- (4) Example code is presented in normal Helvetica font.
spheric Research, and other organizations and provides ongo- 4.2 Other Conventions—All indices begin at zero (C con-
ing development and support of NetCDF and related tools. vention). In several data structures, a scan_no or inst_no
3.1.3 The NetCDF version currently supported in this element must be loaded before reading or writing. This
implementation is 2.3.2. identifies the scan or instrument component number for which
3.2 Data Structures—Each of the analytical information data will be read or written. In all cases, scan or instrument
class tables in the specification document has a corresponding component numbers begin at zero.
data structure; however, not every field in each table has a 4.2.1 All date/time stamps are formatted using the ISO
corresponding data element in a structure, and the data struc- standard 8601 format referenced in the specification. An API
tures may have elements that do not appear in any class table. utility function is provided for conversion between date/time
Most of these differences are due to details of the implemen- information in numeric form and ISO-8601 string format (see
tation which could not be hidden. ms_convert_date(), below).
3.2.1 The data structures provide the mapping between the
attribute name and data type described in the specification and
5. Mass Spectrometric Data Protocol Distribution Kit
the field and actual data type in the file. The actual NetCDF
5.1 It is intended that potential users of this implementation
dimension, variable, and attribute names are hidden from the
can obtain a complete NetCDF and API distribution kit from
API level. These names in fact are irrelevant for application
various instrument vendors’ Web sites. Information on how to
programs; it is the data structure which provides the informa-
obtain the kit will be posted on the ASTM website
tion interchange between the application and the file.
(www.astm.org) under Committee E01.25.
3.2.2 Each data structure and its mapping to an analytical
5.2 The Analytical Data Interchange Protocol for Mass
information class are described in detail later in this guide.
Spectrometric Data distribution kit contains:
3.2.3 Application Programming Interface Functions:
5.2.1 Software—NetCDF distribution kit from Unidata
3.2.3.1 The application programming interface provides
(withthemodifiedmakefileneededtomakethekitcompileout
programmatic access to the contents of the files. Mass spectral
of the box).
data occurs in three forms: global information, which relates to
5.2.2 NetCDF User’s Guide—supplied by Unidata Program
thecontentsoftheentirefile,informationwhichdescribeseach
Center.
part of a multi-component instrument, and information which
5.2.3 Specification E 2077.
changes on a scan-by-scan basis for spectra and library entries.
5.2.4 Guide E 2078.
API functions are provided for opening a file for reading or
writing; closing a file; reading and writing global, per-
6. Hardware and Software
component instrument, and per-scan spectral and library infor-
6.1 This section describes the hardware and software con-
mation; initializing and clearing data structure contents; and a
figurations used for testing. In general, the NetCDF system
few miscellaneous utility functions. Each of these functions is
puts very few requirements on the hardware because most
described in detail in a later section of this guide.
routines are left on disk. Only routines being used at any
3.2.4 Enumerated Sets—Many of the attributes listed in the
particular time are kept in memory.Any limitations found were
Analytical Data Interchange Protocol for Mass Spectrometric
typically those not imposed by NetCDF but ones imposed by
Data specification have an enumerated set of associated values.
the operating system or environment.
The attribute may take only one value from that restricted set.
6.1.1 Hardware (Personal Computers)—The personal com-
In the implementation, each such attribute is defined as a
puter system hardware originally used for testing was:
formal C type, and the allowed values are defined as an
6.1.1.1 Intel 80286 processor,
enumerated set of that formal type. Each enumerated value is
6.1.1.2 640K minimum,
associated with a unique string literal, and it is these string
6.1.1.3 Monochrome, EGA, VGA graphics,
literals, not the enumeration values, which are written to or
6.1.1.4 20 megabyte minimum, 80 megabyte hard-disk is
read from the file. This practice both enforces the use of the
typical, and
proper enumeration values and follows the NetCDF dictum
6.1.1.5 A mouse (optional).
that files be self-describing. If the enumeration values were
6.1.1.6 NetCDF works well on AT-class machines and
written instead of the strings, then some lookup mechanism
higher. NetCDF does not have the items in 6.1.1.1-6.1.1.5 as
would be required external to the NetCDF file to translate the
requirements. These are just the minimum, base-level systems
number into something meaningful.
that were used.
4. Conventions
6.1.2 Software—NetCDF runs on MS-DOS, OS/2, Macin-
4.1 The format convention adopted in this guide is as tosh, Windows 95, and Windows NT operating systems for
follows: personalcomputers.NetCDFwasoriginallyportedfromUNIX
(1) Normal text is presented in this font (Times New to DOS running on an IBM-PS/2 Model 80. It was recently
Roman). ported to the Macintosh OS. NetCDF is written in the C
(2) API symbols (functions, formal types, etc.) are pre- programming language, and there are FORTRAN jackets
sented in boldface Helvetica font. available for applications that want to use FORTRAN calls.
E2078–00
The personal computer software originally employed for test- 8.2 AfewpointsofclarificationabouttheCDLlanguageare
ing and developing NetCDF applications was: given here to facilitate its understanding. For more in-depth
6.1.2.1 Microsoft DOS V3.3 or above, informationonCDL,pleaseconsultthe NetCDF User’s Guide.
6.1.2.2 Microsoft C Compiler V6.0, 8.2.1 A NetCDF template starts with the word “NetCDF”
6.1.2.3 Microsoft Windows V3.0,
followed by the dataset name.
6.1.2.4 Microsoft Windows SDK, and 8.2.2 CDL comments are indicated by two forward slash
6.1.2.5 NetCDF Version 2.0.1.
characters (//).
6.1.3 Workstations and Servers—NetCDF runs easily on
8.2.3 Section indicators (dimensions:, variables:, and data:)
UNIX workstations such as Sun 3, Sun 4, VAXstations,
end with a colon character (:). These are the only tokens that
DECstation 3100, VAXstation II running ULTRIX or VMS,
end with a colon character.
and IBM RS/6000. There are no particular hardware require-
8.2.4 Statements within sections end with the semicolon
ments for workstation class machines, since all workstations
character (;).
have the minimum hardware outlined for personal computers
8.2.5 Variable names beginning with numbers must be
in 6.1.1.
preceded by an underline character (_). Otherwise the ncgen
parser will flag an error.
7. Significance and Use
8.2.5.1 Underline characters were chosen for this protocol
over hyphen characters, because some compilers may interpret
7.1 General Coding Guidelines—The NetCDF libraries are
hyphens as subtraction operators. The feature of CDL that
supplied to developers as source code. End users receive the
allows implicit numerical datatyping of attributes in not being
libraries in compiled binary form as part of a vendor’s
used in the first version of the template. Instead, all floating
application.
point attributes are being handled as strings. This forces
7.1.1 Developers setting out to write a program to convert
programmers to explicitly type variables, thereby encouraging
their data files to the Mass Spectrometric Data Protocol should
more deliberate programming styles. For example:
consider using the NetCDF utilities ncgen and ncdump. After
:aia_template_revision = “0.8”; //M12345
developers create the NetCDF file they should use the ncdump
:netcdf_revision = “2.0.1”; //M12345
program to generate the ASCII representation of the data file,
and examine it to ensure the data are being correctly put into
Consult the NetCDF User’s Guide for more complete
the file.
information on CDL syntax and usage.
7.2 Make Files for NetCDF Libraries and Utilities—In
8.2.6 Underline characters only can be used as separators
generalthecompilationisstraightforward.Themakefileswere
between words within variable names, like:
modified after they were received from the Unidata Corpora-
aia-template-revision, or aia_template_revision.
tion, because they did not compile the first time on PCs. The
8.2.7 Numerical data types for single values can be declared
changes needed to get the Unidata distribution to run on DOS
implicitly by putting numbers on the right side of an assign-
are (1) rename the file MAKEFILE to UNIX.MK, and (2)
ment statement, like:
rename MSOFT.MK to MAKEFILE, and then run NMAKE.
peak_number=2; //number of peaks
The default switches in the Unidata distribution use the
switches for the floating point coprocessor and Microsoft These numerical datatypes can be floating point or integer
Windows options.
values, and can be implicitly datatyped as such.
7.2.1 The protocol kit contains some complete makefile
:floating_point_attribute = 1.11; //M12345
examples for Microsoft C V6.0 running on DOS. The Mi-
8.2.8 Numerical data types can be declared explicitly by
crosoft C V6.0 compiler manual should be consulted for the
preceding the variable defi
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.