ASTM E2310-04(2015)
(Guide)Standard Guide for Use of Spectral Searching by Curve Matching Algorithms with Data Recorded Using Mid-Infrared Spectroscopy
Standard Guide for Use of Spectral Searching by Curve Matching Algorithms with Data Recorded Using Mid-Infrared Spectroscopy
ABSTRACT
This guide presents the use of spectral searching by curve matching search algorithms for data recorded using mid-infrared spectroscopy. The methods described herein may be applicable to the use of these algorithms for other types of spectroscopic data, but each type of data search should be assessed separately. The purpose of this evaluation is the classification and, where possible, identification of the unknown. Spectral searching is intended as a screening method to assist the analyst, and is not an absolute identification technique, and hence, not intended to replace an expert in infrared spectroscopy and should not be used without suitable training. The Euclidean distance algorithm and the first derivative Euclidean distance algorithm are described and their use discussed. The theory and common assumptions made when using search algorithms are also discussed, along with guidelines for the use and interpretation of the search results.
SCOPE
1.1 Spectral searching is the process whereby a spectrum of an unknown material is evaluated against a library (database) of digitally recorded reference spectra. The purpose of this evaluation is classification of the unknown and, where possible, identification of the unknown. Spectral searching is intended as a screening method to assist the analyst and is not an absolute identification technique. Spectral searching is not intended to replace an expert in infrared spectroscopy. Spectral searching should not be used without suitable training.
1.2 The user of this guide should be aware that the results of a spectral search can be affected by the following factors described in Section 5: (1) baselines, (2) sample purity, (3) Absorbance linearity (Beer’s Law), (4) sample thickness, (5) sample technique and preparation, (6) physical state of the sample, (7) wavenumber range, (8) spectral resolution, and (9) choice of algorithm.
1.2.1 Many other factors can affect spectral searching results.
1.3 The scope of this guide is to provide a guide for the use of search algorithms for mid-infrared spectroscopy. The methods described herein may be applicable to the use of these algorithms for other types of spectroscopic data, but each type of data search should be assessed separately.
1.4 The Euclidean distance algorithm and the first derivative Euclidean distance algorithm are described and their use discussed. The theory and common assumptions made when using search algorithms are also discussed, along with guidelines for the use and interpretation of the search results.
1.5 The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard.
General Information
Relations
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E2310 − 04 (Reapproved 2015)
Standard Guide for
Use of Spectral Searching by Curve Matching Algorithms
with Data Recorded Using Mid-Infrared Spectroscopy
This standard is issued under the fixed designation E2310; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2. Referenced Documents
2.1 ASTM Standards:
1.1 Spectral searching is the process whereby a spectrum of
E131 Terminology Relating to Molecular Spectroscopy
an unknown material is evaluated against a library (database)
E334 Practice for General Techniques of Infrared Micro-
of digitally recorded reference spectra. The purpose of this
analysis
evaluation is classification of the unknown and, where
E573 Practices for Internal Reflection Spectroscopy
possible, identification of the unknown. Spectral searching is
E1252 Practice for General Techniques for Obtaining Infra-
intended as a screening method to assist the analyst and is not
red Spectra for Qualitative Analysis
an absolute identification technique. Spectral searching is not
E1642 Practice for General Techniques of Gas Chromatog-
intended to replace an expert in infrared spectroscopy. Spectral
raphy Infrared (GC/IR) Analysis
searching should not be used without suitable training.
E2105 Practice for General Techniques of Thermogravimet-
1.2 The user of this guide should be aware that the results of ric Analysis (TGA) Coupled With Infrared Analysis
a spectral search can be affected by the following factors (TGA/IR)
E2106 Practice for General Techniques of Liquid
described in Section 5: (1) baselines, (2) sample purity, (3)
Chromatography-Infrared (LC/IR) and Size Exclusion
Absorbance linearity (Beer’s Law), (4) sample thickness, (5)
Chromatography-Infrared (SEC/IR) Analyses
sample technique and preparation, (6) physical state of the
sample, (7) wavenumber range, (8) spectral resolution, and (9)
3. Terminology
choice of algorithm.
3.1 Definitions—For general definitions of terms and
1.2.1 Many other factors can affect spectral searching re-
symbols, refer to Terminology E131.
sults.
3.1.1 Euclidean distance algorithm—the Euclidean distance
1.3 The scope of this guide is to provide a guide for the use
algorithm measures the Euclidean distance between each
of search algorithms for mid-infrared spectroscopy. The meth-
library spectrum and the unknown spectrum by treating the
ods described herein may be applicable to the use of these
spectra as normalized vectors. The closeness of the match, or
algorithms for other types of spectroscopic data, but each type hit quality index (HQI), is calculated from the square root of
of data search should be assessed separately. thesumofthesquaresofthedifferencebetweenthevectorsfor
the unknown spectrum and each library spectrum.
1.4 TheEuclideandistancealgorithmandthefirstderivative
3.1.2 first derivative Euclidean distance algorithm—in the
Euclidean distance algorithm are described and their use
first derivative Euclidean distance algorithm the Euclidean
discussed. The theory and common assumptions made when
distance is also computed, except the derivative of each
using search algorithms are also discussed, along with guide-
spectrum is calculated prior to the Euclidean distance calcula-
lines for the use and interpretation of the search results.
tion.
1.5 The values stated in SI units are to be regarded as
3.1.3 hit quality index (HQI)—a table which ranks the
standard. No other units of measurement are included in this
library spectra in the database according to their hit quality
standard.
values (see 7.5).
3.1.4 hit quality value—the spectral search software com-
pares each spectrum in the database to that of the unknown and
This guide is under the jurisdiction of ASTM Committee E13 on Molecular
Spectroscopy and Separation Science and is the direct responsibility of Subcom-
mittee E13.03 on Infrared and Near Infrared Spectroscopy. For referenced ASTM standards, visit the ASTM website, www.astm.org, or
Current edition approved May 1, 2015. Published June 2015. Originally contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
approved in 2004. Last previous edition approved in 2009 as E2310 – 04 (2009). Standards volume information, refer to the standard’s Document Summary page on
DOI: 10.1520/E2310-04R15. the ASTM website.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2310 − 04 (2015)
assigns a numeric value for each library entry demonstrating over the new range before an accurate comparison can be
how similar the two spectra are. made. Normalization of a spectrum for library searching is a
3.1.4.1 Discussion—There are several methods for assign- two step process. First, the minimum absorbance value in the
ing hit quality values and either a high or low value can be selectedspectralrangeissubtractedfromalltheabsorbancesin
assignedasthebestmatch.Refertothesoftwaremanufacturers the same range. The resulting values are then scaled by
documentation. dividing by the maximum result value in the range. The end
result is a spectrum (or a sub-range portion of a spectrum)
3.1.5 normalization—the mathematical technique used to
where the minimum value is zero (0) and the maximum is one
compensate for an intensity difference between two spectra
(1) absorbance. If the range chosen for normalization has only
(see 5.1).
one or two strong bands and a few medium intensity bands, the
3.1.6 peak searching—the process whereby the peak table
range of the spectrum must be reselected or the spectrum will
of the spectrum of an unknown material is evaluated against a
be dominated by the strong bands in the spectrum and the HQI
library of peak tables. Each reference spectrum in the library
will be insensitive to weaker fingerprint bands necessary for
contains a peak table and the peak table is individually
identification of a specific compound. Successful compound
compared to the peak table of the unknown, and assigned a
identification may require the spectral match exclude the
numerical value as to the goodness of fit.
strongest bands, then the normalization will be based on a
3.1.7 reference spectrum—an established spectrum of a
medium intensity band, and weak fingerprint bands will be
known compound or chemical sample.
emphasized in the HQI.
3.1.7.1 Discussion—This spectrum is typically stored in
5.2 Data Point Matching:
retrievable format so that it may be compared against the
5.2.1 The algorithms used for searching a spectrum against
sample spectrum of an analyte.
a library use a calculation that mathematically compares the
3.1.7.2 Discussion—This term has sometimes been used to
data points of the spectrum being searched to the data points of
refer to a background spectrum; such usage is not recom-
the spectra in the library. This requires that the data points in
mended.
both the sample and library spectra occur at the same fre-
3.1.8 search algorithm—the mathematical formula used to
quency. If the data points in the sample and library spectra are
make a point-by-point comparison of two spectra.
not aligned in this manner, then one of the spectra must be
3.1.9 spectral library—a collection of reference spectra mathematically altered (interpolated) to make the data points
stored in a computer readable form, also called a library, match. Typically the unknown spectrum being searched is
database, or spectral database. altered to match the data point spacing of the spectra in the
library.
3.1.10 spectral searching—the process whereby a spectrum
5.2.2 Data point matching is commonly accomplished using
of an unknown material is evaluated against a library of digital
a linear data point interpolation method. In this method, the
reference spectra. Each reference spectrum in the library is
slope and offset of a line segment is calculated between the
individually compared to the spectrum of the unknown, and
absorbancesofeverypairofdatapointsinthespectrum.Anew
assignedanumericalvalueastothegoodnessoffit.Toperform
set of absorbances is calculated by locating the values that
this comparison, each data point in the unknown spectrum is
occur on the line segments at positions corresponding to the
compared to each corresponding point in the reference spec-
datapoint frequency of the library spectrum.
trum.
6. Conditions or Issues Affecting Results
4. Theory
6.1 Spectral quality is one of the primary conditions or
4.1 Beer’s Law—One of the basic principles that make
issues that can affect search results. There is no substitute for
spectral searching possible is Beer’s Law (see Terminology
a carefully recorded spectrum. There are several conditions or
E131), which states that A = abc, where A is the absorbance, a
issues that affect spectral quality as pertains to spectral search-
is the absorptivity, b is the sample pathlength, and c is the
ing. These conditions or issues apply to both the spectra used
concentration of the analyte of interest.As long as Beer’s Law
to create the reference database and to the unknown spectrum.
applies, two spectra of the same material recorded under
similar conditions can be made to appear the same by normal- 6.2 Baselines:
6.2.1 A flat baseline is preferred for the Euclidean distance
ization of the data.
NOTE1—Inanidealcase,thisistruefortransmittancespectra,butthere
algorithm as the Euclidean distance algorithm compares each
are differences in the spectral peak intensities when reflectance spectra are
data point in the unknown spectrum to the corresponding data
compared to transmittance spectra.
point in the reference spectrum. The effect of an offset or slope
in the baseline is interpreted as a difference between the two
5. Spectral Data Pre-Treatment
spectra. Therefore, when a spectrum with a sloping baseline or
5.1 Normalization:
offset is evaluated using the Euclidean distance algorithm, a
5.1.1 Normalization of spectra compensates for the differ-
simple baseline correction should be used.
ences in sample quantity (concentration or pathlength, or both)
NOTE 2—Negative bands can also produce an offset in the baseline as
used to generate the reference spectra in the library and that of
a result of the data normalization process.
the unknown. The spectra are normalized over the complete
spectral range of the library. When searching less than the full 6.2.2 The first derivative Euclidean distance algorithm
spectral range of the library, the spectra must be re-normalized minimizes the effect of an offset or sloping baseline. In this
E2310 − 04 (2015)
algorithm, the comparison is made between the difference of a 6.3.2.3 Halide salts used as window material and as the
pair of adjacent points in the unknown spectrum to the diluent for both pellets and diffuse reflection analysis often
difference between the corresponding pair of adjacent points in contain contaminants such as adsorbed water, hydrocarbon,
the reference spectrum. In effect, this causes the first derivative and nitrates. Always use dry halide salts and keep unused
Euclidean distance algorithm to look only at the differences in halide salts in a desiccator,
the slope of adjacent data points between the two spectra. Fig.
6.3.2.4 Water can alter the spectrum of the sample from its
1 shows how the two algorithms view the same two spectra.
dry state. Spectra of inorganic samples with waters of hydra-
tion are particularly sensitive to adsorbed water,
NOTE 3—The first derivative algorithm converts a sloping baseline into
an offset that is then eliminated by the normalization procedure.
6.3.2.5 Solvent bands from samples run in solution, and
6.3.2.6 Bands from solvents left over from an extraction or
6.3 Sample Purity:
6.3.1 The physical state of the sample should be as close as from casting a film from a solution.
possible to the physical state of the reference materials used to
NOTE 4—Retain spectra of any solvents used, so that bands due to the
obtain the library. For example, a pure liquid sample would
solvent can be identified in the spectrum of the unknown.
ideally be searched against a library of spectra of only liquid
NOTE 5—If the solvent bands in a region of the spectrum cannot be
reference materials. A sample which is probably a mixture,
removed from the spectrum (by either re-recording the spectrum, using an
such as a commercial formulation, should be compared to a
uncontaminated sample, or by spectral subtraction using the solvent
reference spectrum), then that region of the spectrum should be excluded
library of commercial formulations.
during a search. It is not sufficient to remove the offending bands digitally
6.3.2 Insomecasesthenatureofthesamplemaynotbewell
by drawing a straight line through the region before the search.The search
understood. An unknown sample may be a pure material or a
algorithm will calculate a poor match in this region for any reference
mixture.Itmayhaveadditionalcontaminantsthatwillaffectits
spectrum containing features in the region. It should be realized that the
spectrum by adding spurious bands. In addition there are removal of the solvent bands may also remove underlying features in the
sample spectrum.
several other sources of spurious spectral features that may
appearaseitherpositiveornegativebands.Severaloftheseare
6.4 Absorbance Linearity (Beer’s Law):
listed below:
6.4.1 A spectrum recorded using good practices (see Prac-
6.3.2.1 Features due to variations in the carbon dioxide or
tices E334, E1252, E1642, E2105, and E2106) should follow
water vapor levels in the optical path,
Beer’s Law, and so maintain the relative absorbance intensities
6.3.2.2 Bands from a mulling agent,
of its bands, independently of sample thickness.As long as this
ratio between the bands is maintained, the spectra can be
normalized and a good comparison between spectra can be
Coleman, Patricia B., Practical Sample Techniques for Infrared Analysis, CRC
made. For a spectrum to meet this requirement, each ray of
Press, FSBN# 0849342031: 8/26/93.
The bottom two spectra demonstrate the results of the 1st derivative of a spectrum with a sloping baseline as compared to a spectrum with a flat baseline.
The two spectra in the bottom trace are almost completely overlapped.
FIG. 1
E2310 − 04 (2015)
light of a given frequency must pass through the same amount 6.5 Sample Thick
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.