I have been a long time fan of The Charles Smith Blog. It is firmly rooted in my Feedly RSS reader. It should be one of your regulars too. It always provides for great summaries of the forensic science stories that are in the news.


For example here is his great treatment on the closing of the Oregon State Police Handwriting section:

STORY: “Oregon State Police close handwriting lab after investigation of bias, sloppiness,” by reporter Brian Denson, published by The Oregonian on February 14, 2014.

PHOTO CAPTION:  “Handwriting examiners in the Portland Metro Forensic Laboratory, in Clackamas County, used video spectral comparators for the painstaking process of matching suspects’ known writing samples to those believed to have been used in the commission of crimes. The unit is now shut down.”

GIST: “Oregon State Police officials have quietly shut down their handwriting analysis unit, investigated for lapses in quality control, and laid off their last two examiners at a time when fewer crime cases require the work. The department now farms out handwriting analyses to the Washington State Patrol’s forensics laboratory and, in some cases, FBI examiners, according to Lt. Gregg Hastings, an OSP spokesman. Hastings confirmed that the Questioned Documents Unit, the formal name of the handwriting examination group, formally closed on Dec. 7, 2012, nine months after OSP officials suspended the unit’s examinations. Oregon State Police had faced the possibility that the findings of its two full-time examiners had caused problems with criminal prosecutions. This forced them to confront a whodunit under the roof of their own forensics lab. The Oregonian first reported on the controversy in late 2012, after state police discovered that one of its two handwriting examiners committed a serious error in a suspected murder-for-hire case. Internal reviews of work in the handwriting examination unit, obtained by the newspaper through open-records requests, detailed allegations of bias, sloppy work, and dishonesty………State police officials looking into miscues in the handwriting unit in 2012 identified 45 cases that required outside reviews by qualified handwriting examiners. They notified prosecutors handling the cases. Hastings said this week that it appeared no criminal cases were significantly altered or harmed by the findings. Killing the Questioned Documents Unit prompted the layoffs of its last two full-time analysts: Ron Emmons and Christina Kelley. The two had been on paid administrative leave in the $66,000-a-year jobs until state police formally closed their unit.”

According to several sources, the very first use of handwriting examination in the courtroom happened in 1792 trial Goodtitle Drevett v Braham reported at 100 Eng Rep 1139 (1792). As a science endeavor, the validity of handwriting examination for conclusions as to source has been majorly under attack ever since the 2009 National Research Counsel National Academy of Science report “Strengthening Forensic Science in the United States: A Path Forward.” We find in the NAS report the following apt summary of the discipline as typically practiced in the United States today:

The examination of handwritten items typically involves the comparison of a questioned item submitted for examination along with a known item of established origin associated with the matter under investigation. Requirements for comparison are that the writing be of the same type (handwritten/cursive versus hand printed) and that it be comparable text (similar letter/word combinations). Special situations involving unnatural writing are forgery (an attempt to imitate/duplicate the writing of another person) and disguise (an attempt to avoid identification as the writer). The basis for comparison is that handwriting/handprinting/numerals can be examined to obtain writing characteristics (also referred to as features or attributes). The characteristics are further classified into class characteristics (the style that the writer was taught), individual characteristics (the writer’s personal style), and gross/subtle characteristics. Specific attributes used for comparison of handwriting are also referred to as discriminating elements, of which Huber and Headrick have identified 21. Comparisons are based on the high likelihood that no two persons write the same way, while considering the fact that every person’s writing has its own variabilities. Thus, an analysis of handwriting must compare interpersonal variability—some characterization of how handwriting features vary across a population of possible writers—with intrapersonal variability—how much an individual’s handwriting can vary from sample to sample. Determining that two samples were written by the same person depends on showing that their degree of variability, by some measure, is more consistent with intrapersonal variability than with interpersonal variability. Some cases of forgery are characterized by signatures with too little variability, and are thus inconsistent with the fact that we all have intrapersonal variability in our writing.

Scientific Interpretation and Reporting of Results
Terminology has been developed for expressing the subjective conclusions of handwriting comparison and identification, taking into account that there are an infinite number of gradations or opinions toward an identification or elimination. Several scales, such as a five-point scale and a nine-point scale, are used by questioned document examiners worldwide.
The nine-point scale is as follows:
1. Identification (a definite conclusion that the questioned writing
matches another sample)
2. Strong probability (evidence is persuasive, yet some critical quality
is missing)
3. Probable (points strongly towards identification)
4. Indications [that the same person] did [create both samples] (there
are a few significant features)
5. No conclusion (used when there are limiting factors such as disguise,
or lack of comparable writing)
6. Indications [that the same person] did not [create both samples]
(same weight as indications with a weak opinion)
7. Probably did not (evidence is quite strong)
8. Strong probably did not (virtual certainty)
9. Elimination (highest degree of confidence)

Boy doesn’t that all sound wonderfully subjective?

All you have to do is to consider the following example and ask did this come from one person or more than one?

The above comes from the FBI review article “Handwriting Examination: Meeting the Challenges of Science and the Law.” There is clear variation in the four samples. Is the variation due to the person or due to it being different people? This is a basic question. This is not even close to the more complex question that the typical handwriting examiner is asked to make and offers in court every day which is the attribution to a particular source. When dealing with a true unknown sample whose source is truly not known such as the typical seized and submitted sample (e.g., a bank robbery note), the source of this variation as illustrated above drives the validity of the opinion as to the capability of attributing it to a sole source (or not). In this particular case because it is not an unknown, meaning that the person writing these four “Samantha Scott Smith” entries was seen by the examiner, it was easy for them to conclude that it was simple intra-personal variation (same person, just “natural” variation”), and not inter-personal variation (coming from two or more people). This is Figure 4 of the review article and is entitled “Four signatures written by the same individual, demonstrating variation.”

The major tenets that the theory of handwriting analysis can be used to attribute source stands upon is as follows:

  1. that each person’s handwriting is unique (They even claim that unlike DNA, identical twins will write differently), and
  2. that a given person’s handwriting is also relatively stable and changes little over time.

The ultimate thought is a sample of a person’s handwriting can be compared to that of a similar style handwritten document (meaning cursive to cursive and print to print) to determine and authenticate the written document’s writer; if the writing styles “match,” it is likely that one person wrote both documents. Unlike other parts of Question Document Examination where chemistry is involved, ultimately the typical handwriting expert is making a value judgment of “sufficient similarity” based upon that examiner’s training, knowledge and experience. Although there are published standards for this interpretation such as ASTM E2290 – 07a (Standard Guide for Examination of Handwritten Items), even that standard calls for a very high degree of subjectivity and variability.

In the typical crime laboratory, the analysts are not using more empirical methods such as the CEDAR-FOX system. Even when the FBI wanted to defend the science of handwriting analysis it relied not exclusively upon the human interpretation as much as the human interpretation guided by the CEDAR-FOX system as one can see in the review article cited above.

In the typical crime laboratory, it is the lack of relying on more objective means of evaluation such as the CEDAR-FOX system, the lack of the proof of the tenets that supposedly ground handwriting analysis, and the lack of objective criteria that is not subject to inter-rater or intra-rater variability that makes the opinion expressed very vulnerable to bias. Hence why the state of Oregon ultimately did the correct thing and abolished the handwriting unit in its state crime laboratory.

Despite all of this, in the last major examination of its validity under the Daubert standard, in United States v Prime, 431 F . 3d 1147 (C.A. 9 , 2005), it survived scrutiny. Certainly this forensic science discipline is subject to legitimate attack in the courtroom.

Although he was talking about graphology, what best sums up the truth of handwriting analysis as practiced in a good amount of crime laboratories can be summed up by a quote from Barry L Beyerstein, Ph.D when he said “[T]hey simply interpret the way we form these various features on the page in much the same way ancient oracles interpreted the entrails of oxen or smoke in the air. i.e., it’s a kind of magical divination or fortune telling where ‘like begets like.”


In the past, we have blogged on the severe limitations on pattern recognition as a forensic science discipline. (Pattern Recognition is it Science or an Art?). In fact, the National Research Council of the National Academy of Sciences pointed to one form a pattern recognition as being most problematic: forensic odontology.

Most broadly defined, it is the practice of applying dental principles to the legal world. It can be used in mass disaster events to help identify the dead. In the courtroom, forensic odontology is predominantly in the form of bite mark evidence.

Forensic odontology has been with us as admissible evidence in the United States since 1849 which is nearly 50 years longer than fingerprints. J.W. Webster was convicted for the murder of George Parkman after his incinerated remains were identified by Nathan Cooley. Cooley was a dentist who had made partial dentures for Parkman. Cooley came into court and identified the charred remains of the body and concluded source because he physically took the dentures and placed them into the casts. He thought they were a “perfect match.” It was the identification of the body that was accomplished by the comparison of these dentures to the casts alone that lead to conviction. Webster was put to death. A more famous case of where forensic odontology was used is that of serial killer Ted Bundy who left a bite mark on the buttock of a victim, which helped secure his conviction in 1978. He too was executed by the government.

Like the application of most pattern recognition disciplines there are large issues.

  • In the courtroom there are no specific requirements for practicing forensic odontology. In fact, one does not even need to be a practicing dentist.
  • The analysis undertaken is largely arbitrary and subject to the whim of each examiner. There are no universally accepted or practiced protocols or instructions as to how the analysis must be undertaken (no standards).
  • There is insufficient study that the techniques used can correctly identify specific or unique source, meaning that the marks left (the unknown) can be traced and attributed uniquely to one specific source.
  • A person’s dental profile changes over time.
  • Frequently there is unequal application of force in the real world. The bite itself made in the real world and in uncontrolled conditions is totally different than the exemplars that are taken with equal application of force and under controlled conditions.
Wax Exemplar of Ted Bundy

Wax Exemplar of Ted Bundy

  • If the impression that is the unknown is left on skin, the medium (the skin) can change over time and as the bite-mark heals or the body decomposes, and therefore distort the original impression left. What few studies that have been completed were not done on humans, but rather on pigskin. Pigskin and human skin behave in dynamically different ways due to differences in elasticity.
Bite mark impressions can change over time

Bite mark impressions can change over time

  • Each dentition can produce variable impressions and can change based on pressure and surface of contact.
Different impressions left based upon application of force

Different impressions left based upon application of force

  • It is not highly regulated or monitored and has virtually non-existent Quality Assurance safeguards.

In fact, as reported in the New York Times story “Evidence From Bite Marks, It Turns Out, Is Not So Elementary,” the rate of false positives is alarming (up to 65 percent as referenced in the article that refers to a study)

Critics of bite mark comparison cite the case of Ray Krone, an Arizona man convicted of murder on bite mark evidence left on a woman’s breast. He was 100% positively identified as being the only source for the bite mark. He was later exonerated by DNA. Similarly, Roy Brown was convicted of murder due in part to bite-mark evidence, and freed after DNA testing of the saliva left in the bite wounds matched someone else.

The very basic technique and analysis employed by most forensic odontologists is as follows:

  • Bite marks are photographed with a scale
  • Bite marks on skin are taken over repeated intervals
  • Casts of impression are taken
  • Impressions are traced onto transparencies
  • Casts of suspects teeth are taken
  • Comparison between suspect cast and bite mark

And then there are cases of downright failures of human integrity where fraud is committed such as this story:

Video Shows Controversial Forensic Specialist Michael West Fabricating Bite Marks

According to that report:

On Aug. 9, The Huffington Post reported on the case of Leigh Stubbs, a Mississippi woman serving a 44-year sentence for assault and drug charges. Stubbs was convicted in large part due to the testimony of Michael West, a disgraced bite mark specialist. Though West has been largely discredited, prosecutors and state officials in Mississippi (and to a lesser extent in Louisiana) continue to defend convictions won based on his testimony.

In Stubbs’ case, West presented two key pieces of evidence. The first involved the bite mark wizardry that made him famous, and then infamous: West claimed to have found bite marks on alleged victim Kim Williams that medical personnel hadn’t seen. He then used a dental mold of Stubbs’ teeth to perform an analysis on the marks, and would later testify that it was a “probability” that Stubbs had bitten Williams.


On Wednesday, forensic specialists Mike Bowers and David Averill posted a video recording of West’s examination of Williams on their site, Bitemarks.org. In his initial examination, West claims to have “missed” the evidence of a bite mark. He testified he found it in a a subsequent examination performed days later. That examination is depicted in the video below. Note that at the 50-second mark, a bite mark appears in Williams’ skin, seemingly out of nowhere.


In a series of posts, we are going to talk about Mass Spectrometry.

  1. Introduction-The different configurations and the Electron Impact process
  2. What types of mass analyzers are there?
  3. What type of detectors are there?
  4. What types of analysis can be done?
  5. How do you read the output?
  6. How do they come to a qualitative measure using software?
  7. How do they quantitate the results?
  8. Do you need chromatography if you are using Mass Spectrometry?
  9. Other topics of interest about GC-MS

There seems to be a debate, more like a scientific war, between spectroscopists and chromatographers. It boils down to this fundamental question:

Does co-elution matter if one uses Mass Spectrometry?

Well, the answer is yes, of course. Here is why…

  1. If it were indeed true that we do not need well resolved (separated) and specific peaks before we use mass spectrometry (i.e., co-eluting peaks don’t matter), then we would not waste our time with the chromatography aspect of the gas chromatograph. The GC part of the GC-MS method takes the most amount of time using this technique, and if we could cut that out completely and perform instead what is called a direct introduction (DI) or direct interface (DI) probe into the MS alone, then we would increase throughput tremendously. We could test so many more samples. But we don’t and for good reason. One of the reasons that we just do not perform DI and we need chromatography with need well resolved peaks is that it is very easy to use too much sample in the DI method system.

    DI to the MS

    DI to the MS

  2. If you are testing pure compounds, then DI may be a very useful technique. It is fast and rugged requiring very low sample size. However, in our world, the forensic world, the chances of either getting a pre-consumption unknown drug in pure form is extremely rare. Further if the sample is in post-consumption matrix (e.g., blood, tears, sweat, urine, blood), then we know that the sample is not pure.
  3. Professors Harold McNair, PhD and Fred W. McLafferty, PhD as well as Dr. Marvin C. McMaster and Dr. Lee Polite warn against introducing into MS analysis anything other than pure compounds (one way to get pure compounds is through the use of a GC). Professor McLafferty in Interpretation of Mass Spectra wrote as follows: “If several compounds are present in the sample, the resulting spectrum will represent a linear superposition of the competent spectra.” Dr. McNair puts even more simply, “Just don’t do it. Use the advantages of good chromatography first, then you have little chance of error in the reporting of your results.” Another authority in the field Dr. Marvin C. McMaster writes “A mass spectrometer is an excellent tool for clearly identifying the structure of a single compound, but it is less useful when presented with a mixture.” He further writes “A good chromatographic separation based on correct selection of injector type and throat material, column support, carrier gas and oven temperature ramping, and a properly designed interface feeding into the ion source can make or break the mass spectrometric analysis.” He concludes, “The mass spectrometer is designed to analyze only very clean materials.” Another noted international instructor for Agilent, Dr. Lee Polite, PhD, MBA writes, “If you want to be sure and you are in the business of being sure, then separation first always before MS work.”
  4. The other issue is human integrity. While there are a lot of analysts who have high standards for themselves. Some really care about what they do and want high quality of their results. However, there are some that do not share that vision or care. In the worst case, there is fraud. The issue of co-elution of the GC into the MS invites issues surrounding human integrity.

It is a question of could versus should. Could you perform MS without GC or use GC in a way that doesn’t resolve peaks and not prove for a purified compound into the MS for analysis? Sure. You clearly can. BUT, will you be right in your result? Possibly not. Clearly best practices would be to use the powerful tool of GC as it is intended and as it is designed which is to provide for purity and specificity in the effluent. Why would you invite or promote the possibility of error if you did not have to? Why would you invite or promote the need for human integrity. Why if it is not necessary???

As the video above shows us, there is always clearly a”human factor” in all of this analysis. In fact, there is a lot! To a degree, we are left to the discretion of a human being. Scary.

The reporting that is provided is just a small sliver of what can be provided to reviewing individuals. For example, what reviewing counsel and experts typically get are a one sheet conclusion piece of paper.

Here is a typical conclusory report that a defense attorney may get. As you can see no detail, just a conclusion.

Conclusory Report

Conclusory Report

Conclusory Report

Conclusory Report

Here is a typical auto-report from GC-MSD Agilent software. Again, not a lot of detail is provided.

But we can get a lot more information from the GC-MSD software such as these from Agilent:

And we can get significantly more information from the NIST search software simply by right clicking on the screen below such as these reports:

A legend of the graphic user interface

In a series of posts, we are going to talk about Mass Spectrometry.

  1. Introduction-The different configurations and the Electron Impact process
  2. What types of mass analyzers are there?
  3. What type of detectors are there?
  4. What types of analysis can be done?
  5. How do you read the output?
  6. How do they come to a qualitative measure using software?
  7. How do they quantitate the results?
  8. Do you need chromatography if you are using Mass Spectrometry?
  9. Other topics of interest about GC-MS

I am being very specific with my language here when the question posed is “How do THEY come to a qualitative measure using software?” and is not the question of “How SHOULD you come to a qualitative measure using software?”

Let’s look at how THEY do it first….

In this blog, we have posted on this particular topic before. Therefore, I would encourage you to review that post now, then come back to this post. Professor McLafferty of Cornell once wrote in his book Interpretation of Mass Spectra, “The mass spectrum shows the mass of the molecule and the masses of pieces from it. Thus the chemist does not have to learn anything new– the approach is similar to an arithmetic brain-teaser.”

Mass Spectrometry is only computer assisted pattern recognition

Now, presuming you have reviewed the above earlier post, we can look at some of the most amazing parts of MS work (at least to me) and see how really subjective it really is. It is truly open to interpretation.

When we just simply run the NIST spectral library searches, we have some issues of concern.

The Graphic User Interface for GC-MS work
The Graphic User Interface for GC-MS work

As it comes to pass that all that they do when it comes to the qualitative measure in GC-MS that is EI-based is perform computer assisted pattern recognition, then we need to be sure that the standard the unknown is compared against is from an unimpeachable source. Is it a traceable library of the spectra or is it local forced integration that is anecdotal in nature and therefore not independently adjudicated?

To try to solve this uniformity problem and to try to homogenize the standards, Professor Fred McLafferty of Cornell University in 1995 began to get his colleagues together and collect spectra. This resulted in the first EI-based mass spectrometry library that later was converted to what we now enjoy as the NIST/EPA/NIH Mass Spectral Library. The most current version of the EI Library in the NIST ’08 includes:

  • mainlib (main EI MS library)=191,436
  • replib (replicate spectra)=28,307
  • nist_salts (EI Salt Library)=717
  • nist_msms (MS/MS) Library=14,802 of which 3,898 are positive and 1,410 are negative
A legend of the graphic user interface
A legend of the graphic user interface

On the graphic user interface as shown above, there are three very important places that we want to know about in order to judge the validity of this qualitative measure: the text info for the selected hit spectrum, the hit histogram, and the hit list.

All of these three key areas are affected by the particular type of search algorithm that is conducted. Is it an identity search or a similarity search?

  • An Identity search is designed to find exact matches of the compound that produced the submitted spectrum and therefore presumes that the unknown compound is represented in the reference library.
  • A “Similarity” search is optimized to find similar compounds and is intended for use when a compound cannot be identified by the “Identity” search.

What happens in the real world is that the analyst looks may look at the Hit Histogram and then the Hit List. They generally ignore the other screens and rarely look at the Text Info for the Selected Hit Spectrum, although it is perhaps the most important window.

  • The Hit List has different columns of information. The Match Factor is an arbitrary unit number where a perfect match is 1000. It is the comparison between the unknown and the library (direct match). The probability value for a hit is derived assuming that the compound is represented by a spectrum in the libraries searched. It only employs the difference between adjacent hits in the hit list to get the relative probability that any hit in the hit list is correct. While many state scientists discount the probability value, it is arguably, the most “true” (correct) value that we can use to judge the value of the qualitative judgment call. Although there is no written criteria that is universal, it is thought that a match score of 950 or greater is considered an excellent match; 900-950 is a good match; 800-900 is a fair match; less than 800 is a very poor match. The Hit List can be thought of like a ranked top 100 list of compounds that the computer think the unknown is.
  • The Hit Histogram is the number of hits vs. their Match Factors. It is displayed in the pane located just above the Hit List.
  • The “Text Info for the Selected Hit Spectrum” will give us the name of the compound, its diagnostic ions, its CAS registry number, its NIST number (if it is a NIST traceable spectrum) and most importantly the source of where the standard comes from.

The NIST Spectral Library search is very popular. It is a shortcut to thinking through the fundamentals that mass fragmentation is based upon which is nothing more than good old fashioned acid-base chemistry. Exclusive reliance upon this simplified method of analysis as a means of identification can lead to an improper answer. There has always been an upstream-downstream problem with the NIST spectral library where over the years inferior spectra have made their way to the official NIST library. In fact, Professor Mclafferty noted in his book Interpretation of Mass Spectra “[O]ver the last decade approximately 60,000 errors have been corrected in the reference file [of the NIST library].” Just consider the below as an example:

Upstream-downstream problems exist in the NIST spectral library
Upstream-downstream problems exist in the NIST spectral library

At least one organization has published that simple reliance on any mass spectral library is insufficient in and of itself. SOFT/AAFS Forensic Toxicology Laboratory Guidelines 2006 version in 2006 in section 8.2.10 we find the following language:

8.2.10 In routine practice, interpretation of GC/MS-EI full scan mass spectra is performed by the instrument’s software as a semi-automated search against a commercial or user-compiled library. The quality of the match or “fit” may be aided by the factor that is generated, either as a ratio or percentage, where 1.0 or 100% are “perfect” matches. However, such “match factors” must be used as guides only and are not sufficiently reliable to be used as the final determinant of identification. Final review of a “library match” must be performed by a toxicologist with considerable experience in interpreting mass spectra; experience and critical judgement are essential. Interpretation, at a minimum, should be based on the following principles:
For a match to be considered “positive”, all of the major and diagnostic ions present in the known (reference) spectrum must be present in the “unknown”. Occasionally, ions that are in the reference spectra may be missing from the “unknown” due to the low overall abundance of the mass spectrum. If additional major ions are present in the “unknown” it is good practice to try to determine if the “extra” ions are from a co-eluting substance or “background” such as column bleed or diffusion pump oil. Examination of reconstructed ion chromatograms of the suspected co-eluting substance relative to major ions from the reference spectrum will help to determine this.

[Thank you to Dr. Stefan Rose, MD for pointing out the reference to the SOFT/AAFS guideline.]

Early in the same standard we find the following language:

8.2.9 Where mass spectrometry is used in selected ion monitoring mode for the identification of an analyte, whether as part of a quantitative procedure or not, the use of at least one qualifying ion for each analyte and internal standard, in addition to a primary ion for each, is strongly encouraged where possible. Commonly used acceptance criteria for ion ratios is ±20% relative to that of the corresponding control or calibrator. However, it is recognized that some ion ratios are concentration dependent and that comparison to a calibrator or control of similar concentration may be necessary, rather than an average for the entire calibration. Ion ratios for LC/MS assays may be more concentration and time dependent than for GC/MS and therefore acceptable ion ratio ranges of up to ±25% or 30% may be appropriate.

However, the ultimate question is: “How SHOULD we come to a qualitative measure using software?”

The answer is simple and old: Acid-base chemistry approach as articulated by Drs. Fred W. McLafferty and Frantisek Turecek in their book Interpretation of Mass Spectra. In this book they set out a “Standard Interpretation Procedure” which provides a useful and universal standard of how to use the powerful tool of Mass Spectrometry correctly to arrive at a qualitative measure. The “Standard Interpretation Procedure” is as follows:

    1. Study all available information (spectroscopic, chemical, sample history). Give explicit directions for obtaining spectrum. Verify m/v assignments.
    2. Using isotopic abundances where possible deduce the elemental composition of each peak in the spectrum; calculate rings plus double bonds.
    3. Test molecular ion identity; must be highest mass peak in spectrum, odd-electron ion, and give logical neutral losses. Check with CI or other soft ionization. (Don’t just rely on the EI-based GC-MS result)
    4. Mark “important” ions: odd-electron and those of highest abundance, highest mass, and/or highest mass in a group peaks.
    5. Study general appearance of spectrum; molecular stability, labile bonds.
    6. Postulate and rank possible structural assignments for:

(a) important low-mass ions series;
(b) important primary neutral fragments from M+ indicated by high-mass ions (loss of largest alkyl favored) plus those from secondary fragmentations indicated by CAD spectra;
(c) important characteristic ions.

  1. Postulate molecular structures; test against reference spectrum, against spectra of similar compounds, or against spectra predicted from mechanisms of ion decompositions.

Although there is not formally an eighth step Professor McLafferty reminds us that there is another important aspect to remember in all of this if we want to make sure that we are reporting out valid and high quality results with great confidence. He writes:

Remember that abundance values can vary by more than an order of magnitude between instruments; so you must measure a reference spectra of the postulated compound under the same instrumental conditions used with the unknown in order to have high confidence in the answer.

Later he writes:

“The most reliable match is obtained by running the unknown and reference mass spectra under closely identical experimental conditions on the same instrument.”

So even the “father” of the NIST library requires there to be verification by running reference materials to confirm spectrum are in fact repeatable under the specific conditions of the particular instrument. Dr. Marvin C. McMaster has plainly expressed the same notion in the following thoughts:

One of the problems with spectral library databases is that some of their structures are inaccurate or just plain wrong. The original interpretation of their structures may have been incorrect or mistakes may have been made in entering them. The previous Wiley/NIST spectral database with 225,000 compounds was thought to have up to 8% incorrect structures.

There are many aspects of GC-MS that can lead to erroneous identifications beyond this upstream-downstream problem such as:

  1. Were the tuning conditions used to prepare the reference standard in the library the same or different than the one that the unknown was run upon?
  2. The unknown and the reference spectra could have been run on a different type of mass spectrometer with different mass linearity. For example, some of the spectra in the library are so old that they were run on magnetic sector instruments rather than quadrupoles.
  3. The reference standard in the library may not have been from a pure compound. If the reference spectra in the library were run on impure compounds, then there will likely be additional fragment peaks due to the overlapping and the lack of resolution between compounds.

GC-MS work should be reserved to highly scientifically trained credentialed scientists who have a fundamental and strong understanding of chemistry, not someone who can simply follow a procedure, right double click and read a screen without giving any thought to it all.