Independent Consultants in Environmental and Forensic Chemistry
Volume 3, Issue 4 Fall 1999
How can you be sure your results are accurate?
One of the toughest questions to answer is whether or not the result from an analytical test is accurate. That is, how close is the result to the true answer? However, the problem with an environmental sample is that the true answer is not known. Analytical chemists have many precautions and techniques available to them to estimate the accuracy of a result.
To monitor the accuracy of analyses, QA/QC programs should require the introduction of QC samples into the analytical scheme. These QC samples routinely include matrix spikes, blank spikes, and laboratory control samples (LCS). The Atrue@ answer for each of these types of samples is known so that the analyst knows immediately if accuracy is a problem. However, unless the LCS is from an external source, each of these types of samples is generated internally, and the laboratory can be very consistent in its inaccuracy. This consistency would lead the laboratory to believe that their results are accurate when, in fact, they may not be.
Another means of monitoring the accuracy of environmental analytical measurements is the proficiency testing (PT) or performance evaluation (PE) sample. A PT sample is a sample of the matrix (air, water, soil, biota) with a known amount of the analyte present. A PT sample can be Asingle blind@ or Adouble blind.@ A Asingle blind@ PT sample is a sample the laboratory recognizes as a test but still does not know the correct answer for the analysis. With a Adouble blind@ PT sample, the laboratory does not know how much analyte is present in the sample nor do they know that it is a PT sample. In concept, either PT sample is a powerful tool for monitoring the accuracy of a laboratory=s measurements. However, in practice, a single blind PT sample is often handled differently than other samples in that it is analyzed multiple times accompanied by many other QC samples to ensure that the correct answer is obtained.
Consequently, to assess the accuracy routinely provided by a laboratory for field samples, a double blind PT sample can be introduced into the sample train in the field.
For example, a double blind PT sample for PCB 1242 can be produced using soil(s) from the site. The soil can be homogenized and PCB 1242 added, if necessary. The homogenized soil is then analyzed for PCB 1242 to determine a statistically valid concentration with an error term. The concentration(s) of interest or concern are targeted (e.g., 10 ppm). Other concentration levels are recommended so that the laboratory does not easily determine they are being tested. Therefore, 5 mg/kg, 25 mg/kg and PCB-free soil could be produced as PT samples as well. These PT samples would be added to the sample loading in the field with chain of custody documentation as if the PT samples were true field samples. Sample identifications for the PT samples should be consistent with other actual field samples to further disguise their presence.
Results for the PT samples should be obtained early in the project when any problems can be quickly corrected. Obtaining PT sample results at the end of a project will be of little use in identifying problems early and preventing continued inaccuracies during the analyses. The use of double blind PT samples to monitor the laboratory=s accuracy may, in the long run, save time, money, and aggravation.
Are field analytical measurements any good?
Historically, the perception has been that measurements made in the field are not as accurate or precise as those measurements made in a permanent or fixed laboratory. This is a misconception. No reason exists to automatically reject measurements made in the field or to believe that they are inferior to measurements made in a fixed laboratory. The additional handling of samples to package and ship them to a fixed laboratory increases the probability that artifacts may enter into the measurement results. On-site analyses minimize the handling of samples and would give more representative results of actual conditions.
Frequently, field measurements are considered inferior to fixed laboratory results because they lack the necessary supporting documentation and controls that are routinely associated with a fixed laboratory. However, this is by design. The results were meant Aonly for screening.@ Chains of custody are not necessary. Full (or any) instrument calibration is not necessary. Standing operating procedures are not necessary.
Quality control samples are not necessary. Standard reference materials are not necessary. Instrument maintenance is not necessary. Operator training is not necessary. After all, the results are only for screening, and critical decisions will not be based on the results. Then why bother doing them?
The truth is that decisions will be made based on the results of the field measurements. Frequently, the argument is that the measurement only has to be close, not precisely accurate. But how close is close and how do you know how close you are? If the operator has had inadequate training, how do you know he knows how to do the test at all? If a chain of custody is not generated, how do you even know that you have analyzed the right sample? And how about those times that you did field measurements that you later wished were more accurate?
On-site measurements can be made and documented properly with little extra effort. They can be fast, accurate, precise, cost effective, and defensible in a court of law. If good analytical practices are implemented, field measurements can be at least as good as, if not better than, those generated in a fixed laboratory. This means that all of the quality assurance (QA) parameters must be the same as those of a fixed laboratory for the same measurement.
Is your laboratory playing by the rules?
At the Waste Testing and Quality Assurance (WTQA) symposiums in 1998 and 1999, Joseph Solsky (USACE) gave insightful and sobering presentations about Aquestionable@ practices in environmental analytical laboratories. Although these questionable practices do not rise to the level of fraud, they can have profound effects on the real and perceived quality of the data reported by the laboratory, i.e., the data that you use to make decisions about your site, the data that you present to the regulatory agency, the data that you count on!
For example, many of today=s analyses are conducted in accordance with EPA=s Test Methods for Evaluating Solid Wastes (SW-846). A method states that, to calibrate an instrument, Aa minimum of five points should be used to establish the initial calibration curve.@ The laboratory analyzes eight standards... and then chooses the five Abest@ results to establish the linearity of the curve. Is this compliant with the method? Yes. Is it technically sound? Yes, if the excluded standards are at the upper and/or lower end of the curve thus limiting the linear range. However, excluding a valid data point from the middle of the curve because it doesn=t Afit@ is, quite simply, technically wrong. Omitting that point results in a perceived linearity contrary to the evidence. But it is still compliant with the Arules.@
For continuing calibration, the method says Aif the average of the [percent difference (%D)] responses for all analytes is within 15%, then the calibration has been verified.@ If a continuing calibration standard with 15 target analytes has %Ds for 10 of them at 5%; two at 10%; one at 20%; one,at 40%; and one at 60%, the average %D is 12.7%. The laboratory can report that the calibration is verified, right? According to the method language, yes! But if the target analyte with a %D of 60% is the key contaminant at your site, you would have a false confidence in the reliability of your results. The average %D is meaningless with respect to any single analyte - only the individual %D can tell you about the accuracy of the response for the compound that is of concern to you.
In gas chromatography (GC), a compound is identified by its retention time (RT) - the time elapsed from introduction of the sample into the GC to the appearance in the detector. A response must be observed within a specific time frame or window in order to identify a particular compound. These windows are established by the laboratory for each analyte by, according to the method, analyzing at least three standards over a three day period. The average and standard deviation for the RT is calculated. The window for the analyte is plus or minus three times the standard deviation or plus or minus 0.03 minutes (total window width of 3.6 seconds), whichever is greater. If the RT windows are too narrow, the likelihood of false negatives (i.e., reporting an AND@ when an analyte is actually present) is increased. More commonly, the RT windows are too wide, and false positives (i.e., reporting a positive result when the target analyte is not present) result.
RT windows as wide as "0.30 minutes (total window width of 36 seconds) have been observed. Not surprisingly, many false positive results have been reported under these circumstances. Frequently, the analyst simply relies on the data system print-out and reports whatever it says, even in GC/MS analyses where a mass spectrum is available to verify (or refute) the reported identification.
SW-846 methods are designed to provide guidance to qualified and experienced analytical chemists who will implement them with integrity, attention to detail, and common sense. All too often, they are performed by analytical technicians who try to simply Afollow the rules@ without understanding how the method works. When asked to explain practices like those described here, an all-too-common response is Abut that=s how everyone does it@ or Athe method doesn=t say I can=t do it that way.@
As we move away from Acookbook@ methods (like the contract laboratory program methods) and closer to performance based measurement systems (PBMS), these kinds of questionable practices can be expected to increase. To avoid them (or to deal with them, as the case may be) you, too, may need to learn the rules of the game.
Would you like a date with this model?
Ever since the desktop personal computer (PC) became available for business applications, the industry has advanced to a point where complicated calculations can be performed by a beige box on, or under, your desk in a matter of minutes, if not seconds. The environmental field with its reams of geological, hydrological, and chemical data was particularly susceptible to the use of the PC for trying to make practical sense of the collected data. The ability of the PC to rapidly evaluate numerous Awhat-if@ situations and to graphically display the results on a screen has allowed the environmental professional to not only show the current state of contamination at a site but also extrapolate to what was and what will be. Or is it more like what could have been or could be? Or is it none of the above?
These extrapolations are models, a form into which the known data fit. In the environmental field, these models are often developed and applied to groundwater flows to determine where and when a contaminant release occurred (age-dating) or when it will reach a given location at a given concentration. These projections are oftentimes based on a set of data collected at a single given point in time, i.e., when the samples were collected, when water level measurements were taken, when pump tests were conducted, etc. What we have is a model for which we have the solution at one point in time. To obtain the solution at another point in time, some assumptions must be made.
For example, if we are attempting to model the flow of groundwater contamination, one of the first assumptions is that the groundwater flow has not changed with time. Changed from what? From the current flow rate? How do we know what that is?
Estimates of groundwater flow direction and velocity are based on a variety of factors including groundwater elevations, soil conductivity, leaks to other aquifers or bodies of water, etc. For a given site, the sophistication of the model, an average conductivity could be used for the entire site or the site could be segmented around each determination of the conductivity. The accuracy of the model then is dependent on how well we can define the conductivity over the site. Even given multiple determinations of conductivity, the modeler must still assign an area over which that conductivity is operative. Conceivably, the modeler could assume a conductivity that was never measured in order to obtain a model that fits the observed results. These assignments can greatly affect the resultant accuracy and depiction of current groundwater flow as well as any extrapolations to the past or the future.
The use of computer models to assist in the elucidation of environmental problems is a powerful tool. However, limitations to the technique still exist. Extreme caution must be exercised in relying too heavily on the output unless the assumptions, and particularly the bases and effects of those assumptions, are completely understood. Perhaps what should accompany all models is a list of the assumptions made, no matter how trivial, to develop the model. Also, we must remember that a model is a scenario consistent with available data but that scenario may not be, and is probably not, unique. GIGO still stands for Agarbage in, garbage out,@ not Agarbage in, gospel out.@