Data Analysis, Error Estimation & Treatment
1. Systematic and Random Errors
Errors are the inevitable consequence of making measurements, although perhaps a better description would be to use the word 'uncertainty'.
A systematic error is one which occurs with the same sign in every measurement, e.g. times measured on a slow running stop-watch. Finding and eliminating systematic errors is problematic, but it can often be achieved through experience in assessing whether a given result 'makes sense', or by swapping the apparatus.
Random errors result from random fluctuations in the measurement as a result of apparatus or human intervention! Random errors can be assessed by making multiple measurements of the same observable. This not only reveals the presence of random errors, but also allows their magnitude to be estimated.
The terms accuracy and precision are often used synonymously, although this is incorrect.
The term precision is used to describe the statistical uncertainty of our results, i.e. it describes the extent of our random errors.
On the other hand, accuracy describes the extent to which our observations agree with the true value of the observable.
It is therefore possible to have an experiment that is very precise, in that it consistently gives the same result, but inaccurate in that this result disagrees with the correct answer because of systematic errors.
3. Multiple Measurements and Approximating Errors
For a set of multiple measurements, with a Gaussian distribution of errors, the arithmetic mean of the (multiple measurements) data is:
We now need to address the important question of the precision of our estimate of the mean. Obviously, the more measurements we make, the more precise our estimate of the mean should become. One estimate of the error on the mean is the Standard error:
However this by itself does not take into account the number of measurements, it simply calculates the error assuming that the number of measurements represents an accurate sample of how the measured values vary about the true value. Obviously this assumption is more likely to be invalid the fewer the number of measurements. Therefore the error usually cited is the 95% Confidence Level, defined as t95*sm, where the value of t95% depends on (n-1), which is known as the number of degrees of freedom. The value of t95% to use (for your particular degree of freedom) is read from a t-Values Table.
It is not always possible to make multiple observations of a given observable because of, say, lack of time or insufficient quantities of reagents. In these situations, the error on a given measurement may be estimated using common sense, e.g. a distance measured on a metre ruler is arguably precise to ± 1 mm. Such subjective estimates of uncertainty must be made and recorded at the time of measurement; they depend on details of scale marking and on the eyesight and dexterity of the observer, and cannot satisfactorily be made after leaving the laboratory.
4. Recording and Tabulating Experimental Data
When recording data write your observations carefully, and completely. Do not use small scraps of paper. Label columns of data correctly and record ALL your results - nothing which is observed is 'wrong'. Only after careful assessment should a data point be discarded.
Label columns in tables and the axes of graphs by giving the physical quantity, followed by / and the appropriate units,
e.g. Length: L / m Temperature: T / K Surface area: A / m2 Note:
- If the logarithm, square root, or some other function of a quantity is to represented, the units must appear within the argument e.g. ln[p/(Nm-2)].
- In cases where the quantity to be plotted is an inverse, say the inverse of absolute temperature, the axes may be labelled 1/(T/K); a more elegant way of writing this is K/T.
- If very small or very large numbers are measured in an experiment, say times of 0 s, 500 s, 1000 s, 1500 s, ..., then writing all the zeros is time consuming and untidy. Instead the data should be presented as 0.0, 0.5, 1.0, 1.5 and labelled as 10-3 t/s. This means that the figures on the axis are 10-3 x the time in seconds. For example, if the number 0.5 is in a table with the column labelled 104 t/s, then the actual value of this observation is 0.00005 s.
Important Note: Always give the units when quoting a result.
It is pointless to quote a result to more significant figures than the precision of your measurement justifies. For example, if you are measuring a distance using a ruler then a result to the nearest mm is probably the best that can be achieved. (Note : Quote the error with the same number of decimal points as the measurement). Thus: L = (0.090 ± 0.001) m, not: L = (0.09 ± 0.001234) mFurthermore, on using your measurement in an equation to calculate other results, there is no point in writing the new number with a greater number of significant figures than were originally considered justified; just because your calculator gives the extra significant figures, that does not mean you should use them all!
In practice, it is usual to quote as significant figures all the digits that are certain, plus the first uncertain one.
It is recommended that a greater number of significant figures are used in any mathematical manipulation of your data, to avoid rounding errors, but that the final answer again be quoted to the appropriate level.
Graphs can be plotted using the Excel Spreadsheet package on the computers in the laboratory. When plotting your graphs, remember:
- Label the axes in the same way as table columns.
- Display error bars.
- The data should fill the whole page, and not just one corner.
- Try to manipulate formulae to give a straight-line relationship.
- A fitted straight line should fit all experimental points to within their errors.
The analysis of experimental results often involves the drawing of a straight line through the data points. The Linear Regression (or Least Squares) technique allows for the calculation of a such a line, and this procedure is available in Excel (as a Data Analysis option under Tools). In performing a Least Squares calculation, we are making a number of assumptions:
- A linear relationship relationship exists between our two sets of data.
- All points are equally precise.
- The y-data contains all the error.
- There are no obvious outliers in the data set.
Note: If you believe, by inspection of your graph, that one point is an outlier, then it should be excluded from the regression calculation, and a note made to that effect in your write-up.
The final result of an experiment usually depends on the combination of a number of measured properties, each of which has some associated error.The combined error may be calculated by using either mathematical relationships, or worst case scenarios:
- Mathematical relationships
- Worst case scenarios
e.g.
- If y = a / b, where a = 1.2 ± 0.3 kg and b = 7.0 ± 0.6 m3
- Then y = 1.2 / 7.0 = 0.17 kg/m3
- Worst case values of y are (1.2 + 0.3) / (7.0 - 0.6) = 0.23 kg/m3 and (1.2 - 0.3) / (7.0 + 0.6) = 0.12 kg/m3
- The difference between the calculated and worst case values are 0.05 kg/m3 and 0.06 kg/m3
- Therefore, y = (0.17 ± 0.06) kg/m3
'Statistics for Analytical Chemistry'
JC Miller and JN Miller
Second Edition
Ellis Horwood Ltd, 1998
Appendix A: Estimating Volume Errors
TOTAL VOLUME ERRORAppendix B: Estimating Errors on Digital ReadoutsThe total volume error in a piece of glassware is the sum of the (estimated) fill error and the manufacturers tolerance.
Example: If a flask had a cited manufacturers tolerance of ± 0.01 cm3 and the fill error had been estimated as ± 0.04 cm3; then the total volume error would be ± (0.04 + 0.01) cm3 = ± 0.05 cm3
FILL ERROR - SINGLE FILL GLASSWARE (e.g. volumetric flasks, bulb pipettes)
The fill error for this type of glassware should be calculated by estimating the maximum distance the (actual) liquid level could have been from the fill mark and determining what volume this corresponds to.
Example: If for a volumetric flask you estimate that there could be a maxium of 0.05 cm difference between the marked fill level and the actual level of liquid (h) and that the (estimated) diameter of the flask at this point is 1 cm (d):
Error in volume = ± hpd2/4 = ± (0.05 cm)(3.141)(1 cm)2/4 = ± 0.04 cm3.FILL ERROR - GRADUATED GLASSWARE (e.g. burettes, measuring cylinders)
The resolution for apparatus where measurements are made using graduations is defined as the smallest variation you can differentiate with certainty (using the graduations).
Example: If a piece of glassware had graduations every 2 ml, then: If you could measure the level to the nearest graduation - The resolution would be 2 ml; If you could measure the level to half-way between graduations - The resolution would be 1 ml; If you could measure the level to the nearest quarter between graduations - The resolution would be 0.5 ml; etc.
Having established the resolution, the fill error is half the resolution.
MANUFACTURERS TOLERANCE (or ERROR)
Most items of glassware list an uncertainty on them. This is a manufacturers tolerance. It typically means that even if the item were filled "perfectly" to a fill point with pure water at a specific temperature (usually 20oC), there would remain this uncertainty in the volume.
If no manufacturers tolerance is listed on the glassware, the value can be looked up using a Table of Standard Values. If neither the manufacturers tolerance or the Class be shown, it should be assumed to be Class B.
Note that manufacturers tolerances should be ignored if estimating volume errors associated with beakers or conical flasks. For these items the fill error will be so large in comparison to the manufacturers tolerance, that the manufacturers tolerance can be ignored (i.e. assumed to be negligible.
If the reading is fluctuating, estimate the range over which it is fluctuating. The actual value should be taken as the mid-point of that range, with the associated error being half the range.Appendix C: Estimating Errors on Spectra
Example: If the reading on a balance was fluctuating between 1.017 g and 1.021 g - The value recorded should be ½(1.021+1.017) ± ½(1.021-1.017) = 1.019 ± 0.002 g.If the reading is stable: The resolution of the display is the smallest amount the display can change by and the error is half of its resolution.
Example: If a display reading was stable (at 1.08) and the smallest the display could change by was 0.01, then the error in the reading would be ± (0.01/2) = ± 0.005.
When spectra are obtained it is typically desired to record the position of peaks on the spectrum, along with their associated error. In such cases, the error should be assumed to equal half the spectrum resolution. The spectrum resolution is the difference between consecutive x-values (at the point where the peak is measured).
Exercises
Click here to attempt some exercises based on the material given above.