Making Sense of Attribute Gage R&R Calculations
Measurement error is unavoidable. There will always be some measurement variation that is due to the measurement system itself.
Most
problematic measurement system issues come from measuring attribute
data in terms that rely on human judgment such as good/bad, pass/fail,
etc. This is because it is very difficult for all testers to apply the
same operational definition of what is “good” and what is “bad.”
However,
such measurement systems are seen throughout industries. One example is
quality control inspectors using a high-powered microscope to determine
whether a pair of contact lens is defect free. Hence, it is important
to quantify how well such measurement systems are working.
The
tool used for this kind of analysis is called attribute gage R&R.
The R&R stands for repeatability and reproducibility. Repeatability
means that the same operator, measuring the same thing, using the same
gage, should get the same reading every time. Reproducibility means that
different operators, measuring the same thing, using the same gage,
should get the same reading every time.
Attribute
gage R&R reveals two important findings – percentage of
repeatability and percentage of reproducibility. Ideally, both
percentages should be 100 percent, but generally, the rule of thumb is
anything above 90 percent is quite adequate.
Obtaining
these percentages can be done using simple mathematics, and there is
really no need for sophisticated software. Nevertheless, Minitab has a
module called Attribute Agreement Analysis (in Minitab 13, it was called
Attribute Gage R&R) that does the same and much more, and this
makes analysts’ lives easier.
Having
said that, it is important for analysts to understand what the
statistical software is doing to make good sense of the report. In this
article, the steps are reproduced using spreadsheet software with a case
study as an example.
Steps to Calculate Gage R&R
Step 1:
Select between 20 to 30 test samples that represent the full range of
variation encountered in actual production runs. Practically speaking,
if “clearly good” parts and “clearly bad” parts are chosen, the ability
of the measurement system to accurately categorize the ones in between
will not be tested. For maximum confidence, a 50-50 mix of good/bad
parts is recommended. A 30:70 ratio is acceptable.
Step 2: Have a master appraiser categorize each test sample into its true attribute category.
Figure 1: Master Appraiser Categorizations
Step 3:
Select two to three inspectors and have them categorize each test
sample without knowing what the master appraiser has rated them.
Step 4: Place the test samples in a new random order and have the inspectors repeat their assessments.
Figure 2: Test Samples
Step 5:
For each inspector, count the number of times his or her two readings
agree. Divide this number with the total inspected to obtain the
percentage of agreement. This is the individual repeatability of that
inspector (Minitab calls this “Within Appraiser”).
To
obtain the overall repeatability, obtain the average of all individual
repeatability percentages for all inspectors. In this case study, the
overall repeatability is 95.56 percent, which means if the measurements
are repeated on the same set of items, there is a 95.56 percent chance
of getting the same results, which is not bad but not perfect.
Figure 3: Individual Repeatability
In
this case, the individual repeatability of Operator 1 is only 90
percent. This means that Operator 1 is only consistent with himself 90
percent of the time. He needs retraining.
Step 6:
Compute the number of times each inspector’s two assessments agree with
each other and also the standard produced by the master appraiser in
Step 2.
Figure 4: Individual Effectiveness
This
percentage is called the individual effectiveness (Minitab calls this
“Each Appraiser vs. Standard”). In this case, Operator 1 is in agreement
with the standard only 80 percent of the time. He needs retraining.
Step 7: Compute the percentage of times all the inspectors’ assessments agree for the first and second measurement for each sample item.
Figure 5: Reproducibility of the Measurement System
This
percentage is the reproducibility of the measurement system (Minitab
calls this “Between Appraiser”). All three inspectors agree with each
other only 83.3 percent of the time. They may not be all using exactly
the same operational definition for pass/fail all the time or may have a
very slight difference in interpretation of what constitutes a pass and
a failure.
Step 8: Compute the percentage of the time all the inspectors’ assessments agree with each other and with the standard.
Figure 6: Overall Effectiveness of the Measurement System
This
percentage gives the overall effectiveness of the measurement system
(Minitab calls this “All Appraiser vs. Standard”). It is the percent of
time all inspectors agree and their agreement matches with the standard.
Minitab
produces a lot more statistics in the output of the attribute agreement
analysis, but for most cases and use, the analysis outlined in this
article should suffice.
So What If the Gage R&R Is Not Good?
The
key in all measurement systems is having a clear test method and clear
criteria for what to accept and what to reject. The steps are as
follows:
1. Identify what is to be measured.
2. Select the measurement instrument.
3. Develop the test method and criteria for pass or fail.
4. Test the test method and criteria (the operational definition) with some test samples (perform a gage R&R study).
5. Confirm that the gage R&R in the study is close to 100 percent.
6. Document the test method and criteria.
7. Train all inspectors on the test method and criteria.
8. Pilot run the new test method and criteria and perform periodic gage R&Rs to check if the measurement system is good.
9. Launch the new test method and criteria.
Post a Comment