Every
day I receive questions by email
and phone. I prefer email. If you have The New Weibull Handbook and/or the
SuperSMITH software, there is no charge for the answers so feel
free to email your questions to me.
Ask Dr. Bob. Over the years I’ve answered thousands of questions. I have
selected the ones herein because they are either frequently asked or they
are interesting in some sense. I hope you
will find the answers helpful in rounding out your
Weibull background.
The index is upfront
so your search is simple and easy. Each word in the index is hyperlinked
to the answer. Another hyperlink returns you to the index. If you want
to find every instance of the word in this document, use your search function—remember
some of the definitions you need are answered under different issues.
All references are to The New Weibull
Handbook (NWH) unless otherwise indicated.
This document may be
downloaded as a PDF
(200 KB) from Paul Barringer's website.
Send me your email comments and other
questions for inclusion into this document.
Last revision: May 27,
2005
Quality, Probability,
Sampling [index]
July 2005 From Richard Unkle at GE
Transportation:
Hi Bob. I'd butter you up a bit for some help on this next question,
but know it's probably not necessary, as I believe you'll have a
simple answer.
Here's the situation. There's a particular part on our locomotive that
has failed at least 5 times, that we know of, in a given failure mode.
Now, all five failures have come from parts made by the same supplier.
No other supplier's part has experienced this failure mode. However,
it's been 5 out of more than 1 million parts made, versus zero
failures out of much more than a million parts made by all other
suppliers combined. I thought about doing a weibull on the 5 failures,
and then comparing that to a one-parameter weibull for the rest of the
data. The problem is we don't have all the data necessary (in terms of
accurate suspension and even failed part ages) to do that, and would
be too expensive and time consuming to get
the data.
So, barring the above approach, is there some kind of procedure that
works as follows? If I have zero failures out of X million parts made
by multiple suppliers, what's the likelihood that I would see 5
failures out of Y million parts made by any single supplier? I have
some thoughts on how to do that, but thought you might now without
having to think about it too hard.
As always, I am in your debt for sharing your knowledge with me.
Best Regards, Rich
Dr. Bob’s reply:
Let's assume the bad vendor has produced B parts and the good
vendors have produced G parts.
If B=G: the probability is (1/2)^5 = 1/32. This is based on the
probability of randomly selecting a B part equals 1/2. Rigorously we
should use sampling without replacement for the second and later
draws, .but this is a negligible difference for large samples.
Conclusion or inference: This is unlikely but it could happen.
If there are 2B=G parts the probability is (1/3)^5 = 1/243.
Inference: This probability is so small (0.4%) I would say it is
highly unlikely to have happen due to random chance.
If there were 3B=G parts the probability is (1/4)^5 =
1/1024. Inference:This is extremely small, on the order of three
sigma for the normal. Statisticians would say it could not happen
due to random chance. You have one bad vendor, producing a batch
problem, as a result.
Take Care, Bob
Adjusted Rank Algorithm,
Plotting Positions with Suspensions
[index]
April 2003. I want to know if there
is more information about how the adjusted rank algorithm (that
adjusts the plotting positions for suspensions) was calculated. You reference
a Drew Auth as the engineer that came up with the simplified version
of something originally done by Leonard G. Johnson. His book, "The
Statistical Treatment of Fatigue Experimentation," has been out of print
for some time. Any help you could give in tracking down how this adjusted
rank algorithm was derived would be most appreciated.
Dr. Bob’s reply:
The derivation
in Johnson’s book is not too clear; however
Charles Mischke’s ASME
paper [pdf] is excellent, very clear. See also Chapter Two for a
case study of using the Drew Auth formula.
Aggregate Cumulative Hazard, Batch Problems
[index]
A batch problem means that one of the
failure modes only applies to a portion, a subset of the full sample.
The other units are immune to the batch failure mode. Batch problems
are common in production when something is changed. Usually the newer
units are affected. Throughout the Weibull Handbook there are clues
for identifying a batch problem. They are summarized in Chapter 8. The
Aggregated Cumulative Hazard method, (ACH), was developed by Rolls Royce.
It detects batch problems using the hazard function and provides a plot
of the batch. The size of the batch is approximately indicated by the
plot. The hazard function is the probability density function divided
by the reliability function which gives the instantaneous failure rate
at any age. The cumulative hazard function is summation of the hazard
function over time. See Appendix F for a complete presentation with
examples.
Average
& Standard Deviation
[index]
January 2003. Hi Dr. Bob, I was your
student last year in the Weibull course at Lohmar, Germany. I see that
the average and the standard deviation estimates in the
SuperSMITH
software are different from the simple average and standard deviation
(as calculated by Excel, for example). Please explain why there are differences.
Thanks a lot for your help.
Dr. Bob’s reply:
You cannot
use the usual equations for the mean and standard deviation because
they do not account for suspensions. Instead we solve for eta and beta
by whatever method you have selected and then use the Weibull equations
in Appendix G to estimate the mean, mu, and standard deviations. Bob
you may have to life these equations from Paul's CD.

Batch Problem
Clues [index]
July 2003. In my research, I have come across an example wherein I
get some conflicting results when comparing the standard Weibull 2-parameter
regression model to the MLE-RBA model (four failures in the example with
lots of suspensions). The MLE and regression solutions agree quite well,
but the MLE-RBA does not. When doing risk analysis,
the MLE and regression models predict quite closely what the actual number
of failures are, but the RBA model would lead one to believe that a batch
problem may exist. Without delving deeper into the handbook, perhaps you
could provide some guidance on this example, as it could impact the research
I am doing when I have to use small failure samples.
Dr. Bob’s reply:
I will have
to give you the prize for coming up with interesting problems. Although
I admit the "now risk" and the
ACH batch detection methods show only a small probability of a
batch
problem, the number one test - lots of right suspensions, shows strong
evidence of a batch problem or phony data. The probability of having
47 right suspensions at the B10 life without a failure is significantly
less than 1%. The fact that MLE beta is less than the MRR beta is more
evidence, (discovered by Geoff Cole of Rolls Royce). For this small
sample we expect the MLE beta to be greater than the MRR beta. You either
have a batch problem or bad data. Note that if you delete the 47 right
suspensions you have almost a perfect fit and a steeper beta. I suspect
there is something wrong about the right suspensions.
Confidence
Intervals, System Reliability, Component Estimates
[index]
June 2003. I need to compute the
confidence
intervals (upper and lower) for the failure rate of a system with three
identical components in series. I have the individual component reliability
and of course the individual failure rate with its confidence bounds based
upon a particular set of measurements. The distribution is log normal
and I have the estimated parameters for the probability density function.
This data can give me a median CDF with confidence intervals. What I want
is to take this same data and get CDF (probability of failure) for a system
with three of these components in place and the associated confidence
intervals for this system. The CDF would be for one to three failures.
I know I can get probability of failure from 1-(Rel)^3 but I do not know
how to get the associated confidence intervals.
Dr. Bob’s reply:
You won't like
my answer. Confidence interval estimates are a function of data, and
you have no data on the system. Further, confidence interval estimates
are not invariant under transformation which means you must always calculate
the confidence estimates last in the calculation. You cannot calculate
confidence intervals from confidence intervals as the result is garbage.
Therefore, the answer is there is no way you can make the confidence
interval estimate for the system without data on the system. ... Can
you play games with this? Of course you can, but not rigorously. You
might divide the time on the components tested by three and say "if
we had tested them in series" we would have had F failures in T time
and go from there...As long as your customer understands the meaning
of "if" you could do this. Strictly speaking the answer would not be
a confidence interval...but your customer might buy it.
Goodness of Fit,
Cramer-Mise and Kolmogorov-Smirnoff
[index]
May 2003. My statisticians recommend
Cramer-Mise and Kolmogorov-Smirnoff as measures of goodness of fit,
but you do not. Why?
Dr. Bob’s reply:
Comparing all
the known goodness of fit tests the two best goodness of fit tests are
the likelihood ratio test and our "p" value for the correlation coefficient
squared (Coefficient of Determination), [as shown in Chi Chao Liu's
thesis]. The likelihood ratio test described in Chapter 5, is used with MLE or better with MLE Reduced Bias Adjustment. The p value for r squared
presented I Chapter 3, goes with median rank regression, X on Y. Of
course, either technique may be used with either method. For testing
goodness of fit both are excellent and we recommend using both. For
distribution analysis, both work well, but you need at least 21 failures
to have enough information to make a credible choice with any method.
Less than 21 failures always use the Weibull 2p even if you know the
data are log normal. If you do not have Chi Chao Liu's thesis you may
download in two parts: 1) Abstract (1.2
Meg) 2) Dissertation (15.8 Meg).
Mixtures
of Failure Modes, Competing Failure Modes
[index]
April 2003. The failures were produced
by two competitive, "dueling," failure modes. We cannot separate the data
into two failure modes. How can we do a failure forecast?
Dr. Bob’s reply:
The cumulative
probability of failure considering both modes is [1-(F(t1))x(F(t2))].
The first step is do a mixture analysis with WSW. If the "p" value supports
two Weibulls rather than one Weibull, it is now possible to do an Abernethy
risk analysis with the WSW, version 4.0V and later. Alternatively, you
could use Monte Carlo Simulation. "Raptor" would be a good choice
for the simulation software.
Exponential
Distribution [index]
May 2003. In Chapter 8 discussing
the exponential distribution, you mention the mean time between in-flight
shutdowns for commercial engines of 25,000 engine operating hours. As
typical commercial flights are 2-4 hours this number seems extraordinary.
Dr. Bob’s reply:
At the time
I wrote that section, 25,000 was the standard; today it is much higher
and it is extraordinary. To put that in context, years ago I had lunch
with Dr. Von Ohain with the President of Pratt & Whitney Aircraft. Dr.
Von Ohain invented the gas turbine and developed the Jumo engine for
Germany . He said the mean time between in-flight shutdowns on the Jumo
engines in the ME262 was 25 hours. When I expressed shock, he said that
was good enough as the mean life of the ME262 was 7 hours and 10 minutes.
In the six decades that have followed World War II we have made progress
[with gas turbines]!
Small
Samples Sizes, Failures (few data points)
[index]
July 2003 When I recently attended
your excellent Weibull Workshop you said best practice for 20 failures
or less was to always use the two parameter Weibull even if you know that
the log normal is the parent distribution. This is very hard for me to
believe. Could you give some reasoning for this recommendation?
Dr. Bob’s reply:
This recommendation
comes from Chi Chao Liu’s thesis. His massive study of thousands of
Weibull and log normal data sets with and without all types of suspensions
showed that the Weibull 2P is always more conservative in the lower
tail than the log normal. It also showed that for betas of one or less
the mean square error of the log normal was much greater, even an order
of magnitude greater than the Weibull 2P even when the data was log
normal. For betas greater than one the log normal MSE is slightly less
than the Weibull 2p for log normal data but the log normal B lives from
the lower tail are always more optimistic than the Weibull 2P B lives.
Therefore, for engineering problems, we recommend the Weibull 2P for
all samples with 20 failures or less. For more detail see Liu’s thesis
available as a free download [abstract
or dissertation].
Interval and Inspection Data, Inspection Option, Data Input
[index]
February 2003. With interval and
inspection data I use the data shortcut to input points at the same
value. For example, 88x9 means there nine failures occurred at time eighty-eight.
This data appears as the number 9 on the plot. If I change from median
rank regression (MRR) to the Inspection Option the plot position of the
9 changes. Why?
Dr. Bob’s reply:
If you input
the 9 failures in separately they would appear as a vertical column
of points on the Weibull plot. With MRR the 9 is located in the middle
of this column which is approximately where we expect the standard Weibull
line to appear. With the Inspection Option the 9 is located at the topmost
point of the column, where we expect the Inspection Option line to appear.
Mean Time Between
Failure (MTBF) and Mean Time To Failure (MTTF)
[index]
May 2003. What is the difference between
MTTF and MTBF?
Dr. Bob’s reply:
Mean Time To
Failure (MTTF) is the average life of the part. Mean Time Between Failures
(MTBF) is the average time between failures for all the parts in the
fleet. To estimate MTBF divide the total operating time on all parts
by the number of failures. With a "complete" sample, (no suspensions)
the two parameters are identical. MTTF does not change with or without
suspensions other than statistical scatter. MTBF is heavily influenced
by suspensions. For example for wearout failure modes a young fleet
will have enormous MTBF but MTTF does not change. For a very old fleet
that has been through many overhauls and parts replacements, MTBF will
converge toward MTTF.
These two parameters are different
and have different applications. MTTF is the parameter of the exponential
distribution and is related to eta, the characteristic life of the Weibull
distribution. MTBF is used for fleets of repairable systems and is useful
for maintainability analysis.
Unbiased Estimates,
Bias, Mean Square Error, MLE VS
[index]
February 2003. In the New Weibull
Handbook you use median bias for most of the comparison of methods
and yet you write about unbiased estimates. I thought an unbiased
estimate is one whose expected value equals the true value. I am confused
as the expected value is the mean value. Correct?
Dr. Bob’s reply:
You are correct
that an unbiased estimate is one whose expected value equals the true
value and that the mean value is the expected value. However, the mean
value is not a good measure, or a typical measure, for skewed distributions
and most life distributions are skewed. We recommend the median value
instead of the mean value for skewed distributions. Statistical estimates
will split 50/50 around the median. For example, we recommend the median
rank plotting instead of the mean rank positions for life data analysis.
In the case you selected, five failures, B1 and beta are highly skewed.
See the simulation results below. For comparing the accuracy of alternative
methods like MRR versus MLE, I use the median bias rather than the mean
bias. With MLE-RBA Weibull we employ the
median bias correction as the standard or default, but we also offer
a mean bias correction in our software. These are quite different corrections;
the median bias correction is C4^3.5 versus C4^6 for mean bias. The
mean bias correction is much larger.
MonteCarlo Simulation N=5, True Values eta=1000, beta=1, B1=10….. Median /Mean
| |
Eta |
Beta |
B1 |
|
MRR X on Y |
972/1059 |
1.05/1.179 |
12.5/37.3 |
|
MLE |
942/1028 |
1.25/1.45 |
23.9/57.2 |
|
MLE-RBA |
945/1022 |
1.02/.991 |
10.4/22.4 |
|
MRR Y on X |
1068/1124 |
.912/1.05 |
6.3/27.9 |
Note that statisticians prefer mean
square error to bias as a measure of accuracy. Engineers prefer bias
as a measure because they want to know if the estimate is optimistic
or pessimistic. If a failure produces a health, or death, or crash risk,
engineers want either an unbiased estimate or a conservative estimate.
Optimistic bias estimators like small sample MLE failure forecasts and
B life are unacceptable. Our conclusion is to recommend methods with
the smallest median bias as our best practice. MRR and MLE-RBA for small
and moderate sample sizes are examples.
Mixtures
of Failure Modes, System Models, Exponential
[index]
June 2003. I have read much of your
Handbook and have been learning how to apply Weibull analysis to benefit
our company. I am writing to see if you could elaborate for me on a statement
made on p. 46 of the Handbook: "Weibulls for a system or component with
many modes mixed together will tend toward a beta of one. These Weibulls
should not be employed if there is any way to categorize the data into
separate, more accurate failure modes. Using a Weibull plot with mixtures
of many failure modes is the equivalent of assuming the exponential distribution
applies. The exponential results are often misleading and yet this is
common practice." … I have performed analyses on individual failure modes
pertaining to our product, but I anticipate a management will request
a Weibull analysis for a product with all failure modes lumped together,
for the purposes of making decisions regarding warranty terms. According
to your statement above, this would appear to be a risky proposition.
However, I do not fully understand why. What would be the harm in lumping
all failure modes together and performing Weibull analysis in order to
determine the overall failure risk for the product?
Dr. Bob’s reply:
Your method
will forecast the same number of claims each month for a constant fleet
size...This does this make sense? A better answer is use Monte Carlo
simulation. There is free software available to do that called "Raptor."
Download it from the net at Raptor.com.
Or better yet, make a failure-warranty forecast for each failure mode
from the individual Weibulls and plot them all in
WinSMITH Visual, then
sum the Y values to obtain the system forecast of cumulative failures
by months. For repairable systems, this method produces accurate forecasts.
Lumping them all together assumes no wearout modes, no infant mortality,
everything is exponential. Does that make sense? You might consider
attending one of our Weibull Workshops. We will teach you everything
about warranty forecasting with Kaplan Meier, the inspection option
and Crow-AMSAA.
Random or Exponential
Data, Confidence Bounds for Beta
[index]
May 2003. Hi Dr. Bob, I'm lost. I
have a Beta of 0.80 and the BetaU is 1.03 and BetaL is 0.70. How does
that tell me if the failure is close enough to be "random"/exponential?
Dr. Bob’s reply:
Your data is
not significantly different from an exponential at whatever level of
confidence you selected to get the bounds on beta because the value
one lies within the interval. If your double sided confidence bounds
are at 90% confidence, your data is not significantly different from
an exponential at 90% confidence.
Statistical Text References
[index]
May 2003. I have recently been reassigned
and need to use statistics in my new position. Could you recommend some
introductory statistical texts?
Dr. Bob’s reply:
Some recommendations:
- "Introduction to Statistical Analysis,"
Dixon and Massey, McGraw Hill
- "Introduction to the Theory of Statistics,"
Mood & Graybill, Mcgraw Hill
- "Introduction to Statistical Inference,"
Keeping, Van Nostrand...this one is a little heavier..
- "The Cartoon Guide to Statistics,"
Gonick and Smith, HarperPerennial, This is very easy to read and really
quite good. paperback.
The first three are old standards.
Suggest Amazon.com
books.
Unbiased
Estimate [index]
In the New Weibull Handbook you use
median bias for most of the comparison of methods and yet you write about
unbiased estimates. I thought an unbiased estimate is one whose
expected value equals the true value. I am confused as the expected value
is the mean value. Correct?
Dr. Bob’s reply:
You are correct
that an unbiased estimate is one whose expected value equals the true
value and that the mean value is the expected value. However, the mean
value is not a good measure, or a typical measure, for skewed distributions
and most life distributions are skewed. We recommend the median value
instead of the mean value for skewed distributions. Statistical estimates
will split 50/50 around the median. For example, we recommend the median
rank plotting instead of the mean rank positions for life data analysis.
In the case you selected, five failures, the sampling distributions
of B1 and beta are highly skewed. See the simulation results below.
For comparing the accuracy of alternative methods like MRR versus MLE,
I use the median bias rather than the mean bias. With
MLE-RBA Weibull we employ the median bias
correction as the standard or default, but we also offer a mean bias
correction in our software. These are quite different corrections; the
median bias correction is C4^3.5 versus C4^6 for mean bias. The mean
bias correction is much larger.
MonteCarlo Simulation N=5, True Values eta=1000, beta=1, B1=10….. Median /Mean
| |
Eta |
Beta |
B1 |
|
MRR X on Y |
972/1059 |
1.05/1.179 |
12.5/37.3 |
|
MLE |
942/1028 |
1.25/1.45 |
23.9/57.2 |
|
MLE-RBA |
945/1022 |
1.02/.991 |
10.4/22.4 |
|
MRR Y on X |
1068/1124 |
.912/1.05 |
6.3/27.9 |
Note that statisticians prefer mean
square error to bias as a measure of accuracy. Engineers prefer bias
as a measure because they want to know if the estimate is optimistic
or pessimistic. If a failure produces a health, or death, or crash risk,
engineers want either an unbiased estimate or a conservative estimate.
Optimistic bias estimators like small sample MLE failure forecasts and
B life are unacceptable. Our conclusion is to recommend methods with
the smallest median bias as our best practice. MRR and
MLE-RBA for small and moderate sample sizes
are examples.
Mixtures of Two
Failure Modes [index]
The failures were produced by two
competitive, "dueling," failure modes. We cannot separate the data into
two failure modes. How can we do a failure forecast?
Dr. Bob’s reply: The cumulative probability of failure
considering both modes is [1-(R(t1))x(R(t2))]. The first step is do
a mixture analysis with
WSW. If the "p" value supports
two Weibulls
rather than one Weibull, it is now possible to do an Abernethy risk
analysis with the
WSW, version 4.0V and later. YBATH is a good alternative
to the WSW mixture solution. You could also use Monte Carlo Simulation.
"Raptor" would be a good choice for the simulation software.
Warranty Claims
Forecast, Kaplan-Meier, Inspection Option
[index]
April 2003. For warranty claims
forecasting by age you recommend both the Inspection Option and the
Kaplan-Meier model. Which should I use?
Dr. Bob’s reply:
We have research
underway comparing all the interval methods, Inspection Option, Probit,
K-M, and Interval MLE but it is not completed. Industry seems to prefer
the Inspection Option for warranty claims by age but many use K-M.
There is a problem with KM. When we
use the KM with the actuarial correction, some of your suspensions are
eliminated from the data set. This prohibits using the Abernethy Risk
forecast as the suspension histogram is wrong. However, this does not
effect the estimate of per cent claims at the end of the warranty period
which is the usual objective. The Inspection Option does not have this
problem.
If the probability of repeat warranty
claims for the same problem is significant you may want to consider
Wayne Nelson’s Graphical Repair method described in Appendix M. May
2005 update. The research mentioned above is almost completed and suggests
the MLE-Interval method is most accurate, Probit next, and then the
Inspection Option although the differences are not large. K-M was not
included in the study. Based on these results from Todd Marquardt, we
now recommend the MLE-Interval method for warranty claims predictions
by age.
Weibull
Analysis [index]
Use of the Weibull distribution provides
accurate failure analysis and risk predictions with extremely small
samples using a simple and useful graphical plots. Solutions are possible
at the earliest stage of a problem without the requirement to "crash
a few more". Small samples also allow cost-effective component testing.
Weibull analysis is a key discipline for reliability, maintainability,
safety, and supportability (RMS) engineering largely because of new,
credible, and accurate quantitative methods. See
Chapter One.
Weibull
3-Parameter [index]
March 2003. When I use a three
parameter Weibull the slope, beta, is less than the two parameter
beta. Which beta is correct?
Dr. Bob’s reply:
The 3-P Weibull
is a much more complex distribution than the two parameter and we have
fixed requirements to meet before we adopt the 3P solution. Remember
the four hard fixed rules for using 3-parameter:
- you must have 21 or more failures,
some experts say 100.
- you must be able to explain why
the physics of failure support a guaranteed failure free zone,
- the 2-p plot should show curvature,
- and the distribution analysis must
favor the 3-p.
If you meet all these criteria above
the 3-p distribution is the best distribution and the 3-P beta is the
correct beta. The 2-p beta is irrelevant. For example, when we have
some data sets with missing data and use the 3-p and compare it to data
sets without missing data from the same source the 2-p with all the
data fall on top of the 3-p with missing data, same beta.
Temperature Weibulls, Challenger Space
Shuttle, NASA [index]
The Challenger Accident.
I was
honored by being invited to do a Weibull Workshop at the NASA Goddard
Space Flight Center. One of the NASA engineers asked me if he could
do a Weibull plot with the data on O-ring damage from the Challenger-Space
Shuttle history. He provided the following data and linear plot which
are available on the Internet [Dunar]:

This curve fit on the linear plot is
obviously not very good. More complex regression modeling is attempted
in [Wujek] but it also has poor goodness of fit. These models are better
than nothing but inadequate.
My Weibull experience with
temperature
is based on increased temperatures accelerate failures but here the
reverse is true. What to do? I thought perhaps if we took the reciprocal
of absolute temperature as the accelerating parameter that might work.
The figure below shows my results.
Beta is 83 indicating extremely rapidly increasing O-ring damage with
colder temperatures. The fit is excellent with a "p" value for the coefficient
of determination of 74.5%. (Any value above 10% is considered an acceptable
fit.) The probability of damage at the Challenger’s 31 degrees F is
100%. It is easy to be smart retrospectively and I do not mean to criticize
NASA for whom I have great respect. However, it certainly would have
been enlightening if they had made this Weibull plot.

References:
- A. J. Dunar & Stephen P. Waring,
"Power To Explore: History of Marshall Space flight Center 1960-1990,"
Government Printing Office, 033-000-01221-7, Chapter IX, "The Challenger
Accident."
- J. H. Wujek, "Challenger
O-Ring Data Analysis," OnLine Ethics Center.
Are two designs significantly
different? [index]
A problem from Ronald Shop, DAF Trucks,
Netherlands:
The data is extremely small:
1. Old Design: Failures at 5768, 6230, 2394, and 3390.
2. New Design: 1 Failure at 11020; 1 Suspension at 3666.
Using Weibayes the lower 90% bound
on the new set is to the right of the Weibull line. Does this mean the
new design is significantly better? If I use the likelihood ratio test
the p value is 74% using the Fulton Factor. These results seem inconsistent.
Which method should I believe? Your comments please.
Dr. Bob’s reply:
To the degree
that you have a good estimate of beta, Weibayes would be the best test,
and more precise than the likelihood ratio (LR) test. Weibayes eliminates
the uncertainty in beta which is larger than the uncertainty in eta;
the LR test includes both uncertainties. My analysis using MRR estimated
beta as 2.287 as you show on your plot. If I use
MLE-RBA the estimate is 1.954. Using either
of these betas shows the 90% bound to the right of the Weibull plot
indicating a significant difference. Further, in another E-mail Ronald
told me they had prior experience with this failure mode showing a beta
of 2.2 which he used for his Weibayes lower bound. Ronald is commended
for an excellent analysis.

MLE vs. Regression
[index]
Dr. Bob, I'd appreciate hearing any
comments you might have about this article. It is a discussion of MLE
vs. Regression in the case of many suspensions. Thanks, James
In previous issues, we have discussed
the maximum likelihood estimation (MLE) and rank regression methods for
parameter estimation. We have also mentioned that when your data set contains
a large number of suspensions, it is suggested that you use MLE for parameter
estimation. But why? We will take a closer look at this "rule of thumb"
and illustrate how rank regression does not take into account the characteristics
of the entered data when there are a large number of suspensions.
Even though the median rank adjustment
method is the most widely used method for performing suspended items analysis,
there is a shortcoming to the method that must be understood. As you may
have noticed when using this analysis method for suspended items, the
position where the failure occurred is taken into account but not the
exact time-to-suspension. For example, rank regression would yield the
exact same results if late suspension times are changed but are still
late, that is, beyond the last failure.
This shortfall is significant when
the number of failures is small and the number of suspensions is large
and not spread uniformly between failures, as with these data sets. In
cases like this, it is highly recommended that you use maximum likelihood
estimation (MLE) to solve for the parameters instead of least squares,
since maximum likelihood does not look at ranks or plotting positions,
but rather considers each unique time-to-failure or suspension.
Dr. Bob’s reply:
James, they
missed the point! Which method is most accurate? MRR or MLE with suspensions?
My objective is to use the method that is most accurate and conservative.
MRR is almost more accurate than MLE with small samples and suspensions.
To answer the question we would use Monte Carlo simulation with all
kinds of suspensions in large and small numbers with small samples of
failures. The combinations are huge!! I have of course completed many
of these studies and in they favor MRR in general but you can find cases
that favor MLE.
Many others have looked at this. Dr. Wenham at GKN International in
Wolverhampton was promoted to be the worldwide reliability geru for
the corporation. Her predecessor, a mathematical statistician, had made
MLE the corporate standard. Questioning this decision Dr. Wenham did
a massive study with and without suspensions and concluded MRR was more
accurate and changed the corporate standard to MRR. "The Rank Regression
(RR) method calculates Weibull parameters with the best combination
of accuracy and ease of interpretation over more of the range of conditions
thought to be common within GKN than any other method investigated."
Further, under Recommendations: "The RR method should be adopted for
Weibull analysis throughout GKN."
The largest study ever completed was done by Dr. Chi Chao Liu for his
doctoral thesis. He looked at thousands of simulations with all kinds
and sample sizes of suspensions. His conclusion was the same as Dr.
Wenham, MRR is more accurate than MLE for small samples with and without
suspensions. His thesis is available as a download from Barringer's
website below if you want to read it. "The results show that the empirical
equations fit the data very well if the Median Rank Regression method
is used. Points estimated (from the Weibull and lognormal) by MLE are
more scattered."
Both studies are referenced in The New Weibull Handbook.
Waloddi Weibull's
position on the subject: [Weibull 1967] states, "The Maximum Likelihood
Method is the most efficient one of the general methods, but it requires
complicated equations, and, above all, the estimates from small samples
are heavily biased." This was written by Weibull after decades of study
and numerous research projects, many for the US Air Force.
Finally, the criticism of MRR that it does not weigh suspensions as
much as MLE is a two-sided sword. Most in-service data includes suspension
data that is somewhere between inaccurate, wrong and non-existent. The
fact that MLE heavily weighs suspensions is bad with bad suspension
data, but with MRR it makes little difference, in other words MRR is
more robust.
Zero Failure Testing
Subject: n = -(eta/t)^beta ln(1-Confidence)
Dear Dr. Robert Abernethy, This is how I thought about the above equation.
Please comment about it. The equation was created by combining n =ln (1-conf)
/ ln (Reliability) and R(t)= -(eta/t)^beta. They are like Oranges and
Apples, [Discrete and continuous, Non-time Dependent and Time Dependent,
respectively] He has combined two valid equations, coming from two different
disciplines to create a non-valid equation. In one hand, you take beta=1
while allowing to use whole spectrum of beta values. Therefore, the equation
is not a valid equation.
Dr. Bob’s reply:
The equation
relates to equation 6-3 in The New Weibull Handbook and is the basis
for the zero failure Weibayes testing. These tests have the minimum
test-time requirements for substantiation among all other types of Weibull
tests so they are very popular. I developed or designed these tests
in the seventies so they have been around a long time. If you have "The
New Weibull Handbook" it provides the background.
It is apples and oranges but in this case it is quite proper and rigorous.
Start with the zero failure term of the binomial for successes and failures.
Probability of zero failures = p^n where p is the reliability and n
is the number of trials. This is the apples, (discrete) part of the
problem. The null hypothesis for substantiation tests is that the reliability
is no better than the value to be demonstrated and the probability of
passing the test if that is true is the complement of the confidence.
Therefore, (1-C) = R^n. Now we substitute the Weibull reliability function
for R and solve for t, the test duration. This is oranges part, the
continuous distribution. This is a simple conjugation.
System reliability often requires mixing apples and oranges. More often
the conjugation of apples and oranges for systems is complex and may
require Monte Carlo. For example, my doctoral thesis (1965) modeled
the reliability of large liquid rocket engines, conjugating the binomial
requirement of start, accelerate, operate and shutdown without mechanical
failure, with the performance requirements of specific impulse, mixture
ratio control and thrust, all normally distributed. Although I was able
to derive analytical solutions for conjugate Bayesian and likelihood
solutions, the method was too complex to ever be widely accepted. Fortunately
my thesis was accepted.
Hope I have helped. Let me know if you have questions.
Zero Failure
Weibayes [index]
Dr. Bob: We have a real problem
with a new component under test. We want to run a zero failure
Weibayes test to prove it meets the contractual requirement but we
have absolutely no information on what the dominant failure mode may
be, let alone the beta. The requirement is 99% reliability at 10,000
cycles with 90% confidence. How long should we test how many
components?
Dr. Bob’s reply:
Beta will be somewhere between 0.7 and 5.0 for almost all good
components. Design your test with each beta, 0.7 and 5.0, and use
the test with the longest test duration. The alternative is to use a
Binomial test, see Table 8-1, but the sample sizes are too large as
the binomial is imprecise. 230 tests for 10,000 cycles without
failure would meet your requirement.
|