Go
New
Find
Notify
Tools
Reply
  
-star Rating Rate It!  Login/Join 
Posted
Posted on behalf of Colin Parish


Where failures are random, time based replacement or overhaul cannot prevent ALL failures but, if the time interval is, say 25% of the MTBF, then it can prevent about 60% of the expected failures. The question is, ‘why would you want to do this?’ when you would be sacrificing a considerable amount of the ‘useful’ life of the component.

By definition, a breakdown can occur at any time, whereas time based maintenance is usually scheduled to be performed on a down day. In circumstances where there is a considerable cost of ‘lost product’ due to a breakdown, but little or none due to time based replacement or overhaul, then it may be cheaper overall to adopt a time based maintenance strategy.

To determine whether this is so, you need first to assemble ALL the cost data for both scenarios. Then, for any given time interval, you can calculate the probability of unexpected failures using the appropriate Weibull function. For each time interval, factor the breakdown costs by this probability, add the costs of time based maintenance, and annualize the result.

If this seems like a tortuous task, believe me, it is. Fortunately, there are ‘expert systems’ for RCM analysis that will do all this calculation for you and provide a graph of annualised costs vs. the frequency of predictive or preventive maintenance. From this you can choose either, a suitable frequency or, to ‘operate to failure’.

Colin Parish
Chameau Systems Limited
colincparish@ntlworld.com
United Kingdom

This message has been edited. Last edited by: Terrence O'Hanlon,
 
Posts: 776 | Location: Southwest Florida Gulf | Registered: 03 April 2004Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
Posted on behalf of Christopher Smith

I was wondering if you could clarify the math behind it for me.

If I assume that “random” as used above is β=1 then for a, given mission time, there should be no difference in the number of failures experienced if the part is replaced at 25% of MTBF or replaced at 100% of MTBF.

Example:

GIVEN:
Mission time = 1000 hrs
MTBF = 1000hr
β=1 (random)

Replacement at 25% of MTBF means that the part will be replaced every 250 hr.

Reliability for the first 250hr would be as follows: (equation symbols could not be posted here) See attachment

77.88% chance of no failure

This will be the same for each of the 4 periods that comprise the 1000hr mission time.
Hence the reliability for the mission time can be shown as a series system of 4 components that each have a 77.88% reliability.

Therefore:

Now for the case with preventive maintenance replacement not scheduled until the MTBF:

If there is no replacement and the original unit is in service for the 1000 hr mission time then the reliability of that component is as follows:

I’m not sure what math is behind the statement “it can prevent about 60% of the expected failures”. Is the author using a different shape parameter or a different interpretation of the term “random”?

Thanks in advance for looking into this for me.

Christopher Smith

PDF DocReliability_Tip.pdf (75 Kb, 35 downloads) This attachment includes the equation
 
Posts: 776 | Location: Southwest Florida Gulf | Registered: 03 April 2004Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
Posted on behalf of Colin

Gentlemen; my apologies are due.

Christopher you are absolutely correct (at least someone out there is awake).

There is only one truly ‘random’ probability density function and that is the negative exponential function. The Weibull pdf with β = 1 is exactly that curve.

Its derivative function, the Weibull ‘hazard’ function has a constant value 1/β and hence there can be no benefit in replacing a component that has not failed. Furthermore, for a Weibull pdf where β < 1, replacement is positively harmful.

I did use the correct Weibull pdf. My error was, having used the Weibull cumulative function to calculate the probability of breakdown in the period prior to replacement, to assume that replacement of the failed component would reset this function to time zero. Whilst you can ‘physically’ do this for any value of β, it is only valid if the hazard function increases from zero at time zero: it is nonsense for β = 1 and below (as I discovered recently and you were kind enough to point out Christopher– obviously it was not my turn to have the family brain cell when I did this).

Terrence, I think I owe your readers an apology and a warning not to fall into the same trap as I did. Perhaps I should also add a simple explanation of the difference between pdf, cumulative and hazard curves since I suspect that many people believe that the ‘flat’ random hazard curve is the Weibull pdf.

Please let me know if you would like me to write something in plain English.

Regards. Colin.

This message has been edited. Last edited by: Terrence O'Hanlon,
 
Posts: 776 | Location: Southwest Florida Gulf | Registered: 03 April 2004Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
Posted on behalf of Chris

Colin,
Thanks for clearing that up.

Chris
 
Posts: 776 | Location: Southwest Florida Gulf | Registered: 03 April 2004Reply With QuoteEdit or Delete MessageReport This Post
Vee
Posted Hide Post
May I invite those maintainers who are confused by this set of exchanges on terms such as Weibull, pdf, MTBF, Exponential, Beta etc to attend the post Conference workshop in MARTS 2005 in Chicago on 26th May on "Reliability Engineering for Maintenance Practitioners"? I promise to keep the math to a minimum and expect that delegates will have a pretty good idea of the concepts and their application by the end of the day. Regards.

V.Narayan.


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 772 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteEdit or Delete MessageReport This Post
<Ozgipsy>
Posted
Gents,

I am a little surprised by this conversation, however it could be that I have misunderstood what was being said. (Not the first time sadly)

I think there are several points here that deserve to be addressed, however I am going to restrict this answer to the following themes:

  • The random failure curve as represented within RCM (The original report and subsequent reports looking at validating or replicating their results)

  • The selection process for routine maintenance tasks and the RCM logic for task selection

  • Justification for task selection and the danger of many "expert" systems. (Particularly those based around cost optimization theories)


The random failure curve

When the failure curves were first generated they were done using concepts of conditional probability with a smoothing algorithm to represent the 6 different failure types.

Wiebul analysis was not used at the time and is a different beast to the conditional probability concepts that were used initially. There are some similarities in that both include elements of Bayesian maths concepts, but ultimately they are different themes. (Weibul is also somewhat limited generally, however this is a different discussion)

Random conditional probability, which was also captured in the SAE JA1011 RCM standard (www.sae.org) as being relevant for modern RCM analyses, applies to complex equipment such as Hydraulics, Pneumatics, bearings and other "complex" items.

This does not mean that there is not a probability of failure, or that there is not an MTBF, both exist. However RCM bases itself on the conditional probability of failure as this is the more accurate means of determining failure characteristics. (Also we will leave the entire debate regarding asset failure information for another time)

In fact basing any form of maintenance regime on probability alone could potentially increase risk exposure.

The RCM Selection Task Process

The initial task that is offered up in RCM task selection processes is that of On-Condition maintenance, or predictive maintenance. There were several reasons given for this including the ability to mitigate the consequences of safety or environmental incidents as well as the ability of the method to use the entire span of useable life. Regardless of whether or not the failure mode is random, age based or infant mortality based.

Throughout the entire decision process there are questions relating to the applicability and effectiveness of the task, all of which need to be answered in the positive prior to selecting the task.

The second and third choices are then age based, or hard time, tasks. (Preventive replacement and preventive restoration tasks)

The first applicability question here is along the lines of "Is there a point where the conditional probability begins to rise?". The intention here is to establish whether there is a life or not. If the answer is NO then life based tasks are not the most applicable tasks and the analyst is directed to move on in the process.

Therefore failure modes with random failure patterns are NEVER managed by age based tasks. (Ever!) This is becaus ethey achieve NOTHING, regardless of whether there is a probability curve, or an MTBF. (Both of which are not true representations of "life" within the context of RCM)

Final choices are different depending on the consequences of the failure mode. However, one of the potentials is redesign. Of which there is also guidance within the original report as to the cost effectiveness of this.

Expert Systems

There are two common problems that I have noted recently with Cost-Algorithm systems, or expert systems within the field of reliability.

The first is that they do take analysts directly to replacement or refurbishment and begin from the outdated view of age based failure. This is a fundamental error and one that leads companies to establish maintennce policies based on outdated thinking. (Replacement and "life" considerations are useful as part of Whole-of-Life considerations but not for establishing lifetime maintenance regimes.)

The second is that Cost algorithm systems focus only on costs rather than on all of the areas where asset management impacts on corporate performance. In particular is the management of risk as it pertains to safety and to the environment. reducing these to a dollar cost figure is by itself not a problem, but then comparing these in a similar way to failure modes with purely operational or economic consequences is dangerous and is not aligned with the way that modern societies value human life and the environment.

Justifying tasks for economic consequences is based on cost effectiveness, this much is true. However for hidden consequences, safety consequences and environmental consequences the justification relates to the minimization of risk, not the cost effectiveness of executing the task.

I hope this adds some clarity to the discussion. Smiler

Regards

Daryl Mather
www.strategic-advantages.com
 
Reply With QuoteEdit or Delete MessageReport This Post
Vee
Posted Hide Post
Daryl and others,

I note that you are using the term 'random' to imply exponential distribution.

A number of discrete and continuous distributions are 'random'. Thus the Binomial and Poissan distributions are random discrete, while distributions such as Exponential, Normal, Lognormal, Weibull etc are all of the continuous random variety.

V.Narayan.


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 772 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteEdit or Delete MessageReport This Post
Vee
Posted Hide Post
Daryl,

I do not agree with the following statement in your note (emphasis on NEVER).

quote:
Therefore failure modes with random failure patterns are NEVER managed by age based tasks. (Ever!) This is becaus ethey achieve NOTHING, regardless of whether there is a probability curve, or an MTBF. (Both of which are not true representations of "life" within the context of RCM)



In the case of hidden failures, when Testing is the applicable and often most effective task, sometimes it is more economical to replace the item with a pre-tested unit, and test the original item later on the test bench. This then becomes a Scheduled Relpacement activity, which is a hard-time task. Examples are Relief valves or Smoke Detectors, especially on Unattended Offshore Oil & Gas Platforms.

I do not dispute the thrust of your argument, only the use of the emphatic 'NEVER'.

V.Narayan.

This message has been edited. Last edited by: Vee,


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 772 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteEdit or Delete MessageReport This Post
<Ozgipsy>
Posted
Hi Vee,

I hope all is well with you.

Of all the professionals in the world I least like disagreeing with yourself.

If the failure mode has a random conditional probability of failure, then I think you will find that what you have just described to us is a Failure Finding task. (Fourth choice in the side of an RCM decision diagram that is aimed at hidden consequences)

When the failure mode is one that has a random conditional probability of failure, and previous tasks such as on-condition are not possible, then the frequency of the task is then driven by limiting risk to within tolerable levels.

This means that the frequency will have been set by a range of factors including the frequency of the protected situation, the availability of the protective item and the level of risk the organization is willing to accept in this particular mechanism.

In regulatory considerations regarding items such as these this is also the thinking that takes place.

The fact that they take out the valve or other item to test if it is still functioning well, and replace it with another, would be a tactic that they have adopted to minimize the unavailability of the item or for other operational reason. It does not make this task a preventive replacement task.

If this is not the case, and they have been set by some form of age-based relationship, (Eg after a particular time not determined by risk but rather by age) then the item would have been deemed to not have a random conditional probability of failure. (Eg It has a "Life")

I hope this is not the case, and it is not often the case, as it is extremely difficult to find age related failures that do not exhibit random failures during the early parts of life, thus increasing risk of failure of the safety device substantially.

It is difficult to comment in depth on this issue as we are not privy to the analytical processes or thinking that went on at the time of establishing the regime.

However, the statement stands, time based routine tasks are NEVER relevant for failure modes that correspond to a random conditional probability of failure.

Sorry mate... Wink

Regards

Daryl Mather
www.strategic-advantages.com
 
Reply With QuoteEdit or Delete MessageReport This Post
Vee
Posted Hide Post
Hi Daryl,

I do not dispute your arguments. All I am saying is that in relation to hidden failures:

- We have to find applicable tasks; in this case, failure finding is an applicable task. So is a hard-time replacement, but the latter may be too expensive.
- We have to find effective tasks, i.e., one that will address the root cause of the problem. Again , failure finding is effective, but so is a hard-time replacement.
_ Finally, it has also got to be cost effective. In certain cases, the scheduled replacement is cheaper than testing at site.

In these special cases, e.g. PRVs or Smoke Detectors, it is sometimes an economic decision to replace rather than test at site. Such a replacement does not always address a failure, since there be none; in this sense it is a preventive time-based action, based oly on economic considerations such as, as you have pointed out, downtime or downtime costs or both. Hence the word 'NEVER' is inappropriate.

I think I understand the theory of conditional probability well enough to know that with a constant hazard rate, there is no age-related failure. By the way, 'random' does not necessarily mean constant hazard, as I mentioned in my second post. Other non-constant-hazard distributions can also be 'random'.

Good talking with you; it is always stimulating! Regards.

V.Narayan.


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 772 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteEdit or Delete MessageReport This Post
<Ozgipsy>
Posted
Dear Vee,

There are a number of issues in what you have just written.

First the concepts of applicable and effective are somewhat confused.

Applicable refers to the technical feasibility of tasks while effectiveness refers to its ability to be cheaper over time or to reduce the level of exposure to risk to within tolerable levels.

And in the case of random conditional probability of failure time based tasks are NOT applicable under any circumstances. (For the reasons set out previously)

The frequency of the testing, the failure finding task, is determined by risk calculations. (Provided it is a random failure pattern and on-condition was not able to be used) Definitely NOT by economic considerations, I would go as far as saying that if this were the case then it is a tragic error!

If the "how" is determined by economic considerations, then this is another issue entirely. However, the drive of this task appears to be periodic testing to ensure that it remains in a functional condition. (Thus a failure finding task)

I am not sure how else I can illustrate this concept Vee. Particularly in a text forum such as this one.

Regards

Daryl Mather
www.strategic-advantages.com
 
Reply With QuoteEdit or Delete MessageReport This Post
Vee
Posted Hide Post
Thanks Daryl, I appreciate your constructive approach.

However, I am pretty clear about what I am saying.

Quoting John Moubray, "A proactive task is worth doing if it reduces the consequences of the associated failure mode to the extent that justifies the cost of doing the task".

If we assume for ease of understanding that the failure distribution follows Weibull, then, as you have already stated words to the effect that:

1. for beta less than 1, do nothing till failure
2. for beta=1, for evident failures do nothing unless degradation has set in and incipiency can be measured. For hidden failures, test to detect state at a frequency that will give us the required availability.
3. for beta>>1, time-based preventive tasks are applicable, since wear-out is indicated.

So I am entirely in agreement with you wrt item 2 above. But, as my examples of PRVs and Smoke Detectors in the earlier post show, in SOME cases, a preventive replacement strategy is cheaper (due to downtime costs)for such hidden failures. Not always, but sometimes. The emphasis is on the last sentence of Moubray's quote.

The 'how' as you put it has an economic significance. Ultimately, the task we do must make economic sense. Hence my objection to the word 'NEVER'.

I note that you continue to use the word 'random' to imply an exponential distribution. They are not interchangeable as there are other random distributions.

V.Narayan.


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 772 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteEdit or Delete MessageReport This Post
<Ozgipsy>
Posted
Dear Vee,

There are actually some fundamental issues here that are being either deliberately or unintentionally ignored. As such I am going to makie one last effort to clear this up, and if thats no good then you can have the last word on the matter.

`
quote:

Quoting John Moubray, "A proactive task is worth doing if it reduces the consequences of the associated failure mode to the extent that justifies the cost of doing the task".


If you look deeper at what Moubray wrote, as well as at what the original RCM report stated which was confirmed later in the RCM Standard (SAE JA1011) you will note that this theme is developed into considerably more detail than this initial statement.

In fact there are definite differences in the way that failure modes with safety, environmental and hidden consequences are treated which does not take into account "cost effectiveness" rather it takes into account the reduction of exposure to risk to within tolerable levels. (As stated a number of times)

In fact, and this is important, basing the frequency of a routine, or proactive, task that has safety or environmental conseuences, on cost calculations is not only wrong but is unethical!

The word random that I am using is to refer to the "random conditional probability of failure" that was initially uncovered int he 1978 report and then further proved in a range of other US military reports. It is real, it exists and it is not a replacement term for any exponential distribution. Rather, it is the proven failure characteristic for many failure modes on complex equipment.

(If you recall I started off this threed talking about the limitations of Weibul and the fact that this was not a consideration in the initial RCM study)

Now, for the last time, the tests that you are describing are essentially Failure Finding or detective maitnenance tasks.

That is, you are testing the item after a predetermined time to ensure that it has not failed!

The time period, if it has a random conditional probability of failure is determined by risk calculations. (THis particular statement could be expanded to two or three volumes of work, so I am not going to get into it too much here)

¿WHY? Because when an item has a random failure pattern, that means that the liklihood of it failing today is the same as tomorrow, is the same as in 5 years and is the same as in 10 years.

This is a fundamental point that if we are not able to get a handle on, then there will be no way that people are going to understand the absolute futility of doing a task based on the assumption that the item has a "life".

When the item is taken down to be tested, the oil and gas industry does tend to replace it with one that has been tested and is working. THis has nothing to do with the fact that there is a life in most cases, it has nmore to do with the fact that they want to reduce the level of unavailability of the item while they are testing it.

I think that you will find that the Piper-Alpha dissaster is as much of a contributor to this as is any cost calculations. As you well know, the unavailability of the item in that particular case was a key element, along with many others, of the eventual disaster.

So, in summary, and for the last time:

If an item has a random conditional probability of failure then the liklihood of it failing does not increase with time.

Therefore any time based maintenance would be futile as it would be trying to impose rules on an item where they do not exist. (All equipment is more likely to fail as it gets older, a classic error in thinking)

Regardless of how they decide to do the failure finding test, which may well be organized for the most cost effective outcome. (Although this is not often the case) The task remains a failure finding task because this is essentially what they are doing. And they are doing so in a time that is based upon risk calculations not some erroneous view that random failure patterns can have a life.

Hence, it is not a PREVENTIVE REPLACEMENT STRATEGY, it is a FAILURE FINDING STRATEGY.

Life based maintenance on items with a random failure pattern is NEVER technically feasible, and cannot be stated as EFFECTIVE in the way that the term is used in RCM.

If you want to keep going with this I am happy to address this by email personally, as far as public debate goes I have said what I wanted to say. (And repeated it several times)

Regards

Daryl Mather
www.strategic-advantages.com
 
Reply With QuoteEdit or Delete MessageReport This Post
 Previous Topic | Next Topic powered by eve community  
 


Copyright © 2004-2008 NetexpressUSA Inc. All rights reserved.