Page 1 2 
Go
New
Find
Notify
Tools
Reply
  
-star Rating Rate It!  Login/Join 
Posted Hide Post
Following this definitely productive discussion could somebody define quantitatively the term "reliability" and specifically for the whole plant.

If for a single item failing randomly the probability is defined as :
R = 1 - P ( P is probability of failure, P as derivative of MTBF )
and for a time related failure rate single item, expression for R is more complex ( can't remember now ) how do you measure R for a whole plant?

Of course one can connect individual pieces of equipment reliability with logical AND's and OR's
thus calculating overall plant reliability but is that the way for a 1000+ pieces plant? Sounds cumbersome to me... not mentioning that not all individual pieces of equipment reliabilities are available.

Dave

This message has been edited. Last edited by: David_G,
 
Posts: 1340 | Location: Texas | Registered: 22 February 2005Reply With QuoteReport This Post
Vee
Posted Hide Post
Dave,

Taking you car as an example, it has many systems that need to be reliable for the car as a whole to be reliable. These include e.g., power generation, transmission, suspension, safety systems such as braking, lighting etc.

Each of these systems has sub-systems. Taking power transmission as an example, you have a clutch assembly, gearbox, power shaft, differential, wheels Each of these has components, such as bearings.

We can work at the top end; for the car as a whole, failure may mean one or more of the following may not work on demand:
- engine does not start
- a tire is flat
- the wipers don't work
- brake light is fused etc.

The time between these 'failures' affects the car as whole. So if we record all these failures and their dates, we can compute the car's reliability.

We can also work bottom-up. This is more tedious, as we have to build a mathematical model, using, as you mentioned, AND/OR logic symbols. If we know the reliability parameters of each component, we can 'run' a simulation model to get the reliability of the sub-assembly. We then build a mathematical model of the car, assembling all the sub-systems, again using logic operators AND/OR. We run this model and get the reliability of the car. In practice, we build just one large model with all the systems broken down to component level.

The top-down is method is based on historical data and is 'after-the-event' When building a new project we can use the bottom up approach.

I have tried to put all this is simple terms. The reality is somewhat more complex, but I hope you get the picture.


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 1027 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteReport This Post
Posted Hide Post
quote:
Originally posted by Vee:

We can work at the top end; for the car as a whole, failure may mean one or more of the following may not work on demand:
- engine does not start
- a tire is flat
- the wipers don't work
- brake light is fused etc.

The time between these 'failures' affects the car as whole. So if we record all these failures and their dates, we can compute the car's reliability.



Vee,

If you view reliability of a complex system as a statistical parameter by calculating it as a function of MTBF then it will be extremely non-representative. The reason being is that you do not want to mix in the same pot different failure modes while calculating it. In the case of the car, failure of the cam shaft in the engine can not be compared with a wiper failure by its significance, consequences, time to correct, and frequency of occurrence although either one may prevent from driving.

My point: one needs to know component's Reliability and then calculate system's reliability. It is hard to do in real life though.

Dave
 
Posts: 1340 | Location: Texas | Registered: 22 February 2005Reply With QuoteReport This Post
Posted Hide Post
Hi,Dave, in process industry, we seem using the historical data to generate a model(weibull or other types), and we calculate the system availablity by such model. It maybe not accurate, but we dont know how to get the availability better than this. But have you seen people calculate the availabiltiy by bottom-up? That must be time-consuming. Just curious to know it. Thanks.
 
Posts: 2 | Location: manchester | Registered: 12 September 2007Reply With QuoteReport This Post
Vee
Posted Hide Post
Dave,
You said
quote:
The reason being is that you do not want to mix in the same pot different failure modes while calculating it. In the case of the car, failure of the cam shaft in the engine can not be compared with a wiper failure by its significance, consequences,

You must define what is a failure of the car. If for argument's sake, you accept that the car meets its functional requirements when
- the brake lights are fused or
- the traffic signal lights don't work, or
- the wiper does not work
then you are absolutely right. After all, with all these faults, the car will still get you from A to B. But if the traffic police catch you without functioning brake lights or signaling lights, I think you will have an embarrassing discussion. Certainly, if the car needs an annual certification for road-worthiness, it will fail.
So to take the car as a whole, EVERY failure that you define as a loss of function of the car has the same significance. A fused brake light is no more or no less important than a flat tire or emissions beyond preset limits. With these qualifications, for the car as a whole (or its equivalent, a Manufacturing Plant), we can compute the reliability parameters.

MTBF is an average value. It is useful as a rough metric, a sort of quick and dirty number. When we want to do some serious reliability analysis, we need the actual failure distribution, or probability density curve. This is not easy to do for every failure mode. We have to compute the parameters that define such curves. One of the more popular distributions that appears to fit most equipment failure data is the Weibull. Three parameters, using the greek symbols eta, gamma and beta can define any Weibull distribution. The fun part of Weibull is that we can get approximations of other distributions such as the gaussian (or normal), exponential, log-normal and a whole host of other distributions that one might face in real life . That ia why it is popular. We need such data when we build models, not just MTBFs.

Scarlett, a good modeler will do a reality check by doing what is called 'history-matching' This process allows one the validate the model for its closeness to reality.
You ask
Quote But have you seen people calculate the availabiltiy by bottom-up? That must be time-consuming. Just curious to know it Unquote
The answer is yes. There are many Consulting firms who can help you do these and there are many software packages for this purpose.
-

This message has been edited. Last edited by: Vee,


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 1027 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteReport This Post
Posted Hide Post
Vee, Scarlett,

IMO, the fact that people in the maintenance reliability business are using plant MTBF an indicator of reliability is unfortunate. To calculate reliability from the bottom is the correct way of doing it although labor intensive.

A Weibull distribution will have a n more distinctive pattern ( meaning data won't be smeared that much ) if less failure modes in a sample are considered. The perfect case is a single failure mode. And when one considers smaller components, such as motor or fan, fewer modes are mixed together then that of a complex system.

Vee, you said that MTBF is a "quick and dirty number". I can't agree more, but how dirty? When you estimate, for example, speed of a car you know that you may be no more then, say, 10% off, but in all likelyhood you do not know how far off is plant's MTBF/reliability calculated in such an overly simplistic way will be different from the true one. By not knowing its accuracy how can one rely on it?

Another point... In general, statistical analysis in reliability is a process when one takes a sample of, as in this case, time between failures, calculates relibility, and then makes a reliability inferrence in regards to the whole population. In a real plant we do not have to take a sample. We can deal with the population since each failure and amount of downtime for every piece of equipmnet is stored in the CMMS database.

Therefore here is a suggestion. Forget about MTBF and statistics. Work with the whole population. Sum up all uptimes for each individual equipments in the plant, such as motors, pumps, gearboxes, transmissions, fans, etc., and find a parameter, which we'll call

Availability = sum_of_uptimes / maximum_available_sum_of_uptimes

Any CMMS can do it easily if data is entered properly. This parameter may not reflect what is the percentage of time the plant was making the product that went out of the door but it will be representative in regards to the relaibility efforts.

David
 
Posts: 1340 | Location: Texas | Registered: 22 February 2005Reply With QuoteReport This Post
Vee
Posted Hide Post
Dave,
If I understand you correctly, when you say
quote:
We can deal with the population since each failure and amount of downtime for every piece of equipmnet is stored in the CMMS database.

we work with the whole poplulation, then there is a problem, as follows.

For ANY reliability analysis, be it merely computing MTBF or Weibull parameters, we must satisfy two conditions
1. The failure mode under consideration must be independent , i.e., not influence another. Thus a bearing failure should not induce a seal failure, or one tire should not influence another tire's performance in a car.
2. Every data point wrt failure modes must be under identical conditions. Thus if your car tire population is 4, then EVERY tire must be operating under the same operating context (function, load) AND subject to identical degradation mechanisms (traction or not, cambering, brake type, air pressure, make of tire etc.)

Thesse cpnditions are clearly not possible to obtain in practice: the left front wheel works under different functional requirements and load than the rear right wheel. Every tire influences every other tire as well.

For these reasons, we HAVE to select samples that minimise such errors.

After all, pressure relief valves are not all identical in size, design, or service conditions. How often do we see PRV MTBFs that are based on populations? How believable are such numbers?

This message has been edited. Last edited by: Vee,


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 1027 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteReport This Post
Posted Hide Post
quote:
Originally posted by Vee:
.. if we work with the whole poplulation, then there is a problem ...


Vee,

I do not see any of the listed above being a problem since if whole population is taken ( all machines at the plant) then calculating MTBF, where failure independance and failure modes should be considered, won't be neceesary. Instead, as I suggested previously, Availability ( for each individual machine ) in a specified time period will be used:

Availability = Time_period - Downtime

or

% Availability = ( Time_period - Downtime ) / Time_period

could be calculated for an individual machine. I believe Availability is a representative measure of current reliability.

In other words we won't be dealing with time-to-failure distribution at all. It won't be statistical prediction in time based on a sample distribution. It will be a delayed measure of the time during which a machine or a plant was available for usage in a specified past period.

How to proceed to the WHOLE PLANT AVAILABILITY I am not sure.

Dave
 
Posts: 1340 | Location: Texas | Registered: 22 February 2005Reply With QuoteReport This Post
Vee
Posted Hide Post
Dave,
quote:
% Availability = ( Time_period - Downtime ) / Time_period

could be calculated for an individual machine. I believe Availability is a representative measure of current reliability.

The formula is quite correct of course, but Availability is not always representative of Reliability. There is a small matter of maintainability to consider; the higher the maintainability, the higher the Availability.
In the special case of Hidden failures, and when we assume that the failed item is replaced or repaired as soon as we know of it, then Availability is indeed equal to or at least close to Reliability.

quote:
How to proceed to the WHOLE PLANT AVAILABILITY I am not sure.

I am not sure I understand your difficulty. Using a top-down approach you use exactly the same formula as you have stated, only this time the numbers are for the Plant as a whole.

This message has been edited. Last edited by: Vee,


Regards,
V.Narayan (Vee)
Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238
Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784
 
Posts: 1027 | Location: Scotland, UK. | Registered: 16 May 2004Reply With QuoteReport This Post
Posted Hide Post
Hi David,

Point 1 – Increase availability up to what limit?

As in many other circumstances in engineering, one is aimed at obtaining the maximum value at the minimum cost. This can be achieved by compromising. As Vee stated, the benefit tends monotonously to some specific value at the same time that cost rises steeply as some variable of interest is increased or decreased. Please see the attached Exhibit “Optimality”. This figure is classical and depicts the idea in quite a comprehensive way. Normally we stop when the increment of the benefit (in €) becomes lower than the increment of the cost (this happens in the vicinity of points Q1 and Q2). The same principle applies to reliability or availability of a system. When hazardous circumstances are involved that might cause deaths or injured people, you can use ALARP (As Low As Reasonably Practicable) principles http://en.wikipedia.org/wiki/ALARP or risk trees (if you work for an insurance company).

Point 2 – Availability versus reliability

Availability of a system comprises its reliability measured by MTTM (Mean Time To Maintenance – some times being corrective and others preventive) and its maintainability measured by MTTR (Mean Time To Repair, Replace, Restore or Recover). In the case of a repairable system you use both indicators, among others, to get a picture of how mainte-nance performance is being accomplished in the course of time. When availability is deemed to be increased, the result is higher when MTTR is diminished of a certain percentage than when MTTM is augmented of the same percentage. And, also, results are often obtained quicker and at a lower cost. In short, you can increase the availability of any piece of equipment just by improving workmanship methods during the period the piece of equipment is put at your disposal by Production – nothing to do with reliability. Reliability, in turn, can be improved just by choosing a different periodicity when maintenance is time based (or other unit related) or by providing better operational conditions (environmental and/or operators skills) or, still, better engineering – which is often rather difficult to put into practice.

Point 3 – population versus sample

Allow me a correction from a statistical perspective:

In management performance monitoring, regardless of which indicator is being used, we cannot refer to a population when addressing data, as this can be as large as we can imag-ine. We are always in a position where a few observations (forming a sample) are available and which can be treated on a timely basis in order to obtain some meaningful measure of performance as time goes by. Suppose, you gather information every week; you could have done it, say, every day, or every hour and…so forth. Further more, suppose you gather information every week and compute the mean over the last 100 events, for example (it could suffice as long as the coefficient of variation doesn’t exceed a certain allowable empiric threshold). Does this group of 100 observations form a population? Of course not, because you could have extended your time window back in time in order to embrace, say, 150 observations or 200 observations (if available and considered representative). This means that, in such occasions, we are actually treating samples, which means that we have always some degree of incertitude when reading or reporting a number obtained by manipulation of other numbers that have not come from a continuum spectrum. And because we have samples we calculate statistics instead of parameters (such as means, variances and the like) inside of confidence intervals.

Point 4 – The whole plant availability

Please see an example of a system availability calculation in the attached Excel file “System availability”.

Point 5 – Trend estimation

I will start a new thread with the subject “monitoring of performance indicators” which I think will be of your interest.

Regards,

Rui Assis

This message has been edited. Last edited by: Rui Assis,

PowerpointOptimality.ppt (68 KB, 9 downloads)
 
Posts: 29 | Location: Lisbon | Registered: 22 October 2007Reply With QuoteReport This Post
Posted Hide Post
David,

Here is the Excel file "System availabity" that couldn´t be acommodated in the previous post.

Regards,

Rui Assis

Excel SpreadsheetSystem_availability.xls (26 KB, 15 downloads)
 
Posts: 29 | Location: Lisbon | Registered: 22 October 2007Reply With QuoteReport This Post
  Powered by Social Strata Page 1 2  
 


Copyright © 2004-2008 NetexpressUSA Inc. All rights reserved.