Join or Manage Your Profile
Posting Boards
Maintenance and Reliability
Posts About Improving Reliability
% Reliability Success|
Go
![]() |
New
![]() |
Find
![]() |
Notify
![]() |
Tools
![]() |
Reply
![]() |
|
Following this definitely productive discussion could somebody define quantitatively the term "reliability" and specifically for the whole plant.
If for a single item failing randomly the probability is defined as : R = 1 - P ( P is probability of failure, P as derivative of MTBF ) and for a time related failure rate single item, expression for R is more complex ( can't remember now ) how do you measure R for a whole plant? Of course one can connect individual pieces of equipment reliability with logical AND's and OR's thus calculating overall plant reliability but is that the way for a 1000+ pieces plant? Sounds cumbersome to me... not mentioning that not all individual pieces of equipment reliabilities are available. Dave This message has been edited. Last edited by: David_G, |
||||
|
Dave,
Taking you car as an example, it has many systems that need to be reliable for the car as a whole to be reliable. These include e.g., power generation, transmission, suspension, safety systems such as braking, lighting etc. Each of these systems has sub-systems. Taking power transmission as an example, you have a clutch assembly, gearbox, power shaft, differential, wheels Each of these has components, such as bearings. We can work at the top end; for the car as a whole, failure may mean one or more of the following may not work on demand: - engine does not start - a tire is flat - the wipers don't work - brake light is fused etc. The time between these 'failures' affects the car as whole. So if we record all these failures and their dates, we can compute the car's reliability. We can also work bottom-up. This is more tedious, as we have to build a mathematical model, using, as you mentioned, AND/OR logic symbols. If we know the reliability parameters of each component, we can 'run' a simulation model to get the reliability of the sub-assembly. We then build a mathematical model of the car, assembling all the sub-systems, again using logic operators AND/OR. We run this model and get the reliability of the car. In practice, we build just one large model with all the systems broken down to component level. The top-down is method is based on historical data and is 'after-the-event' When building a new project we can use the bottom up approach. I have tried to put all this is simple terms. The reality is somewhat more complex, but I hope you get the picture. Regards, V.Narayan (Vee) Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238 Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784 |
||||
|
Vee, If you view reliability of a complex system as a statistical parameter by calculating it as a function of MTBF then it will be extremely non-representative. The reason being is that you do not want to mix in the same pot different failure modes while calculating it. In the case of the car, failure of the cam shaft in the engine can not be compared with a wiper failure by its significance, consequences, time to correct, and frequency of occurrence although either one may prevent from driving. My point: one needs to know component's Reliability and then calculate system's reliability. It is hard to do in real life though. Dave |
||||
|
Hi,Dave, in process industry, we seem using the historical data to generate a model(weibull or other types), and we calculate the system availablity by such model. It maybe not accurate, but we dont know how to get the availability better than this. But have you seen people calculate the availabiltiy by bottom-up? That must be time-consuming. Just curious to know it. Thanks.
|
||||
|
Dave,
You said
You must define what is a failure of the car. If for argument's sake, you accept that the car meets its functional requirements when - the brake lights are fused or - the traffic signal lights don't work, or - the wiper does not work then you are absolutely right. After all, with all these faults, the car will still get you from A to B. But if the traffic police catch you without functioning brake lights or signaling lights, I think you will have an embarrassing discussion. Certainly, if the car needs an annual certification for road-worthiness, it will fail. So to take the car as a whole, EVERY failure that you define as a loss of function of the car has the same significance. A fused brake light is no more or no less important than a flat tire or emissions beyond preset limits. With these qualifications, for the car as a whole (or its equivalent, a Manufacturing Plant), we can compute the reliability parameters. MTBF is an average value. It is useful as a rough metric, a sort of quick and dirty number. When we want to do some serious reliability analysis, we need the actual failure distribution, or probability density curve. This is not easy to do for every failure mode. We have to compute the parameters that define such curves. One of the more popular distributions that appears to fit most equipment failure data is the Weibull. Three parameters, using the greek symbols eta, gamma and beta can define any Weibull distribution. The fun part of Weibull is that we can get approximations of other distributions such as the gaussian (or normal), exponential, log-normal and a whole host of other distributions that one might face in real life . That ia why it is popular. We need such data when we build models, not just MTBFs. Scarlett, a good modeler will do a reality check by doing what is called 'history-matching' This process allows one the validate the model for its closeness to reality. You ask Quote But have you seen people calculate the availabiltiy by bottom-up? That must be time-consuming. Just curious to know it Unquote The answer is yes. There are many Consulting firms who can help you do these and there are many software packages for this purpose. - This message has been edited. Last edited by: Vee, Regards, V.Narayan (Vee) Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238 Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784 |
||||
|
Vee, Scarlett,
IMO, the fact that people in the maintenance reliability business are using plant MTBF an indicator of reliability is unfortunate. To calculate reliability from the bottom is the correct way of doing it although labor intensive. A Weibull distribution will have a n more distinctive pattern ( meaning data won't be smeared that much ) if less failure modes in a sample are considered. The perfect case is a single failure mode. And when one considers smaller components, such as motor or fan, fewer modes are mixed together then that of a complex system. Vee, you said that MTBF is a "quick and dirty number". I can't agree more, but how dirty? When you estimate, for example, speed of a car you know that you may be no more then, say, 10% off, but in all likelyhood you do not know how far off is plant's MTBF/reliability calculated in such an overly simplistic way will be different from the true one. By not knowing its accuracy how can one rely on it? Another point... In general, statistical analysis in reliability is a process when one takes a sample of, as in this case, time between failures, calculates relibility, and then makes a reliability inferrence in regards to the whole population. In a real plant we do not have to take a sample. We can deal with the population since each failure and amount of downtime for every piece of equipmnet is stored in the CMMS database. Therefore here is a suggestion. Forget about MTBF and statistics. Work with the whole population. Sum up all uptimes for each individual equipments in the plant, such as motors, pumps, gearboxes, transmissions, fans, etc., and find a parameter, which we'll call Availability = sum_of_uptimes / maximum_available_sum_of_uptimes Any CMMS can do it easily if data is entered properly. This parameter may not reflect what is the percentage of time the plant was making the product that went out of the door but it will be representative in regards to the relaibility efforts. David |
||||
|
Dave,
If I understand you correctly, when you say
we work with the whole poplulation, then there is a problem, as follows. For ANY reliability analysis, be it merely computing MTBF or Weibull parameters, we must satisfy two conditions 1. The failure mode under consideration must be independent , i.e., not influence another. Thus a bearing failure should not induce a seal failure, or one tire should not influence another tire's performance in a car. 2. Every data point wrt failure modes must be under identical conditions. Thus if your car tire population is 4, then EVERY tire must be operating under the same operating context (function, load) AND subject to identical degradation mechanisms (traction or not, cambering, brake type, air pressure, make of tire etc.) Thesse cpnditions are clearly not possible to obtain in practice: the left front wheel works under different functional requirements and load than the rear right wheel. Every tire influences every other tire as well. For these reasons, we HAVE to select samples that minimise such errors. After all, pressure relief valves are not all identical in size, design, or service conditions. How often do we see PRV MTBFs that are based on populations? How believable are such numbers? This message has been edited. Last edited by: Vee, Regards, V.Narayan (Vee) Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238 Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784 |
||||
|
Vee, I do not see any of the listed above being a problem since if whole population is taken ( all machines at the plant) then calculating MTBF, where failure independance and failure modes should be considered, won't be neceesary. Instead, as I suggested previously, Availability ( for each individual machine ) in a specified time period will be used: Availability = Time_period - Downtime or % Availability = ( Time_period - Downtime ) / Time_period could be calculated for an individual machine. I believe Availability is a representative measure of current reliability. In other words we won't be dealing with time-to-failure distribution at all. It won't be statistical prediction in time based on a sample distribution. It will be a delayed measure of the time during which a machine or a plant was available for usage in a specified past period. How to proceed to the WHOLE PLANT AVAILABILITY I am not sure. Dave |
||||
|
Dave,
The formula is quite correct of course, but Availability is not always representative of Reliability. There is a small matter of maintainability to consider; the higher the maintainability, the higher the Availability. In the special case of Hidden failures, and when we assume that the failed item is replaced or repaired as soon as we know of it, then Availability is indeed equal to or at least close to Reliability.
I am not sure I understand your difficulty. Using a top-down approach you use exactly the same formula as you have stated, only this time the numbers are for the Plant as a whole. This message has been edited. Last edited by: Vee, Regards, V.Narayan (Vee) Lead Author, 100 Years of Maintenance: Practical Lessons from Three Lifetimes, Industrial Press.NY ISBN-13: 978-0831133238 Author, Effective Maintenance Management: Risk and Reliability Strategies for Optimizing Performance, 2004, Industrial Press NY ISBN-13: 978-0831131784 |
||||
|
Hi David,
Point 1 – Increase availability up to what limit? As in many other circumstances in engineering, one is aimed at obtaining the maximum value at the minimum cost. This can be achieved by compromising. As Vee stated, the benefit tends monotonously to some specific value at the same time that cost rises steeply as some variable of interest is increased or decreased. Please see the attached Exhibit “Optimality”. This figure is classical and depicts the idea in quite a comprehensive way. Normally we stop when the increment of the benefit (in €) becomes lower than the increment of the cost (this happens in the vicinity of points Q1 and Q2). The same principle applies to reliability or availability of a system. When hazardous circumstances are involved that might cause deaths or injured people, you can use ALARP (As Low As Reasonably Practicable) principles http://en.wikipedia.org/wiki/ALARP or risk trees (if you work for an insurance company). Point 2 – Availability versus reliability Availability of a system comprises its reliability measured by MTTM (Mean Time To Maintenance – some times being corrective and others preventive) and its maintainability measured by MTTR (Mean Time To Repair, Replace, Restore or Recover). In the case of a repairable system you use both indicators, among others, to get a picture of how mainte-nance performance is being accomplished in the course of time. When availability is deemed to be increased, the result is higher when MTTR is diminished of a certain percentage than when MTTM is augmented of the same percentage. And, also, results are often obtained quicker and at a lower cost. In short, you can increase the availability of any piece of equipment just by improving workmanship methods during the period the piece of equipment is put at your disposal by Production – nothing to do with reliability. Reliability, in turn, can be improved just by choosing a different periodicity when maintenance is time based (or other unit related) or by providing better operational conditions (environmental and/or operators skills) or, still, better engineering – which is often rather difficult to put into practice. Point 3 – population versus sample Allow me a correction from a statistical perspective: In management performance monitoring, regardless of which indicator is being used, we cannot refer to a population when addressing data, as this can be as large as we can imag-ine. We are always in a position where a few observations (forming a sample) are available and which can be treated on a timely basis in order to obtain some meaningful measure of performance as time goes by. Suppose, you gather information every week; you could have done it, say, every day, or every hour and…so forth. Further more, suppose you gather information every week and compute the mean over the last 100 events, for example (it could suffice as long as the coefficient of variation doesn’t exceed a certain allowable empiric threshold). Does this group of 100 observations form a population? Of course not, because you could have extended your time window back in time in order to embrace, say, 150 observations or 200 observations (if available and considered representative). This means that, in such occasions, we are actually treating samples, which means that we have always some degree of incertitude when reading or reporting a number obtained by manipulation of other numbers that have not come from a continuum spectrum. And because we have samples we calculate statistics instead of parameters (such as means, variances and the like) inside of confidence intervals. Point 4 – The whole plant availability Please see an example of a system availability calculation in the attached Excel file “System availability”. Point 5 – Trend estimation I will start a new thread with the subject “monitoring of performance indicators” which I think will be of your interest. Regards, Rui Assis This message has been edited. Last edited by: Rui Assis, Optimality.ppt (68 KB, 9 downloads) |
||||
|
David,
Here is the Excel file "System availabity" that couldn´t be acommodated in the previous post. Regards, Rui Assis System_availability.xls (26 KB, 15 downloads) |
||||
|
| Powered by Social Strata | Page 1 2 |
| Please Wait. Your request is being processed... |
|

