Myron Tribus’ UCLA Statistical Thermodynamics class introduced me to entropy, -SUM[p(t)ln(p(t))]. (p(t) is the probability of state t of a system.) Professor Tribus later advocated maximum-entropy reliability estimation, because that “…best represents the current state of knowledge about a system…” [Principle of maximum entropy – Wikipedia] Caution! This article contains statistical neurohazards.
Claude Shannon wrote that entropy (log base 2) represents information bits, “…an absolute mathematical limit on how well data from the source can be losslessly compressed onto a perfectly noiseless channel.” [Beirlant et al.]
Maximum likelihood estimation is one way to estimate reliability from data. It maximizes the probability density function of observed data, PRODUCT[p(t)], e.g., for observed failures at ages t. It is equivalent to maximize -SUM[ln(p(t)]. Maximum entropy reliability estimation maximizes entropy -SUM[p(t)ln(p(t)]. That’s same as maximizing the expected value, -SUM[p(t)ln(p(t)], of the log likelihood -ln(p(t). Fine, if you have life data, ages at failures t censored or not.
What if You Don’t Have Life Data?
There’s reliability information in ships and returns counts [Want Field Reliability, Without Life Data? (accendoreliability.com)]. How much information? Compare the information (entropy) in reliability estimates from life data vs. estimates without life data (period ships and return counts); i.e., -SUM[p(t)ln(p(t)) conditional on life data vs. -SUM[q(t)ln(q(t))] conditional on ships at rate l(t) and returns R(t). The relative entropy or information loss, -SUM[p(t)ln(p(t)/q(t))], is the Kullback-Leibler divergence (between estimates with vs. without life data).
If you want more about the equivalence among maximum likelihood estimation, information, entropy, and Kullback-Leibler divergence, see the following references. [David Darmon, ThirdOrderScientist, https://thirdorderscientist.org/homoclinic-orbit/2013/4/1/maximum-likelihood-and-entropy/ and Korbinian Strimmer, http://www.strimmerlab.org/publications/lecture-notes/MATH20802/02-likelihood2.html]. For comparisons with vs. without life data, see https://sites.google.com/site/fieldreliability/random-tandem-queues-and-reliability-estimation-without-life-data/.
How to get Field Reliability Information from Ships and Returns?
Imagine that a product’s or part’s lifetime is the service time in a self-service system. Observe an M/G/infinity (queue) without identifying individual products or parts. (“M” denotes Poisson inputs at rate λ (“ships”), “G” denotes the service time cumulative distribution function G(t), and “infinity” means self-service.) M/G/infinity inputs and outputs correspond to ships and returns. Cumulative outputs (“returns”) have Poisson distribution with mean λ*G(t), where t is the time from start of the system’s inputs [Mirasol]. If you knew or estimate input rate λ, you could estimate G(t) from output counts R(t1), R(t2),… in periods t1, t2,… The estimates could be G(t1) = R(t1)/λ, G(t2) = (R(t1)+R(t2))/2λ,… BUT, because a cumulative distribution function G(t) is not supposed to decrease with age t, estimation requires “majorization,” i.e., a way to make the estimates of G(t) non-decreasing.
How to Maximize the Likelihood Function of Ships and Returns Counts?
Cory Atwood, Editor of the ASA SPES Newsletter that published my 1999 article, asked me for the likelihood function for M/G/infinity inputs and outputs. The likelihood function is the product, PRODUCT[Poisson[R(t), λ*g(t)], where Poisson[count, mean] is the Poisson probability of R(t) outputs (returns) in period t from prior ships cohorts S(s) s<=t. The likelihood function is a product, because Poisson processes have independent increments (periods). The system output rate λ*g(t) in period t is the input rate times the probability density function g(t) of the lifetime distribution G(t).
Albert Marshall and Frank Proschan proved that majorization gave the maximum likelihood estimator of an increasing failure rate function from periodic failure rate estimates. That is equivalent to maximizing the likelihood of the outputs of an M/G/infinity system. I borrowed a majorization method called “Pool Adjacent Violators Algorithm” (PAVA) [Barlow et al.]. The PAVA majorization formula for Poisson output at cumulative rate λG(t) is [Miserable formula didn’t render. Please see my NRLQ reference for original.]
$$ \displaystyle \lambda G\left(t_{k}\right)=max_{1\leq\alpha\leq k}min_{k\leq\beta\leq n}\left\{ \frac{R\left(\alpha\right)+R\left(\alpha+1\right)+\ldots+R\left(\beta\right)}{S\left(\alpha\right)+S\left(\alpha+1\right)+\ldots+S\left(\beta\right)}\right\} $$
for k = 1,2,…,n, where n is the total number of ships cohorts, S(t) are ships and R(t) are returns in period t.
Basically, PAVA means, if estimates “violate”, i.e., G(t1)>G(t2) for some t1<t2, “pool” the adjacent estimates of G(t1) and G(t2). So the maximum likelihood estimate of G(t) is the “majorized” λG(t)/λ.
How to Compute a Reliability Estimate from Ships and Returns?
Spreadsheet “npmle.xlsx” tables 3 and 4 in https://sites.google.com/site/fieldreliability/home/files-workbooks-etc implements PAVA. Mark Felthauser converted my PAVA spreadsheet and VBA software into R script that implements PAVA. I converted Mark’s R script into an R script that directly maximizes the Poisson likelihood function and deals with inadmissible or missing data. If you prefer R scripts, let me know, or else use npmle.xlsx or send your ships and returns data to pstlarry@yahoo.com to make nonparametric population estimates of your products’ or parts’ field reliability functions without life data.
References
- E. Barlow, D. J. Bartholomew, J. M. Bremner, and H. D. Brunk, “Statistical Inference Under Order Restrictions,” John Wiley & Sons, Dec. 1973, https://doi.org/10.1111/j.1467-9574.1973.tb00228.x
- Beirlant, E. J. Dudewicz, L. Györgi, E. C. van der Meulen, “Nonparametric Entropy Estimation: an Overview,” July 2001
- L. George, “Field Reliability Estimation Without Life Data,” ASA SPES Newsletter, Dec. 1999
- L. George, “Random-Tandem Queues and Reliability Estimation Without Life Data,” 2019, Random-Tandem Queues and Reliability Estimation, WIthout Life Data – Field Reliability (google.com), RANDTAND.DOCX or RANDTAND.PDF
- W. Marshall and F. Proschan, “Maximum Likelihood Estimation for Distributions with Monotone Failure Rate,” The Annals of Mathematical Statistics, 36, 69-77 (1965)
- M. Mirasol, The Output of an M/G/infinity Queuing System is Poisson,” Operations Research, 11, 282-284, (1963)
Claude Shannon, “Mathematical Theory of Communication,” Bell System Technical Journal, Vol. 27 (4): 623–656, (1948). doi:10.1002/j.1538-7305.1948.tb00917.x.
Larry George says
Thank Fred for getting the formula translated nicely.
Sorry about that: some Greek letters or symbols didn’t translate:
S =SUM as in entropy = SUM[p(t)log((t))]
P = Product as in likelihood = PRODUCT[Poisson(.,.)]
l = lambda as in lambda*G(t)
scrambled egg symbol should have been s <= t.
I will spell out symbols in future articles.
If you ask, pstlarry@yahoo.com, I will send *.docx or *.pdf version of article with original symbols.