What are conditions that binomial distribution approaches to Poisson distribution?

$\begingroup$

I'm reading two books and they say differently.

In a binomial distribution $X \sim \text{Bin}(n,p)$, if $n \to +\infty$, $X$ approaches to Poisson distribution $\text{Po}(np)$.

The other book says $X$ approaches to normal distribution $\text{N}(np,np(1-p))$.

I'm confused.

asked Aug 6, 2018 at 12:27

What are conditions that binomial distribution approaches to Poisson distribution?

$\endgroup$

$\begingroup$

Either the two books are sloppy or you're not reading precisely.

  • If $n\to\infty$ and $p\to0$ while $np$ approaches some positive number $\lambda,$ then the binomial distribution approaches a Poisson distribution with expected value $\lambda.$

  • If $n\to\infty$ as $p$ stays fixed, and $X\sim\operatorname{Binomial}(n,p)$ then the distribution of $(X-np)/\sqrt{np(1-p)}$ approaches the standard normal distribution, i.e. the normal distribution with expected value $0$ and standard deviation $1.$

It is sloppy to say something approaches something depending on $n$ as $n\to\infty,$ unless it is precisely defined and not meant literally. Thus the statement that something approaches $\operatorname{Binomial}(np, np(1-p))$ as $n\to\infty$ is not to be taken literally, but rather it means what is stated in the second bullet point above, where the limit, the standard normal distribution, does not depend on $n.$ In the first bullet point above, the statement that something approaches $\operatorname{Poisson}(np)$ can make sense only because $np$ does not depend on $n,$ that is, what is considered is a limit as $n\to\infty$ and $p\to0$ which $np$ remains fixed.

answered Aug 6, 2018 at 12:34

$\endgroup$

6

Cytometry

Nader Rifai PhD, in Tietz Textbook of Clinical Chemistry and Molecular Diagnostics, 2018

Counting Cells Scientifically: The Poisson Distribution

Before 1900, statistical methods were not rigorously applied in either clinical medicine or experimental science. The journalBiometrika began publication in 1901. In 1907, an article by “Student” demonstrated that the theoretical minimum error of cell counts with a hemocytometer varies with the square root of the number of cells actually counted,8 fitting the Poisson distribution described 70 years earlier. Poisson statistics apply to cells counted by any method and to other objects encountered (and counted) in cytometry, notably the photoelectrons generated by scattered or emitted light from cells interacting with cytometers' detectors. William S. Gossett had published under the pseudonym “Student” because his employers at the Guinness brewery were concerned that their competitors might benefit as they had from statistics (and cytometry); he had counted yeasts rather than blood cells in the hemocytometer.

In 1910, Ronald Ross, who had won the 1902 Nobel Prize in Medicine and would soon be knighted for his discovery that mosquitoes transmitted malaria, applied Gossett's findings to calculate how much blood he and his fellow malariologists needed to analyze to detect small numbers of parasites with reasonable precision. The required amount, several microliters, spread thickly on a glass slide, would take an observer over an hour to examine thoroughly using a high-power oil immersion lens. This might be acceptable for research but would be difficult to implement on a regular basis for clinical use; there was, however, no technology even imaginable as a replacement for a human observer at that time.

Counting cells using a hemocytometer and a microscope requires only that the observer be able to distinguish the cells of interest from everything else in the sample. Even that level of discrimination may not always be necessary. Consider the cellular ecology of human blood, a common sample for cytometry.

Red blood cells are the most abundant (~5,000,000/µL whole blood); their very numbers require a sample to be diluted several hundredfold to keep cells separated enough to be counted. The RBC concentration in whole blood is calculated from the number counted and the known dilution factor. Normal RBC volume is approximately 90 fL.

The typical WBC concentration in normal blood is 5000 to 10,000/µL, meaning that only one or two WBCs accompany each 1000 RBCs. WBCs vary in size from approximately 200 fL (lymphocytes) to more than 500 fL (monocytes), but there are larger lymphocytes and smaller monocytes. Although their hemoglobin content, lack of a nucleus, and smaller size make RBCs simple to discriminate from WBCs by microscopy or cytometry, most modern automated cell counters, which simply measure approximate cell size, do not make the distinction and instead include WBCs in RBC counts, with negligible effects on accuracy.

It had been known since the early days of hemocytometry that RBCs could be lysed and WBCs preserved for counting by diluting a blood sample with a hypotonic medium or with chemicals such as acids or detergents, and the same dilution procedure was later adapted to flow cytometric counters, which typically count WBCs in blood diluted approximately 1:10.

Finding Probabilities

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

Poisson Events Described

The Poisson distribution arises from situations in which there is a large number of opportunities for the event under scrutiny to occur but a small chance that it will occur on any one trial. The number of cases of bubonic plague would follow Poisson: a large number of patients can be found with chills, fever, tender enlarged lymph nodes, and restless confusion, but the chance of the syndrome being plague is extremely small for any randomly chosen patient. This distribution is named for Siméon Denis Poisson, who published the theory in 1837. The classic use of Poisson was in predicting the number of deaths of Prussian army officers from horse kicks from 1875 to 1894; there was a large number of kicks, but the chance of death from a randomly chosen kick was small.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123848642000068

Radiobiology of Radiotherapy and Radiosurgery

H. Richard Winn MD, in Youmans and Winn Neurological Surgery, 2017

Conventional Radiation

In order to achieve durable tumor control, all or at least a significant proportion of clonogenic cells must be eliminated so that they can no longer maintain the tumor. The probability of tumor control is derived from the Poisson distribution using the equationP =e−n, whereP is the probability of tumor control andn is the average number of survival clonogens after radiation. For example, reduction of a clonogenic population consisting of initially 109 cells by at least 9 logarithms would give a probability of 37% tumor control, and reduction by 10 logarithms would give about a 90% probability of tumor control.

The Poisson Distribution

Julien I.E. Hoffman, in Biostatistics for Medical and Biomedical Practitioners, 2015

Relationship to the Binomial Distribution

The Poisson distribution approximates the binomial distribution closely when n is very large and p is very small. It is the limiting form of the binomial distribution when n→∞, p→0, and np = μ are constant and <5. In the binomial distribution, the mean is given by np, and the standard deviation by npq. If n is large and p is very small, as in the Poisson approximation to the binomial, then the mean is still np, but the standard deviation is now ∼np, because q is almost 1. Therefore, the limiting value of the standard deviation, as the binomial distribution approaches the Poisson distribution, is the square root of the mean. This mathematical distribution can be applied to a binomial distribution in which the probability of an event, p, is very small and n is large, and also to a rare, random event in which we know the number of events that occur but do not know the number that do not occur. We know how many people in the cavalry corps were kicked to death by mules, but there is no way of knowing how many were not kicked to death by mules. We can count how many telephone calls were made to a particular number, but cannot know how many were not made. We know how many people were struck by lightening, but not how many were not struck. For the binomial example, we could calculate the probabilities of 0, 1, 2, etc., events from the binomial distribution, but when p is very small and n is very big, the Poisson calculation is simpler.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128023877000184

Pulmonary Alveolar Proteinosis Syndrome

V. Courtney Broaddus MD, in Murray & Nadel's Textbook of Respiratory Medicine, 2022

Epidemiology

Disorders of surfactant homeostasis are found in worldwide distribution but are quite rare. Since the initial description of PAP, more than 1000 separate cases of PAP2,4–6,23,89,105,106 and substantially fewer cases of PSMD have been reported in the medical literature (eTable 98.1). A comprehensive meta-analysis of 410 separate cases of PAP representing all clinical subtypes and most or all cases of PAP reported in the medical literature through 1998 found that patients were more likely to be male (male/female ratio = 2.65:1.0) and that men predominated among smokers (male/female ratio = 2.78:1.0) but not among nonsmokers (male/female ratio = 0.69:1.0).2 These results suggested that the high proportion of men among PAP patients may be explained by their higher frequency of tobacco use. More recent evaluations have refined these estimates for individual diseases. Consequently, the epidemiology of each is considered individually.

eTable 98.1Comparison of Demographic and Epidemiology Data Among Five Large Series of PAP Patients

Seymour2 (n = 410)Inoue4 (n = 233)Xu5 (n = 241)Bonella6 (n = 70)Campo7 (n = 81)
Age at diagnosis (mean, range) 39 (30–46) 51 (41–58) 42 (N/A) 43 (18–78) 40 (26–54)
Ratio (male/female) 2.6 2.0 2.2 1.3 2.0
Primary PAP (%) N/A 90 N/A 91 90
Secondary PAP (%) N/A 10 N/A 9 3.7
Time to diagnosis (months) 7 (3–19) 10 (4–36) N/A 9 (1–36) 11 (0–27)
Spontaneous remitters (%) 6 5 N/A 5 7
Smoking habits (%)
Never 28 43 21 36
Previous N/A 29 30 42
Current N/A 29 49 22
Dust exposure (%) N/A 26 N/A 54 32
Whole-lung lavage 54% N/A 59% 90% 54%

N/A, not available; PAP, pulmonary alveolar proteinosis.

Cell growth

Frank H. Stephenson, in Calculations for Molecular Biology and Biotechnology (Second Edition), 2010

3.12.1 The Poisson distribution

The Poisson distribution is used to describe the distribution of rare events in a large population. For example, at any particular time, there is a certain probability that a particular cell within a large population of cells will acquire a mutation. Mutation acquisition is a rare event. If the large population of cells is divided into smaller cultures, as is done in the fluctuation test, the Poisson distribution can be used to determine the probability that any particular small culture will contain a mutated cell.

Calculating a Poisson distribution probability requires the use of the number e, described in the following box.

The number e

In molecular biology, statistics, physics, and engineering, most calculations employing the use of logarithms are in one of two bases, either base 10 or base e. The number e is the base of the natural logarithms, designated as ln. For example, ln 2 is equivalent to loge 2. The value of e is roughly equal to 2.7182818. e is called an irrational number because its decimal representation neither terminates nor repeats. In that regard, it is like the number pi (p) (the ratio of the circumference of a circle to its diameter). In fact, pi and e are related by the expression eip = 1, where i is equal to the square root of −1.

Many calculators have an ln key for finding natural logarithms. Many calculators also have an ex key, used to find the antilogarithm base e.

The Poisson distribution is written mathematically as

P=e-mmrr!

where P is the fraction of samples that will contain r objects each, if an average of m objects per sample is distributed at random over the collection of samples. (The m component is sometimes referred to as the expectation.) The component e is the base of the natural system of logarithms (see previous box). The exclamation mark, !, is the symbol for factorial. The factorial of a number is the product of the specified number and each positive integer less than itself down to and including 1. For example, 5! (read ‘5 factorial’) is equal to 5 × 4  ×  3 × 2  ×  1 = 120. 0! is equal to 1.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123756909000036

STATIONARY DISTRIBUTIONS IN STOCHASTIC KINETICS*

J. Tóth, T.L. Török, in Mathematical and Computational Methods in Physiology, 1981

4 POISSONIAN STATIONARY DISTRIBUTION

The significance of the Poisson distribution is generally overemphasized in the literature. In this field, it was conjectured that the stationary distribution in stochastic kinetics is usually Poissonian /Prigogine, 1978/. This is not the case by far as our theorems below show.

Theorem 3 /Érdi and Tóth, 1979/: The necessary and sufficient condition for a simple birth and death process to have a Poisson stationary distribution.

/10/ P(n)=∏i=1Mλi nie−λini(λ1,λ2,…,λM∈R+)

/if it has a non-degenerate stationary distribution at all/ that it be a linear one, i.e. the functions φi. in Eq. /2/ be homogeneous linear functions.

The proof is based upon a special case of Eq. /9/ and upon the corresponding definitions.-

Theorem 4 /Tóth, 1980/: The stationary distribution of a detailed balanced complex chemical reaction belonging to the class of simple Markov population processes, i.e. a process with

/11/α¯i(ni):=∑k =0aiαi,knik

/12/ β¯i(ni):=∑k=0biβi,knik

/13/γ¯ij(ni):=∑k=0oijγij,knik

αi,k'βi,k'γij,k∈Ro+;ni∈No,i,j∈{1,2,…,M}

has a stationary distribution of the form expressed by Eq. /10/ if and only if the following relations hold:

/14/ai+1=bic ij=1γij,o=0

/16/αi,r=(αi,ai/βi,ai+1)∑k=r+1ai+1(k−1r)βi,k (r=0,1,...,ai−1)

/18/γji,1=γij,1αi,aiβj,aj+1αj,ajβi,ai+1

The proof of the theorem is based upon the comparison of the coeffiecients of multivariable polynomials, and it is very similar to the proof of the one-dimensional case as described by Tóth and Török /1980/.-

The transition intensities in Eqs. /11/–/13/ may be related to the following complex chemical reaction:

/19/k A(i)⇌βi,k+1αi,k (k+1)A(i);k∈{0,1,…,ai}

/20/kA(i)→γij,kA(j)+(k−1)A (i);k∈{0,1,…,oij}

We conclude that the complex chemical reaction expressed by the Eqs. /19//20/ results in a Poissonian stationary distribution if and only if the respective reaction rates are interdependent and the dependence is as expressed in Eqs. /14/–/18/.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080273563500183

Development of Early Warning Models

Yajia Lan, ... Shengjie Lai, in Early Warning for Infectious Disease Outbreak, 2017

3.2.8 Poisson Regression

Poisson regression is a time series regression model that is based on the Poisson distribution and is applicable for early warning and predicting diseases that have low incidence rates. It assumes that the number/incidence of cases at time t is subject to a Poisson distribution with a mean, μt, i.e., Yt~Pμt , and μt can be expressed as the log-linear model of time t, as shown in Eq. (3.25).

(3.25)logμt=α+β⋅t

When using the Poisson regression model, the first issue to be addressed is the asymmetry of distribution. Early warning limits that have symmetric intervals may reduce the efficiency of early warning models, as they may result in false positive signals. One of the solutions to this problem is to convert the data. For counts that conform to the Poisson distribution, converting with a power of 3/2 can produce an approximate symmetrical distribution. The count value 100(1 − α)% prediction interval at time t is as follows:

(3.26)μˆ×1±23zα/2 ×1μˆ+Vt3/2

μˆt=expαˆ+βˆt is derived from the estimate of the baseline value, and

(3.27)Vt=varαˆ+t2varβˆ+2tcovαˆβˆ

In Eqs. (3.26) and (3.27), all parameters that are used to calculate early warning limits can be obtained with standard regression analysis software.

Example: Table 3.4 lists scarlet fever cases for a period of 15 weeks in a region. These data are fitted into a Poisson regression early warning model.

Table 3.4. The Simulative Monitoring Data in the Past 25 Weeks

Monitor week (t)Report number (y)Monitor week (t)Report number (y)
1 1 14 2
2 0 15 0
3 1 16 5
4 0 17 1
5 0 18 1
6 1 19 2
7 2 20 1
8 7 21 1
9 1 22 2
10 6 23 1
11 0 24 1
12 1 25 1
13 2

Model parameters are calculated by using surveillance data:

αˆ=0.438602,βˆ=0.002404varαˆ=0.107835varβˆ=0.000481covαˆβˆ=−0.006311

If the current week is the 26th week, the predictive value is:

μˆ8= expαˆ+8βˆ=1.650545Vμˆ8=0.104709

The 95% prediction limit is:

1.650545×1+2 3×1.96×11.650545+0.1047093/2=5.0

When the count value exceeds 5, the aberration value is determined.

In contrast to normal error regression, this method is used for rare events (e.g., in a hospital setting). However, the Poisson assumptions are not met in many circumstances, and other methods are needed to address extra-Poisson variation.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128123430000035

Confidence Intervals

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

Example Completed, Rare Event: CI on Proportion Children’s Lead Levels

Because p=0.012, π is evidently close to 0, implying the Poisson distribution. σ=p/n=0.012/2500=0.00219. The hospital administration wants to be sure that it does not have too many high-lead children in its catchment and therefore chooses a 99% confidence interval. From Table 7.1, the 0.99 two-tailed 1−α yields a corresponding z of 2.576. Replacing the 1.96 in Eq. (7.11) with 2.576, we obtain

P[p-2.576×σ< π

By focusing on the right tail, the hospital administration may be 99.5% confident that no more than 1.8% of children in its catchment have high lead levels.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012384864200007X

Chain Polymerization I

Norman C. Billingham, in Comprehensive Polymer Science and Supplements, 1989

4.3.4.1 Instantaneous initiation with termination or transfer

Szwarc4 has emphasized the practical conditions required for synthesis of a polymer with a Poisson distribution of molecular weights. All chains must be initiated simultaneously, termination and transfer must be excluded, the reaction conditions must be spatially homogeneous, the rate of propagation must be independent of chain length and all propagation steps must be identical and irreversible. These conditions are extremely difficult to achieve. In particular, increasing the molecular weight of the polymer requires either a higher monomer concentration or a lower active centre concentration. Since the monomer concentration is limited by the need to keep the solution fluid and homogeneous, higher molecular weights imply reduced active centre concentrations and it becomes increasingly harder to eliminate termination by impurities.

Szwarc4 and Peebles2 both give exhaustive treatments of the effects of impurity termination in living polymerization. The precise effect depends on the exact reaction. If impurities added with the monomer very rapidly kill a proportion of the active centres, then the distribution remains narrow but the average molecular size is higher than expected. Termination which is slower relative to propagation causes the deactivation of some chains at the same time as others continue to propagate and the molecular weight distribution becomes broadened in a complex way which depends on whether termination occurs at a constant rate or at a rate which varies with conversion.2 An understanding of these effects is fundamental in attempts to synthesize monodisperse polymers.

The effect of chain transfer on a living polymerization with rapid initiation has been analyzed by Coleman et al.9 and by Orofino and Wenger.10 This situation is common in cationic polymerizations where initiation is often very rapid, termination slow or nonexistent and the reaction dominated by chain transfer to monomer (Volume 3, Part 4). The effect is to broaden the distribution because although all active centres are generated simultaneously, molecular chains are generated randomly. If the transfer reaction is sufficiently rapid relative to the propagation, then the polymer produced in a batch reactor is expected to have a most probable distribution.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080967011000665

What are the 3 conditions for a Poisson distribution?

Poisson Process Criteria Events are independent of each other. The occurrence of one event does not affect the probability another event will occur. The average rate (events per time period) is constant. Two events cannot occur at the same time.

What are the conditions for binomial distribution to be used?

The binomial distribution describes the behavior of a count variable X if the following conditions apply: 1: The number of observations n is fixed. 2: Each observation is independent. 3: Each observation represents one of two outcomes ("success" or "failure").
Binomial distribution describes the distribution of binary data from a finite sample. Thus it gives the probability of getting r events out of n trials. Poisson distribution describes the distribution of binary data from an infinite sample. Thus it gives the probability of getting r events in a population.

What are the conditions under which the binomial n P distribution can be approximated by the Poisson and normal distributions respectively?

The Poisson distribution can be derived as a limiting case to the binomial distribution bin(n,p) as the number n of trials goes to infinity and the expected number np of successes remains fixed — see law of rare events below.