Langbahn Team – Weltmeisterschaft

Talk:Central limit theorem: Difference between revisions

Content deleted Content added
SineBot (talk | contribs)
m Signing comment by 76.76.233.148 - "Help, clarification: "
O18 (talk | contribs)
move recent comment to end.
Line 14: Line 14:
| url=
| url=
| date= }}
| date= }}

==Help, clarification==

Can we be careful to define "sum of distribution functions". Clearly this is not the process say SUM(U,V) = (U + V)/2 which indeed yields a distribution function with all the nice properties, and looks like a "sum". The usage of convolution needs to be well explained and tied to the notion of the "distribution of the means". If we have a random process which, say, is constrained outputs values in the range [0,1] then clearly the means will never reach a limit which is a normal distribution.

Z_n = \frac{S_n - n \mu}{\sigma \sqrt{n}}\,,
the above appears on the page missing the forward slash for division between sigma and sqrt(n) <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/76.76.233.148|76.76.233.148]] ([[User talk:76.76.233.148|talk]]) 23:31, 14 November 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->

Correction to correction above: Never mind...neglected to note that the numerator contains just the sum and not the mean so formula is correct (but arcane) as it stands... <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/76.76.233.148|76.76.233.148]] ([[User talk:76.76.233.148|talk]]) 23:52, 14 November 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->


==This is to bring contents to the top==
==This is to bring contents to the top==
Line 373: Line 364:


: Surely the reference to Turing's fellowship disertation should refer not to ''Cambridge University'' but to ''King's College Cambridge''! [[Special:Contributions/86.22.72.56|86.22.72.56]] ([[User talk:86.22.72.56|talk]]) 18:13, 10 November 2009 (UTC)
: Surely the reference to Turing's fellowship disertation should refer not to ''Cambridge University'' but to ''King's College Cambridge''! [[Special:Contributions/86.22.72.56|86.22.72.56]] ([[User talk:86.22.72.56|talk]]) 18:13, 10 November 2009 (UTC)
==Help, clarification==

Can we be careful to define "sum of distribution functions". Clearly this is not the process say SUM(U,V) = (U + V)/2 which indeed yields a distribution function with all the nice properties, and looks like a "sum". The usage of convolution needs to be well explained and tied to the notion of the "distribution of the means". If we have a random process which, say, is constrained outputs values in the range [0,1] then clearly the means will never reach a limit which is a normal distribution.

Z_n = \frac{S_n - n \mu}{\sigma \sqrt{n}}\,,
the above appears on the page missing the forward slash for division between sigma and sqrt(n) <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/76.76.233.148|76.76.233.148]] ([[User talk:76.76.233.148|talk]]) 23:31, 14 November 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->

Correction to correction above: Never mind...neglected to note that the numerator contains just the sum and not the mean so formula is correct (but arcane) as it stands... <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/76.76.233.148|76.76.233.148]] ([[User talk:76.76.233.148|talk]]) 23:52, 14 November 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->

Revision as of 02:16, 15 November 2009

WikiProject iconMathematics B‑class Top‑priority
WikiProject iconThis article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
BThis article has been rated as B-class on Wikipedia's content assessment scale.
TopThis article has been rated as Top-priority on the project's priority scale.
WikiProject iconStatistics Unassessed
WikiProject iconThis article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the importance scale.

This is to bring contents to the top

I removed this:

An interesting illustration of the central tendency, or Central Limit Theorem, is to compare, for a number of lifts (elevators for those on the left-hand side of the Atlantic), the maximum load and the maximum number of people. For small lifts holding only a few people, the maximum load divided by maximum number of people is usually greater than it is in large lifts holding a larger number of people. This is necessary because some small groups of people who fill the lift may well have several people who are above average weight (just as, on other occasions, other small groups may have several who are well below average weight), whereas the larger the sample (the number of people in the large lift) the nearer the proportion of overweight people will be to the norm for the whole population.

While it is a nice example, it doesn't illustrate the Central limit theorem, whose gist is that the sum is normally distributed. I don't quite know where to put this example though. Maybe in standard deviation or normal distribution? AxelBoldt 21:02 Oct 14, 2002 (UTC)


I've encountered another definition of "the" central limit theorem.

My statistics textbook (Mathematical Statistics with Applications, 6th edition, by Wackerly, Mendenhall III, and Scheaffer) defines it in this way:

If Y1, Y2, ..., Yn are iid with μ and σ, then n1/2*(Ybar-μ)/σ converges to a standard normal distribution as n goes to infinity. (my paraphrase)

The HyperStat on-line basic statistics text says

The central limit theorem states that given a distribution with a mean m and variance s2, the sampling distribution of the mean approaches a normal distribution with a mean (m) and a variance s2/N as N, the sample size, increases. (quoted directly)

I suppose this follows from the definition given in this article. Nonetheless, it is not identical to the one given in the article.

Is there a general trend for more basic/applied statistics books to use this mean-centric definition, while more advanced/theoretical ones use the definition given in the article? Is the definition given in the article better somehow? (I assume the mean-centric definition can be derived from it, but not vice versa.) Should the article also mention the mean-centric definition, since it seems to be somewhat popular?

--Ryguasu 10:52 Dec 2, 2002 (UTC)

No --- the "mean-centric" version and the "sum-centric" version are trivially exactly the same thing; either can be derived from the other, and it's completely trivial: Just multiply both the numerator and the denominator by the same thing; you need to figure out which thing. Michael Hardy 04:34 Feb 21, 2003 (UTC)
Right. This became obvious to me sometime after posting the question. Nonetheless, I think I'm going to stick in the mean-based formulation at some point; I've found more books using only the mean-based definition, and I imagine that some not so mathematically inclined people who nonetheless have to brush up against the CLT (certain social scientists come to mind) might like having what is not trivial to them pointed out. I agree, however, that unless proofs of the CLT typically involve the mean-based formulation, the one currently given on this page should be presented as more fundamental. --Ryguasu

Maybe I'm getting in over my head here, but do you really need to normalize Sn to say anything precise here? Can't we clarify the first "informal" claim of convergence of Sn by saying, parallel to what AxelBoldt has said for the normalized (i.e. Zn) case

The distribution of Sn converges towards the normal distribution N(nμ,σ2n) as n approaches ∞. This means: if F(z) is the cumulative distribution function of N(nμ,σ2n), then for every real number z, we have
limn→∞ Pr(Snz) = F(z).

Is there a lurking desire here to state the non-standard normal part as a corollary, rather than as central to the CLT? That might be ok, although the general-purpose version looks more useful to me.

--Ryguasu 01:18 Dec 11, 2002 (UTC)

The problem is that on one side of your equality you have a limit as n approaches infinity, so that the value of that side does not depend on anything called n, and which CDF you've got on the other side does depend on the value of n. -- Mike Hardy

Actually, the CDF on the right hand size depends on z, not on n. There are no free ns anywhere. --Ryguasu

It does depend on n, but your notation inappropriately suppresses that dependency. You defined F(z) as the cumulative distribution function of N(nμ,σ2n). AxelBoldt 02:23 Dec 14, 2002 (UTC)

Excellent point. Nonetheless, I find it suspicious that someone with more mathematical experience than me can't express the "informal" claim in a rigorous manner. At Talk:Normal distribution, you mentioned "goodness of fit" tests. Couldn't you express the informal version formally, through some limit statement about the results of such a test as the number of samples/trials goes to infinity? --Ryguasu 02:11 Jan 30, 2003 (UTC)

Probably, I don't know. But the version given in the article is also a rigorous statement of the "informal" claim you have in mind. AxelBoldt 00:55 Jan 31, 2003 (UTC)


How about adding some examples? (This is something most of the math pages are lacking.) How about an illustration involving coin flips? I.e., X_n is defined on the probability space [0, 1] so that X_n is 1 with probability 1/2 and -1 with probability 1/2. A series of graphs and equations could be given.


In the article, there is a comment reading, "picture of a distribution being "smoothed out" by summation would be nice". I've created an animated gif to address this comment. Since animated gifs are considered questionable, I am posting it to the talk page to see if others think it's a good idea. (The image has a rather large footprint on the screen. If anyone can easily shrink it, that would be good. With the rather rudimentary image manipulation tools at my disposal, it would be a moderately involved undertaking for me, so I'm not going to do it unless it's a worthwhile effort.)

I also propose the following explanatory text:

The figure below demonstrates the central limit theorem in action. It shows the distribution of the random variable Y = nSn for values of n from 1 to 7. (In this particular case, the random variables Xi have variance equal to 1, so the variance of Sn is equal to n. The factor n scales Y so that its variance is equal to 1 independent of n.)

Any and all comments appreciated. -- Cyan 22:15, 2 Feb 2004 (UTC)

Testing...

The central limit theorem in action

Yes, using the thumbnail feature would be a quick work-around. I don't know anything about this, but the diagram seems useful to me (it's particularly useful that it pauses between repetitions). You can count along in your head 1 to 7 as the shape of the graph changes, it doesn't rely on captions you need to read at the same time as observing the graph. I give it my uninformed support.  :) (Plus, if this is replicating information already included in the text then that's even better; relying on an animated gif to impart key information rather than to give an example of it would be a bad thing). fabiform | talk 04:32, 4 Feb 2004 (UTC)

An animation can't be printed, and I've always found animated diagrams to be very frustrating, particularly in a case like this. I have to wait for it to come around again if I'm trying to wrap my head around some individual part of it. There's no pause button, no frame forward, no rewind, at least in most browsers. I'd rather see such images side by side in most cases. Perhaps an animation in addition might be neat, but forcing it on readers is to me not friendly.
Here's a quick vertically flattened version (which could float to the side of the body text, for instance). A horizontal version might be better, or break it on two lines. --Brion 09:15, 4 Feb 2004 (UTC)
My $0.02:
a) This is, indeed, an example of an appropriate use of an animated GIF. There's no actual need to change it. However...
b) I actually think that in this particular case the separate pictures are really just as good. I find the animation irritatingly jumpy, and, of course, the constant-time steps are too fast for the early steps (where you might even want to take a moment to visualize the convolution in your head, and notice that you go from two sharp peaks to three blunt peaks to a single broad peak with four bumps), and too slow for the later steps (which all look alike). This is a nit-pick, though.
c) Footprint of the animated version is OK. Note, however, that you could easily reduce the extent of the X axis to +/- 3.5. Maybe by the last iteration there is some data outside those limits and maybe you know it's there, but visually it doesn't matter.
d) The individual thumbnails in Brion's version need a bit of work. They're currently too small and the vertical arrangement isn't very good. You're going to get a million "try this, try that" suggestions, each of which would be a couple of hours' work to try... mine is that you use a table and put them into some kind of comic strip format, maybe two rows of four, maybe four rows of two... yes, you'd need to provide an eighth image but since it would look just the same as the seventh that wouldn't be a problem... you'd need to tinker with the axis labelling, slightly bigger type, perhaps slightly fewer divisions... the axis labels (numbers) do NOT need to be TRULY legible, they should be reduced with antialiases smoothing, it's OK if they look blurry when you enlarge them, but they need to be just legible enough that you think you're seeing 1, 2, 3...
Very appropriate to the subject matter, by the way, and a nice illustration. Good stuff! Dpbsmith 11:37, 4 Feb 2004 (UTC)

Thanks for all the comments, folks! Here's what I'm going to do. As Dbpsmith and Brion suggest, I'm going to create a static image in 2 strips of 4 graphs. I'll play around with the x-axis limits for aesthetic effect, and I'll include a link to the animated gif for those of our readers who want to click on it. The reason to include it at all is that the last few panels will be indistinguishable as static images, but small changes will be apparent in the animated version, thus giving the viewer a sense of the scale of changes in distribution that occur past a certain value of n. -- Cyan 16:04, 4 Feb 2004 (UTC)

I looked at the different proposed diagrams, and I think I prefer the 2 strips of 4 graphs idea. I like the static images better than the animated image. -- It occurs to me that the illustration of the central limit theorem could be expanded by showing two or more different initial distributions, or adding a different distribution each time (not identical). After all the whole point of the theorem is that for a large class of distributions, adding them together brings you to the same limiting distribution. Thoughts? Happy editing, Wile E. Heresiarch 02:47, 18 Mar 2004 (UTC)
Oh, just a minor followup -- maybe it would help if the same example shown on the main central limit theorem page was the same as one of (hopefully several) examples shown in illustration of the central limit theorem. I'm thinking the main page could just show the phenomenon, and the illustration page could go into more detail. Thinking out loud, Wile E. Heresiarch 14:08, 18 Mar 2004 (UTC)
Yet another half-baked idea -- maybe the effect of the animation can be sort-of imitated by leaving each plotted line in the succeeding figures, but grayed-out or something like that. So you could see just how much the line is changing, and the old lines won't block out the new ones if we use a lighter/grayer color. Wile E. Heresiarch 14:15, 18 Mar 2004 (UTC)

I have to agree with the no-animation camp. While it does show the progression nicely, having to watch it repeat a few times isn't ideal, and it distracts from the article. The images are great though, and as shown above they work nicely in a line. One other problem with animation is that it can show effects that are not there - the line looks to move which kinda hides the fact that it is a convolution. There might be a case to argue for a link to the animated version, but I would argue it is unnecessary. Good work folks. Mat-C 00:41, 18 Apr 2004 (UTC)

Mat-C, maybe you can look at the figures in Student's t-distribution and tell me what you think -- I attempted to show the progression of the t distribution to the normal distribution by using different colors. How successful was that, do you think? Thanks for any comments, Wile E. Heresiarch 02:53, 19 Apr 2004 (UTC)

Just for those who are wondering, the reason I haven't followed up on producing a set of images is because I discovered that the numerical convolution method I'm using isn't actually converging to a Gaussian. The images above look like Gaussians, but in fact are flatter and have wider tails than a Gaussian actually has. In fact, if I start with a Gaussian, the convolution moves it away from Gaussianity, flattening it and widening the tails. I haven't the time to devote to correcting this problem right now... I may get to it at some less busy time in the future. -- Cyan 05:53, 18 Apr 2004 (UTC)

Hmm, can you tell me a little about how you're going about the convolution, then? The reason that I ask is that I have also computed a numerical convolution (via FFT) for the figures on the illustration of the central limit theorem page, and I'd like to try to make sure those figures don't have the same problem. Thanks for any info. Wile E. Heresiarch 02:53, 19 Apr 2004 (UTC)
I used a two-sided filter algorithm based on MATLAB's built-in one-sided "filter" function (more info on this function here). I convolved a vector containing discrete samples of the distribution with the original distributio, and then rescaled it back to standard deviation 1, which involves resampling the distribution so that the discrete grid matches that of the original distribution. Apparently this quick and dirty procedure is affected by some kind of numerical error, because the distribution it converges to is not Gaussian. If you want to check the convergence, why not just plot a Gaussian over your filter-derived distribution? -- Cyan 04:54, 19 Apr 2004 (UTC)
Thanks for your comments. Just a thought -- the problem that you describe might be caused by the discretization effects -- I ran into that when working on another convolution problem and found the convolution result slowly drifting away from the correct result. I think it might be possible to solve the problem without resampling, which could reduce the discretization error. I think I'll post the Octave code which I used to construct the figures -- then it can be inspected and compared, as well as making it possible to "try this at home". Happy editing, Wile E. Heresiarch 02:22, 20 Apr 2004 (UTC)

o(t2)

Just a note: o(t2), t → 0, refers to a function which goes to zero more quickly than t2 (like t3), and not a function 'like' t2, which would be O(t2). Hence, I have reverted the recent edits that changed o(t2) to o(t3). Notably, the article on Big-O notation does not discuss limits other than the limit as t → ∞. However, it should do so! Ben Cairns 06:56, 14 Feb 2005 (UTC).

o(t2) Reply

Sorry, I 've did not seen your message (Bjcairns) in the discussion enrty. I confused big O with small o. I though that this o is reffering to the higher order corrections of the Taylor's expansion formula. I suppose that you are right so I changed the article back to its previous version with o(t2) without being logged in. That ip 143.233.xxx.xxx etc is mine :) My version is perhaps correct if we consider the Big O and not the small one. Theofilatos 17:07, 17 Feb 2005 (UTC)


Needs layman's language too

This article seems to be very mathematically complex. It could benefit from some simple layman's language. Ian Howlett 13:24, 30 June 2005 (UTC).[reply]

Quotation marks de-emphasize

Quotation marks around a word often mean something like: that's what some people are often heard to call it, but I don't want to commit myself to agreeing. Thus they de-emphasize. If you write "John has a 'degree' from the University of the Ozarks", the quotation marks enclosing the word "degree" mean that maybe John and some others call it a "degree", but you don't necessarily agree. Often quotation marks mean "don't take this word literally." That is the meaning of the quotation marks around "the" in the section heading that says "The" central limit theorem. The word "the" in this context implies uniqueness: that there is only one central limit theorem. In fact there are many, with varying assumptions: sometimes independence is relaxed; sometimes identical distribution is relaxed; sometimes the random variables live in some space besides the real line, etc. The quotation marks mean that often people call this one "the" central limit theorem, but the word "the" should not be taken too literally. Michael Hardy 18:16, 16 September 2005 (UTC)[reply]

I am aware of the use of quotes in this way, I use them like that "every" day. :) However, I find it strange that somebody would quote the word the. Oleg Alexandrov 18:42, 16 September 2005 (UTC)[reply]
Ironic emphasis of "the" is common enough in informal American English (dunno if the Brits use it too). I don't think we want an easily misunderstood wordplay here. I've replace "The" central limit theorem with Classical central limit theorem. Feel free to find a different adjective. There are other uses of "scare quotes" in the article which should be reviewed. Regards & happy editing, Wile E. Heresiarch 03:08, 19 September 2005 (UTC)[reply]

Hi guys I'm an editor who delves a lot in physics (statistical mechanics especially) and a bit in statistics. That means I use this theorem a lot. I'll have other things to say, but as of now I just need to share something that sprung to my mind (not that it's original work, someone probably thought of that before me):

there is very probably a link between the non-independant case and polymer physics. A real-world polymer is basically a correlated random walk, although this correlation tends to decrease exponentially. Yet the object follows the Central Limit Theorem. About this see the ideal chain and worm-like chain articles, especially the parts about the Kuhn segment.

Either mathematicians have a version of non-independant CLT corresponding to this, in which case as a polymer and random walk editor I need to know, or this should probably be added as another case of non-independant CLT, in some form or the other.(ThorinMuglindir 23:40, 25 October 2005 (UTC))[reply]

A few thoughts

From the article: "The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist)."

This should also appear (and probably be explained in detail) in probability density function.

A chapter about dedicated to CLT and Fourier transform wouldn't be superfluous either, as the CLT is quite easy to demonstrate in Fourier space. Such considerations are for the moment mentioned but in very little detail. That would lead us to being able to say that the convergence of CLT is faster in the low-fourier modes, and slower in the high fourier modes (if you don't renormalize the sum, there can even be no convergence at all in the high fourier modes in some, see below). Wouldn't attempt to formalize that in a clean mathematical way myself though.

Some singular cases that might be worth explaining

As I said yesterday I am very much into editing polymer and random walk stuff for physics, which leads me to linking to CLT a lot. There is a case where convergence toward CLT is singular, yet arises quite often in random walks (namely, that is random walk on a lattice).

Take for instance independant variables which can take -1 or 1 for values with proba 1/2 each, and sum them N times.

If you look at the density function you obtain, it is not strictly a gaussian. It is a series of successive dirac delta functions. Now CLT is not that far off because the amplitude of the Dirac peaks of the resulting sum varies according to a gaussian curve. So that if you look at the function in the low fourier modes, it will correspond to the gaussian curve that is predicted by CLT. For high fourier modes (k > N.2Π, or k > 2Π if you don't renormalize the sum) the density of the resulting sum has nothing to do with a gaussian.

The situation is not the same if you consider a sum of continuous variables, or a lattice-free random walk, the same problem does not arise.

For example consider the countinuous variable that is uniformely distributed in , and sum a large number of independant realisations of this variable. This new variable has the same mean and variance as the previous one, yet you won't obtain a series of dirac peaks like in the previous case. The resulting density density will look as a gaussian pretty much at any scale, including in the high fourier range...

All this will probably be clearer by writing a formula: for the latter variable (variance 1, mean 0), the (normalized) sum converges toward the density function P(X), corresponding to N(0,1/N). Now, strictly speaking, the former variable sum's density function does not converge toward P(X), but rather toward:

, where delta is the Dirac delta function

About this here are my questions: is the above somehow related to what you say about the nature of the third moment of the variable controlling the speed of convergence? Can this difference in convergence in the high and low fourier modes be formalized mathematically?(192.54.193.37 08:58, 26 October 2005 (UTC))[reply]

Of I again forgot to log on... Well, the section above is from ThorinMuglindir 09:00, 26 October 2005 (UTC)[reply]

Not all of what you are saying is immediately clear to me, but you seem to be talking about discrete random variables compared with continuous random variables. Discrete random variables do not have probability density functions, but the cnetral limit theorem is not about densities anyway. It is about convergence in distribution, i.e. about cumulative distribution functions. So there is less problem about comparing discrete and continuous random variables. The classic example is the normal approximation to the binomial distribution; even here the approximation can be misleading in the tails, as it often is when appling the central limit theorem. --Henrygb 23:55, 26 October 2005 (UTC)[reply]
thanks my question was indeed related to that binomial distribution. Just as a remark it is often possible and useful to define a probability density function for a discrete variable, using Dirac delta function. Mathematically speaking Dirac delta function is not a function, but it's still a distribution (distribution, not in the sense of statistics, but in the sense of topology, that is to say, an object in the adherence of the space of functions). Of coures when you do physics you couldn't care less about what exactly is a function and what is a distribution... What I wrote above is just a reformulation of the meaning of the graph that compares the curve and the histogram in the binomial distribution article, reformulation that is based on Dirac delta functions.(ThorinMuglindir 10:03, 27 October 2005 (UTC))[reply]

I'll add a very short bit to the article, explaining that CLT can also be adapted to sums of discrete variables, although in a slightly different form, and link to binomial distribution as an example. Be it just to not confuse a reader who comes here from, say the random walk article, where CLT is applied to a sum of discrete variables.ThorinMuglindir 10:04, 27 October 2005 (UTC)[reply]

sum has finite variance, or the random numbers themselves?

The first paragraph states: The most important and famous result is called simply The Central Limit Theorem which states that if the sum of the variables has a finite variance, then it will be approximately normally distributed.

The random variables must have finite variance, right? This was the impression I got from http://mathworld.wolfram.com/CentralLimitTheorem.html. I am not skilled at mathematics so I do not know if saying the sum has a finite variance is correct. Thank you. Jason Katz-Brown 05:32, 11 February 2006 (UTC)[reply]

They say the same thing: variances are non-negative so the sum of a finite number of them will be finite if and only if each of them is. --Henrygb 16:31, 11 February 2006 (UTC)[reply]

Organigram?!

From the article:

This means that if we build an organigram of the realisations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the organigram converges toward a gaussian curve as n approaches . The binomial distribution article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values.

Huh? We draw an organizational chart of what, and how? I suppose "independent identical" is meant to be iid (as independent and identical is a contradiction), but what about the rest of it? Given the binomial distribution article reference, "organigram" is probably meant to be "histogram", though I don't see how the curve would join the "centers" of the upper faces more than any other points on them (a description which makes more sense for organigrams, even though as a whole using organigrams to depict distributions would be a strange idea). The histogram, then, is presumably of the probability distribution of a random variable that is the sum of n iid discrete random variables (or approximation of the same by frequencies in a finite sample of the random variable, but then we need to take limit at infinite sample size or the other limit won't converge). Is this interpretation correct? I'm not sure. How to explain it in good encyclopedia style? I don't know. As it stands this part of the article is very confusing and should be fixed, preferably by someone who knows something about probability theory (I don't, so I'm not touching it). 82.103.195.147 20:51, 12 August 2006 (UTC)[reply]

What about the sum of non-identically distributed random variables?

This is a personal doubt, but probably more people comming to this page can have it. Is there any result related to the CLT that says anythin about the sum of random variables in general? For example, in my problem I have 40 Beta variables each of them with their own mean and variance. I think there is some result saying their sum is a Normal variable with mean and variace equal to their respective sums. Is that right?Arauzo 10:15, 17 September 2006 (UTC)[reply]

See the sections on Lyapunov condition and Lindeberg condition. --Henrygb 17:54, 17 September 2006 (UTC)[reply]
I read the article and understand that whether the random variables are identical or not, their sum will be normally distributed. I disagree with the condition that the random variables must be identical.--Piyatad 09:56, 28 November 2006 (UTC)[reply]
I doesn't say they must be identical. It says that IF their distributions are identical (the distributions, not the random variables!) THEN etc. etc. But it also says:
Several generalizations for finite variance exist which do not require identical distribution but incorporate some condition which guarantees that none of the variables exert a much larger influence than the others.
So there you have it: the article says the distributions do not need to be identical if "some other condition" holds. Just which other condition depends on which version of the theorem you're talking about. I think equal variances may be more than strong enough; and if I weren't writing this comment in some haste I just might say that's obvious.... Michael Hardy 01:45, 4 December 2006 (UTC)[reply]
I think we should state very explicitly in the first paragraph that the CLT applies to the sum of arbitrary distributions, not only indentical distributions. The current version is causing misunderstanding in standard deviation, where they state that "... [the classical central limit theorem] says that sums of many independent, identically-distributed random variables tend towards the normal distribution as a limit." My suggestion is replace the last sentence in the first paragraph with the following: "The most important and famous result is called The Central Limit Theorem which states that if the sum of independent and arbitrarily-distributed variables has a finite variance, then it will be approximately normally distributed (i.e., following a normal or Gaussian distribution)." Please comment.
It's a long time since the above, but any way... I think that equal variances are not enough and that it would be incorrect to have "The most important and famous result is called The Central Limit Theorem which states that if the sum of independent and arbitrarily-distributed variables has a finite variance, then it will be approximately normally distributed (i.e., following a normal or Gaussian distribution)." The "proof" in the article might be simplified by working with cumulant generating functions (cgf) and this would make the point here clearer. Just write the expansion to include the skewness, and work out the effect on the cgf for the average. You get something involving the skewnesses of the individual components and for the CLT to hold this term must converge to zero as the number of samples increases. To defeat the CLT assumption you just need to find an sequence of skewnesses which increases fast enough. Thus a CLT result does not some caveats: (quote)
Several generalizations for finite variance exist which do not require identical distribution but incorporate some condition which guarantees that none of the variables exert a much larger influence than the others.
... where the condition would need to apply all aspects of the distributions of the variables. Melcombe (talk) 15:05, 2 April 2008 (UTC)[reply]

Large sample size

The need for a large sample size should be included. n >= 30 to 70 for it to be large. 70.111.238.17 14:11, 1 October 2006 (UTC)[reply]

It says as n approaches ∞, and that is certainly quite large. But some "rules of thumb" could be added too. In many cases, "≥ 30" is quite conservative. Michael Hardy 21:29, 2 October 2006 (UTC)[reply]

central {limit theorem} or {central limit} theorem?

The article now says:

This is a limit theorem and is about central limits.

I've long thought it was the central theorem on limits, not the theorem on central limits. Can someone explain just what "central limits" are? The article's present comment seems to confuse rather than to clarify. Michael Hardy 02:51, 6 November 2006 (UTC)[reply]

I removed a bit that said that the theorem was NOT a 'central' theorem, but was a theorem about 'central limits'. The text now says what it is, not what it isn't. I hope that clears it up. 8_)--Light current 03:24, 6 November 2006 (UTC)[reply]

I don't see how it clears up what a "central limit" is. What is a "central limit"? Michael Hardy 03:42, 6 November 2006 (UTC)[reply]

...and now I've edited it to say it's a central theorem about limits, not a theorem about central limits. Can no one explain what a "central limit" is? I suspect no one can, because I suspect there's no such thing. Michael Hardy 03:45, 6 November 2006 (UTC)[reply]

I dont think you are correct. You should read the whole thing then youll see. Excerpt from page:

Note the following apparent "paradox": by adding many independent identically distributed positive variables, one gets approximately a normal distribution. But for every normally distributed variable, the probability that it is negative is non-zero! How is it possible to get negative numbers from adding only positives? The reason is simple: the theorem applies to terms centered about the mean. Without that standardization, the distribution would, as intuition suggests, escape away to infinity.

My itals, bolding--Light current 03:52, 6 November 2006 (UTC)[reply]

So a central limit is one that is evenly distributed about zero.--Light current 03:54, 6 November 2006 (UTC)[reply]

Hey Ive just noticed you are a statistician!! Why you asking me about stats? 8-)--Light current 03:55, 6 November 2006 (UTC)[reply]

Ive removed the controversial statement until we can get the proper dope on it 8-)--Light current 04:01, 6 November 2006 (UTC)[reply]

I agree with M.Hardy. Historically, Polya introduced in 1920 the name in german "zentral Grenzwertsatz" which means central theorem-about-the-limit (George Polya "Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem," Mathematische Zeitschrift, 8 (1920), 171-181)Dangauthier 14:55, 5 April 2007 (UTC)[reply]

I just (two years later!) added this ref to the history section of the article. Qwfp (talk) 14:58, 15 May 2009 (UTC)[reply]

Making it Easier to Understand

  • I reckon the section on Classical CLT should start by stating the theorem. Justification should come after this. So the passage might read: " The central limit theorem says that the means of samples are normally distributed." - comments please —The preceding unsigned comment was added by 212.159.75.167 (talk) 20:18, 3 January 2007 (UTC).[reply]
  • Another suggestion is provide an "every day" example with simulated graphs, like color of adjacent cars that stop at an intersection or something. -unsigned-

30 Individual Samples

I have heard that 30 individuals samples will meet the requirements of the C.L.T and therefore be considered a "statically" valid sample, assuming they were randomly selected using proper statistical procedure. This seems wrong to me, but I'd like to know what others say. Gautam Discuss 06:45, 8 June 2007 (UTC) [reply]

Often far less than 30 is enough; sometimes many more is not. It depends on what distribution you're sampling from. But don't call them samples; call them observations within a sample. Michael Hardy 06:48, 8 June 2007 (UTC)[reply]

Additive mean and variance

The CLT indicates for large sample size (n>29 or 100),[1] that the sampling distribution will have the same mean as the population, but variance divided by sample size

The CLT doesn't say that, and it doesn't depend on large sample size. The expected value of the sum is the sum of the expected values, always. For independent random variables the variance of the sum is always the sum of the variances. 72.75.76.121 (talk) 23:43, 5 December 2007 (UTC)[reply]


Standard error of sum is σ n1/2

In the following:

Consider the sum Sn = X1 + ... + Xn. Then the expected value of Sn is nμ and its standard error is σ n−1/2. Furthermore, informally speaking, the distribution of Sn approaches the normal distribution N(nμ,σ2n) as n approaches ∞.


σ n−1/2 should be σ n1/2, as this refers to the standard error of the sum, Sn, not of the sample mean. I made this change.tom fisher-york (talk) 16:30, 14 December 2007 (UTC)[reply]

Uses of CLT?

It would be great if this article included a bit about why the CLT is so cool..

For those of us who are getting into some real math through wikipedia, it's really great when there is an example application of a theorem like CLT. For people who already know all about CLT, it may seem like an impossible task to select an example (like giving an example use of addition) but I think the CLT is just arcane enough to warrant a bit of practical stuff so people can see why it is so cool. For me, I was just trying to figure out what the probability of something -- something with a distinctly -not-normal distribution. I ran 1,000 simulations and saw (naturally) that the simulations were generating a distribution with a very familiar (normal) distribution. Wow, I thought. That's the Central Limit Theorem at work! So I was able to avoid further simulations and use a plain old Z test to estimate the final result -- p < e-26 . Even with a very fast computer, you could not do enough simulations of my problem to establish this probability. With the CLT, there is no need.

Just my 2 cents. I could add this myself if folks agree... Tombadog (talk) 12:45, 13 March 2008 (UTC)[reply]

iid?

Do the random variables really have to be iid? Does the CLT work if the variables are independent but have different distributions? Let's say a build a house in 10 (or more) following steps from to basis to roof. Every step takes some time (lenght=random variable). The steps are independent from each other but have different distributions (normal, exponentiell, equal, whatever..). The total length (sum of the 10 random variables) should then be approximately normal distributed regardless of the distribution of each single variable (or is it not)? --217.83.60.191 (talk) 16:25, 18 March 2008 (UTC)[reply]

If you simply drop the assumption of identical distribution, then the resulting statement is not true. But the assumption of identical distribution can be replaced by any of various other assumptions with the result that the theorem is still true. I don't have any of the details at the tip of my tongue, but probably such things should be added to the article. Michael Hardy (talk) 17:02, 18 March 2008 (UTC)[reply]
See discussion above under "What about the sum of non-identically distributed random variables". Melcombe (talk) 15:07, 2 April 2008 (UTC)[reply]

CLT in real life?

I remember back in the old days at university when I heard of an eplanation of the CLT in real life by a friend of a friend. Most things in nature a normally distributed like the length of some kind of plant, the lifespan of some kind of animal, the temperature on a special day or whatever. All these random variables are influenced by many other variables. So the normal distribution is so important because everything (every variable) that is the result of many many other variables is approx. normally distributed. It is not really the sum of other variables in a mathematical way as we don't know exactely the relationship between the variables but its close to that. --Unify (talk) 23:46, 18 March 2008 (UTC)[reply]


Correction needed

Under "Proof of the central limit theorem", the artlicle has

For any random variable, Y, with zero mean and unit variance (var(Y) = 1), the characteristic function of Y is, by Taylor's theorem,

The correction required relates to "by Taylor's theorem" ... the expansion might be a Taylor expansion, but the result that the expansion is valid in this case is not (i.e. assuming that the variance exists and not assuming higher order moments). Anyone have a proper reference for the result, or should the sentence just be rephrased? Melcombe (talk) 15:16, 2 April 2008 (UTC)[reply]


Proof of the central limit theorem

I think the statement about the "remarkably simple proof" is not appropriate. The version of Taylor's theorem that is used here, is probably known only to a small fraction of the readers ( has complex values), the "simple properties of characteristic functions" (referring to linear transformations) are not explained, and the convergence to an exponential would deserve a reference. Above all, the heavy tool applied here, the Levy continuity theorem, is not trivial at all. In my opinion, expressing a proof in terms of another difficult theorem doesn't make it simple. It would therefore be more appropriate to state that rigorous proofs of the central limit theorem tend to be cumbersome, but that a non-rigourous argument is easily obtained from the Taylor expansion of the characteristic functions. —Preceding unsigned comment added by 84.56.2.132 (talk) 06:59, 17 July 2008 (UTC)[reply]

Lack of independence: "Expert attention"

I add a ref about CLT for convex bodies, and remove the tag. Unfortunately, the tag "expert attention needed" was inserted with no explanation, what for. Thus, I am not sure: is it still needed, or not? Boris Tsirelson (talk) 19:52, 18 October 2008 (UTC)[reply]

Well, other than adding reference(s) to this section, nothing much has changed. While I didn't add the tag, there seem to be two points needing attention. Firstly, for the first three items in the list, it would be good to have an outline of the sort of conditions/assumptions being imposed for the CLT to hold: possibly a single summary might do for all three? Secondly, for the 4'th item (convex bodies) it seems necessary to say how a CLT can apply to "bodies" or "sets" when all the discussion above is about random things which are numerically-valued. I guess there would be the question of whether such cases should be dealt with under the heading of "dependence", or might be better with their own slot, depending on what is being meant. Melcombe (talk) 14:15, 22 October 2008 (UTC)[reply]
There is also the question of the heading "lack of independence". At least for some of the topics indicated, it seems that both "non-identical" and "non-independent" are allowed, whereas there is at least an implication that what is covered is the case of "identical" and "non-independent". Melcombe (talk) 14:27, 22 October 2008 (UTC)[reply]
I see. Well, maybe I'll do some more. Why "lack of independence"? For two reasons. First, some time ago the main part of the article covered the non-identical case. Second (and more important), "non-identical" is a relatively small problem, while "non-independent" is relatively hard. About convex bodies: it is meant that a point is chosen from a given convex body (uniformly); its coordinates are "random things which are numerically-valued", but dependent in a way not typical for probability theory (till now); to cover this case is considered an important progress. Boris Tsirelson (talk) 15:21, 22 October 2008 (UTC)[reply]
OK, some improvements would be good. Regarding "convex bodies", it is not obvious how this might differ from a non-independent version of what is in the subsection titled "Multidimensional central limit theorem" ... what role does the "convex" bit have? Is it just to ensure that the mean is within the "body"? The abstract and first page of the cited article are not particularly informative to me. Melcombe (talk) 09:12, 23 October 2008 (UTC)[reply]
I am reading some sources. About convex body: it is completely different, since the large parameter is not the number of summands (this is just 1, - no sum at all, just a single random vector) but rather the dimension of the body (and the random vector), that is, the number of (random, dependent) coordinates. But wait, I'll try to write it down some day. Boris Tsirelson (talk) 13:53, 23 October 2008 (UTC)[reply]
I did something, and shall continue. Boris Tsirelson (talk) 20:15, 23 October 2008 (UTC)[reply]

Error in "Lindeberg condition"?

I suspect the Lindeberg condition in the section discussing non-identically distributed random variables might be incorrect. Possibly there should be an average instead of a plain sum, or someting like that. I suspect that in the current form, the condition is essentially never satisfied. Not sure about this, though. I'll check this if I find the original reference. --130.231.89.82 (talk) 09:19, 22 October 2008 (UTC)[reply]

Why incorrect? The sum in it corresponds to the sum in the definition of It means that deviated values do not contribute to the variance (in the limit). I believe it is correct. Boris Tsirelson (talk) 20:21, 23 October 2008 (UTC)[reply]

Error in Central limit theorem for Gaussian polytopes

In this new (sub)section, there is "for all t in R" ... but t only appears as a dummy integration variable. Melcombe (talk) 16:26, 28 October 2008 (UTC)[reply]

Oops... You are right, thank you. I'll correct it soon. In fact, I see that the formulation is clumsy; we can just say "converge in distribution". (In contrast, "CLT for convex bodies" includes some uniformity over all bodies or densities.) Strangely, the clumsy formulation is used by the authors. Boris Tsirelson (talk) 18:55, 28 October 2008 (UTC)[reply]

Asymptotic normality for statistical estimators

To User:Melcombe (and maybe someone else): Probably you could add some results about asymptotic normality for statistical estimators. Boris Tsirelson (talk) 07:08, 4 November 2008 (UTC)[reply]

Re-averaged?

Back in 26 October 2008, an anonymous editor inserted the word "re-averaged" into the definition of the CLT in the first sentence of the article. It has been there ever since, and has been echoed in countless internet websites. But what is a "re-averaged sum"? I submit that this edit was well-intended, but meaningless and very confusing.

Further, in Theorems 1 and 2 in the article by Le Cam, there is no reference to identically distributed random variables. Indeed, the point of many modern versions of the CLT is that the sequence of independent random variables need not be identically distributed. I submit that the phrase "identically distributed" does not belong in the opening paragraph at all.

Accordingly, I have tried to rewrite the lead paragraph so that it is both correct and generally understandable. I hope it is acceptable to all of you. —Aetheling (talk) 20:01, 11 March 2009 (UTC)[reply]

I think the term was meant to be linked to the sentence (which you have left) "They all express the fact that a sum of many independent random variables will tend to be distributed according to one of a small set of "attractor" distributions." ...presumably this is a nod in the direction of stable distributions. Given this it may be that the "re-averaged" was meant to be something like "rescaled", referring to the fact that divisor other than n is required for a non degenerate limit distribution.
Or did it mean, subtract the expectation? Boris Tsirelson (talk) 10:07, 12 March 2009 (UTC)[reply]

Turing and the CLT

Surely the reference to Turing's fellowship disertation should refer not to Cambridge University but to King's College Cambridge! 86.22.72.56 (talk) 18:13, 10 November 2009 (UTC)[reply]

Help, clarification

Can we be careful to define "sum of distribution functions". Clearly this is not the process say SUM(U,V) = (U + V)/2 which indeed yields a distribution function with all the nice properties, and looks like a "sum". The usage of convolution needs to be well explained and tied to the notion of the "distribution of the means". If we have a random process which, say, is constrained outputs values in the range [0,1] then clearly the means will never reach a limit which is a normal distribution.

   Z_n = \frac{S_n - n \mu}{\sigma \sqrt{n}}\,, 

the above appears on the page missing the forward slash for division between sigma and sqrt(n) —Preceding unsigned comment added by 76.76.233.148 (talk) 23:31, 14 November 2009 (UTC)[reply]

Correction to correction above: Never mind...neglected to note that the numerator contains just the sum and not the mean so formula is correct (but arcane) as it stands... —Preceding unsigned comment added by 76.76.233.148 (talk) 23:52, 14 November 2009 (UTC)[reply]