# 誰將隨機變量引入概率？

Who and when introduced the concept of random variable, was it a "basic notion" before measure theory?

Seneta's paper也很有趣，但是它以伯努利（Bernoulli）的1713 Ars Conjectandi開始，以現代的帶有隨機變量的符號重寫了所有內容，因此很難分辨它們或符號的起源。

Who introduced the set builder notation involving random variables, like Pr$(|\xi|>\varepsilon)$, and when did it become common to express results in it?

This question has no definitive answer, because people were operating with random variables long before any rigorous definition was given.

Probability theory begins in 16-th century, if not earlier. Cardano wrote a book on it, for example. In 1773 de Moivre wrote an important book where he essentially introduced the principal method of modern probability (harmonic analysis). This was much developed by Laplace. Important development happened in 19-th century (Chebyshev, for example). All these people were talking about random variables without giving a precise definition. I mean a "precise definition" from the point of view of modern mathematician who only understands definitions given in terms of set theory. Of course there was no set theory yet.

By the beginning of 20-th century probability was a highly developed theory, with substantial applications to physics (Brownian motion, for example). All essential applications of probability already existed by the beginning of 20-th century, including statistical mechanics, financial mathematics and, of course statistics.

Attempts to extablish a rigorous justification in the spirit of newly developed set theory started in the second decade of 20-th century. One early system of axioms was due to S. Bernstein, another to J. Schauder. Still another system of foundations was due to von Mises.

Kolmogorov's system was published in 1933. Gradually (not immediately!) it took over.

To answer your other question, no this cannot be compared to Euclid. Euclid gave a systematic and comprehensive exposition of all mathematics that existed at that time. Kolmogorov's book is a tiny booklet, where he only proposes a new system of axioms, proves a few very general theorems. It is a little research monograph, not an exposition of the known things.

Remark. Modern education system makes many young mathematicians thing that there was no probability theory before Kolmogorov and no algebraic geometry before Grothendieck. Developing this principle one step further, one can say that there was no mathematics before Cantor:-) Because all modern mathematics is based on set theory.

Concerning the notation $$\text{Pr}(|\xi|>\varepsilon)$$ here's what I've found so far:

Cajori's 1929 A History of Mathematical Notations says nothing on probability theory, which suggest that the subject had not yet developed any special or widely adopted notation around the beginning of the 20th century. This seems to be supported by Jeff Miller, who writes:

Symbols for the probability of an event $$A$$ on the pattern of $$P(A)$$ or $$Pr(A)$$ are a relatively recent development given that probability has been studied for centuries. A. N. Kolmogorov's Grundbegriffe der Wahrscheinlichkeitsrechnung (1933) used the symbol $$\mathbf{P}(A)$$. The use of upper-case letters for events was taken from set theory where they referred to sets. H. Cramér's Random Variables and Probability Distributions (1937), "the first modern book on probability in English," used $$P(A)$$. In the same year J. V. Uspensky (Introduction to Mathematical Probability) wrote simply $$(A)$$, following A. A. Markov Wahrscheinlichkeitsrechnung (1912, p. 179). W. Feller's influential An Introduction to Probability Theory and its Applications volume 1 (1950) uses $$Pr\{A\}$$ and $$\mathbf{P}\{A\}$$in later editions.

But digging a little deeper one finds a few earlier sources.

Before giving some, let me note that I find Miller's claim that upper-case letters for events were taken from set theory quite dubious, given that already Bayes in An Essay towards solving a Problem in the Doctrine of Chances (1763) used upper case $$M$$ for an event.

Next, let me quote Markov from his Wahrscheinlichkeitsrechnung (1912) p. 14-15 (my translation):

We don't find it superfluous to express the theorem of multiplications of probabilities via the formula $$(AB)=(A)(B,A)=(B)(A,B),$$ where $$(AB)$$ denotes the probability of the simultaneous occurrence of both events $$A$$ and $$B$$, $$(A)$$ and $$(B)$$ denote respectively the probabilities of events $$A$$ and $$B$$, $$(B,A)$$ denotes the probability of event $$B$$, when $$A$$ is known for a fact, and $$(A,B)$$ denotes the probability of event $$A$$, when $$B$$ is known to have occurred.

I don't have access to his first Russian edition (Исчисление вероятностей 1900), but Poincaré in his Calcul des probabilités (1896) p. 37 used almost the same notation as Markov.

La probabilité pour que $$A$$ et $$B$$ se produisent tous deux est égale à la probabilité pour que $$B$$ se produise, multipliée par la probabilité pour que $$A$$ se produise, quand on sait que $$B$$ s'est produit.

Ou, inversement, elle est égale à la probabilité pour que $$A$$ se produise, multipliée par la probabilité pour que $$B$$ se produise, quand on suppose que $$A$$ doit se produire. $$(A \text{ et } B) =(B) (A \text{ si } B) = (A) (B \text{ si } A).$$

It's interesting that unlike Markov, Poincaré doesn't provided an explicit definition of the notation $$(A)$$. Instead he evolves it from what look like labels to equations some pages earlier, into mathematical objects that can appear in equations.

For a notation that's closer to $$\text{Pr}(A)$$, it seems like Hausdorff and his doctor father Bruns played a role. Bruns in his Wahrscheinlichkeitsrechnung und Kollektivmaßlehre (1906) writes $$\mathfrak{W}(E)$$ for the probability of an event $$E$$ ($$\mathfrak{W}$$ abbreviating Wahrscheinlichkeit). The book is based on lectures that Bruns gave every two years since the beginning of the 1880's at the University of Leipzig. Hausdorff attended these lectures in 1890 and stenographed them (but I don't have access to Hausdorffs notes).

In 1900/01 Hausdorff himself lectured on probability theory and introduced the notation $$p_F(E)$$ for conditional probability in his Beiträge zur Wahrscheinlichkeitsrechnung (1901):

Wenn bei dem Versuche, der über das Eintreffen oder Ausbleiben des Ereignisses $$E$$ entscheidet, die Zahl der überhaupt günstigen gleichberechtigten Fälle durch die Zahl der überhaupt möglichen gleichberechtigten Fälle dividiert wird, so entsteht die Wahrscheinlichkeit von $$E$$ schlechthin, die absolute Wahrscheinlichkeit $$p(E)$$. Werden hingegen unter den günstigen und möglichen Fällen nur diejenigen gezählt, die ein bestimmtes anderes Ereignis $$F$$ herbeiführen, so entsteht die relative Wahrscheinlichkeit von $$E$$ unter der Voraussetzung, dass $$F$$ verwirklicht sei, ein Begriff, für den sich die Schreibung $$p_F(E)$$ und etwa die Ausdrucksweise relative Wahrscheinlichkeit von $$E$$, posito $$F$$ empfehlen dürfte.

My translation:

If in the attempt of deciding the occurrence or absence of an event $$E$$, the number of favourable and equally likely cases is divided by the number of all possible and equally likely cases, the probability of $$E$$ proper emerges, the absolute probability $$p(E)$$. If on the other hand, among the favourable and possible cases only those are counted, which induce another determined event $$F$$, then the relative probability of $$E$$ under the condition that $$F$$ is realised emerges, a concept for which the notation $$p_F(E)$$ and the parlance relative probability of $$E$$, posito $$F$$ might be recommended.

There is a long commentary by W. Purkert appended to this article in Felix Hausdorff, Gesammelte Werke, Band V (2005). There Purkert explains that Hausdorff influenced several authors with it.

For instance Bruns, in a footnote of his 1906 book, writes that Bayes' Theorem is a simple corollary from purely arithmetical theorems, if one introduces the notion of ,conditional' probability as Hausdorff did. (But I don't know if Bruns' $$\mathfrak{W}(E)$$ was inspired by Hausdorff's $$p(E)$$ or if Bruns already used it before 1900.)

Czuber, who authored the article on probability theory in Klein's Encyclopedia of mathematics (1900) and the book Wahrscheinlichkeitsrechnung und ihre Anwendung auf Fehlerausgleichung, Statistik und Lehensversicherung (1903) adopted Hausdorffs notation in the second revised edition of his book (1908), writing $$\mathfrak{W}$$ like Bruns, instead of Hausdroff's $$p$$. Czuber acknowledges Hausdorff in a footnote on p. 45 for bringing clarity to the subject with this notation. Also Broggi adopted Hausdorffs notation in Versicherungsmathematik (1911) (Matematica attuariale).

In Hausdorff's later lectures, Wahrscheinlichkeitsrechnung (1923, 1931, Gesammelte Werke Band V p. 595) one also finds a definition of random variable with finitely many values and a (first?) use of an inequality to stand for an event.

There he uses the notation $$w(A)$$ instead of $$p(A)$$ and writes (p. 608 Gesammelete Werke Band V, my translation)

Let $$A_1,A_2,\ldots , A_m$$ be a complete disjunction, $$w(A_i) = p_i, \sum p_1=1$$. Imagine that to each case $$A_i$$ is associated a real number $$x_i$$ (for instance the number $$i$$). If the $$p_i$$ are $$> 0$$ and the $$x_i$$ are pairwise different, and $$x$$ denotes one of them, then we call $$x$$ a variable, which can assume the values $$x_1,\ldots,x_m$$; only that here, compared to the common use of language, the variable is made more precise, in that it can assume each value $$x_i$$ with a certain probability $$p_i$$. We call the embodiment [Innbegriff] of the $$p_i, x_i$$ a distribution of the variabel $$x$$.

Such distributions play the main role in applications of the calculus of probabilities.

Immediately after that he defines the expectation value and the $$k$$-th moment $$\mu_k$$ of a variable $$x$$ and on the next page writes

If $$t$$ is a positive number and the sum $$\overset{*}{\sum}$$ only runs over those $$i$$ with $$|x_i|\geq t$$, then for $$k=2,4,\ldots$$ $$\mu_k = \sum p_i x^k_i \geq \overset{*}{\sum} p_i x^k_i \geq t^k \cdot \overset{*}{\sum} p_i,$$ $$\overset{*}{\sum} p_i \leq \dfrac{\mu_k}{t^k}$$ or with the obvious notation $$w(|x| \geq t)\leq \dfrac{\mu_k}{t^k},\quad w(|x| < t) \geq 1 - \dfrac{\mu_k}{t^k}\; (\text{Tschebyscheff})$$

I've searched in Todhunter's A History of the mathematical theory of probability (1865) and Czuber's Encyclopedia article (1900) for even earlier uses of notations like $$p(A), P(A), Pr(A)$$, but couldn't find any. Of course, the use of upper/lower-case $$P$$ to stand for probability is much older than 1900. Laplace for instance uses it often. But from a modern perspective the $$p$$ of Laplace and earlier mathematicians should be understood as a number between 0 and 1, while the $$p$$ introduced by Hausdorff or the parenthesis used by Poincare and Markov can be seen as operators, that accept events as inputs and yield numbers between 0 and 1 as outputs.

What I haven't looked for is the first use of the notation $$P(A|B)$$ for conditional probabilities. A related question is https://mathoverflow.net/q/163582/745.