- Boxes
- definitions
- Ellipses
- theorems and lemmas
- Blue border
- the statement of this result is ready to be formalized; all prerequisites are done
- Orange border
- the statement of this result is not ready to be formalized; the blueprint needs more work
- Blue background
- the proof of this result is ready to be formalized; all prerequisites are done
- Green border
- the statement of this result is formalized
- Green background
- the proof of this result is formalized
- Dark green background
- the proof of this result and all its ancestors are formalized
Let \(A\) be an \(n\)-element set, and let \(k\) be an integer between 0 and \(n\). Then a \(k\)-permutation of \(A\) is an ordered listing of a subset of \(A\) of size \(k\).
The sum of the elements in the \(n\)th row of Pascal’s triangle is \(2^n\). If the elements in the \(n\)th row of Pascal’s triangle are added with alternating signs, the sum is 0.
The first statement in the corollary follows from the fact that
and the second from the fact that
Suppose we have an experiment whose outcome depends on chance. We represent the outcome of the experiment by a capital Roman letter, such as \(X\), called a random variable . The sample space of the experiment is the set of all possible outcomes. If the sample space is either finite or countably infinite, the random variable is said to be discrete .
Let \(X\) be a random variable which denotes the value of the outcome of a certain experiment, and assume that this experiment has only finitely many possible outcomes. Let \(\Omega \) be the sample space of the experiment (i.e., the set of all possible values of \(X\), or equivalently, the set of all possible outcomes of the experiment.) A distribution function for \(X\) is a real-valued function \(m\) whose domain is \(\Omega \) and which satisfies:
\(m(\omega ) \geq 0\ , \qquad \)for all \(\, \, \omega \in \Omega \ \), and
\(\sum \limits _{\omega \in \Omega } m(\omega ) = 1\ \).
For any subset \(E\) of \(\Omega \), we define the probability of \(E\) to be the number \(P(E)\) given by
The uniform distribution on a sample space \(\Omega \) containing \(n\) elements is the function \(m\) defined by
for every \(\omega \in \Omega \).
If \(P(E) = p\), the odds in favor of the event \(E\) occurring are \(r : s\) (\(r\) to \(s\)) where \(r/s = p/(1-p)\). If \(r\) and \(s\) are given, then \(p\) can be found by using the equation \(p = r/(r+s)\).
Let \(A\) be any finite set. A permutation of \(A\) is a one-to-one mapping of \(A\) onto itself.
Let \(a_n\) and \(b_n\) be two sequences of numbers. We say that \(a_n\) is asymptotically equal to \(b_n\), and write \(a_n \sim b_n\), if
Let \(\sigma \) be a permutation of the set \(\{ 1, 2, \ldots , n\} \). Then \(i\) is a record of \(\sigma \) if either \(i = 1\) or \(\sigma (j) {\lt} \sigma (i)\) for every \(j = 1,\ldots ,\, i - 1\).
A Bernoulli trials process is a sequence of \(n\) chance experiments such that
Each experiment has two possible outcomes, which we may call success and failure.
The probability \(p\) of success on each experiment is the same for each experiment, and this probability is not affected by any knowledge of previous outcomes. The probability \(q\) of failure is given by \(q = 1 - p\).
Let \(n\) be a positive integer, and let \(p\) be a real number between 0 and 1. Let \(B\) be the random variable which counts the number of successes in a Bernoulli trials process with parameters \(n\) and \(p\). Then the distribution \(b(n, p, k)\) of \(B\) is called the binomial distribution .
A set of events \(\{ A_1,\ A_2,\ \ldots ,\ A_n\} \) is said to be mutually independent if for any subset \(\{ A_i,\ A_j,\ldots ,\ A_m\} \) of these events we have
or equivalently, if for any sequence \(\bar A_1\), \(\bar A_2\), …, \(\bar A_n\) with \(\bar A_j = A_j\) or \(\tilde A_j\),
(For a proof of the equivalence in the case \(n = 3\), see Exercise 33.)
Let \(X_1, X_2, \ldots , X_n\) be random variables associated with an experiment. Suppose that the sample space (i.e., the set of possible outcomes) of \(X_i\) is the set \(R_i\). Then the joint random variable \({\bar X} = (X_1, X_2, \ldots , X_n)\) is defined to be the random variable whose outcomes consist of ordered \(n\)-tuples of outcomes, with the \(i\)th coordinate lying in the set \(R_i\). The sample space \(\Omega \) of \({\bar X}\) is the Cartesian product of the \(R_i\)’s:
The joint distribution function of \({\bar X}\) is the function which gives the probability of each of the outcomes of \({\bar X}\).
The random variables \(X_1\), \(X_2\), …, \(X_n\) are mutually independent if
for any choice of \(r_1, r_2, \ldots , r_n\). Thus, if \(X_1,~ X_2, \ldots ,~ X_n\) are mutually independent, then the joint distribution function of the random variable
is just the product of the individual distribution functions. When two random
variables are mutually independent, we shall say more briefly that they are indepen-
dent.
Let \(X_1,~ X_2, \ldots ,~ X_n\) be continuous random variables associated with an experiment, and let \({\bar X} = (X_1,~ X_2, \ldots ,~ X_n)\). Then the joint cumulative distribution function of \({\bar X}\) is defined by
The joint density function of \({\bar X}\) satisfies the following equation:
Let \(X_1\), \(X_2\), …, \(X_n\) be continuous random variables with cumulative distribution functions \(F_1(x),~ F_2(x), \ldots ,~ F_n(x)\). Then these random variables are mutually independent if
for any choice of \(x_1, x_2, \ldots , x_n\). Thus, if \(X_1,~ X_2, \ldots ,~ X_n\) are mutually independent, then the joint cumulative distribution function of the random variable \({\bar X} = (X_1, X_2, \ldots , X_n)\) is just the product of the individual cumulative distribution functions. When two random variables are mutually independent, we shall say more briefly that they are independent.
A sequence \(X_1\), \(X_2\), …, \(X_n\) of random variables \(X_i\) that are mutually independent and have the same density is called an independent trials
process.
A sequence of random variables \(X_1\), \(X_2\), …, \(X_n\) that are mutually independent and that have the same distribution is called a sequence of independent trials or an independent trials process.
Independent trials processes arise naturally in the following way. We have a single experiment with sample space \(R = \{ r_1,r_2,\dots ,r_s\} \) and a distribution function
We repeat this experiment \(n\) times. To describe this total experiment, we choose as sample space the space
consisting of all possible sequences \(\omega = (\omega _1,\omega _2,\dots ,\omega _n)\) where the value of each \(\omega _j\) is chosen from \(R\). We assign a distribution function to be the product distribution
with \(m(\omega _j) = p_k\) when \(\omega _j = r_k\). Then we let \(X_j\) denote the \(j\)th coordinate of the outcome \((r_1, r_2, \ldots , r_n)\). The random variables \(X_1\), …, \(X_n\) form an independent trials process.
The probabilities assigned to events by a distribution function on a sample space \(\Omega \) satisfy the following properties:
\(P(E) \geq 0\) for every \(E \subset \Omega \ \).
\(P(\Omega ) = 1\ \).
If \(E \subset F \subset \Omega \), then \(P(E) \leq P(F)\ \).
If \(A\) and \(B\) are disjoint subsets of \(\Omega \), then \(P(A \cup B) = P(A) + P(B)\ \).
\(P(\tilde A) = 1 - P(A)\) for every \(A \subset \Omega \ \).
The total number of permutations of a set \(A\) of \(n\) elements is given by \(n \cdot (n~ -~ 1) \cdot (n - 2) \cdot \ldots \cdot 1\).
Let \(P\) be a probability distribution on a sample space \(\Omega \), and let \(\{ A_1,\ A_2,\ \dots ,\ A_n\} \) be a finite set of events. Then
That is, to find the probability that at least one of \(n\) events \(A_i\) occurs, first add the probability of each event, then subtract the probabilities of all possible two-way intersections, add the probability of all three-way intersections, and so forth.
If the outcome \(\omega \) occurs in at least one of the events \(A_i\), its probability is added exactly once by the left side of Equation 3.3. We must show that it is added exactly once by the right side of Equation 3.3. Assume that \(\omega \) is in exactly \(k\) of the sets. Then its probability is added \(k\) times in the first term, subtracted \(k \choose 2\) times in the second, added \(k \choose 3\) times in the third term, and so forth. Thus, the total number of times that it is added is
But
Hence,
If the outcome \(\omega \) is not in any of the events \(A_i\), then it is not counted on either side of the equation.
The total number of \(k\)-permutations of a set \(A\) of \(n\) elements is given by \(n \cdot (n-1) \cdot (n-2) \cdot \ldots \cdot (n - k + 1)\).
Let \(a\) and \(b\) be two positive integers. Let \(S_{a,b}\) be the set of all ordered pairs in which the first entry is an \(a\)-shuffle and the second entry is a \(b\)-shuffle. Let \(S_{ab}\) be the set of all \(ab\)-shuffles. Then there is a 1-1 correspondence between \(S_{a,b}\) and \(S_{ab}\) with the following property. Suppose that \((T_1, T_2)\) corresponds to \(T_3\). If \(T_1\) is applied to the identity ordering, and \(T_2\) is applied to the resulting ordering, then the final ordering is the same as the ordering that is obtained by applying \(T_3\) to the identity ordering.
The easiest way to describe the required correspondence is through the idea of an unshuffle. An \(a\)-unshuffle begins with a deck of \(n\) cards. One by one, cards are taken from the top of the deck and placed, with equal probability, on the bottom of any one of \(a\) stacks, where the stacks are labelled from 0 to \(a-1\). After all of the cards have been distributed, we combine the stacks to form one stack by placing stack \(i\) on top of stack \(i+1\), for \(0 \le i \le a-1\). It is easy to see that if one starts with a deck, there is exactly one way to cut the deck to obtain the \(a\) stacks generated by the \(a\)-unshuffle, and with these \(a\) stacks, there is exactly one way to interleave them to obtain the deck in the order that it was in before the unshuffle was performed. Thus, this \(a\)-unshuffle corresponds to a unique \(a\)-shuffle, and this \(a\)-shuffle is the inverse of the original \(a\)-unshuffle.
If we apply an \(ab\)-unshuffle \(U_3\) to a deck, we obtain a set of \(ab\) stacks, which are then combined, in order, to form one stack. We label these stacks with ordered pairs of integers, where the first coordinate is between 0 and \(a-1\), and the second coordinate is between 0 and \(b-1\). Then we label each card with the label of its stack. The number of possible labels is \(ab\), as required. Using this labelling, we can describe how to find a \(b\)-unshuffle and an \(a\)-unshuffle, such that if these two unshuffles are applied in this order to the deck, we obtain the same set of \(ab\) stacks as were obtained by the \(ab\)-unshuffle.
To obtain the \(b\)-unshuffle \(U_2\), we sort the deck into \(b\) stacks, with the \(i\)th stack containing all of the cards with second coordinate \(i\), for \(0 \le i \le b-1\). Then these stacks are combined to form one stack. The \(a\)-unshuffle \(U_1\) proceeds in the same manner, except that the first coordinates of the labels are used. The resulting \(a\) stacks are then combined to form one stack.
The above description shows that the cards ending up on top are all those labelled \((0, 0)\). These are followed by those labelled \((0, 1),\ (0, 2),\) \(\ \ldots ,\ (0, b - 1),\ (1, 0),\ (1,1),\ldots ,\ (a-1, b-1)\). Furthermore, the relative order of any pair of cards with the same labels is never altered. But this is exactly the same as an \(ab\)-unshuffle, if, at the beginning of such an unshuffle, we label each of the cards with one of the labels \((0, 0),\ (0, 1),\ \ldots ,\ (0, b-1),\ (1, 0),\ (1,1),\ \ldots ,\ (a-1, b-1)\). This completes the proof.
If \(D\) is any ordering that is the result of applying an \(a\)-shuffle and then a \(b\)-shuffle to the identity ordering, then the probability assigned to \(D\) by this pair of operations is the same as the probability assigned to \(D\) by the process of applying an \(ab\)-shuffle to the identity ordering.
Call the sample space of \(a\)-shuffles \(S_a\). If we label the stacks by the integers from \(0\) to \(a-1\), then each cut-interleaving pair, i.e., shuffle, corresponds to exactly one \(n\)-digit base \(a\) integer, where the \(i\)th digit in the integer is the stack of which the \(i\)th card is a member. Thus, the number of cut-interleaving pairs is equal to the number of \(n\)-digit base \(a\) integers, which is \(a^n\). Of course, not all of these pairs leads to different orderings. The number of pairs leading to a given ordering will be discussed later. For our purposes it is enough to point out that it is the cut-interleaving pairs that determine the probability assignment.
The previous theorem shows that there is a 1-1 correspondence between \(S_{a,b}\) and \(S_{ab}\). Furthermore, corresponding elements give the same ordering when applied to the identity ordering. Given any ordering \(D\), let \(m_1\) be the number of elements of \(S_{a,b}\) which, when applied to the identity ordering, result in \(D\). Let \(m_2\) be the number of elements of \(S_{ab}\) which, when applied to the identity ordering, result in \(D\). The previous theorem implies that \(m_1 = m_2\). Thus, both sets assign the probability
to \(D\). This completes the proof.
If an ordering of length \(n\) has \(r\) rising sequences, then the number of cut-interleaving pairs under an \(a\)-shuffle of the identity ordering which lead to the ordering is
To see why this is true, we need to count the number of ways in which the cut in an \(a\)-shuffle can be performed which will lead to a given ordering with \(r\) rising sequences. We can disregard the interleavings, since once a cut has been made, at most one interleaving will lead to a given ordering. Since the given ordering has \(r\) rising sequences, \(r-1\) of the division points in the cut are determined. The remaining \(a - 1 - (r - 1) = a - r\) division points can be placed anywhere. The number of places to put these remaining division points is \(n+1\) (which is the number of spaces between the consecutive pairs of cards, including the positions at the beginning and the end of the deck). These places are chosen with repetition allowed, so the number of ways to make these choices is
In particular, this means that if \(D\) is an ordering that is the result of applying an \(a\)-shuffle to the identity ordering, and if \(D\) has \(r\) rising sequences, then the probability assigned to \(D\) by this process is
This completes the proof.
Let \(a\) and \(n\) be positive integers. Then
Thus,
In addition,
The second equation can be used to calculate the values of the Eulerian numbers, and follows immediately from the Equation 3.5. The last equation is a consequence of the fact that the only ordering of \(\{ 1, 2, \ldots , n\} \) with one rising sequence is the identity ordering. Thus, it remains to prove Equation 3.5. We will count the set of \(a\)-shuffles of a deck with \(n\) cards in two ways. First, we know that there are \(a^n\) such shuffles (this was noted in the proof of Theorem 3.10). But there are \(A(n, r)\) orderings of \(\{ 1, 2, \ldots , n\} \) with \(r\) rising sequences, and Theorem 3.11 states that for each such ordering, there are exactly
cut-interleaving pairs that lead to the ordering. Therefore, the right-hand side of Equation 3.5 counts the set of \(a\)-shuffles of an \(n\)-card deck. This completes the
proof.
For integers \(n\) and \(j\), with \(0 {\lt} j {\lt} n\), the binomial coefficients satisfy:
We wish to choose a subset of \(j\) elements. Choose an element \(u\) of \(U\). Assume first that we do not want \(u\) in the subset. Then we must choose the \(j\) elements from a set of \(n - 1\) elements; this can be done in \({{n-1} \choose j}\) ways. On the other hand, assume that we do want \(u\) in the subset. Then we must choose the other \(j - 1\) elements from the remaining \(n - 1\) elements of \(U\); this can be done in \({{n-1} \choose {j - 1}}\) ways. Since \(u\) is either in our subset or not, the number of ways that we can choose a subset of \(j\) elements is the sum of the number of subsets of \(j\) elements which have \(u\) as a member and the number which do not—this is what Equation 3.1 states.
The binomial coefficients are given by the formula
Each subset of size \(j\) of a set of size \(n\) can be ordered in \(j!\) ways. Each of these orderings is a \(j\)-permutation of the set of size \(n\). The number of \(j\)-permutations is \((n)_j\), so the number of subsets of size \(j\) is
This completes the proof.
Given \(n\) Bernoulli trials with probability \(p\) of success on each experiment, the probability of exactly \(j\) successes is
where \(q = 1 - p\).
We construct a tree measure as described above. We want to find the sum of the probabilities for all paths which have exactly \(j\) successes and \(n - j\) failures. Each such path is assigned a probability \(p^j q^{n - j}\). How many such paths are there? To specify a path, we have to pick, from the \(n\) possible trials, a subset of \(j\) to be successes, with the remaining \(n-j\) outcomes being failures. We can do this in \(n \choose j\) ways. Thus the sum of the probabilities is
(Binomial Theorem) The quantity \((a + b)^n\) can be expressed in the form
To see that this expansion is correct, write
When we multiply this out we will have a sum of terms each of which results from a choice of an \(a\) or \(b\) for each of \(n\) factors. When we choose \(j\) \(a\)’s and \((n - j)\) \(b\)’s, we obtain a term of the form \(a^j b^{n - j}\). To determine such a term, we have to specify \(j\) of the \(n\) terms in the product from which we choose the \(a\). This can be done in \(n \choose j\) ways. Thus, collecting these terms in the sum contributes a term \({n \choose j} a^j b^{n - j}\).
Two events \(E\) and \(F\) are independent if and only if
Let \(X_1\), \(X_2\), …, \(X_n\) be continuous random variables with density functions \(f_1(x),~ f_2(x), \ldots ,~ f_n(x)\). Then these random variables are mutually independent if and only if
for any choice of \(x_1, x_2, \ldots , x_n\).
Let \(X_1, X_2, \ldots , X_n\) be mutually independent continuous random variables and let \(\phi _1(x), \phi _2(x), \ldots , \phi _n(x)\) be continuous functions. Then \(\phi _1(X_1),\)
\(\phi _2(X_2), \ldots , \phi _n(X_n)\) are mutually independent.