Data Analysis

PROBABILITY THEORY AND RANDOM VARIABLES

The probability space is denoted by $Ω$ and its elements $ω$ are samples or experimental outcomes. Certain subsets (collection of outcomes) $Ω$ are called events $Λ$ .

The set theory notation is used. If $A$ and $B$ are two sets, then $A \cup B$ , ( $A + B$ ) is their union, $A \cap B$ , ( $A B$ ) is their intersection, $A - B$ is the complement of $B$ with respect to $A$ , the empty set is denoted by $0$ . If $A \cap B = 0$ , that is the sets are disjoint then the events are mutually exclusive.

For a class of events $Λ$ we assign probabilities to events $Λ$ via a probability function $\Pr (.)$ . That is, to each event we assign a number $\Pr (Λ)$ , called the probability of $Λ$ . The probability function satisfies the following axioms:

(i)	$\Pr (Λ) \geq 0$
(ii)	$\Pr (Ω) = 1$
(iii)	if $Λ_{i} \cap Λ_{i} = 0, i \neq i, i, j = 1 \dots n$ , then $\Pr (Λ_{1} \cup Λ_{2} \cup \dots \cup Λ_{n}) = \Pr (Λ_{1}) + \Pr (Λ_{2}) + \dots + \Pr (Λ_{n})$
(iv)	if $Λ_{i} \cap Λ_{i} = 0, i \neq i, i, j = 1 \dots n, \dots$ , then $\Pr (Λ_{1} \cup Λ_{2} \cup \dots \cup Λ_{n} \cup \dots) = \Pr (Λ_{1}) + \Pr (Λ_{2}) + \dots + \Pr (Λ_{n}) + \dots$

The class of events has to be defined. In defining the class of events we want the set operations (unions, intersections, complements) to yield sets that are also events. A class of sets having these properties is called a Borel field. A class $F$ of $ω$ sets is called a Borel field if

(i)	$Ω \in F$
(ii)	if $Λ \in F$ then $Ω - Λ \in F$
(iii)	if $Λ_{1}, Λ_{2}, \dots, Λ_{n} \in F$ then $\cup_{1}^{n} Λ_{n} \in F$ and $\cap_{1}^{n} Λ_{n} \in F$
(iv)	if $Λ_{1}, Λ_{2}, \dots, Λ_{n}, \dots \in F$ then $\cup_{1}^{n} Λ_{\infty} \in F$ and $\cap_{1}^{n} Λ_{\infty} \in F$

The triplet $(Ω, F, \Pr)$ is called an experiment.

Example: $Ω = {read numbers}$ , Borel field $F = {ω \leq x_{1} for all x_{1}}$ .

Example, rolling a die once: $Ω = {1, 2, 3, 4, 5, 6}$ , Borel field $F = {0, [1, 3, 5], [2, 4, 6], Ω}$ . But $A = {0, [1, 3, 5], [2, 4, 6], [1], Ω}$ is not a Borel field, because ${1} \cup {2, 4, 6} = {1, 2, 4, 6} \notin A$ .

RANDOM VARIABLES

A real finite-valued function $x (.)$ defined on $Ω$ is called a (real) random variable if, for every real number $x$ , the inequality $x (ω) \leq x$ defines an $ω$ set whose probability is defined. A random variable is a Borel measurable function.

For a random variable the function

F_{x} (x) = \Pr {x (ω) \leq x}

(3.1)

is defined for all real $x$ and is called the distribution function of the random variable $x$ . A random variable $x$ is called discrete if there exists a mass function $m_{x} (.)$ such that

F_{x} (x) = \sum_{\underset{m_{x} (ξ) \geq 0}{ξ \leq x}} m_{x} (ξ)

(3.2)

A random variable is called continuous if there exists a density function $p_{x} (.)$ such that

F_{x} (x) = \int_{- \infty}^{x} p_{x} (ξ) d ξ, - \infty \leq x \leq \infty

(3.3)

If the number of points at which $F_{x} (.)$ is not differentiable is countable then

p_{x} (x) = \frac{d}{d x} F_{x} (x)

(3.4)

at all $x$ at which the derivative exists.

The expectation, average, mean or first moment of a continuous random variable is defined by

E {x} = \int_{- \infty}^{\infty} x p_{x} (x) d x

(3.5)

The nth moment of $x$ is defined by

E {x^{n}} = \int_{- \infty}^{\infty} x^{n} p_{x} (x) d x

(3.6)

The second moment $E {x^{2}}$ is called the mean square value.

The nth central moment of $x$ is define by

E {{(x - E {x})}^{n}} = \int_{- \infty}^{\infty} {(x - E {x})}^{n} p_{x} (x) d x

(3.7)

The second central moment is called the variance of $x$ .

Example: rolling a die. We can define a probability space $Ω = {1, 2, 3, 4, 5, 6}$ , a random variable $x (ω)$ , and probability $\Pr (ω)$ as for example given in the table

$Ω (ω)$	1	2	3	4	5	6
$x (ω)$	-30	-20	-10	10	10	30
$\Pr (ω)$	1/6	1/6	1/6	1/6	1/6	1/6

Let us introduce subsets (events, corresponding to odd or even numbers) $Λ_{1} = {1, 3, 5}$ . and $Λ_{2} = {2, 4, 6}$ . The corresponding Borel field is $F = {0, Λ_{1}, Λ_{2}, Ω}$ and the corresponding probabilities are elements of the row matrix $\Pr = [0, 0.5, 0.5, 1]$ . This is also the case of tossing a coin with two events heads or tails.

Example: choice of a random phase $ψ$ from the interval $- π < ψ \leq π$ in continuum, thus $Ω = {- π < ψ \leq π}$ . Let us define the random variable $x (ω) = ψ$ and the probability function is $\Pr {ψ_{1} \leq ψ \leq ψ_{2}} = (ψ_{1} - ψ_{2}) / (2 π)$ , $ψ_{1}, ψ_{2} \in [- π, π]$

The corresponding distribution function is:

if $ψ > π$ then $F_{x} = 1$ ,

if $- π \leq ψ \leq π$ then $F_{x} = (π + ψ) / (2 π)$ ,

if $ψ \leq - π$ then $F_{x} = 0$ .

The standard uniform density function results by differentiating the distribution function with respect to the random variable $x = ψ$ .

A random variable $x$ is Gaussian or normally distributed if its density function is given by

p_{x} (x) = \sqrt{\frac{1}{2 π σ^{2}}} \exp [- \frac{1}{2} {(\frac{x - m}{σ})}^{2}]

(3.8)

where $m = E {x}$ and $σ^{2} = E {{(x - m)}^{2}}$ ( $m$ - mean $σ^{2}$ - variance).

The normal distribution function is

F_{x} (x) = \sqrt{\frac{1}{2 π σ^{2}}} \int_{- \infty}^{x} \exp [- \frac{1}{2} {(\frac{ξ - m}{σ})}^{2}] d ξ = \frac{1}{2} + \erf \frac{x - m}{σ}

(3.9)

It is convenient to specify a random variable by its characteristic function defined as the Fourier transform of the density function

φ_{x} (u) = \int_{- \infty}^{\infty} e^{i u x} p_{x} (x) d x

(3.10)

The nth moment of the random variable is

E {x^{n}} = \int_{- \infty}^{\infty} x^{n} p_{x} (x) d x

(3.11)

It is easy to verify the following relation

\frac{1}{i^{n}} \frac{d^{n}}{d u^{n}} φ_{x} (0) = {\int_{- \infty}^{\infty} x^{n} e^{i u x} p_{x} (x) d x |}_{u = 0} = E {x^{n}}

(3.12)

Thus when the characteristic function is calculated it is easy to calculate the values of the set of nth moments by differentiation.

The characteristic function for a Gaussian density with expectation $m = 0$ is

φ_{x} (u) = \sqrt{\frac{1}{2 π σ^{2}}} \int_{- \infty}^{\infty} \exp [i u x - \frac{1}{2} {(\frac{x}{σ})}^{2}] d x

(3.13)

(The assumption that $m = 0$ is not a serious loss of generality because from all samples we can subtract the value of $m$ to obtain a random variable with mean values equal zero.)

Introducing the change of variables in the integral $y = x / σ$ , $x = σ y$ , $d x = σ d x$ , the integration finally yields the following expression

φ_{x} = \exp (- \frac{1}{2} u^{2} σ^{2})

(3.14)

Differentiation and substitution into the expressions for the nth central moments lead to the conclusion that for odd values of $n$ the central moments are zero and it is easy to establish a relation between the expression for consecutive even numbers of $n$ . The final result is that for a normal distribution the nth central moment is equal to

E {{(x - m)}^{n}} = \{\begin{matrix} 0, & all odd n \geq 1 \\ 1 \cdot 3 \cdot 5 \cdot \dots \cdot (n - 1) σ^{n}, & all even n \geq 2 \end{matrix}

(3.15)

The even central moments grow without bound when $n$ tends to infinity.

JOINTLY DISTRIBUTED RANDOM VARIABLES

Let us consider two random variables $x$ and $y$ . The two sets ${x (ω) \leq x}$ and ${y (ω) \leq y}$ are events with probabilities

\Pr {x (ω) \leq x} = F_{x} (x)

\Pr {y (ω) \leq y} = F_{y} (y)

(3.16)

where $F_{x} (x)$ are $F_{y} (y)$ distribution functions of the random variables $x$ and $y$ . The intersection of these two sets

{x (ω) \leq x} \cap {y (ω) \leq y} = {x (ω) \leq x, y (ω) \leq y}

(3.17)

is an event. The probability of this event is the joint distribution function of the jointly distributed random variables $x$ and $y$

F_{x, y} (x, y) = \Pr {x (ω) \leq x, y (ω) \leq y} .

(3.18)

Example: temperature measured $x (ω)$ at 6 at night and $y (ω)$ at 12 (at the same day). It is possible to consider the random variables $x$ and $y$ separately or to look at the pair $x (ω)$ , $y (ω)$ jointly. To estimate the joint probability one has to consider a two dimensional problem related to the $x$ , $y$ plane.

In general the continuous random variables $x_{1}, x_{2}, \dots, x_{n}$ defined on the same probability space are said to be jointly distributed. They may be characterized by their joint distribution function

F_{x_{1}, x_{2}, \dots, x_{n}} (x_{1} x_{2} \dots x_{n}) = \Pr {x_{1} (ω) \leq x_{1}, \dots, x_{n} (ω) \leq x_{n}},

(3.19)

where

{x_{1} (ω) \leq x_{1}, \dots, x_{n} (ω) \leq x_{n}} = {x_{1} (ω) \leq x_{1}} \cap \dots \cap {x_{n} (ω) \leq x_{n}},

(3.20)

or by their joint density function

F_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) = \int_{- \infty}^{x_{1}} \dots \int_{- \infty}^{x_{n}} p_{x_{1}, \dots, x_{n}} (ξ_{1} \dots ξ_{n}) d ξ_{1}, \dots, d ξ_{n} .

(3.21)

For the differentiable case

p_{x_{1}, \dots, x_{n}} (ξ_{1} \dots ξ_{n}) = \frac{\partial^{n}}{\partial x_{1}, \dots, \partial x_{n}} F_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) .

(3.22)

The marginal distribution function is defined by

F_{x_{1}, \dots, x_{m}} (x_{1} \dots x_{m}) = F_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{m} \infty \dots \infty) .

(3.23)

The marginal density function is

p_{x_{1}, \dots, x_{m}} (x_{1} \dots x_{m}) = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} p_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) d x_{m + 1}, \dots, d x_{n} .

(3.24)

The expectation of $x_{k}, 1 \leq k \leq n$ is given by

m_{k} = E {x_{k}} = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} x_{k} p_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) d x_{1}, \dots, d x_{n} .

(3.25)

The second moment of $x_{k}$ is

E {x_{k}^{2}} = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} x_{k}^{2} p_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) d x_{1}, \dots, d x_{n} .

(3.26)

Of great importance in applications is the covariance of $x_{k}$ and $x_{l}$ which is defined by

\begin{array}{l} cov {x_{k}, x_{l}} = \\ \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} (x_{k} - m_{k}) (x_{l} - m_{l}) p_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) d x_{1}, \dots, d x_{n} . \end{array}

(3.27)

The generalization of the higher moments and central moments from the case of one random variable to the joint variables is straightforward.

Two jointly distributed random variables $x_{1}$ and $x_{2}$ are independent if any of the following equivalent conditions is satisfied

$F_{x_{1}, x_{2}} (x_{1}, x_{2}) = F_{x_{1}} (x_{1}) F_{x_{2}} (x_{2}),$	(3.28)
$p_{x_{1}, x_{2}} (x_{1}, x_{2}) = p_{x_{1}} (x_{1}) p_{x_{2}} (x_{2}),$

We say that $x_{1}, \dots, x_{n}$ are mutually independent if

p_{x_{1}, \dots, x_{n}} (x_{1} \dots x_{n}) = p_{x_{1}} (x_{1}) \cdot \dots \cdot p_{x_{n}} (x_{n}) .

(3.29)