Lecture Note: Econ 703 (week 1)

Date: 2023.08.29
This class focus on modes of convergence.
almost sure convergence
defined by $P (lim)$
convergence in probability
defined by $lim (P)$
convergence in distribution/week convergence
defined by CDF or probability in set $P (X_{n} \in A)$

Modes of Convergence

Probability Space

Definition a probability space by $(Ω, F, P)$ , and $X_{n}, Ω \to R$

Sure convergence

lim_{n \to \infty} X_{n} (ω) = X (ω), for any ω \in Ω

almost sure convergence

Define as $X_{n} \overset{a . s .}{\to} X$ .

Suppose we have a random variable $X_{n}$ and $X$ , then almost sure convergence means:

P (lim_{n \to \infty} X_{n} = X) = 1 o r P (lim_{n \to \infty} d (X_{n}, X) = 0) = 1

Note: strongest one, and defined on probability is nearly the same everywhere.

Convergence in probability

Define $X_{n} \overset{p}{\to} X$ as $X_{n}$ converges in probability to $X$ , and the key is the probability of limit.

lim_{n \to \infty} P (| X_{n} - X | \leq ϵ) = 0, for any ϵ

Properties

Convergence in probability → Convergence in distribution
Convergence in distribution → Convergence in probability when X = c (a constant)
[continuous mapping theorem] for any continuous function $g$ :
$X_{n} \overset{p}{\to} X, g (X_{n}) \overset{p}{\to} g (X)$
For $X_{n} \overset{p}{\to} X, Y_{n} \overset{p}{\to} Y$ , we can have $X_{n} + Y_{n} \overset{p}{\to} X + Y$
- in some special cases, it applies for conv in distribution
- if $X_{n}, Y_{n}$ are independent, we have $X_{n} - Y_{n} \overset{p}{\to} X - Y$

Convergence in distribution

Other names can be converge weakly/converge in law.

F_{X_{n}} (x) \to F (x)

Properties

Portmanteau lemma
following two arguments are same:

X_{n} \overset{d}{\to} X lim_{n \to \infty} P (X_{n} \in A) = P (X \in A)

Continuous mapping theorem
for any continuous function $g$ :
$X_{n} \overset{d}{\to} X, g (X_{n}) \overset{d}{\to} g (X)$

Other: Convergence in mean

Define $X_{n}$ converges in $r^{t h}$ mean towards the random variable $X$ if and only if:

lim_{n \to \infty} E (| X_{n} - X |^{r}) = 0

So some special cases are:

when $r = 1$ , we say $X_{n}$ converges in mean to $X$
when $r = 2$ , we say $X_{n}$ converges in mean square to $X$

Theorem from Marginal Joint distribution

For $X_{n} \overset{p}{\to} X, Y_{n} \overset{p}{\to} Y$ , we have marginal joint distribution $(X_{n}, Y_{n}) \overset{p}{\to} (X, Y)$

But when it comes to converges in distribution, we need extra assumptions:

For $X_{n} \overset{p}{\to} X, Y_{n} \overset{p}{\to} C$ , we can have $(X_{n}, Y_{n}) \overset{d}{\to} (X, C)$

Stochastic Order Notation

Big O: stochastic boundedness

$X_{n} = O (r_{n})$ $| \frac{X_{n}}{r_{n}} | < M \in C$ iff

P (| \frac{X_{n}}{a_{n}} | > M) < ε, \forall n > N

which means $X_{n}$ is stochastically bounded.

Small o: convergence in probability

$X_{n} = o (r_{n})$ iff

lim_{n \to \infty} P (| \frac{X_{n}}{r_{n}} | \geq ϵ) = 0

for every positive $ϵ$ , which means that $r_{n}$ increase much faster than $X_{n}$ in any time.

Theorems

$o_{p} (1) + o_{p} (1) = o_{p} (1)$
$o_{p} (1) + O_{p} (1) = O_{p} (1)$
$o_{p} (1) O_{p} (1) = o_{p} (1)$

LLNs

This section covers several variants of LLM(Law of Large numbers).

Here we use notation as follows:

${X_{i}}$ is a series of random vars ${X_{1}, X_{2}, . . . X_{n}}$
Denote $E$ as mean of var, and denote $v a r (X_{i})$ as variance
i.i.d: identical and independent distribution
Different convergence: $\overset{d}{\to}, \overset{p}{\to}$

Theorem(1713)

Given ${X_{i}} \overset{i . i . d}{\to} B e r n o u l l i (p)$ , then $\overset{―}{X_{n}} \overset{d}{\to} p$

Note: a special case for LLN.

Forms of Theorems

when we wanna find a specific theorem, we should define:

Assumptions

independent vars
allows distribution not change

Conclusions

convergence in p (weak) or in a.s. (strong)
shape of tails (fat/not fat)

Theorem (pre Chebyshev weak LLN)

For ${X_{i}}$ all independent (not i.i.d), we have $E (X - i) = μ < \infty$ and $v a r (X_{i}) \leq M$ for all $i$ , then there exists $0 < M < \infty$ such that:

\overset{―}{X_{n}} \overset{p}{\to} μ

Assumption about variance can be relaxed to: $\frac{1}{n^{2}} \sum_{i} v a r (X_{i}) \leq o (1)$
Note: when random var $X_{i}$ is not unusual (infinite variance), averaging converges to expectation.

Proof

Given $E (X_{i}) = μ$ , we know $E (\overset{―}{X}) = μ$
use property: $P (| \overset{―}{X_{n}} - E (X_{n}) | \geq ϵ) \leq \frac{v a r (\overset{―}{X_{n}})}{ϵ^{2}}$
from a bounded variance, we can know:
$P (| \overset{―}{X_{n}} - E (X_{n}) | \geq ϵ) \leq \frac{v a r (\overset{―}{X_{n}})}{ϵ^{2}} \leq \frac{1}{ϵ^{2}} \frac{M}{n}$
According to definition of convergence in p, when $n \to \infty$ , we ensure that $lim_{n \to \infty} P (| \overset{―}{X_{n}} - E (X_{n}) | \geq ϵ) \leq lim_{n \to \infty} \frac{1}{ϵ^{2}} \frac{M}{n} = 0$

Theorem (Kolmogorov's 2nd strong LLN)

Given ${X_{i}}$ are i.i.d, then

\overset{―}{X_{n}} \to μ a.s.

iff $E (X_{i})$ exists and equals to $μ$ for all i

for i.i.d sequence, not all finite variance for all vars, no a.s. convergence

CLTs

This section covers several variants of CLT(Central Limit Theorem).

From LLN we know:

(\overset{―}{X_{n}} - μ) \overset{p}{\to} 0

Now we wanna know the "shape" of such convergence, which is about asymptotic distribution/density.

An intuitive approach is to add a scaler $f (n)$ to enlarge the item:

f (n) (\overset{―}{X_{n}} - μ) \overset{p}{\to} ?

CLT implies that for a special scaler:

f (n) = \sqrt{n}

we have a magical property that we can have a special distribution $Z$ :

\sqrt{n} (\overset{―}{X_{n}} - μ) \overset{d}{\to} Z

Theorem (Lindeberg-Lévy CLT)

Given:

${X_{i}}$ are i.i.d
$E (X_{i}) = μ$ and $v a r (X_{i}) = σ^{2}$

then we have:

\sqrt{n} (\frac{\overset{―}{X_{n}} - μ}{σ}) \overset{d}{\to} N (0, 1)

Note: a special scaler results in a special distribution, and it is robust for all random vars

Theorem (Cramér–Wold, vector-form)

The above theorem can be easily generalized to vector form.

Following are equal:

$X_{n} \overset{d}{\to} X$
$λ^{'} X_{n} \overset{d}{\to} λ^{'} X$ for all $λ \in R^{k}$

So $λ^{'} X_{n}$ is the linear combination of the vector of random vars.

Theorem (multi-var form)

${X_{i}}$ are i.i.d
$E (X_{i}) = μ$ and $v a r (X_{i}) = v$

then we have:

\sqrt{n} (\frac{\overset{―}{X_{n}} - μ}{σ}) \overset{d}{\to} N (0, v)

Theorem (Berry-Esseen)

let $J_{n} (t) = P (\sqrt{n} (\frac{\overset{―}{X_{n}} - μ}{σ}) \leq t)$ , which is our targeted CDF.

let ${X_{i}}$ be i.i.d with finite 3rd moment, then there exists constant $c$ such that:

∥ J_{n} - Φ ∥_{\infty} \leq c \frac{E (| X - E (X) |)^{3}}{v a r (X)^{3 / 2} \sqrt{} n}

Note: our target CDF is bounded, and generally we can find a small

Modes of Convergence ​

Probability Space ​

Sure convergence ​

almost sure convergence ​

Convergence in probability ​

Properties ​

Convergence in distribution ​

Properties ​

Other: Convergence in mean ​

Theorem from Marginal Joint distribution ​

Stochastic Order Notation ​

Big O: stochastic boundedness ​

Small o: convergence in probability ​

Theorems ​

LLNs ​

Theorem(1713) ​

Forms of Theorems ​

Theorem (pre Chebyshev weak LLN) ​

Theorem (Kolmogorov's 2nd strong LLN) ​

CLTs ​

Theorem (Lindeberg-Lévy CLT) ​

Theorem (Cramér–Wold, vector-form) ​

Theorem (multi-var form) ​

Theorem (Berry-Esseen) ​

Modes of Convergence

Probability Space

Sure convergence

almost sure convergence

Convergence in probability

Properties

Convergence in distribution

Properties

Other: Convergence in mean

Theorem from Marginal Joint distribution

Stochastic Order Notation

Big O: stochastic boundedness

Small o: convergence in probability

Theorems

LLNs

Theorem(1713)

Forms of Theorems

Theorem (pre Chebyshev weak LLN)

Theorem (Kolmogorov's 2nd strong LLN)

CLTs

Theorem (Lindeberg-Lévy CLT)

Theorem (Cramér–Wold, vector-form)

Theorem (multi-var form)

Theorem (Berry-Esseen)