[통계학] 3.2 베르누이 분포, 이항 분포 Bernoulli Distribution, Binomial Distribution

동전 던지기, 질병의 진단, 찬반 투표와 같이 결과가 2가지로 한정되는 실험을 Bernoulli trial이라고 부른다. 이런 실험을 여러 차례 반복하여 결과가 몇 번 나왔는지에 대한 분포가 이항 분포이다.

#Bernoulli Distribution

주어진 확률 \( 0 \le p \le 1\) 에 대하여,

\[ X = \left\{ \begin{array}{cl} 1 & \text{with probability } ~p \\ 0 & \text{with probability } ~(1-p) \end{array} \right.\]

를 Bernoulli distribution이라고 한다. 보통 \(X=1\) 을 '성공', \(X=0\) 를 '실패'라고 부르기도 한다.^[각주:1] 평균과 분산은 위에 주어진 확률 분포로부터 직접 구할 수 있다.

\[ E[X] = 1 \times p + 0 \times (1-p) = p \]

\[ E[X^2] = 1^2 \times p + 0^2 \times (1-p) = p\]

\[ \mathrm{Var}(X) = E[X^2] - (E[X])^2 = p(1-p) \]

#Binomial Distribution

이제 Bernoulli trial을 \(n\)번 반복하는 실험을 생각해보자. 실험의 결과는 1(성공)과 0(실패)가 여러 횟수 나올 것이다. 이제 random variable \(Y\)를 \(n\)번의 실험 동안 성공의 횟수라고 정의하자.

\[ Y = \text{the number of successes in } n ~ \text{trials} \]

우리의 목표는 \(Y=y\) 일 확률을 구하는 것이다. 먼저, 4번의 반복 실험 동안, 3번의 성공의 확률을 구해보자. 3번의 성공이 되기 위해서는 처음 1번은 실패하고, 그 다음 3번이 성공하는 것이 경우가 가능할 것이다. 이 경우의 확률은

\[ \begin{gather*} {\scriptstyle \text{failure}} & & {\scriptstyle \text{success}} & & {\scriptstyle \text{success}} & & {\scriptstyle \text{success}} \\ (1-p) & \times & p & \times & p & \times & p & = & p^3(1-p)^1 \end{gather*} \]

그러나 3번이 성공하는 경우는 이것만이 아니고 다른 경우도 가능하다.

\[ \begin{gather*} {\scriptstyle \text{success}} & & {\scriptstyle \text{failure}} & & {\scriptstyle \text{success}} & & {\scriptstyle \text{success}} \\ p & \times & (1-p) & \times & p & \times & p & = & p^3(1-p)^1 \end{gather*} \]

\[ \begin{gather*} {\scriptstyle \text{success}} & & {\scriptstyle \text{success}} & & {\scriptstyle \text{failure}} & & {\scriptstyle \text{success}} \\ p & \times & p & \times & (1-p) & \times & p & = & p^3(1-p)^1 \end{gather*} \]

\[ \begin{gather*} {\scriptstyle \text{success}} & & {\scriptstyle \text{success}} & & {\scriptstyle \text{success}} & & {\scriptstyle \text{failure}} \\ p & \times & p & \times & p & \times & (1-p) & = & p^3(1-p)^1 \end{gather*} \]

따라서 4번의 반복 실험 동안, 3번의 성공 확률은 \(4p^3(1-p)^1\)이 된다. 같은 방식으로 \(n\) 번의 반복 실험 동안, \(y\)번의 성공확률은

\[ P(Y=y) = \left(\begin{array}{c} \text{the number of }y~\text{successes} \\ \text{cases in } n ~\text{trials} \end{array}\right) \times p^y (1-p)^{n-y} \]

이 때, \(y\)번 성공의 경우의 수는^[각주:2]

\[ \left( \begin{array}{c} n \\ y \end{array} \right) = \frac{n!}{y!(n-y)!} \]

이 된다. 따라서,

\[ f_Y(y) = P(Y=y) = \left( \begin{array}{c} n \\ y \end{array} \right) p^y (1-p)^{n-y} ~~~~~ \text{where } y=0,1,2,\cdots,n \]

이 함수가 확률 분포가 되려면 총 합이 1이 되어야 한다. 이를 확인하기 위해서는 다음 정리가 필요하다.

THEOREM Binomial Theorem

자연수 \(n\) 에 대하여,

\[ (x+y)^n = \sum _{i=0} ^n \left( \begin{array}{c} n \\ i \end{array} \right) x^i y^{n-i} \]

Binomial theorem에 의해

\[ \sum _{y=0} ^n f_Y(y) = \sum _{y=0} ^n \left( \begin{array}{c} n \\ y \end{array} \right) p^y (1-p)^{n-y} = (p + (1-p))^n = 1 \]

확률 분포 \(f_Y\) 를 binomial \((n,p)\) distribution이라고 하고, 랜덤 변수 \(Y\) 가 binomial \((n,p)\) distribution을 따른다는 것을 다음과 같이 표기한다.

\[ Y \sim B(n,p)\]

다음 그림은 몇몇 binomial distribution의 pmf이다.

Tayste / Public domain via Wikimedia

#Mean of Binomial Distribution

Binomial distribution의 평균을 구하기 위해서는 다음의 식이 필요하다.^[각주:3]

THEOREM

\[ x \left( \begin{array}{c} n \\ x \end{array} \right) = n \left( \begin{array}{c} n-1 \\ x-1 \end{array} \right) \]

(증명)

\[ x \left( \begin{array}{c} n \\ x \end{array} \right) = x \frac{n!}{x!(n-x)!} = \frac{n\times (n-1)!}{(x-1)![(n-1)-(x-1)]!} = n \left( \begin{array}{c} n-1 \\ x-1 \end{array} \right) \]

(증명끝)

이제 평균을 구해보자.

\[ \begin{align*} E[Y] &= \sum _{y=0} ^n y\left( \begin{array}{c} n \\ y \end{array} \right)p^y (1-p)^{n-y} \\ &= \sum _{y=1} ^n y\left( \begin{array}{c} n \\ y \end{array} \right)p^y (1-p)^{n-y} & & {\scriptstyle \leftarrow ~ \text{omit }y=0}\\ &= \sum _{y=1} ^n n\left( \begin{array}{c} n-1 \\ y-1 \end{array} \right)p^y (1-p)^{n-y} \\ &= \sum _{z=0} ^{n-1} n\left( \begin{array}{c} n-1 \\ z \end{array} \right)p^{z+1} (1-p)^{n-(z+1)} & & {\scriptstyle \leftarrow ~ \text{substitute } z=y-1}\\ &= np\sum _{z=0} ^{n-1} \left( \begin{array}{c} n-1 \\ z \end{array} \right)p^{z} (1-p)^{(n-1)-z} \\ &= np & & {\scriptstyle \leftarrow ~ \text{sum} = \text{[sum of }B(n-1,p)] = 1} \end{align*} \]

#Variance of Binomial Distribution

평균과 같은 방식으로 분산을 구할 수 있다.

\[ \begin{align*} E[Y^2] &= \sum _{y=0} ^n y^2 \left( \begin{array}{c} n \\ y \end{array} \right)p^y (1-p)^{n-y} \\ &= \sum _{y=1} ^n y^2 \left( \begin{array}{c} n \\ y \end{array} \right)p^y (1-p)^{n-y} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {\scriptstyle \leftarrow ~ \text{omit }y=0} \\ &= \sum _{y=1} ^n yn \left( \begin{array}{c} n-1 \\ y-1 \end{array} \right) p^y(1-p)^{n-y} \\ &= n \sum _{z=0} ^{n-1} (z+1) \left( \begin{array}{c} n-1 \\ z \end{array} \right) p^{z+1} (1-p)^{n-1-z} ~~~~~~~~~~~~~~~~~ {\scriptstyle \leftarrow ~ \text{substitute } z=y-1} \\ &= np \sum _{z=0} ^{n-1} z \left( \begin{array}{c} n-1 \\ z \end{array} \right) p^z (1-p)^{(n-1)-y} + np \sum _{z=0} ^{n-1} z \left( \begin{array}{c} n-1 \\ z \end{array} \right) p^z (1-p)^{(n-1)-y} \\ \\ &= np(n-1)p + np ~~~~~~~~~~~~~~~~~~~~ {\scriptstyle \leftarrow ~ \text{first sum}=\text{mean of }B(n-1,p)~,~ \text{second sum}=[\text{sum of }B(n-1,p)]=1} \end{align*} \]

마지막에서 두번째 줄의 sum은 각각 \(B(n-1,p)\)의 평균과 pmf합이다. 이 결과를 이용하면

\[ \mathrm{Var}(Y) = np(1-p) \]

#Moment Generating Function of Binomial Distribution

Binomial distribution의 mgf는 binomial theorem으로부터 쉽게 구할 수 있다.

\[ M_Y(t) = \sum _{y=0} ^n e^{ty} \left( \begin{array}{c} n \\ y \end{array} \right)p^y (1-p)^{n-y} = \sum _{y=0} ^n \left( \begin{array}{c} n \\ y \end{array} \right) (pe^t)^y (1-p)^{n-y} = [pe^t + (1-p)]^n \]

부르는 것은 실험의 종류에 맞춰서 설정하면 된다. 예를 들어, 동전 던지기 실험에서는 '앞면', '뒷면'으로 부를 수 있다. [본문으로]
고등학교 수학에서 배운 combination \\({}_nC_y\\)이다. n명의 사람 중에서 순서 상관없이 y명을 뽑는 방법의 수. [본문으로]
다음 정리로부터 \\(\\begin{pmatrix}n\\\\i\\end{pmatrix}\\)를 binomial coefficient라고 부른다. [본문으로]

저작자표시 비영리 동일조건 (새창열림)

'Mathematics > 통계학' 카테고리의 다른 글

[통계학] 3.5 음이항 분포, 기하 분포 Negative Binomial Distribution, Geometric Distribution (0)	2020.07.31
[통계학] 3.4 초기하 분포 Hypergeometric Distribution (0)	2020.07.25
[통계학] 3.3 푸아송 분포, 푸아송 프로세스 Poisson Distributions, Poisson Process (0)	2020.07.24
[통계학] 3.1 이산 균등 분포 Discrete Uniform Distribution (0)	2020.07.24
[통계학] 2.3 분산, 모멘트 생성 함수 Variance, Moment Generating Functions (2)	2020.07.24
[통계학] 2.2 기대값 Expected Values (0)	2020.07.23

피그티의 기초물리