ベイズ統計学への入門（その04/X） - 「大人の教養・知識・気付き」を伸ばすブログ

はじめに

　さまざまなテキスト

ベイズ統計の理論と方法

作者:渡辺澄夫
コロナ社

Amazon

実践Ｄａｔａ　Ｓｃｉｅｎｃｅシリーズ　ＲとＳｔａｎではじめる　ベイズ統計モデリングによるデータ分析入門 (ＫＳ情報科学専門書)

作者:馬場真哉
講談社

Amazon

などを参照しながらベイズ統計学について学んでいきます。
　また理論だけでなく、可能な限りシミュレーションを含めていくこととし、それも $\mathrm{R}$ , $\mathrm{Stan},$ $\mathrm{Python}$ や $\mathrm{Julia}$ など幅広い言語で実装していきたい。

各種バージョン情報

OS
　Windows 11 Home 22H2

R
　R-4.1.3

RStudio
　RStudio 2022.02.2+485 "Prairie Trillium" Release (8acbd38b0d4ca3c86c570cf4112a8180c48cc6fb, 2022-04-19) for Windows

Python
　3.11.0

Jupyter Notebook
　6.4.12

Julia
　1.8.0

前回

https://power-of-awareness.com/entry/2023/02/28/050000power-of-awareness.com

はじめに
前回
今回のまとめ
3.　Bayes推定
参考文献

今回のまとめ

母数空間を $\Theta$ とする。実数 $0\lt\alpha\lt1$ に対して
$\begin{aligned}\displaystyle{\int_{C}f(\theta|x)d\theta}\geq\alpha\end{aligned}$
を満たすような $C\subset\Theta$ を確率 $\alpha$ の信用集合(確信集合)という。信用集合( $\mathrm{credible\ set}$ )は事後分布の不確実性を伝える有効な手法の1つである。

定義からでは信用集合は一意に決まらない。 $C=\left[a_{\alpha},b_{\alpha}\right]$ とする等裾事後信用区間( $\mathrm{equal}$ - $\mathrm{tailed}$ $\mathrm{posterior}$ $\mathrm{credible}$ $\mathrm{intervals}$ )や、最高事後密度区間( $\mathrm{HPD}$ 区間) $S_{\alpha}$ を採用することが多い。

$\mathrm{Bayes}$ 統計学における仮説検定 $H_0=\{\theta\in S_0\}$ は、パラメータの事後分布 $p(\theta|\mathcal{D})$ において仮説 $H_i$ が成り立つ事後確率
$\begin{aligned}p_i=P_{p(\theta|\mathcal{D})}\{\theta\in S_i\}=\displaystyle{\int_{S_i}p(\theta|\mathcal{D})d\theta}\end{aligned}$
を評価する。

具体的には、 $\mathrm{Bayes}$ 統計学における2つの仮説に対する仮説検定
$\begin{aligned}H_0=\{\theta\in S_0\}\ v.s.\ H_1=\{\theta\in S_1\}\end{aligned}$
では、 $\mathrm{Bayes\ Factor}$ $B_{01}(\mathcal{D})$ を
$\begin{aligned}B_{01}(\mathcal{D}):=\displaystyle{\frac{\displaystyle{\int_{S_0}p(\theta|\mathcal{D})d\theta}}{\displaystyle{\int_{S_1}p(\theta|\mathcal{D})d\theta}}} \div \displaystyle{\frac{\displaystyle{\int_{S_0}p(\theta)d\theta}}{\displaystyle{\int_{S_1}p(\theta)d\theta}}}\end{aligned}$
で定義し、この値を以て判断する。

3.　Bayes推定

3.6　Bayes推定の実例：Bernoulli試行

　企業の信用力を評価すべく、納期に遅れる確率を考えることとし、納期に遅れる確率を $\pi$ と書くことにする。

3.6.1　逐次的にデータを活用した場合

　ある企業との取引について、 $i$ 番目の取引について納期通りに納品された場合には $0$ 、納期に遅れた場合には $1$ を取るような確率変数　

$\begin{aligned} X_i=\left\{ \begin{array}{ll} 1,\ \ 納期に遅れる,\\ 0,\ \ 納期を守る \end{array} \right. \end{aligned}$

と定義する。
　このとき、 $X_i$ の確率分布は

$\begin{aligned} P\{X_i=x_i\}&=\left\{ \begin{array}{ll} \pi,\ \ &x_i=1,\\ 1-\pi,\ \ &x_i=0, \end{array} \right.\\ p(x_i|\pi)&=\pi^{x_i}(1-\pi)^{1-x_i},\ \ x_i=0,1 \end{aligned}$

である。
　このとき、納期の遅れのパターンからある企業 $X$ 社が納期に遅れる確率を $\mathrm{Bayes}$ 統計学に基づいて推測する。
　まず $X$ 社が1回目の納期に遅れたときの納期に遅れる確率 $\pi$ の推測を考える。 $\mathrm{Bayes}$ の定理より

$\begin{aligned} p(\pi|x_1)=\displaystyle{\frac{p(x_1|\pi)p(\pi)}{\displaystyle{\int_{0}^{1}p(x_1|\pi)p(\pi)d\pi }}}\propto p(x_1|\pi)p(\pi) \end{aligned}$

が成り立つ。
　 $\pi$ について無情報事前分布を用いることとすれば、

$\begin{aligned} p(\pi)=\begin{cases} 1,\ 0\lt\pi\lt1,\\ 0,\ \pi\leq0,1\leq\pi \end{cases} \end{aligned}$

であり、また $x_i=1$ であるとき、

$\begin{aligned} p(\pi|x_1=1)\propto \pi^{x_i}(1-\pi)^{1-x_i}=\pi \end{aligned}$

である。更に

$\begin{aligned} \displaystyle{\int_{0}^{1}\pi d\pi}=\left[\displaystyle{\frac{1}{2}}\pi^2\right]^{1}_{0}=\displaystyle{\frac{1}{2}} \end{aligned}$

に注意すれば、

$\begin{aligned} p(\pi|x_1=1)=\begin{cases} 2\pi,&\ 0\lt\pi\lt1,\\ 0,&\pi\lt0,1\lt\pi \end{cases} \end{aligned}$

が成り立つ。
　次に2回目の納期にも遅れたと仮定する。このとき $\pi$ の事後分布のカーネルは $\mathrm{Bayes}$ の定理から

$\begin{aligned} p(\pi|x_1=1,x_2=1)&\propto p(x_2=1|\pi)p(\pi|x_1=1),\\ &\propto \pi^1 (1-\pi)^{1-1}2\pi,\\ &\propto \pi^2 \end{aligned}$

である。また $\displaystyle{\int_{0}^{1}}\pi^2 d\pi=\displaystyle{\frac{1}{3}}$ であるから、

$\begin{aligned} p(\pi|x_1=1,x_2=1)=3\pi^2\boldsymbol{1}_{(0,1)}(\pi) \end{aligned}$

以上、帰納的に $\mathrm{Bayes}$ の定理を適用することで $n$ 回続けて納期を遅れたという状況での $\pi$ の事後分布は

$\begin{aligned} p(\pi|x_1=1,\cdots,x_n=1)=(n+1)\pi^{n}\boldsymbol{1}_{(0,1)}(\pi) \end{aligned}$

である。

3.6.2　一括でデータを活用した場合

　前節では逐次的にデータを用いて事後分布を導出したが、次に過去の納期に関する情報をまとめて使って $\pi$ の事後分布を導出する。
　たとえば過去5回の納期に関するデータとして

$\begin{aligned} \mathcal{D}=(x_1,x_2,x_3,x_4,x_5)=(1,0,1,0,0) \end{aligned}$

を得たとしよう。
　 $X_i,i=1,2,3,4,5$ は互いに独立であるから、

$\begin{aligned} p(\mathcal{D}|\pi)=\displaystyle{\prod_{i=1}^{5}{p(x_i|\pi)}}=\displaystyle{\prod_{i=1}^{5}\pi^{x_i}(1-\pi)^{1-{x_i}}}=\displaystyle{\pi^{\sum_{i=1}^{5}x_i}(1-\pi)^{5-\sum_{i=1}^{5}x_i}} \end{aligned}$

であり、具体的に $\mathcal{D}$ が与えられていることから

$\begin{aligned} p(\mathcal{D}|\pi)=\displaystyle{\prod_{i=1}^{5}\pi^{2}(1-\pi)^{3}} \end{aligned}$

である。 $\pi$ の関数と見るとき、これを尤度という。
　 $\mathrm{Bayes}$ の定理から

$\begin{aligned} p(\pi|\mathcal{D})=\displaystyle{\frac{\displaystyle{p(\mathcal{D}|\pi)p(\pi)}}{\displaystyle{\int_{0}^{1}p(\mathcal{D}|\pi)p(\pi)d\pi}}}\propto p(\mathcal{D}|\pi)p(\pi)\propto \pi^2(1-\pi)^3 \end{aligned}$

が成り立つ。
　一般の場合として納期を $n$ 回の場合を考える。

$\begin{aligned} p(\mathcal{D}|\pi)=\displaystyle{\pi^{\sum_{i=1}^{n}x_i}(1-\pi)^{n-\sum_{i=1}^{n}x_i}} \end{aligned}$

ここで $y_n=\displaystyle{\sum_{i=1}^{n}x_i}$ と定義すると

$\begin{aligned} p(\mathcal{D}|\pi)=\displaystyle{\pi^{y_n}(1-\pi)^{n-y_n}} \end{aligned}$

である。
　 $\mathrm{Bayes}$ の定理より

$\begin{aligned} p(\pi|\mathcal{D})=\displaystyle{\frac{\pi^{y_n}(1-\pi)^{n-y_n}}{B(y_n+1,n-y_n+1)}}\boldsymbol{1}_{(0,1)}(\pi) \end{aligned}$

が成り立つ。ここで $B(\cdot,\cdot)$ はベータ関数である。これはベータ分布である。すなわち

$\begin{aligned} \pi|\mathcal{D}\sim Be(y_n+1,n-y_n+1) \end{aligned}$

である。

3.7　信用集合

　信用集合( $\mathrm{credible\ set}$ )は事後分布の不確実性を伝える有効な手法の1つである。
　母数空間を $\Theta$ とする。実数 $0\lt\alpha\lt1$ に対して

$\begin{aligned} \displaystyle{\int_{C}f(\theta|x)d\theta}\geq\alpha \end{aligned}$

を満たすような $C\subset\Theta$ を確率 $\alpha$ の信用集合(確信集合)という。特に $C$ が区間であれば、信用区間・確信区間( $\mathrm{credible\ interval}$ )という。
　しかし定義からでは、信用集合は一意に決まらない。たとえば

$\begin{aligned} P\left\{\theta\lt a_{\alpha}|\boldsymbol{x}\right\}&=\displaystyle{\frac{\alpha}{2}},\\ P\left\{\theta\gt b_{\alpha}|\boldsymbol{x}\right\}&=\displaystyle{\frac{\alpha}{2}} \end{aligned}$

を満たすような実数 $a_{\alpha},b_{\alpha}$ を選び、 $C=\left[a_{\alpha},b_{\alpha}\right]$ とする等裾事後信用区間( $\mathrm{equal}$ - $\mathrm{tailed}$ $\mathrm{posterior}$ $\mathrm{credible}$ $\mathrm{intervals}$ )や、その区間内の確率密度が必ず区間外の密度よりも大きくなる最高事後密度区間( $\mathrm{HPD}$ 区間)、すなわち

$\begin{aligned} \kappa_{\alpha}\gt0\ \mathrm{s.t.}\ F\left(\theta\in\left\{\theta|f(\theta|\boldsymbol{x})\geq\kappa_{\alpha}\right\}|\boldsymbol{x}\right)\geq1-\alpha \end{aligned}$

に対して

$\begin{aligned} S_{\alpha}&=\left\{\theta\left|\right.f(\theta|\boldsymbol{x})\geq\kappa_{\alpha}\right\} \end{aligned}$

を満たすような区間 $S_{\alpha}$ を採用することが多い。 $\mathrm{HPD}$ 区間は母数の真の値がそこに含まれる確率密度が低いような区間を含まなくなることから、母数の真の値である蓋然性がより高い区間を含み得るという点で、等裾事後信用区間よりも優れていると言える。

　伝統的な統計学でいう信頼区間に対応するのが等裾事後信用区間である。信頼区間は、ある分析において何度も推定値および信頼区間を計算したときに、推定値がその信頼区間に収まる頻度が $100(1-\alpha)\%$ であることを意味する。これに対して、信用区間は想定した確率モデルの母数がその区間に含まれる確率が $100(1-\alpha)\%$ であることを意味する。

3.8　Bayes統計学における仮説検定

　 $\mathrm{Bayes}$ 統計学における仮説検定 $H_0=\{\theta\in S_0\}$ は、パラメータの事後分布 $p(\theta|\mathcal{D})$ において仮説 $H_i$ が成り立つ事後確率

$\begin{aligned} p_i=P_{p(\theta|\mathcal{D})}\{\theta\in S_i\}=\displaystyle{\int_{S_i}p(\theta|\mathcal{D})d\theta} \end{aligned}$

を評価する。その仮説 $H_i$ の事後確率が $1$ に近ければその仮説は正しいと評価してもよかろう。
　 $\mathrm{Bayes}$ 統計学における2つの仮説に対する仮説検定*1

$\begin{aligned} H_0=\{\theta\in S_0\}\ v.s.\ H_1=\{\theta\in S_1\} \end{aligned}$

では、 $\mathrm{Bayes\ Factor}$ $B_{01}(\mathcal{D})$ を

$\begin{aligned} B_{01}(\mathcal{D}):=\displaystyle{\frac{\displaystyle{\int_{S_0}p(\theta|\mathcal{D})d\theta}}{\displaystyle{\int_{S_1}p(\theta|\mathcal{D})d\theta}}} \div \displaystyle{\frac{\displaystyle{\int_{S_0}p(\theta)d\theta}}{\displaystyle{\int_{S_1}p(\theta)d\theta}}} \end{aligned}$

で定義する。最右辺の除数を事前オッズ比、被除数を事後オッズ比という。事前オッズ比も事後オッズ比も $\mathrm{Bayes\ Factor}$ が小さければ分母の対立仮説 $H_1$ が分子の帰無仮説 $H_0$ よりも正しい確率が大きいことを意味する。したがって

事前オッズ比は事前情報において対立仮説 $H_1$ が帰無仮説 $H_0$ に対してどれだけ蓋然性の高さで優位にあるかを意味する。
事後オッズ比は情報 $\mathcal{D}$ が加わったことを考慮した対立仮説 $H_1$ の帰無仮説 $H_0$ に対する蓋然性の高さの観点での優位性を意味する。

であり、 $\mathrm{Bayes\ Factor}$ はデータ $\mathcal{D}$ が加わったことで対立仮説 $H_1$ の帰無仮説 $H_0$ に対する蓋然性の高さの観点での優位性がどれだけ変わったかを意味する。
　とはいえ $\mathrm{Bayes\ Factor}$ は事前情報にも依存している。事後確率は

$\begin{aligned} \displaystyle{\int_{S_i}p(\theta|\mathcal{D})d\theta}=\displaystyle{\int_{S_i}\frac{p(\mathcal{D}|\theta)p(\theta)}{\displaystyle{\int p(\mathcal{D}|\theta)p(\theta)d\theta}}}d\theta=\displaystyle{\frac{\displaystyle{\int_{S_i}p(\mathcal{D}|\theta)p(\theta)d\theta}}{\displaystyle{\int p(\mathcal{D}|\theta)p(\theta)d\theta}}} \end{aligned}$

であるから、事後オッズ比について

$\begin{aligned} \displaystyle{\frac{\displaystyle{\int_{S_0}p(\theta|\mathcal{D})d\theta}}{\displaystyle{\int_{S_1}p(\theta|\mathcal{D})d\theta}}}=\displaystyle{\frac{\displaystyle{\int_{S_0}p(\mathcal{D}|\theta)p(\theta)d\theta}}{\displaystyle{\int_{S_1}p(\mathcal{D}|\theta)p(\theta)d\theta}}} \end{aligned}$

が成り立つ。したがって

$\begin{aligned} B_{01}(\mathcal{D})&=\displaystyle{\frac{\displaystyle{\int_{S_0}p(\mathcal{D}|\theta)p(\theta)d\theta}}{\displaystyle{\int_{S_1}p(\mathcal{D}|\theta)p(\theta)d\theta}}}\div \displaystyle{\frac{\displaystyle{\int_{S_0}p(\theta)d\theta}}{\displaystyle{\int_{S_1}p(\theta)d\theta}}}\\ &=\displaystyle{\frac{\displaystyle{\int_{S_0}p(\mathcal{D}|\theta)\left\{\displaystyle{\frac{p(\theta)}{\displaystyle{\int_{S_0}p(\theta)d\theta}}}\right\}d\theta}}{\displaystyle{\int_{S_1}p(\mathcal{D}|\theta)\left\{\displaystyle{\frac{p(\theta)}{\displaystyle{\int_{S_1}p(\theta)d\theta}}}\right\}d\theta}}}\\ &=\displaystyle{\frac{\displaystyle{\int_{S_0}p(\mathcal{D}|\theta)p(\theta|\theta\in S_0)d\theta}}{\displaystyle{\int_{S_1}p(\mathcal{D}|\theta)p(\theta|\theta\in S_1)d\theta}}} \end{aligned}$

であり、このように $\mathrm{Bayes\ Factor}$ の評価には事前分布 $p(\theta)$ が加味されている。

3.9　将来予測と意思決定

　 $\mathrm{Bayes}$ 分析における予測の方法を考える。すなわちデータ $\mathcal{D}$ を観測して事後分布が分かっているとして、未観測の実現値を $\tilde{x}$ とすれば、その分布は未知の値である $\tilde{x}$ のデータ $\mathcal{D}$ が観測されたという条件の下での条件付確率分布

$\begin{aligned} p(\tilde{x}|\mathcal{D})=\displaystyle{\frac{p(\tilde{x},\mathcal{D})}{p(\mathcal{D})}} \end{aligned}$

を考え、これを予測分布という。ここで

$\begin{aligned} p(\tilde{x},\mathcal{D})=\displaystyle{\int p(\tilde{x},\mathcal{D}|\theta)p(\theta)}d\theta,\ p(\mathcal{D})=\displaystyle{\int p(\mathcal{D}|\theta)p(\theta)}d\theta \end{aligned}$

であるから、

$\begin{aligned} p(\tilde{x}|\mathcal{D})=\displaystyle{\frac{\displaystyle{\int p(\tilde{x},\mathcal{D}|\theta)p(\theta)}d\theta}{\displaystyle{\int p(\mathcal{D}|\theta)p(\theta)}d\theta}} \end{aligned}$

が成り立つ。
　実際には、将来データの予測値を得るべく上記の予測分布を用いたり、事前分布の観測データに対する適合度を得るべく周辺尤度

$\begin{aligned} P\left(\tilde{x}\right)=\displaystyle{\int p\left(\tilde{x}\left|\right.\theta\right)f(\theta)d}\theta \end{aligned}$

を利用したりする。

3.10　Pythonでの実装例

　中妻(2019)　PP. 37-40を基にした。

##################
### ベイズ推論 ###
##################

import numpy as np
import scipy.stats as st
import scipy.optimize as opt
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
import sys

FontPath = r'C:\\Windows\Fonts\meiryo.ttc'

jpfont = FontProperties(fname = FontPath)

###
### Bernoulli分布のHPD区間の計算

def beta_hpdi(ci0, alpha, beta, prob):
    def hpdi_conditions(v, a, b, p):
        eq1 = st.beta.cdf(v[1], a, b) - st.beta.cdf(v[0], a, b) - p
        eq2 = st.beta.cdf(v[1], a, b) - st.beta.cdf(v[0], a, b)
        return np.hstack((eq1, eq2))
    return opt.root(hpdi_conditions, ci0, args = (alpha, beta, prob)).x
def bernoulli_stats(data, a0, b0, prob):
    n = data.size
    sum_data = data.sum()
    a = sum_data + a0
    b = n - sum_data + b0
    mean_pi = st.beta.mean(a, b)
    median_pi = st.beta.median(a, b)
    mode_pi = (a - 1.0) / (a + b - 2.0)
    sd_pi = st.beta.std(a, b)
    ci_pi = st.beta.interval(prob, a, b)
    hpdi_pi = beta_hpdi(ci_pi, a, b, prob)
    stats = np.hstack((mean_pi, median_pi, mode_pi, sd_pi, ci_pi, hpdi_pi))
    stats = stats.reshape((1,8))
    stats_string = ['平均', '中央値','最頻値','標準偏差','信用区間(下限)','信用区間(上限)',
                    'HPD区間(下限)', 'HPD区間(上限)']
    param_string = ['成功確率']
    results = pd.DataFrame(stats, index = param_string, columns = stats_string)
    return results, a, b

###

p = 0.25
n = 50
np.random.seed(99)
data = st.bernoulli.rvs(p, size = n)

a0 = 1.0
b0 = 1.0
prob = 0.95

results, a, b = bernoulli_stats(data, a0, b0, prob)
print(results)
print(results.to_string(float_format = '{:,.4f}'.format))

# 事後分布のグラフ
fig1 = plt.figure(num = 1, facecolor = 'w')
q = np.linspace(0, 1, 250)
plt.plot(q, st.beta.pdf(q, a, b), 'k-', label = '事後分布')
plt.plot(q, st.beta.pdf(q, a0, b0), 'k:', label = '事前分布')
plt.xlim(0, 1)
plt.ylim(0, 7)
plt.xlabel('成功確率', fontproperties = jpfont)
plt.ylabel('確率密度', fontproperties = jpfont)
plt.legend(loc = 'best', frameon = False, prop = jpfont)
plt.show()

# 事前分布とデータの累積が事後分布の形状に与える影響の可視化
np.random.seed(99)
data = st.bernoulli.rvs(p, size = 250)
value_size = np.array([10, 50, 250])
value_a0 = np.array([1.0, 6.0])

value_b0 = np.array([1.0, 4.0])
styles = [':', '-.', '--', '-']
fig2, ax2 = plt.subplots(1, 2, sharey = 'all', sharex = 'all', num = 2, figsize = (8, 4), facecolor = 'w')

ax2[0].set_xlim(0, 1)
ax2[0].set_ylim(0, 15.5)
ax2[0].set_ylabel('確率密度', fontproperties = jpfont)

for index in range(2):
    style_index = 0
    a0_i = value_a0[index]
    b0_i = value_b0[index]
    ax2[index].plot(q, st.beta.pdf(q, a0_i, b0_i), color = 'k',
                    linestyle = styles[style_index],
                    label = '事前分布 Beta({0: < 3.1f}, {1:<3.1f})'.format(a0_i, b0_i))
    for n_j in value_size:
        style_index += 1
        sum_data = np.sum(data[:n_j])
        a_j = sum_data + a0_i
        b_j = n_j - sum_data + b0_i
        ax2[index].plot(q, st.beta.pdf(q, a_j, b_j), color = 'k',
                        linestyle = styles[style_index],
                        label = '事後分布 ( n = {0:3d} )'.format(n_j))
    ax2[index].set_xlabel('成功確率', fontproperties = jpfont)
    ax2[index].legend(loc = 'best', frameon = False, prop = jpfont)
plt.tight_layout()
plt.show()

参考文献

安道知寛(2010)「ベイズ統計モデリング」(朝倉書店)
鎌谷研吾・著　駒木文保・編(2020)「モンテカルロ統計計算」(講談社サイエンティフィック)
豊田秀樹・編著(2015)「基礎からのベイズ統計学」(朝倉書店)
中妻照雄(2007)「入門　ベイズ統計学」(朝倉書店)
中妻照雄(2013)「実践　ベイズ統計学」(朝倉書店)
中妻照雄(2019)「実践 Pythonライブラリー Pythonによるベイズ統計学入門」(朝倉書店)
馬場真哉(2019)「RとStanではじめるベイズ統計モデリングによるデータ分析入門」(講談社サイエンティフィック)
松浦健太郎(2016)「StanとRでベイズ統計モデリング」(共立出版)
渡辺澄夫(2012)「ベイズ統計の理論と方法」(コロナ社)
Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin (2014) "Bayesian Data Analysis", CRC Press

*1: $\ S_0\cup S_1=\emptyset$ と仮定する。

はじめに

前回

今回のまとめ

3. Bayes推定

3.6 Bayes推定の実例：Bernoulli試行

3.6.1 逐次的にデータを活用した場合

3.6.2 一括でデータを活用した場合

3.7 信用集合

3.8 Bayes統計学における仮説検定

3.9 将来予測と意思決定

3.10 Pythonでの実装例