Monty Hall Paradox

Bayesian and Frequentist Solutions to Monty Hall's Problem

When defining probability, there are generally two schools of thought: Bayesian and frequentist. The former regards probability as our degree of belief that an event will occur, while the latter regards it as the relative frequency with which an event occurs. This post presents the use of Bayesian and frequentist methods to solve the famous Monty Hall problem.

alt

Illustration of Monty Hall Problem

Monty Hall Problem

The first time I stumbled upon this probability puzzle was in the movie 21. This puzzle originated from an old American game show, Let's Make a Deal, and was named after its host, Monty Hall.

Based on Wikipedia, this is well known from a 1990 letter to Marilyn vos Savant's "Ask Marilyn" column:

Say you're on a game show and you have a choice of three doors: behind one door is a car; behind the others, a goat. You pick a door, say number 1, the host, who knows what's behind the door, opens another door, say number 3, and there's a goat inside. Then he says to you: "Do you want to choose door 2? Would it be beneficial for you to switch options?

Marilyn suggests that you should stick with the switch, as it increases your chances of winning the car from 1/3 to 2/3. This answer has been hugely criticized because people intuitively tend to think that since there are only two doors left, then the probability of a car behind two doors is 1/2. So no matter which door you choose to switch, the odds are the same.

In this blog post, I will demonstrate how to answer this paradox using a Bayesian or frequentist approach.

think like a bayesian

Bayesianism takes a subjective approach to probability and is based on Bayes' theorem. It is basically a concept of conditional probability defined by the following formula.

In short, the outcome probability, also known as the posterior probability, is derived using three components: the likelihood, the prior probability, and the probability (a value between 0 and 1) of observing the evidence.

alt

Use Bayes' theorem to explain why different people value similar empirical evidence differently.

Bayes theorem

In the context of the Monty Hall problem, we are interested in comparing the probabilities between switching or sticking to our originally chosen door, given that the host chooses to open a door with a goat behind it.

Using the example question posed above, assuming door 3 has been revealed as a goat, the car will be behind door 1 (our initial choice) or door 2 (if we choose to switch).

alt

To fix this, let's apply Bayes' theorem formula and compare

P(door 1 = car | open = door 3)

P(门2 = 车| 打开=门3)

贝叶斯定理在蒙蒂霍尔问题中的应用

先验概率 P(A)

让我们从最简单的计算概率开始,先验概率。这是指在游戏开始时,在门 1 打开之前,汽车在门 2 和门 3 后面的初始预期概率。

假设汽车是随机分配的,每扇门都有平等的机会让汽车落后。因此,P(门1 = 汽车)和 P(门 2 = 车)的概率相等为 1/3。

可能性 P(B|A)

接下来,我们将计算主持人为各自的假设打开门 3 的可能性。

第1个假设,如果汽车在门 1 后面,主持人可以打开门 2 或门 3 以露出一只山羊。因此:

P(打开 = 门 3 | 门 1 = 汽车) = 1/2

另一方面,如果汽车在门 2 后面,主持人别无选择,只能打开门 3,因为它是唯一的一扇门后有山羊的门。因此,对于第二个假设,

P(打开 = 门 3 | 门 2 = 汽车) = 1

联合概率 P(B|A) X P(A)

在已知可能性和先验概率值的情况下,我们可以计算出两个假设的公式的分子值。

P(打开 = 门 3 | 门 1 = 汽车) X P(门 1 = 汽车) = 1/2 X 1/3 = 1/6

P(打开 = 门 3 | 门 2 = 汽车) X P(门 1 = 汽车) = 1 X 1/3 = 1/3

观察到的证据概率 P(B)

我们可以通过简单地将联合概率相加来得出观察到的证据的概率。此值表示主持人在参赛者选择门 3 的情况下选择打开门 1 的概率。

P(打开 = 门 3) = 1/6 + 1/3 = 1/2

后验概率 P(A|B)

最后,我们可以通过将上述所有派生值输入公式来求解后验概率。

P(门 1 = 汽车 | 打开 = 门 3) = (1/6) / (1/2) = 1/3

P(门 2 = 汽车 | 打开 = 门 3) = (1/3) / (1/2) = 2/3

后验概率值告诉我们,主持人在门 1 后面有山羊,坚持选择门 3,那么赢得汽车的几率是 1/3。相反,如果我们选择接受切换到门 2 的提议,我们的机会就会翻倍到 2/3。

因此,贝叶斯方法支持Marilyn vos Savant的建议,即如果可以选择,请始终进行切换。

像频率主义者一样思考

与贝叶斯方法相反,顾名思义,频率主义者根据采样频率来确定概率。例如,如果我们想知道在抛硬币中获得“正面”的概率,我们可以掷硬币进行x次试验,并根据“正面”发生的频率分布确定概率。

正如伯努利大数定律所声称的那样,事件发生的长期频率将收敛于其理论概率。因此,为了解决Monty Hall问题,我们可能会在大量的试验中运行游戏节目谜题的模拟,并比较坚持我们最初选择的决定和改变选择的决定之间的赢得汽车的频率。

对于这个项目,我使用python编程语言执行了蒙特卡罗模拟。蒙蒂霍尔游戏被模拟了相当多的重复,并记录了各自策略的获胜几率。模拟唯一需要的包是随机包。首先,我创建了三个空列表来存储每个示例模拟的结果。

初始化空的数组存放模拟过程中的数

chance_of_winning_ifswitch_list = []
#放弃最初选择,改选另外的门

chance_of_winning_ifdonotswitch_list = []
#坚持最初选择的门

percentage_diff_list = []
相差的百分比

接下来,使用“for-loops”设置模拟。

在每个模拟循环中,汽车和山羊的位置被随机分配,并选择随机门作为参赛者的初始选择。鉴于游戏主持人总是会用山羊打开另一扇门,只有当最初选择的门碰巧后面有汽车奖品时,切换的决定才会适得其反。

换句话说,如果选择不切换选择,获胜的概率相当于当初选择带车门的概率。因此,包含一个“if-else”条件语句来检查最初选择的门是否有汽车。

如果所选门后面没有汽车,我们分配一个值 1 来表示选择切换选项时的获胜事件。如果所选门后面有汽车,我们分配一个值 0 来表示选择不切换选择时的获胜事件。

为了确定每种策略的获胜频率,获胜的几率是根据Monty Hall游戏的1,000个模拟回合的样本计算的。

此抽样重复 10,000 次,以确定每种策略的获胜百分比分布。设置随机种子值以确保结果的可重复性。

# Repeat 10000 trials with 1000 samples per trial

for i in range(10000):    
    results_list = []    

    for i in range(1000):        
        door_list = ["car""goat""goat"]                 
        random.shuffle(door_list)        
        chosen_door_number = random.sample(range(3), 1)        
        chosen_door = door_list[chosen_door_number[0]]        
        
        if chosen_door != "car":            
            results_list.append(1)        
        else:
            results_list.append(0)
    
    # Compute winning percentage if choose to switch  
    chance_of_winning_ifswitch = sum(results_list)/len(results_list)*100              
    
    # Compute winning percentage if choose not to switch
    chance_of_winning_ifdonotswitch = 100 - chance_of_winning_ifswitch                    
    
    # Compute difference in winning percentage between the two strategies             
    percentage_diff = chance_of_winning_ifswitch - chance_of_winning_ifdonotswitch  
             
    # Append the results to respective lists                
    chance_of_winning_ifswitch_list.append(chance_of_winning_ifswitch)                             
    chance_of_winning_ifdonotswitch_list.append(chance_of_winning_ifdonotswitch)    
    percentage_diff_list.append(percentage_diff)

下图说明了基于我们的模拟结果的两种策略的获胜赔率分布。蓝色直方图表示策略切换的分布,而黄色直方图表示策略坚持初始选择的分布。

alt

According to the central limit theorem, when the sample size is large, the distribution of the sample mean will approximate a Gaussian distribution or a normal distribution. As expected from our extensive simulations, we observed a bell-shaped curve that characterizes a normal distribution.

The x-axes of the two histograms clearly show the difference in winning odds between the two strategies.

If you choose to switch, there is a 62-72% chance of driving home, and if you choose not to switch, there is a 28-38% chance.

This is further supported by the 95% confidence intervals calculated for the two strategies. Within 95,000% of the 10 trial sample winning odds, if you switch, the frequency of winning is between 66.646% and 66.705%, while sticking to the same gate choice will be between 33.295% and 33.354%.

Since the 95% confidence intervals do not overlap, we can infer that there is a statistically significant difference in the win rates between the two strategies.

In fact, as shown in the third histogram in orange, we are 95% confident that choosing the switch would increase our odds of winning by 33.293 to 33.410%. Thus, the frequentist approach also supports Marilyn Vos Savant's advice to always switch if you have a choice.

in conclusion

This blog post demonstrates the use of Bayesian and frequentist approaches to solve the Monty Hall problem. With the examples above, we can see different ways of deriving probability values.

Although the two methods are different, the derived probabilities are similar to each other in the context of the Monty Hall problem. The results clearly show that whether you subscribe to the Bayesian or the frequentist paradigm, it is always a smarter choice to change your options to increase your chances of winning the car.

Simulation, analysis, and visualization are all performed using Python.

# Import required packages

import random
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import pandas as pd

####----Create Monty Hall Game Simulation----####

# Create empty lists to store simulation results output
chance_of_winning_ifswitch_list = []
chance_of_winning_ifdonotswitch_list = []
percentage_diff_list = []

# Set a seed value for reproducibility
random.seed(1234)

# Create simulation using for-loops
# Repeat 10000 trials with 1000 samples per trial
for i in range(10000):
    results_list = []
    
    for i in range(1000):
        door_list = ["car""goat""goat"]
        random.shuffle(door_list)
    
        chosen_door_number = random.sample(range(3), 1)
        chosen_door = door_list[chosen_door_number[0]]
    
        if chosen_door != "car":
            results_list.append(1)
        else:
            results_list.append(0)
    
  
    chance_of_winning_ifswitch = sum(results_list)/len(results_list)*100              # Compute winning percentage if choose to switch
    chance_of_winning_ifdonotswitch = 100 - chance_of_winning_ifswitch                # Compute winning percentage if choose not to switch
    percentage_diff = chance_of_winning_ifswitch - chance_of_winning_ifdonotswitch    # Compute difference in winning percentage between the two strategies
    
    # Append the results to respective lists
    chance_of_winning_ifswitch_list.append(chance_of_winning_ifswitch)                
    chance_of_winning_ifdonotswitch_list.append(chance_of_winning_ifdonotswitch)
    percentage_diff_list.append(percentage_diff)


####----Analysis of the simulation results----####

# Create a function to calculate 95% confidence intervals
def mean_confidence_interval(data, confidence=0.95):
    a = 1.0 * np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)
    return m, m-h, m+h

# Compute 95% confidence intervals of chance of winning if choose to switch
mean_confidence_interval(chance_of_winning_ifswitch_list)

# Compute 95% confidence intervals of chance of winning if choose not to switch
mean_confidence_interval(chance_of_winning_ifdonotswitch_list)

# Compute 95% confidence intervals of difference in chances of winning between the two strategies
mean_confidence_interval(percentage_diff_list)

####----Simple visualisation of the simulation results----####

fig = plt.figure(figsize = (15,8))
ax1, ax2, ax3 = fig.subplots(3, 1)

# Title and caption of plot
fig.suptitle("Comparison of winning odds between strategies", fontsize = 20, fontweight = "bold")
fig.text(.99, .010, "Visualisation:@nxrunning", ha='right', fontsize = 15)

# Subplot number 1
ax1.set_title("Distribution of winning percentage if choose not to switch", loc = "left", fontweight = "bold")
ax1.hist(chance_of_winning_ifswitch_list, alpha = 1, color = "#023047", edgecolor = "white", bins = (30))
ax1.set_ylabel("Frequency")
ax1.set_xlabel("Odds of winning (%)", fontsize = 13)
ax1.annotate("95% Confidence Interval:\n[66.646, 66.705]", xy = (70, 500), fontsize = 13)

# Subplot number 2
ax2.set_title("Distribution of winning percentage if choose not to switch", loc = "left", fontweight = "bold")
ax2.hist(chance_of_winning_ifdonotswitch_list, alpha = 1, color = "#ffb703", edgecolor = "white", bins = (30))
ax2.set_xlabel("Odds of winning (%)", fontsize = 13)
ax2.set_ylabel("Frequency")
ax2.annotate("95% Confidence Interval:\n[33.295, 33.354]", xy = (36.7, 500), fontsize = 13)

# Subplot number 3
ax3.set_title("Distribution of differences in odds of winning between the two strategies", loc = "left", fontweight = "bold")
ax3.hist(percentage_diff_list, alpha = 1, color = "#fb8500", edgecolor = "white", bins = (30))
ax3.set_ylabel("Frequency")
ax3.set_xlabel("Odds of winning (%)", fontsize = 13)
ax3.annotate("95% Confidence Interval:\n[33.293, 33.410]", xy = (40, 500), fontsize = 13)
fig.tight_layout()

# Removing top and right borders
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax3.spines['top'].set_visible(False)
ax3.spines['right'].set_visible(False)

# Adjust space between title and subplots
plt.subplots_adjust(top=0.90)

# Save plot
fig.savefig('Montyhallproblemsimulation.png', dpi=500)

The full code can be found here.

This article is published by mdnice multi-platform

Guess you like

Origin blog.csdn.net/qq_40523298/article/details/130451929