Back to Incite! Decision Technologies.

Or, One Way I Learned to Quit Believing My Prejudices

In my last article on this topic, I showed that considering background information can play a significant role in helping us make less biased judgments. What I hope to show now is that while we learn by updating the information we have through experience, limited experiences can often lead to prejudices about the way we interpret the world; but even broad and deep experience should rarely lead us to certain conclusions.

To get started, imagine playing a game in which someone asks you to infer the number of sides of a die based on the face numbers that show up in repeated throws of the die. The only information you are given beforehand is that the actual die will be selected from a set of seven die having these number of faces: (4, 6, 8, 10, 12, 15, 18). Assuming you can trust the person who reports the outcome on each throw, after how many rolls of the die will you be willing to specify which die was chosen?

Let’s use the R programming language to help us think through the problem. Start by specifying the set of the die possibilities such that each number represents the number of sides of a given die. (You might also want to refer to my previous article on Bayesian analysis to familiarize yourself with some of the terminology that follows.)

[Note: If you’re not into programming, you can skim past the crunchy geeky bits and still get the flavorful creamy nougat inside. But you have to practice self control to get the delayed satisfaction beneath that layer. You can do it!]

die <- c(4, 6, 8, 10, 12, 15, 18)
n.die <- length(die)

The game master selects a die to be in play but keeps the choice hidden from us. As I’ll explain in more detail later, the game master is the aggregate market or nature, and by “in play,” I mean that some condition that we currently care about in that environment is the case.

die.choice <- 12

Given that we are tasked to infer which die is chosen, and we don’t know any other information other than the number of die in the set (before we observe the outcome of any die toss), any initial guess is as good as the other. Furthermore, we can describe our inclination to believe one situation is the case to the exclusion of any others with a number between 0 and 1, where 0 means that we don’t believe at all, and 1 means that we certainly believe with no reservations. We’ll call this initial guess our prior probability (or our initial degree of belief) that any one of the dice is in play. What number should we assign for each of the possibilities? Well, since we know that any guess is as good as any other, that the possibilities are mutually exclusive, but that any one of the possibiities we’ve identified will certainly be the case, a little thought should tell us that 1/(number of dice) should be the prior probability that we assign to all the possibilities…at least initially. (Some really smart mathematicians described this as the principle of insufficient reason) As we proceed, we’ll discuss how to update these probabilities as we get more information as we aggregate the outcomes of the toss of the die. In R, we apply this probability across a vector with a size equal to the set of die possibilities.

initial.prob <- 1/n.die
initial.probs <- rep(initial.prob, n.die)
[1] 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571

Before we start the process of observing the die outcomes as they are tossed, create an index to keep track of those trials as they occur. Let’s start with 20 trials.

n.trials <- 20
trials <- 0:n.trials

Initialize the trials outcome vector. The first trial outcome (i.e., trial=0) will be NA because we don’t conduct a trial until trial=1.

trial.outcomes <- rep(0, n.trials+1)
trial.outcomes[1] <- NA

Establish the probability that any face of a given die occurs, assuming that each die is fair. For any die that could be in play, the probability that any of its sides occurs is 1/(number of sides). This is another application of the principle of insufficient reason.

outcome.prob <- 1/die

We will keep track of the three key probabilities in three separate arrays as the trials reveal their outcomes. We initialize each array here. The first two arrays are the same size with shape Array(m x n) = (number of trials) x (number of possibilities). The third is a vector of length (number of trials).

  1. This array corresponds to the probability typically denoted as prob(evidence | hypothesis) or our “likelihood” function. For our exercise, we interpret this as the probability that we observe a revealed outcome given that the choice of a certain die is in play. The first values at trial=0 are our prior probabilities.
prob.outcome.gv.choice <-
  array(rep(0, (n.trials+1) * n.die),
        dim = c((n.trials+1), n.die))
prob.outcome.gv.choice[1, ] <- initial.probs
colnames(prob.outcome.gv.choice) <- die
  1. This array corresponds to the probability typically denoted as prob(hypothesis | evidence) or our posterior probability. For our exercise, we interpret this as the degree of belief that a certain die is in play given the cumulative observations of outcomes.
prob.choice.gv.outcome <- prob.outcome.gv.choice
  1. This vector corresponds to the sum of probabilities of outcomes on each trial. In each trial, it’s the normalizing constant. The first value at trial=0 is 1 since it is the sum of the initial prior probabilities or (number of possibilities) * (1/(number of possibilities)).
marginal.prob <- sapply(trials, function(t) sum(prob.outcome.gv.choice[t+1, ]))

Before we start our simulated experiment, let’s think a bit more about what each of those three arrays mean. The first array tells us how likely it is that we should observe a given outcome if a die of a certain size is in play. If the die is four-sided, the likelihood that we would observe a 1, 2, 3, or 4 is 1/4; likewise, for a 12-sided die, any one of the faces would have a likelihood of 1/12. The second array keeps track of our degrees of belief that one of the die types was selected for play. The first set of values in this array will be our prior probabilities, which are all equal, but they will change as we observe the outcomes of the tosses. Hopefully, they will do so such that our confidence that one die type is the case moreso than the other possibilities. Finally, the third vector is the sum of the likelihoods for any outcome that occurs. For example, if a 9 appears on a toss, the likelihood that we would observe a 9 for any of the given generating possibilities is…

df.likelihood <- data.frame(die, ifelse(9>=die, 0, outcome.prob))
colnames(df.likelihood) <- c("Die Sides", "Likelihood Given 9")
df.likelihood %>%
  kable() %>%
  kable_styling(bootstrap_options = "striped",
                full_width = F) %>%
  column_spec(1, bold = T, border_right = T, width = "2.25cm") %>%
  column_spec(2, width = "3.75cm")
Die Sides Likelihood Given 9
4 0.0000000
6 0.0000000
8 0.0000000
10 0.1000000
12 0.0833333
15 0.0666667
18 0.0555556

The sum of those values in the right column would be r sum(df.likelihood$Likelihood) for the trial on which we might observe that 9.

Run the simulation by stepping across the trials index, starting at the second position (i.e., trials=1) by skipping the initial position (i.e., trials=0). The term trials[-1] is a reduced form of the trials index that removes the initial position. As we step through each trial, we will update our posterior probability as

posterior = previous_posterior * observed_outcome_likelihoods / sum(observed_outcome_likelihood)

for (t in trials[-1]) {
  # Generate a new outcome for each trial based on the die in play.
  trial.outcomes[t+1] <- round(runif(1, 1, die.choice))
  # Calculate the new prior probability for each possibility on each new trial.
    prob.outcome.gv.choice[t+1, ] <-
      (1 - (trial.outcomes[t+1] > die)) *
      prob.choice.gv.outcome[t, ] * outcome.prob
    # Calculate the marginal probability in each trial.
    marginal.prob[t+1] <- sum(prob.outcome.gv.choice[t+1, ])
    # Calculate the updated posterior for each possibility on each trial.
    prob.choice.gv.outcome[t+1, ] <- prob.outcome.gv.choice[t+1, ] /
df.trial.outcomes <- data.frame(trials, trial.outcomes)
colnames(df.trial.outcomes) <- c("trial", "outcome")
df.trial.outcomes %>%
  kable(caption = "Table of trials and the associated observed outcome on each.") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = F) %>%
  column_spec(1, bold = T, border_right = T) %>%
  column_spec(1:2, width = "3cm")
Table of trials and the associated observed outcome on each.
trial outcome
0 NA
1 6
2 5
3 5
4 6
5 4
6 4
7 1
8 5
9 9
10 3
11 10
12 10
13 4
14 6
15 5
16 11
17 4
18 4
19 9
20 3

Now, redefine the prob.choice.gv.outcome array as a data frame for plotting with ggplot2.

df.prob.choice.gv.outcome <-,
                                           colnames = die)

Transform the data frame df.prob.choice.gv.outcome into a data frame with relational structure such that trial is the left outermost column, die possibility is the next column, and the values are the final right column.

melt.prob.choice.gv.outcome <- melt(cbind(trials, df.prob.choice.gv.outcome),

Plot the probabilities of the die possibilities as they evolve across the trials.

plot.prob.trials <-
  ggplot(data = melt.prob.choice.gv.outcome,
                    aes(x = trials,
                        y = value,
                        group = variable,
                        colour = variable))
plot.prob.trials <- plot.prob.trials + geom_line()
plot.prob.trials <- plot.prob.trials + geom_point()
plot.prob.trials <- plot.prob.trials + theme(axis.text = element_text(size=14),
                                             axis.title = element_text(size=14,face="bold"),
                                             title = element_text(size=16,face="bold"))
plot.prob.trials <- plot.prob.trials + labs(title = "Probability That a Given Die is in Play",
                                            x = "Trials",
                                            y = "Probability"