Iterative Prisoner’s Dilemma

The Prisoner’s Dilemma is a hypothetical game theory situation.

Two criminals are caught by the authorities and questioned separately. They each must choose whether to snitch on their partner in crime. Each person may either choose to collude or betray.

If both partners choose to collude, that means both refuse to talk,  and both are released with no penalties.

If one partner betrays and the other colludes, the betrayer receives a reward, while the colluding partner is severely punished.

If both partners attempt to betray the other, both partners are punished.

When it’s setup as a game, point values are assigned for each possible outcome. In the Computer Science Principles version, the point values are as follows:

Prisoner’s Dilemma
possible outcomes
Your
move
Opp’s
move
Your
score
Opp’s
score
Both players collude ‘c’ ‘c’ 0 0
Both players betray ‘b’ ‘b’ -225 -225
 You betray opponent ‘b’ ‘c’  +50 -250
Opponent betrays you  ‘c’ ‘b’  -250 +50

Iterative Prisoner’s Dilemma

The Prisoner’s Dilemma game is pretty boring when you only play the game once, because the correct move is always to betray your opponent. Whether your opponent betrays or colludes, you always get a better score if you betray than if you collude.

When you have to play the game many times in a row with the ability to see one or more rounds of game history, things are much different. Now your opponent is likely to betray you in retaliation if you have betrayed in the previous round. Strategies that collude with each other can receive better average scores than strategies that betray each other repeatedly.

Iterative Prisoner’s Dilemma is more relevant to real life as a human being or an animal in the wild. In either case, individuals are very good at remembering who has betrayed them in the past, and you can bet that this history influences their decisions when dealing with other individuals who have betrayed or colluded with them previously.

In an Iterative Prisoner’s Dilemma tournament, each player submits a program with an algorithm that decides a move (collude or betray) for each round, given the history of the game so far. In some versions, the algorithm can only view the results of the previous one round. In others, the algorithm is allowed to use the entire history of the game up to the current round.

Strategies

Game theorists and programmers have been playing Iterative Prisoner’s Dilemma tournaments for decades. People have applied interesting math and evolving algorithms to the program with some interesting results. Some well known strategies have emerged. Feel free to do an internet search for more detail, but here are the two strategies that you should probably focus on.

“Tit for TatStrategy

In this strategy, you do the move that your opponent did to you in the previous round. If the opponent cooperated, you cooperate. If they betrayed you, you betray them in the next round.

“Win-Stay, Lose-Switch” Strategy

This strategy defines a “win” as either successful collusion or successful betrayal of the opponent. A “loss” is when the opponent successfully betrays you or both players betray each other. “Win-Stay” means keep doing the same thing you did last round if your last result was a win. If you successfully betrayed the opponent, keep betraying. If both players colluded, continue colluding. If the last round was a loss, switch to the opposite strategy. If you were successfully betrayed last round, betray your opponent. If both players betrayed each other, that isn’t working, so try colluding next round instead.

Variants of “Tit for Tat” and “Win-Stay, Lose-Switch”

First Turn Moves: Any strategy must decide if its first move will be to betray or collude. This decision is made with no information on previous round results, so it sets the tone for the strategy: it starts each game on a greedy/aggressive note or an altruistic note.

The first turn can also be randomly decided so that it colludes and betrays according to percentage probabilities.

Random Deviations from Strategy:

“Forgiving” or “Generous” strategies will randomly collude some percentage of the time in situations when they would normally betray in retaliation. This is advantageous since it helps the strategy avoid going through many consecutive rounds of mutual betrayal and retaliation, which is a problem that reduces your average game score.

“Sneaky” strategies will randomly betray some of the time in situations when they would normally collude. This is a way to steal some extra points. It will tend to cause retaliation from other strategies, but it can work if the other strategies are particularly forgiving.

Successful Strategies / Metagame

There is no single best program for Iterative Prisoner’s Dilemma. It depends on what sort of mix of programs are brought into a tournament. In general, the forgiving variants of Tit for Tat tend to be strong. This is the case whether a computer is set to let many strategies evolve and compete with each other over time or if humans write programs and develop them over the course of multiple competitions.

Bad Strategies

Programs that don’t retaliate in response to betrayal are bad, because they are vulnerable to devastating cycles of repeated betrayal by strategies like aggressive “Win-Stay, Lose Switch.”

Non-retaliating strategies include “always collude” and, ironically, “always betray.” Since those strategies do the same thing no matter what, they provide no incentive for the opponent to collude with you.

Purely random strategies and strategies that follow an unchanging pattern are also non-retaliating. If an opponent has a strategy that can recognize and exploit this type of strategy, their best move is to betray every time, since there is no retaliation penalty for doing so. Why collude if the opponent isn’t adjusting their moves in response to the game history?