
The number of resources gathered in a game is influenced both by random chance as well as player strategy. Game play productivity includes measures of the number of resources a player gathers or steals from other players. This analysis of Settlers of Catan game play data demonstrates that game productivity is the best predictor for a player to win. However, the ‘game productivity’ metrics are the most important in determining a player’s points – productivity, cards from the robber, total resource gain, loss due to trading, loss due to the robber, and tribute (the last 6 columns). Most variables can be seen to have some effect on the number of points a player gets, particularly the number of 2’s or 12’s rolled. Reading across the second row, we can look for clusters to see if any other variables produce an effect on the number of points. In this case, we care about game points, the second column.

To read a K-means graph, you look for visual clusters along a particular variable. The second hexagon that players initially place their settlements by is also important (X2) as was the number of 8’s rolled (X8). Both robberCardsGain and robberCardsLoss also measure this. The most important variable is at the top, totalAvailable, which is a measure of the total number of resources available to the players in the game. The tree is produced with respect to some response variable, in this case the total number of points each player gets. Decision TreeĪ decision tree provides ranges of values for a variable that correspond to a given correlational ‘choice’. This shows good recovery of the underlying probability distribution. The boxplot shows the number of times each possible 2D6 roll occurred (in percentages). This was unexpected, but other analysis methods didn’t show a strong relationship. The numeric token on the first hexagon intersected by each starting settlement is negatively correlated with the other 2 hexagons. The pairwise correlation plot reveals that production related variables (lower 8) are positively correlated. Players are allowed 2 starting positions, and the first position is the players first choice chronologically.The dataset was obtained from Kaggle. Another popular assumption is that the starting position is key to winning the game. Players often accuse the relative probability of rolling a number as the determinant of their winning or losing (“luck”). The objective of this analysis is to determine the most likely determinants of winning the game. Points are obtained by spending resources. The objective of the game is to obtain ten points before any other player. If a hexagon has a number token equal to the die roll sum, any player intersecting the hexagon may collect that resource. When the die are rolled, the sum is used to determine which resources are collected. Due to the relative availability of high-probability hexagons, the starting position is most commonly believed to be the key indicator of game success. Players begin the game by placing two settlements onto nodes intersecting 3 hexagons. 2 and 12 have the lowest probability, and have 1 black circle. For example, 6 and 8 have the highest probability of being rolled when excluding 7. Each token has circular indicators that show the relative probability of that number being rolled. The player who rolled a 7 has the ability to place the robber, blocking other players from those resources and taking a resource from an adjacent player. In Catan, a nonplayer piece called the ‘robber’ is moved when a 7 is rolled. These tokens range from 2 to 12, with the exception of 7.

Each hexagon receives a random number token.

These hexagons are normally in a random configuration. The game consists of hexagons with one of four possible resources available. Settlers of Catan is a game that revolves around the probability distribution of two independent 6-sided die rolls. Games of chance are often people’s first exposure to statistics.
