The challenge of predicting the future lies behind a lot of human endeavors, from gambling to business decisions. In recent decades, the technology used to make predictions has become increasingly sophisticated. Where once technology was effectively used simply to crank through calculations, artificial intelligence and machine learning open up new possibilities.
Naturally, there is huge interest in this technology for those who want to predict the outcome of major sports contests – from baseball fans checking out the MLB odds to NBA executives planning a franchise’s acquisitions. But when it comes to individual tournaments, can prediction software really pick the winner?
As perhaps the biggest global sports event, the football World Cup presents an ideal opportunity for scientists and others working in this field to test their technology. Picking the winner of the World Cup, like predicting the World Series winner or the winner of the Olympic 100 metres, would be a very public way to demonstrate the effectiveness of your system or your software.
Attempts to predict the World Cup winner have produced some unlikely stories, such as that of Paul the Octopus, who correctly predicted the winners of 12 out of 14 games at the World Cup of 2010. Other animals to have been recruited for these efforts include donkeys, cats and penguins, but alongside these entertaining news stories, there have also been some serious predictive attempts.
Ahead of the 2018 World Cup, financial company Goldman Sachs employed a machine learning system to predict the winner. The software used 200,000 statistical models and a data mining operation that gathered huge volumes of team and player characteristics. Using this information, their software ran a million simulations of the tournament to establish the probability of each team going through to win.
At the same time, scientists at Dortmund’s Technical University combined statistical analysis and machine learning with a type of modeling known as the ‘random forest’ approach to model the outcome of each game and thereby predict the outcome of the tournament. This technique led to the prediction that Germany would win the tournament, while Goldman Sachs calculated that it would be Brazil.
In fact, it was neither. France and Croatia played out the final, with France winning. Germany failed to make it out of the group phase, while Brazil crashed out in the quarterfinals.
This highlights the problems involved in designing software to predict a specific single outcome. In the real world, even a single passage of play in one game of football, basketball or baseball is influenced by hundreds, if not thousands of variables, many of which cannot be predicted, or if prediction is possible, it is only possible to come up with an estimate.
Sports prediction software is increasingly effective when it comes to producing probabilistic outcomes. Putting a percentage chance on an outcome is well within the grasp of modern software, and that is why many sports betting companies and a handful of sports bettors are increasingly using this technology to help them weigh up the odds in sports betting markets. In the betting world, professionals focus on probabilities, not crude certainties, and technology can be effective here.
The use of predictive software can also have other benefits. For example, the research conducted in Dortmund found that team rankings and national GDP had an impact on a team’s chances at the World Cup, while other factors such as the size of the nation and the nationality of the coach had no impact.
A more fundamental problem with using software to make predictions is that such technology relies on what is known as ‘frequentist’ data, that is, on large samples of data that can be analyzed to reach logical probability conclusions. But this approach can fall short when data is limited. For instance, many experts predict that the desert climate, temperature and humidity of Qatar, where the 2022 World Cup will be staged, will have an impact on the outcome of the tournament.
Yet none of the World Cup tournaments to date have been held in similar conditions. This will also be the first modern World Cup staged in November, during the middle of the European and South American domestic seasons. Weighing up these factors becomes infinitely more difficult when there is little or no data for the software to go on. While AI is increasingly impressive in its sophistication, it doesn’t yet display the flexibility and capacity for lateral thinking that characterizes the human brain.
That’s why the most effective approaches to predicting individual sports tournament outcomes are those where prediction technology is integrated with expert human analysis.