Poker-playing AIs typically perform well against human opponents when the play is limited to just two players. Now Carnegie Mellon University and Facebook AI research scientists have raised the bar even further with an AI dubbed Pluribus, which took on 15 professional human players in six-player no-limit Texas Hold ’em and won. The researchers describe how they achieved this feat in a new paper in Science.
Playing more than 5,000 hands each time, five copies of the AI took on two top professional players: Chris “Jesus” Ferguson, six-time winner of World Series of Poker events, and Darren Elias, who currently holds the record for most World Poker Tour titles. Pluribus defeated them both. It did the same in a second experiment, in which Pluribus played five pros at a time, from a pool of 13 human players, for 10,000 hands.
Co-author Tuomas Sandholm of Carnegie Mellon University has been grappling with the unique challenges poker poses for AI for the last 16 years. No-Limit Texas Hold ’em is a so-called “imperfect information” game, since there are hidden cards (held by one’s opponents in the hand) and no restrictions on the size of the bet one can make. By contrast, with chess and Go, the status of the playing board and all the pieces are known by all the players. Poker players can (and do) bluff on occasion, so it’s also a game of misleading information.
Claudico begat Libratus
In 2015, Sandholm’s early version of a poker-playing AI, called Claudico, took on four professional players in heads-up Texas Hold ’em—where there are only two players in the hand—at a Brains vs. Artificial Intelligence tournament at the Rivers Casino in Pittsburgh. After 80,000 hands played over two weeks, Claudico didn’t quite meet the statistical threshold for declaring victory: the margin must be large enough that there is 99.98% certainty that the AI’s victory is not due to chance.
Sandholm et al. followed up in 2017 with another AI, dubbed Libratus. This time, rather than focusing on exploiting its opponents’ mistakes, the AI focused on improving its own play–apparently a more reliable approach. “We looked at fixing holes in our own strategy because it makes our own play safer and safer,” Sandholm told IEEE Spectrum at the time. “When you exploit opponents, you open yourself up to exploitation more and more.” The researchers also upped the number of games played to 120,000.
The AI prevailed, even though the four human players tried to conspire against it, coordinating on making strange bet sizes to confuse Libratus. As Ars’ Sam Machkovech wrote at the time, “Libratus emerged victorious after 120,000 combined hands of poker played against four human online-poker pros. Libratus’ $ 1.7 million margin of victory, combined with so many hands, clears the primary bar: victory with statistical significance.”
But Libratus was still playing against one other player in heads-up action. A far more challenging conundrum is playing poker with multiple players. So Pluribus builds on that earlier work with Libratus, with a few key innovations to allow it to come up with winning strategies in multiplayer games.
Sandholm and his former graduate student, Noam Brown—who is now working on his PhD with the Facebook Artificial Intelligence Research (FAIR) group—employed “action abstraction” and “information abstraction” approaches to reduce how many different actions the AI must consider when devising its strategy. Whenever Pluribus reaches a point in the game when it must act, it forms a subgame—a representation that provides a finer-grained abstraction of the real game, akin to a blueprint, according to Sandholm.
“It goes back a few actions and does a type of game theoretical reasoning,” he said. Each time, Pluribus must come up with four continuation strategies for each of the five human players via a new limited-lookahead search algorithm. This comes out to “four to the power of six million different continuation strategies overall,” per Sandholm.
Like Libratus, Pluribus does not use poker-specific algorithms; it simply learns the rules of this imperfect information game and then plays against itself to devise its own winning strategy. So Pluribus figured out on its own it was best to devise a mixed strategy of play and being unpredictable—the conventional wisdom among today’s top human players. “We didn’t even say, ‘The strategy should be randomized,'” said Sandholm. “The algorithm automatically figured out that it should be randomized, and in what way, and with what probabilities in what situations.”
Pluribus actually confirmed one bit of conventional poker-playing wisdom: it’s just not a good idea to “limp” into a hand, that is, calling the big blind rather than folding or raising. The exception, of course, is if you’re in the small blind, when mere calling costs you half as much as the other players. But while human players typically avoid so-called “donk betting“—in which a player ends one round with a call but starts the next round with a bet—Pluribus placed donk bets far more often than its human opponents.
So, “In some ways, Pluribus plays the same way as the humans,” said Sandholm. “In other ways, it plays completely Martian strategies.” Specifically, Pluribus makes unusual bet sizes and is better at randomization.
“Its major strength is its ability to use mixed strategies,” said Elias. “That’s the same thing that humans try to do. It’s a matter of execution for humans—to do this in a perfectly random way and to do so consistently. Most people just can’t.”
“These AIs have really shown there’s a whole additional depth to the game that humans haven’t understood.”
“It was incredibly fascinating getting to play against the poker bot and seeing some of the strategies it chose,” said Michael “Gags” Gagliano, another participating poker player. “There were several plays that humans simply are not making at all, especially relating to its bet sizing. Bots/AI are an important part in the evolution of poker, and it was amazing to have first-hand experience in this large step toward the future.”
This type of AI could be used to design drugs to take on antibiotic-resistant bacteria, for instance, or to improve cybersecurity or military robotic systems. Sandholm cites multi-party negotiation or pricing—such as Amazon, Walmart, and Target trying to come up with the most competitive pricing against each other—as a specific application. Optimal media spending for political campaigns is another example, as well as auction bidding strategies. Sandholm has already licensed much of the poker technology developed in his lab to two startups: Strategic Machine and Strategy Robot. The first startup is interested in gaming and other entertainment applications; Strategy Robot’s focus is on defense and intelligence applications.
Potential for fraud
When Libratus beat human players in 2017, there were concerns about whether poker could still be considered a skill-based game and whether online games in particular would soon be dominated by disguised bots. Some took heart in the fact that Libratus needed major supercomputer hardware to analyze its game play and figure out how to improve its play: 15 million core hours and 1,400 CPU cores during live play. But Pluribus needs much less processing capability, completing its blueprint strategy in eight days using just 12,400 core hours and 28 cores during live play.
So is this the death knell for skill-based poker? Well, the algorithm was so successful that the researchers have decided not to release its code. “It could be very dangerous for the poker community,” Brown told Technology Review.
Sandholm acknowledges the risk of sophisticated bots swarming online poker forums, but destroying poker was never his aim, and he still thinks it’s a game of skill. “I have come to love the game, because these AIs have really shown there’s a whole additional depth to the game that humans haven’t understood, even brilliant professional players who have played millions of hands,” he said. “So I’m hoping this will contribute to the excitement of poker as a recreational game.”
Listing image by Steve Grayson/WireImage/Getty Images