Machines have a knack for beating us at our own games these days. As DeepMind’s AlphaGo routinely defeats humanity’s best in one of the world’s oldest strategy games, artificial intelligence is now turning its attention to more modern fare: video games.
Microsoft-owned AI Maluuba has just earned the highest possible score in the Atari 2600 port of the arcade classic, Ms. Pac-Man.
By maxing out its score at the highest possible 999,990 points, the AI has accomplished a gaming feat that won’t likely be met by humans for a while, if ever. (Our current record comes courtesy of one Wilson Oyama, at just 266,330 points.)
According to researchers on the project, the new high score demonstrates a success for Maluuba’s “Hybrid Reward Architecture”, which combines more classical reinforcement learning with a more novel “divide and conquer” method that uses multiple agents to assess the game and make decisions.
We have our top agent on it
Specifically, Maluuba’s utilizes more than 150 agents at once, tasked with monitoring a single aspect of the game — be it a pellet, one of the four enemy ghosts, the bonus fruit pickup, or Ms. Pac-Man’s position.
As it plays, these agents present the AI’s top agent — described by researchers as sort of a “senior manager” for a company — with feedback on which direction it should move Ms. Pac-Man. However, the top agent doesn’t just take a vote to make its calls — it weighs the importance of each suggestion.
For example, if over a hundred agents say “move left” to get a nearby pellet, but two or three say “go right” to avoid getting attacked by a ghost, the top agent knows it’s better to dodge the baddies than grab a quick point.
Outside of shaming humans at their now-inferior retro gaming ability, the team at Maluuba hope its Hybrid Reward Architecture can help machine learning improve, primarily with context-sensitive decision making.
Researches say one potential use for the divide-and-conquer method could be in sales, assigning agents to every potential client and directing the representative more efficiently to the ones more receptive to make a purchase at a given time.
It may be a while before we see Maluuba’s accomplishment’s in 1980s’ arcade games jump into our everyday lives, but now might be the time to accept that your days of holding that precious high score or speedrun are likely numbered.