1/ The virtue of PLAY—for humans + machines

~10yrs ago psychologists cheered PLAY for learning, exploring, social bonding, stress management

Humans have cheered PLAY to train machines + AI:
checkers, chess, go, jeopardy, video games

—and now poker...

2/ Checkers, chess, go etc are two-person zero-sum games with Nash equilibrium strategies. Many multi-player video games + multi-player poker have more complexity with HIDDEN INFORMATION.

CMU researchers + FB published in Science on training AI to exploit opponent weakness

3/ Here is description of the goal and technique of “Pluribus” trained on self-play to beat past versions of itself—which is notable because...

4/ The (wise) technique Jeff Bezos advocates in human decision making is “regret minimization”

—most poker AIs also use algorithm of “counterfactual regret minimization”. This one uses a variation that used CPUs instead of GPUS and cost estimated $144 of compute.

5/ Full paper here: 

