SFB 303 Discussion Paper No. B - 432

Author: Cressman, R., and K. H. Schlag
Title: Updating Strategies Through Observed Play - Optimization Under Bounded Rationality
Abstract: Individuals repeatedly face a multi-decision task with unknown payoff distributions. They have minimal memory and update their strategy by observing previous play (and not strategy) of someone else. We select behavior rules that increase average payoffs as often as possible in a large population where all use the same rule. Here imitation generalizes to a pasting procedure. When decisions within the task are unrelated, individuals eventually learn the efficient strategy but the underlying dynamic is not monotone. However, when choices influence which decisions are subsequently faced in the task, play may not be efficient in the long run as it approaches a Nash equilibrium of the agent normal form.
Keywords: Multi-Armed Bandit, improving, undominated behavioral rule, play-wise imitating, replicator dynamic, monotone dynamics
JEL-Classification-Number: C72, C79
Creation-Date: April 1998
URL: ../1998/b/bonnsfb432.zip

SFB 303 Homepage

21.04.1998, Webmaster