Thus, it is a problem in which options can greatly improve performance, but only if those options are themselves feasible to learn. G. Barto et al. Konidaris and Barto (2009a) assumed that the agent starts with primitive actions only, and that a new option is created for moving each effector over each object when the agent first successfully does so. The task is then to efficiently learn the policies for these options using abstraction selection. The agent was given an abstraction library consisting of 17 abstractions.

1995). Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 20, 197–243. Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In C. Sammut & A. G. ), Machine learning, proceedings of the nineteenth international conference (ICML 2002) (pp. 243–250). San Francisco: Morgan Kaufmann. , & Grupen, R. A. (1997). A feedback control structure for on-line learning tasks. Robotics and Autonomous Systems, 22, 303–315. Gibson, J. (1977).

This happens because the agent still uses only local information in selecting actions: it only considers how the distributions change locally as a result of executing single actions or single options. This method is illustrated in the Light Box Environment in Sect. 3 below, where we call the agent using it the LOCAL agent. One way to produce a more global method is to allow the agent to use its current environment model to plan to reach configurations of environmental variable values that will likely yield more relevant information.

