Dynamic Threshold Learning
Dynamic Threshold Learning (DTL) is a new learning technique that we have developed for real-time learning of robot controllers. The key features of DTL are:
- DTL is an online learning algorithm that learns to improve the robot’s performance as it interacts with the world
- DTL is designed for dynamic control problems that involve continuous real-time responses to the environment
- DTL learns from its mistakes
- DTL learns quickly – in minutes rather than hours or days
- Rather than trying to learn from scratch, DTL makes use of domain knowledge to speed learning
Model Transition Control (MTC) is a framework we have developed for nonlinear control of robots. MTC models nonlinear control problems as sets of linear control regimes linked by nonlinear transitions. These control regimes are represented by a state-action map (SAM) showing the control regime that will result from actions taken in given states
For example, consider a simple SAM for vehicle braking with one state variable (velocity) and one action variable (braking effort). This SAM has a GRIP control regime (where tires maintain grip on the road) and a SKID control regime (where tires skid). The vehicle is most likely to skid when it is traveling at maximum speed and brakes with maximum force. The vehicle is least likely to skid when it is stationary and applies no braking force. We refer to these points as anchor points, which are the points in state-action space that are most likely to result in the corresponding control regime.
DTL rapidly learns the structure of the SAM by maximizing the amount of learning for each observation. For each observation of a given control regime, DTL infers that all points in the SAM closer to the anchor point of that regime along all state-action axes belong to the same control regime. For example, if a vehicle traveling at 80 mph skids when hitting the brakes with 50% effort, it’s likely to skid when traveling at 100 mph and hitting the brakes with 70% effort.
MTC combines DTL with an exploratory controller that initially generates highly aggressive responses to the current vehicle and environment state. This exploratory controller initially experiences many undesirable control regimes. DTL quickly learns to limit these responses to avoid the undesirable control regimes and keep the robot in the desired control regime. For more details on DTL, see the paper that we’ve submitted to ICRA 2012.