Robocup.nl - Main

The Research

Teamskills

We use a distributed shared dynamic world model for team coordination.
Sharing the same world model enables the team members to reason not only about their own actions, but they can also predict the next action a team member will take. The world model is maintained locally which can cause small differences between the world models of each robot due to network lag. Communication on the team coordination level is used to improve robustness and is almost a necessity since the team is heterogeneous both in hardware and software. The Utrecht Pioneer 2 robot uses a subsumption architecture while the Amsterdam/Delft Nomad Scouts operate on a hybrid architecture. Our team uses a global team behavior (a strategy) from which each robot derives its individual behavior (also known as role).

Our teamskills module determines the current team behavior using a simple finite state machine.
Each player has its individual behavior defined by the team behavior. There are about half a dozen different individual behaviors available, all of which have an attacking, defending or intercepting purpose. Utility functions based on potential fields \cite{latombe91,khatib86} are used to distribute the roles among the team members. This technique is similar to the ones used by other participants in the middle size league \cite{art00,freiburg01}.
Each robot calculates its own scores for the individual behaviors useful at the moment, communicates them and then calculates the scores of its team mates. Incoming scores from one of your team mates overwrite the scores you calculated for him. This introduces a level of redundancy which can improve the team performance. When everybody knows the scores for the individual behaviors of each team member the roles can be easily distributed.

The exact specification of an individual behavior is not (and can not be) the same on both architectures.
The following describes the Amsterdam/Delft approach which is similar to the one Tambe \cite{tambe00persistent} described for use in the simulation league.
A Markov decision process is used to search through state space of a robot's future states. First a number of possible actions and their parameters with respect to the current situation and individual behavior are generated. For each of these actions the probability of success is calculated. The impact on the world of an action and the expected concurrent actions of team mates and enemy players is evaluated.
Modelling team mate actions is relatively easily compared to modelling your opponent with reasonable accuracy. The evaluation criteria both include a role evaluation (using potential fields) as well as concepts like ball possession. Now the action with the maximum expected utility can be chosen and the process can be repeated to look further ahead. One of the challenges of this approach is to calculate the probability a certain action will succeed; after an estimation by hand we want to use a technique like reinforcement learning to adjust the probabilities.

Behavior Based Reinforcement Learning

The Pioneer 2 robot of Utrecht University uses an extended version of the subsumption architecture \cite{Brooks}in which particular behaviors such as {\em Get\_Ball, Dribble}, and {\em Score} compete for controlling the robot. The competition is modelled by using preconditions for the behaviors and a priority scheme for selecting a possible behavior; a behavior such as {\em Emergency\_Brake} has
the highest priority.

Learning is integrated as part of the behavior-based control architecture of the robot. This is done by providing the robot with different implementations of the same behavior. E.g. we use three {\em Get\_Ball} behaviors instead of just one. Particular behaviors may implement fast play while allowing for
errors, whereas another behavior may implement slow, but precise play. These behaviors are tested during online play against an opponent. If a particular behavior is successful against an opponent team (e.g. the {\em Get\_Ball} succeeds in getting the ball within 10 seconds), the behavior gets strengthened, and is therefore used more often. This allows us to learn a sequence of behaviors which has the highest expected probability of success against an opponent. Since there are not too many alternative behaviors,
a single RoboCup game could be sufficient to test enough behaviors so that the best behaviors can be executed during the final part of the game. To guarantee the best behavior is found, there is an exploration function which tests alternative, currently thought sub-optimal, behaviors.

Although the opponent is not explicitly modelled, which would require a lot of training data and games, our method can deal with different opponents by quickly learning the best functioning sequence of behaviors. This learning behavior may provide the robot player with additional strength, possibly leading to quite attractive play during the RoboCup games.

Vision

Our vision research is mostly directed at the problems of recognition and self-localisation. We can detect and track all objects typically found in a RoboCup environment. This is done via color detection in a pie chart manner in adapted HSI space. We have implemented 2 different ways to do self-localisation, a local and a global method. Both are based on detecting the lines of the field. The global method first splits the screen into multiple regions of interest (ROIs) and does a rough Hough transform on each of these. Then the method searches for the precise lines using linear edge detectors.
These are then matched to the world-model giving an estimation of the position. The local method checks a 3x3x3 grid (x,y and $\phi$) to estimate the best position. Per position it calculates the expected lines on the captured image. It then does a distance transform (DT) in a rectangle around the expected line and compares the results, giving a new best estimator. If the DT result becomes too big we switch back to the global self-localisation. Both methods use an inclinometer to estimate the camera tilt, since a small camera tilt can impair distance accuracy and therefore self-localisation severely.