University of Maryland

Departments: Computer Science, Kinesiology, Mechanical Engineering
Programs: ISR, NACS, Maryland Robotics Center, UMIACS

Imitation Learning by Cognitive Robots

ONR Grant N000141310597 (2013-2017)
Lockheed Martin ETL Seed Grant N000141310597 (2017)



Project Overview

Manually programming robots to carry out specific tasks is a difficult and time consuming process. One approach to addressing this issue is to use imitation learning, or "learning by demonstration", in which a robot watches a person perform a specific task, and then the robot tries to perform the same or a similar task via imitation. However, most past work on robotic imitation learning has focused on having a robot literally copy/duplicate the human demonstrator's actions without any deeper "understanding" of the demonstrator's goals and intentions. While this can be effective, it tends to not generalize well to even mildly novel situations or unexpected events, and its use with robots that are substantially different from the human demonstrator (six rather than two arms, non-humanoid, very large or very small, etc.) raises very challenging barriers.

In this context, the primary overall goal of this research project is to create and critically evaluate a general-purpose neurocognitive architecture for imitation learning by an autonomous system. Our emphasis is on producing a robotic system that can learn bimanual arm movement control from a single demonstration, generalizing its actions much as a person does.

As elaborated below, the themes of this research include:

Virtual Demonstrator Environment
    The imitation learning process is simplified by using a virtual demonstration environment where the human demonstrator is effectively invisible to the learning robot.

Imitation Learning via Cause-Effect Reasoning
    Cause-effect reasoning is used to infer the demonstrator's goals/intentions rather than trying to literally replicate the demonstrator's arm movements.

Learning to Manipulate Non-Rigid Entities
    Learning methods are developed for manipulating non-rigid entities, such as fluids and flexible manufacturing components, via dexterous bilateral arm movements.

Investigating Planning Methods
    Methods are derived for integrating planning and acting, and for exploring different hierarchical planning methods.

Learning Neural Control Methods
    Neurocomputational methods are used to implement the control system for a robot's bimanual arm movements, including the cognitive aspects of the system.

The physical platforms used in this work include Baxter robots and a pair of Kuka arms.

*Prof. Gupta is currently collaborating from USC.


Virtual Demonstrator Environment

We have developed a virtual demonstrator environment, SMILE (Simulator for Maryland Imitation Learning Environment), in which a human demonstrator manipulates objects on a table to show a robot how to perform a task. The simulator implements basic physics in an artificial reality. SMILE's object interface also supports loading complex custom 3D models using external STL files (generated using CAD tools), and allows one to specify controls such as switches and lights using XML. Object templates and instances support variable substitutions, so that one can create different objects from the same template with different parameters.


The video on the left below illustrates creating a demonstration in a tabletop world with toy blocks. The intent is to show a robot how to take randomly placed blocks and use them to construct two structures shaped like the letters "U" and "M" (for University of Maryland). The subwindow at the bottom right shows the observing robot's view.

The video on the right below illustrates creating a demonstration related to device maintainance. The device in this case is a mock-up of a disk drive dock cabinet. The intent is for the robot to manipulate the dock drawer, and the disks and toggle switches inside, to carry out various tasks (e.g., swapping a failing disk marked by a red indicator light with a new one).

Toy Blocks Demo 1: Stack blocks to make "UM"

Disk Drive Dock Demo: Replace disk drive indicated by red LED

While most demonstrations created by SMILE are intended for export and use by physical robots, it is possible to have fairly accurate simulated robots inside SMILE's artificial world, as illustrated here with a Baxter robot.

The video on the left below illustrates creating a second demonstration in the toy blocks world, again with the intent of constructing two structures shaped like the letters "U" and "M".

The video on the right below shows how a simulated Baxter, introduced into the artificial toy blocks world, can carry out the same task after seeing this demonstration Demo 2.

Toy Blocks Demo 2: Stack blocks to make "UM"

Simulated Baxter Robot: Performs task within SMILE

Selected Publications

Huang, D., Katz, G., Gentili, R.J., Reggia, J. The Maryland Virtual Demonstrator Environment for Robot Imitation Learning. CS-TR-5039, Dept. of Computer Science, University of Maryland, College Park, MD, June 2014. (Superseded by CS-TR-5049, the next reference below.)

Huang, D., Katz, G., Gentili, R, Reggia, J. SMILE: Simulator for Maryland Imitation Learning Environment, Technical Report, CS-TR-5049, Department of Computer Science, May 2016.

Huang D, Katz G, Langsfeld J, Gentili R, Reggia J. A Virtual Demonstrator Environment for Robot Imitation Learning, Proc. Seventh Annual IEEE International Conference on Technologies for Practical Robot Applications (TePRA), 2015.

Huang D, Katz G, Langsfeld J, Oh H, Gentili R, Reggia J. An Object-Centric Paradigm for Robot Programming by Demonstration, Proc. Ninth International Conference on Augmented Cognition, Los Angeles, Lecture Notes in Computer Science 9183, Foundations of Augmented Cognition, Springer, D. Schmorrow and C. Fidopiastis (eds.), August 2015, 745-756.

Download SMILE

Visit our download SMILE web page for further information about SMILE and to obtain an open-source copy.


Imitation Learning via Cause-Effect Reasoning

Our approach to general-purpose imitation learning is based on cause-effect reasoning. During learning, a robot infers a hierarchical representation of a demonstrator's intentions that explains why the demonstrator performed the observed actions. This allows the learning robot to create its own plan for the same task, rather than focusing on duplicating the precise movements of the human demonstrator. We have completed construction of a first version of this imitation learning system, and formalized the underlying algorithms, providing guarantees of its soundness and completeness. We also experimentally compared various criteria for what makes the system's explanation/interpretation of a demonstrator's actions plausible. We evaluated the ability of a physical robot (Baxter from Rethink Robotics) to learn a series of maintenance tasks involving mock-ups of a disk drive dock and a pipe-switch-valve apparatus when using our algorithms. It can successfully generalize observed skills, as presented in SMILE demonstrations, involving bimanual manipulation of composite objects in 3D (examples in the videos below), deriving a suitable plan of action to carry out a demonstrated task in the limited situations we have tested so far. Our results indicate that the cause-effect reasoning approach we have introduced here can be an effective approach to cognitive-level imitation learning. Work is underway to compare our robot's learning performance to that of human subjects learning the same tasks.


These two videos show a Baxter robot, having observed the event records from two of the SMILE videos given above, carrying out the corresponding demonstrated tasks. The robot has learned the task in each case from a single demonstration. Note the coordinated bimanual actions and that the robot makes (quite limited at present) generalizations by starting from different initial states.

The video on the left is based on learning from the SMILE blocks world demonstration Demo 1 above.

The video on the right is based on learning from the SMILE disk drive dock demonstration above (8x speed-up).

Toy Blocks Execution: Stack blocks to make "UM"

Disk Drive Dock Execution: Generalizes to a different faulty drive

Selected Publications

Katz G, Huang D, Gentili R, Reggia J. Imitation Learning as Cause-Effect Reasoning, Proceedings of the Ninth Annual Conference on Artificial General Intelligence (AGI-16), P. Wang & B. Steunebrink (Eds.), NYC, July 2016. Received Best Student Paper Award.

Katz G, Huang D, Hauge T, Gentili R, Reggia J. A Novel Parsimonious Cause-Effect Reasoning Algorithm for Robot Imitation and Plan Recognition, IEEE Transactions on Cognitive and Developmental Systems, 2017, in press.

Katz G, Huang D, Gentili R, Reggia J. An Empirical Characterization of Parsimonious Intention Inference for Cognitive-Level Imitation Learning, Proc. 19th Intl. Conf. on Artificial Intelligence (ICAI 17), Las Vegas, July 2017, in press.

Hauge T, Katz G, Huang D, Reggia J, Gentili R. Development of a computational method to assess high-level motor planning during the performance of complex actions. Accepted. 18th NASPSPA Conference, 4-7 June 2017, San Diego, CA, USA.

Download Our Cause-Effect Reasoning Code

Visit our download causal reasoning code web page for further information and to obtain an open-source copy.


Learning to Manipulate Non-Rigid Entities

We have been developing learning methods for manipulating non-rigid entities, such as fluids and flexible manufacturing components, via dexterous bilateral arm movements. One study has focused on a fluid pouring task, where a Baxter robot holds a bottle of water in its right hand and learns to pour the correct amount of water into a moving flask. Our approach explores the task parameter space via local models to optimize appropriate movements. Learning was highly effective, with successful parameter values for new task variations being found very quickly. We have also developed an approach to automatic robotic cleaning of deformable objects having unknown stiffness characteristics. A bimanual robot setup (KUKA arms) is used where one arm holds the part to be cleaned, while the other holds the cleaning tool. The robot quickly learns models of the part deformation depending on the cleaning force and grasping parameters, and to select the correct grasp location and tool parameters for rapid cleaning.


On the left, Baxter has learned a model for pouring fluids into a moving container while minimizing the number of attempts needed to successfully pour a new target volume. After a set of random trials to construct the initial model, the robot is typically able to learn how to pour new volumes in just a handful of attempts.

On the right, a bimanual robot setup is used to clean deformable parts without prior knowledge of the part stiffness characteristics. One arm grasps the part while the other cleans. As the system gains additional knowledge about that part's behavior, it optimizes both the leaning parameters and where to hold the part to minimize the cleaning time.

Pouring Liquids into a Moving Container

Cleaning Flexible Parts

Selected Publications

Langsfeld, J., Kabir, A., Kaipa, K., Gupta, S. Online Learning of Part Deformation Models in Robotic Cleaning of Compliant Objects. ASME 2016 Manufacturing Science and Engineering Conference (MSEC), Blacksburg, VA, 2016.

Langsfeld, J., Kabir, A., Kaipa, K., Gupta, S. Robotic Bimanual Cleaning of Deformable Objects with Online Learning of Part and Tool Models. IEEE Conference on Automation Science and Engineering (CASE), Fort Worth, TX, 2016, submitted.

Langsfeld, J., Kaipa, K., Gentili, R., Reggia, J., Gupta, S.K. Incorporating Failure-to-Success Transitions in Imitation Learning for a Dynamic Pouring Task, Proc. IEEE International Conference on Intelligent Robots and Systems (IROS 2014) Workshop on Compliant Manipulation, Chicago, Sept. 2014.

Langsfeld, J., Kaipa, K., Gupta, S. Generation and Exploitation of Local Models for Rapid Learning of a Pouring Task. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Second Machine Learning in Planning and Control of Robot Motion Workshop, Hamburg, Germany, 2015

Langsfeld J, K Kaipa, and SK Gupta. Selection of Trajectory Parameters for Dynamic Pouring Tasks based on Exploitation-driven Updates of Local Metamodels. Robotica, 2017, in press.

Langsfeld J. Learning Task Models for Robotic Manipulation of Non-rigid Objects. PhD Dissertation, University of Maryland, College Park, USA, 2017.


Investigating Planning Methods

The planning component of our neurocognitive architecture needs to operate in environments that are open-world, partially observable and dynamic. We developed a new knowledge-based planning formalism called Hierarchical Goal Network (HGN) planning to explore addressing these issues. Our HGN planning algorithm uses arbitrary amounts of planning knowledge, but falls back on domain-independent planning techniques to fill in gaps. We also developed a formalization of acting, the Refinement Acting Engine, and studied its integration with ongoing planning.

Selected Publications

Alford, R., U. Kuter, D. S. Nau, and R. P. Goldman. Plan aggregation for strong-cyclic planning in nondeterministic domains. Artificial Intelligence 216, 206 - 232, Nov. 2014.

Ghallab, M., Nau, D., Traverso, P. The Actor's View of Automated Planning and Acting, Artificial Intelligence, 208, 2014, 1-17.

Ghallab, M., Nau, D., Traverso, P. Automated Planning and Acting. Cambridge University Press, Cambridge, UK. 2016.

Ivankovic, F., P. Haslum, S. Thiebaux, V. Shivashankar, and D. Nau. Optimal planning with global numerical state constraints. In International Conference on Automated Planning and Scheduling (ICAPS), June 2014.

Nau, D., Ghallab, M., Traverso, P. Blended Planning and Acting: Preliminary Approach, Research Challenges, Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015.

Nau D, M. Ghallab, and P. Traverso. Refinement planning and acting. In Conf. on Advances in Cognitive Systems, May 2017.

Shivashankar, V. Hierarchical Goal Networks: Formalisms and Algorithms for Planning and Acting, PhD Dissertation, University of Maryland, College Park, May 2015.

Shivashankar, V., Kaipa, K., Nau, D., Gupta, S. Towards Integrating Hierarchical Goal Networks and Motion Planners to Support Planning for Human-Robot Teams, Proceedings of the IROS Workshop on AI and Robotics, Sept. 2014.

Shivashankar, V., Alford, R., Kuter, U., Nau, D. Hierarchical Goal Networks and Goal-Driven Autonomy: Going where AI planning meets goal reasoning, ACS Workshop on Goal Reasoning, Annual Conference on Advances in Cognitive Systems, 2013, 95-110.


Learning Neural Control Methods

We are studying the use of neurocomputational methods to implement both trajectory control and cognitive control for a robot's bimanual arm movements. For example, we have evaluated a neural architecture that captures the visuo-spatial transformations required for the cognitive processes of mental simulation and imitation. Our model involves movements with 7 degrees of freedom arms towards targets in a 3D workspace while producing human-like kinematics. It performed accurate, flexible and robust bimanual reaching movements while avoiding extreme joint positions under various conditions. We also showed how, using these methods, the mechanical dependencies existing between finger joints can be accommodated in controlling robots with mechanically inter-dependent joints. Complementary work has developed self-organizing maps (SOMs) that use limit cycles to represent external input sequences instead of the static encoding representations of past SOMs. This fundamental change in representation is more consistent with the oscillatory nature of brain activity. We used this approach to build a combined open-loop, closed-loop multi-map neurocontroller for a Baxter robotic arm, as illustrated below, demonstrating that even though activity is constantly oscillating in the neural network controller the robot can perform fixed-point arm reaching tasks. Finally, we are studying how to implement cognitive control of sequential behaviors with attractor neural networks in which the control neural modules act by gating the activity and learning of other neural modules. This approach was used successfully to solve sequential card matching problems, and was found to match the performance of human subjects on such tasks.

A very short demonstration of how a limit-cycle SOM neural network can smoothly direct a robot's arm trajectory to a fixed location and maintain it steady there in spite of ongoing oscillatory activity in the neural networks.

Neural Net Arm Control:
Arm moves to a fixed position in spite of oscillating neural net activity.

Selected Publications

Gentili R, Oh H, Huang D., Katz G, Miller R, Reggia J. Towards a Multi-Level Neural Architecture that Unifies Self-Intended and Imitated Arm Reaching Performance, Proc. 36th Annual International Conf. of the IEEE Engineering in Medicine and Biology Society, August, 2014, 2537-2540.

Gentili R, Oh H, Kregling A, Reggia J. A Cortically-Inspired Model for Inverse Kinematics Computation of a Humanoid Finger with Mechanically-Coupled Joints, Bioinspiration and Biomimetics, IOP Press, 11, 2016, 036013.

Gentili R, Oh H, Miller R, Huang D, Katz G, Reggia J. A Neural Architecture for Performing Actual and Mentally Simulated Movements During Self-Intended and Observed Bimanual Arm Reaching Movements, International Journal of Social Robotics, 7 (3), 2015, 371-392.

Huang D. Self-Organizing Map Neural Architectures Based on Limit Cycle Attractors, PhD Dissertation, August 2016.

Huang D, Gentili R, Katz G, Reggia J. A Limit Cycle Self-Organizing Map Architecture for Stable Arm Control, Neural Networks, 85, 2017, 165-181.

Huang D, Gentili R, Reggia J. Limit Cycle Representation of Spatial Locations Using Self-Organizing Maps, Proc. of the IEEE Symposium Series on Computational Intelligence (SSCI), Dec. 2014.

Huang, D., Gentili, R., Reggia, J. Self-Organizing Maps Based on Limit Cycle Attractors, Neural Networks, 63, 2015, 208-222.

Huang D, Gentili R, Reggia J. A Self-Organizing Map Architecture for Arm Reaching Based on Limit Cycle Attractors, Proc. Ninth Intl. Conference on Bio-Inspired Information and Communication Technology (BICT 2015), New York City, Dec. 2015.

Oh, H. A Multiple Representations Model of the Human Mirror Neuron System for Learned Action Imitation. PhD Dissertation, University of Maryland, College Park, Dec. 2015. Advisor: R. Gentili.

Oh H, Braun A, Reggia J, Gentili R. Role of Visuospatial Processes During Observational Practice, North American Society for Psychology of Sport and Physical Activity, Montreal, 2016.

Reggia, J., Monner, D., Sylvester, J. The Computational Explanatory Gap, Journal of Consciousness Studies, 21 (9), 2014, 153-178.

Reggia J, Huang D, Katz G. Exploring the Computational Explanatory Gap, Philosophies, 2, 5, 2017.

Sylvester J, Reggia J. Engineering Neural Systems for High-Level Problem Solving, Neural Networks, 79, 2016, 37-52.

Last updated June 2017