MS Defense: EFFECTIVENESS OF PROXIMAL POLICY OPTIMIZATION METHODS FOR NEURAL PROGRAM INDUCTION
The Neural Virtual Machine (NVM) is a novel neurocomputational architecture designed to emulate the functionality of a traditional computer. A version of the NVM called NVM-RL supports reinforcement learning based on standard policy gradient methods as a mechanism for performing neural program induction. In this thesis, I modified NVM-RL using one of the most popular reinforcement learning algorithms, proximal policy optimization (PPO). Surprisingly, using PPO with the existing all-or-nothing reward function did not improve its effectiveness. However, I found that PPO did improve the performance of the existing NVM-RL if one instead used a reward function that grants partial credit for incorrect outputs based on how much those incorrect outputs differ from the correct targets. I conclude that, in some situations, PPO can improve the performance of reinforcement learning during program induction, but that this improvement is dependent on the quality of the reward function that is used.
Chair: Dr. James A. Reggia Members: Dr. Dana Nau
Dr. Garrett E. Katz