PhD Proposal: Vaiolin: AI Assisted Violin Learning

Snehesh Shrestha
07.30.2021 13:00 to 15:00


The art and skill to play a music instrument is learned over many years of iteration of lessons, practice, and feedback. It is a complex cognitive task requiring a high degree of control and coordination of the motor system, the visual and auditory sensory system, somatosensory system (proprioception and touch), as well as lessons, supervision, and feedback from a teacher that includes natural language, gestures, and demonstrations. In sports, serious athletes practice with coaches and personal trainers to receive more regular practice and feedback. Similar to sports, when playing the violin, correct posture of the entire body is critical for obtaining superior sound quality, ability to practice and perform for long periods, and avoid injuries. While sports coaches and violin teachers cannot be present all the time to provide feedback, an artificial intelligence (AI) learning assistant system, Vaiolin, can augment the teaching by monitoring students' practice and providing feedback to the students. It can enhance the teachers' capabilities by monitoring the students' progress, summarizing strengths and weaknesses, and providing lesson recommendations.As part of this larger system, in this proposal, I will focus on posture specific issues and will present the development of the following components: a) posture assessment b) machine learning for fine motor movement and c) user interface for visualization and feedback. I take a human-centered design approach for the development of Vaiolin, then take a top-down approach to build an augmented learning user interface and evaluation metrics. In this work, I will present my ongoing work for a novel visualization of posture issues and visual and auditory feedback mechanisms to explain and guide students on how to fix the issues. To achieve this, I will discuss models used to capture the meaning of posture issues, sophisticated bowing techniques, and fine hands movements. In this work, a combined human body and instrument geometry is learned through a spatio-temporal graph convolution network. For low level features, I will discuss temporally coherent human 3D pose refinement network. For violin and bow 3D pose, I will present the creation and generation of large synthetic data and learning the 3D pose. For the posture issues I will define ontology of salient issues due to posture and movement patterns grounded in kinesiology theories and music teaching. Finally I will summarize the entire system to connect the workflow to create a scalable system.Examining Committee:

Chair: Dr. Yiannis Aloimonos Dept rep: Dr. Ramani Duraiswami Members: Dr. Cornelia Fermüller Dr. Irina Muresanu
Dr. Ge Gao