Texera: Supporting Collaborative ML-Centric Data Science for All

Talk
Prof. Chen Li
UC Irvine
Time: 
10.27.2023 14:00 to 15:00

Many data science projects have team members from multiple disciplines with complementary backgrounds, including domain experts with limited IT skills and computer scientists who lack domain knowledge. Typically they rely on tools such as Github, Google Drive, or even email attachments for sharing code and data files, which are very inefficient. In this talk we present our effort of supporting collaborative data analytics to enable a user experience similar to those provided by Google Docs for shared editing and Overleaf for paper writing. We present our open source system called Texera, which has been under development in the past six years. It provides collaboration-oriented features such as GUI-based workflows using cloud services, shared editing, shared execution, version control, commenting, debugging, and multiple languages (e.g., Python and R). Given the increasing importance of machine learning in data science, Texera has rich features to support ML-related analysis. We will discuss technical challenges related to these features and our solutions. The system has been used by more than 200 people to conduct more than 60 data projects on various topics. We will also share our vision of developing a data science community for a broad audience.