Beagle Project: Info and Dataset


Beagle: Automated Extraction and Interpretation of Visualizations from the Web

"How common is interactive visualization on the web?" "What is the most popular visualization design?" "How prevalent are pie charts really?" These questions intimate the role of interactive visualization in the real (online) world. In this project, we present our approach (and findings) to answering these questions. First, we introduce Beagle, which mines the web for SVG-based visualizations and automatically classifies them by type (i.e., bar, pie, etc.). With Beagle, we extract over 41,000 visualizations across five different tools and repositories, and classify them with 85% accuracy, across 24 visualization types. Given this visualization collection, we study usage across tools. We find that most visualizations fall under four types: bar charts, line charts, scatter charts, and geographic maps. Though controversial, pie charts are relatively rare for the visualization tools that were studied. Our findings also suggest that the total visualization types supported by a given tool could factor into its ease of use. However this effect appears to be mitigated by providing a variety of diverse expert visualization examples to users. By using a scalable and automated data collection process, Beagle can support a variety of data-driven visualization design techniques, where a large input corpus is used to train machine learning models and extract design heuristics for automated visualization of new and unfamiliar datasets.

paper info: paper | video preview
general info: README.txt | labels.json
dataset zip files: chartblocks.zip | d3.zip | fusioncharts.zip | graphiq.zip | plotly.zip
Have questions? Contact beagle@cs.umd.edu.