Testudo Course Scanner Database Schema (Complete)

This diagram includes all fields, including the temporal tracking fields (first_seen/last_seen) used for historical data tracking.

erDiagram
    snapshots {
        INTEGER id PK
        TEXT term_id
        TEXT department
        DATETIME snapshot_time
        INTEGER courses_found
        INTEGER sections_found
        TEXT status
        TEXT error_message
    }

    current_snapshot {
        TEXT term_id PK
        TEXT department PK
        INTEGER snapshot_id FK
    }

    course_info {
        TEXT term_id PK
        TEXT course_id PK
        TEXT department "GENERATED"
        TEXT title
        INTEGER credits
        TEXT grading_method
        TEXT prerequisites
        TEXT restrictions
        TEXT cross_listed_with
        TEXT credit_restrictions
        TEXT description
        INTEGER description_approved
        INTEGER first_seen_snapshot_id FK
        INTEGER last_seen_snapshot_id FK
    }

    section_info {
        TEXT term_id PK,FK
        TEXT course_id PK,FK
        TEXT section_id PK
        TEXT days
        TEXT start_time
        TEXT end_time
        TEXT building_code
        TEXT room_number
        TEXT location
        TEXT delivery_type
        INTEGER first_seen_snapshot_id FK
        INTEGER last_seen_snapshot_id FK
    }

    section_meetings {
        TEXT term_id PK,FK
        TEXT course_id PK,FK
        TEXT section_id PK,FK
        TEXT meeting_type PK
        TEXT days PK
        TEXT start_time PK
        TEXT end_time
        TEXT building_code
        TEXT room_number
        INTEGER first_seen_snapshot_id FK
        INTEGER last_seen_snapshot_id FK
    }

    instructor_info {
        TEXT term_id PK,FK
        TEXT course_id PK,FK
        TEXT section_id PK,FK
        TEXT instructor_name PK
        INTEGER first_seen_snapshot_id FK
        INTEGER last_seen_snapshot_id FK
    }

    seats_snapshot {
        TEXT term_id PK,FK
        TEXT course_id PK,FK
        TEXT section_id PK,FK
        DATETIME snapshot_time PK
        INTEGER total_seats
        INTEGER open_seats
        INTEGER waitlist_count
        INTEGER taken_seats "GENERATED"
    }

    snapshots ||--o{ current_snapshot : "current"
    snapshots ||--o{ course_info : "first_seen"
    snapshots ||--o{ course_info : "last_seen"
    snapshots ||--o{ section_info : "first_seen"
    snapshots ||--o{ section_info : "last_seen"
    snapshots ||--o{ section_meetings : "first_seen"
    snapshots ||--o{ section_meetings : "last_seen"
    snapshots ||--o{ instructor_info : "first_seen"
    snapshots ||--o{ instructor_info : "last_seen"
    course_info ||--o{ section_info : "has"
    section_info ||--o{ section_meetings : "has"
    section_info ||--o{ instructor_info : "taught_by"
    section_info ||--o{ seats_snapshot : "tracked_in"

Key Relationships

  • snapshots → current_snapshot: One snapshot can be current for many term/department combinations (1:N)

  • snapshots → course_info, section_info, etc.: Snapshots track when records were first/last seen

  • course_info → section_info: One course can have many sections (1:N)

  • section_info → section_meetings: One section can have many meeting times (1:N)

  • section_info → instructor_info: One section can have many instructors (1:N)

  • section_info → seats_snapshot: One section can have many seat snapshots over time (1:N)

Temporal Tracking

The first_seen_snapshot_id and last_seen_snapshot_id fields enable historical tracking:

  • first_seen_snapshot_id: References the snapshot when a record first appeared in scraped data

  • last_seen_snapshot_id: References the snapshot when a record was last seen in scraped data

  • Records are never deleted, allowing tracking of when courses/sections were dropped or changed

  • To find current/active records, query where last_seen_snapshot_id equals the current snapshot ID

  • To find dropped records, query where last_seen_snapshot_id is less than the current snapshot ID

  • Join with snapshots table to get the actual timestamps: JOIN snapshots ON last_seen_snapshot_id = snapshots.id

Benefits of Snapshot IDs vs Timestamps

  • Referential integrity: Foreign keys ensure first/last seen always reference actual snapshots

  • More compact storage: INTEGER vs DATETIME

  • Simpler queries: Direct comparison of IDs instead of timestamps

  • Better performance: Indexed integer comparisons are faster than datetime comparisons

Notes

  • snapshots table tracks metadata about scraping operations

  • current_snapshot table maintains the most recent snapshot for each term/department combination

  • Primary keys (PK) are shown for each table

  • Foreign keys (FK) show which fields link to parent tables

  • GENERATED fields are automatically computed by the database (department, taken_seats)

  • section_meetings composite key: The primary key includes (meeting_type, days, start_time) to allow sections to have multiple meetings of the same type (e.g., multiple labs at different times)