Building a text to SQL tool¶

We’re going to build something genuinely useful. Weirdly enough, this is a “hello world” exercise for prompt engineering.

Ask a question of your database in English, get a response from a custom SQL query written by the LLM.

The UMD course database¶

I developed a project to scrap the the Testudo web site to collect data about UMD courses.

Download and save the course database from https://www.cs.umd.edu/class/fall2025/cmsc398z/files/umd-202508.db

The schema for this database is defined in umd-schema.sql. There are two Mermaid diagrams for it, one that simplified the database in schema_diagram.md, and a complete diagram in schema_diagram_complete.md. The database schema is designed to manage data for multiple terms, and allow scraping data for a term and department multiple times to keep track of changes in number of seats, and check for added or dropped courses or sections.

Prototyping against the UMD course database¶

We’re going to examine the UMD course database, and prototype against it using the sqlite-utils CLI tool:

sqlite-utils schema umd-202508.db

Let’s write that to a file:

sqlite-utils schema umd-202508.db > schema.sql

Now we can feed it to LLM and write our first query:

llm -f schema.sql \
  -s "reply with sqlite SQL" \
  "how many courses are there?"

I got back

```sql
To get counts, run one of these SQLite queries depending on what you mean by "how many courses":

Total courses (rows in course_info):
SELECT COUNT(*) AS total_courses FROM course_info;
...
Courses per department:
SELECT department, COUNT(*) AS courses FROM course_info GROUP BY department ORDER BY courses DESC;
...

So, lets’s be a little more specific

llm -f schema.sql -s "reply with sqlite SQL" \
"list the number of courses for each department for the 202508 term"

which returns

SELECT department,
       COUNT(*) AS course_count
FROM course_info
WHERE term_id = '202508'
GROUP BY department
ORDER BY course_count DESC, department;

Note that it doesn’t always come back as pure sql, the llm might return several options with descriptions, as above, or as a markdown LLM with the SQL wrapped it in a fenced code block. We could ask it not to, but we can also use the --extract flag to extract the SQL from the response if it is wrapped in a code block.

llm -f schema.sql \
  -s "reply with sqlite SQL" \
  --extract \
  "list the number of courses for each department for the 202508 term"

Let’s run that query in the most diabolical way possible:

sqlite-utils umd-202508.db "$(llm -f schema.sql \
  -s 'reply with sqlite SQL' \
  --extract \
  'list the number of courses for each department for the 202508 term')"

A more detailed schema¶

We can just use the schema that can be extracted from the database. But in my work on the Testudo scanner, I created a more detailed schema umd-schema.sql, that contains lots of additional documentation and examples. When trying to generate sql for a query, either with an LLM or manually, using that more detailed schema may prove useful.

Turning that into a Python function¶

Let’s upgrade our hacky CLI prototype into a Python function.

import sqlite_utils
import llm
import csv
import sys

model = llm.get_model("gpt-5-mini")

def text_to_sql(db: sqlite_utils.Database, question: str) -> str:
    """Convert a prompt to SQL using the LLM."""
    prompt = "Schema:\n\n{}\n\nQuestion:\n\n{}".format(
        db.schema, question
    )
    return model.prompt(
        prompt,
        system="reply with SQLite SQL, not in markdown, just the SQL",
    ).text()

db = sqlite_utils.Database("umd-202508.db")

sql = text_to_sql(db, "list the number of courses for each department for the 202508 term")

# Execute and print CSV
cursor = db.conn.execute(sql)
columns = [desc[0] for desc in cursor.description]  # column names

writer = csv.writer(sys.stdout)
writer.writerow(columns)

for row in cursor:
    # row is an sqlite3.Row or tuple-like; convert values to strings and handle NULL
    writer.writerow([("" if v is None else v) for v in row])

Notes:

This uses db.conn.execute(sql) to get a cursor with .description (column names) and iterate rows.
csv.writer writes rows directly to stdout; you can change to write to a file by opening a file and passing that to csv.writer instead of sys.stdout.

Upgrading that to a CLI tool¶

Now that we have this working, let’s turn it into a small CLI tool using argparse from the Python standard library. It should allow a database other than umd-202508.db to be specified, and allow a separate schema to be specified, rather than just using the one generated from the database.

import argparse
import re
import sys
import csv
import time

import sqlite_utils
import llm


def extract_sql(text: str) -> str:
    """Try to extract SQL from a model response.

    - Prefer fenced code blocks (``` or ```sql).
    - Otherwise strip common leading commentary like 'SQL:'.
    - Fall back to returning the full response.
    """
    if not text:
        return ""

    # Look for fenced code block
    m = re.search(r"```(?:sql)?\n(.*?)```", text, flags=re.DOTALL | re.IGNORECASE)
    if m:
        return m.group(1).strip()

    # If there's an inline code fence `...`
    m = re.search(r"`{1,3}\s*(SELECT|WITH|INSERT|UPDATE|DELETE|CREATE|PRAGMA)", text, flags=re.IGNORECASE)
    if m:
        # try to pull from first backtick to the last backtick
        parts = re.split(r"`{1,3}", text)
        # find a part that looks like SQL
        for part in parts:
            if re.match(r"^\s*(SELECT|WITH|INSERT|UPDATE|DELETE|CREATE|PRAGMA)", part, flags=re.IGNORECASE):
                return part.strip()

    # Remove a leading "SQL:" or "SQL -"
    m = re.search(r"SQL[:\-\s]\s*(.*)", text, flags=re.DOTALL | re.IGNORECASE)
    if m:
        return m.group(1).strip()

    # As a last resort, return the whole text
    return text.strip()


def prompt_to_sql(model, schema: str, question: str, system_msg: str) -> str:
    prompt = f"Schema:\n\n{schema}\n\nQuestion:\n\n{question}"
    resp = model.prompt(prompt, system=system_msg).text()
    return extract_sql(resp)


def main():
    parser = argparse.ArgumentParser(
        description="Simple text-to-SQL CLI backed by an LLM and sqlite-utils."
    )
    parser.add_argument("db", help="Path to the sqlite database file")
    parser.add_argument("question", help="Natural language question to convert to SQL")
    parser.add_argument(
        "--schema",
        "-s",
        help="Path to a .sql file containing the schema to send to the model. If omitted, the DB schema is used.",
    )
    parser.add_argument(
        "--model",
        "-m",
        default="gpt-5-mini",
        help="LLM model to use (default: gpt-5-mini)",
    )
    parser.add_argument(
        "--no-exec",
        action="store_true",
        help="Only print the SQL generated; do not try to execute it against the DB",
    )
    parser.add_argument(
        "--max-rows",
        type=int,
        default=200,
        help="Maximum number of rows to fetch and write when executing (default: 200)",
    )
    parser.add_argument(
        "--output",
        "-o",
        help="Path to output CSV file for query results (default: results-<timestamp>.csv)",
    )

    args = parser.parse_args()

    # load DB
    db = sqlite_utils.Database(args.db)

    # load schema
    if args.schema:
        with open(args.schema, "r", encoding="utf-8") as fh:
            schema = fh.read()
    else:
        schema = db.schema

    # prepare model
    model = llm.get_model(args.model)

    system_msg = "reply with SQLite SQL, not in markdown, just the SQL"

    # Get SQL from model
    sql = prompt_to_sql(model, schema, args.question, system_msg)
    if not sql:
        print("No SQL was produced by the model.", file=sys.stderr)
        sys.exit(1)

    print("---- Generated SQL ----")
    print(sql)
    print("-----------------------")

    if args.no_exec:
        sys.exit(0)

    # Try executing the SQL. IMPORTANT: We will only execute read-only queries.
    # Do not run statements that modify the database (CREATE, INSERT, UPDATE, DELETE, DROP, etc.)
    def try_execute(sql_text):
        sql_clean = sql_text.strip()
        # Determine whether this is a read-only query we will allow
        allow_prefixes = ("SELECT", "WITH", "PRAGMA")
        first_word = sql_clean.split(None, 1)[0].upper() if sql_clean else ""
        if not sql_clean.upper().startswith(allow_prefixes):
            # Refuse to automatically execute any statement that could modify the DB.
            return False, Exception(
                "Refusing to execute non-query SQL to avoid modifying the database. "
                "Only read-only queries (SELECT/WITH/PRAGMA) are allowed."
            )

        try:
            cur = db.conn.execute(sql_text)
            # fetch rows (up to max_rows)
            rows = cur.fetchmany(args.max_rows)
            columns = [d[0] for d in cur.description] if cur.description else []

            # determine output file
            if args.output:
                out_path = args.output
            else:
                # default to a timestamped file so repeated runs don't clobber each other
                ts = int(time.time())
                out_path = f"results-{ts}.csv"

            # Write CSV
            with open(out_path, "w", newline="", encoding="utf-8") as fh:
                writer = csv.writer(fh)
                if columns:
                    writer.writerow(columns)
                for r in rows:
                    # Convert any bytes objects to strings for CSV compatibility
                    clean_row = []
                    for v in r:
                        if isinstance(v, (bytes, bytearray)):
                            try:
                                clean_row.append(v.decode("utf-8"))
                            except Exception:
                                clean_row.append(str(v))
                        else:
                            clean_row.append(v)
                    writer.writerow(clean_row)

            # Print summary to stdout
            print(f"Query executed successfully. Wrote {len(rows)} rows to {out_path}")
            if len(rows) == args.max_rows:
                print(f"... (fetched {args.max_rows} rows; limit reached)")
            return True, None
        except Exception as e:
            return False, e

    ok, err = try_execute(sql)
    if ok:
        return

    # If we get here, initial execution failed; report and attempt one repair pass (still read-only)
    print("Initial execution failed with error:", err, file=sys.stderr)
    repair_prompt = (
        f"Schema:\n\n{schema}\n\n"
        f"Question:\n\n{args.question}\n\n"
        "The SQL you produced failed when executed against the database.\n"
        f"SQL that failed:\n\n{sql}\n\n"
        f"Error:\n\n{err}\n\n"
        "Please return corrected SQLite SQL only (no markdown). Only return a read-only query "
        "(SELECT/WITH/PRAGMA). Do not return any statements that modify the database."
    )
    resp = model.prompt(repair_prompt, system=system_msg).text()
    sql2 = extract_sql(resp)
    print("---- Repaired SQL ----")
    print(sql2)
    print("----------------------")
    if not sql2:
        print("Model did not produce corrected SQL.", file=sys.stderr)
        sys.exit(1)

    ok2, err2 = try_execute(sql2)
    if ok2:
        return

    print("Repaired SQL still failed with error:", err2, file=sys.stderr)
    sys.exit(1)


if __name__ == "__main__":
    main()

Here’s a fun note: the above block, usage examples and notes just said “FILL ME” and then I ran this command:

llm -m gpt-5-mini -f text-to-sql.md -s 'Write the code for the FILL ME bit'

Ways to make this better¶

This is the most basic version of this, but it works pretty well!

Some ways we could make this better:

Examples. The single most powerful prompt engineering trick is to give the LLM illustrative examples of what you are trying to achieve. A small number of carefully selected examples of questions and the expected SQL answer can radically improve the results.
Column values. A common failure case for text to SQL is when the question is e.g. “How many schools are in California?” and the model queries for where state = 'California' when it should have queried for where state = 'CA'. Feeding in some example value from each column can be all the model needs to get it right.
Data documentation. Just like a real data analyst, the more information you can feed the model the better.
Explaining errors. You can use EXPLAIN ... for a cheap validation of the query without running the whole thing. EXPLAIN will also tell you whether the indices in the database allow for efficient execution of the query.