DB2 Parallel Edition

Trace Distribution

  • Compressed, ASCII format (78.3MB)
  • Applications for Measurement and Benchmarking of I/O on Parallel Computers

    DB2 Parallel Edition is the parallel RDBMS from IBM. It partitions large relations using a subset of the attributes. DB2 uses multiple processes per processor to implement multiple threads of control. These processes are used for performing different database tasks as well as to implement asynchronous I/O.

    Input Dataset

    To drive this application, we constructed a simulated database containing records for one year's operation of a large department store. It includes records for one million customers, 100 thousand products, 10 thousand employees, 200 departments and 100 million transactions. The database schema is as follows:

    # 256 byte record, 1 million customers
    create table CUSTOMER (ssn char(10) not null, name char(50), address char(186), phone char(10)) partitioning key (ssn)

    # 256 byte record, 100K products
    create table PRODUCT (prodid integer not null, pname char(100), dept char(152)) partitioning key (prodid)

    # 128 byte record, 10K employees
    create table EMPLOYEE (empno integer not null, name char(74), workdept integer, phoneno char(10), job char(20), salary float, bonus float) partitioning key (empno)

    # 256 byte record, 200 depts
    create table DEPT (deptid integer not null, manager integer, budget float, location char(240)) partitioning key (deptid)

    # 32 byte record, 100 million sales, 500000 sales/dept, > 1 year period
    create table SALES (prodid integer, deptid integer, salesperson integer, customer char(10), amount float, code char(2)) partitioning key (deptid,prodid,salesperson,customer)

    # index on customer
    create index idx.cust on customer (ssn)

    # index on employee
    create index idx.emp on employee (empno)

    # index on sales
    create index idx.sales on sales (prodid, deptid)

    The total database size was 5.2 GB and it was stored in 831 files (including the indices).

    Workload

    We ran five consecutive queries against this database. These queries perform complex join, set, and aggregate operations on indexed and non-indexed relations:

    # query #1: List the names of all managers of all departments
    select name,phoneno,salary from EMPLOYEE, DEPT where empno = manager

    # query #2: list the sales volume of each department.
    select deptid,sum(amount) from SALES group by deptid order by 2

    # query #3: find name and phone number of all customers who have spent
    # less then $100 in the past year
    select name, phone from CUSTOMER where ssn in (select distinct customer from SALES where amount

    # query #4: list all products that brought revenue higher than $10 million
    select prodid, pname from PRODUCT where prodid in (select prodid from SALES group by prodid having sum(amount)>10000000.000)

    # query #5: how many items have we sold in the last year?
    select count(*) from SALES

    For these queries, DB2 uses a directed outer-table join strategy in which rows of the outer table is hashed (based on the partitioning key) to the node where join is performed.