DB2 Parallel Edition

Trace Distribution

Compressed, ASCII format (78.3MB)

Applications for Measurement and Benchmarking of I/O on Parallel Computers

DB2 Parallel Edition is the parallel RDBMS from IBM. It partitions large relations using a subset of the attributes. DB2 uses multiple processes per processor to implement multiple threads of control. These processes are used for performing different database tasks as well as to implement asynchronous I/O.

Input Dataset

To drive this application, we constructed a simulated database containing records for one year's operation of a large department store. It includes records for one million customers, 100 thousand products, 10 thousand employees, 200 departments and 100 million transactions. The database schema is as follows:

# 256 byte record, 1 million customers create table CUSTOMER (ssn char(10) not null, name char(50), address char(186), phone char(10)) partitioning key (ssn) # 256 byte record, 100K products create table PRODUCT (prodid integer not null, pname char(100), dept char(152)) partitioning key (prodid) # 128 byte record, 10K employees create table EMPLOYEE (empno integer not null, name char(74), workdept integer, phoneno char(10), job char(20), salary float, bonus float) partitioning key (empno) # 256 byte record, 200 depts create table DEPT (deptid integer not null, manager integer, budget float, location char(240)) partitioning key (deptid) # 32 byte record, 100 million sales, 500000 sales/dept, > 1 year period create table SALES (prodid integer, deptid integer, salesperson integer, customer char(10), amount float, code char(2)) partitioning key (deptid,prodid,salesperson,customer) # index on customer create index idx.cust on customer (ssn) # index on employee create index idx.emp on employee (empno) # index on sales create index idx.sales on sales (prodid, deptid) The total database size was 5.2 GB and it was stored in 831 files (including the indices).

Workload

We ran five consecutive queries against this database. These queries perform complex join, set, and aggregate operations on indexed and non-indexed relations:

# query #1: List the names of all managers of all departments select name,phoneno,salary from EMPLOYEE, DEPT where empno = manager # query #2: list the sales volume of each department. select deptid,sum(amount) from SALES group by deptid order by 2 # query #3: find name and phone number of all customers who have spent # less then $100 in the past year select name, phone from CUSTOMER where ssn in (select distinct customer from SALES where amount # query #4: list all products that brought revenue higher than $10 million select prodid, pname from PRODUCT where prodid in (select prodid from SALES group by prodid having sum(amount)>10000000.000) # query #5: how many items have we sold in the last year? select count(*) from SALES For these queries, DB2 uses a directed outer-table join strategy in which rows of the outer table is hashed (based on the partitioning key) to the node where join is performed.