DB2 Parallel Edition


Description

DB2 Parallel Edition is the parallel RDBMS from IBM. It partitions large relations using a subset of the attributes. DB2 uses multiple processes per processor to implement multiple threads of control. These processes are used for performing different database tasks as well as to implement asynchronous I/O.

Input Dataset

To drive this application, we constructed a simulated database containing records for one year's operation of a large department store. It includes records for one million customers, 100 thousand products, 10 thousand employees, 200 departments and 100 million transactions. The database schema is as follows:

# 256 byte record, 1 million customers
create table CUSTOMER (ssn char(10) not null, name char(50), address char(186), phone char(10)) partitioning key (ssn)

# 256 byte record, 100K products
create table PRODUCT (prodid integer not null, pname char(100), dept char(152)) partitioning key (prodid)

# 128 byte record, 10K employees
create table EMPLOYEE (empno integer not null, name char(74), workdept integer, phoneno char(10), job char(20), salary float, bonus float) partitioning key (empno)

# 256 byte record, 200 depts
create table DEPT (deptid integer not null, manager integer, budget float, location char(240)) partitioning key (deptid)

# 32 byte record, 100 million sales, 500000 sales/dept, > 1 year period
create table SALES (prodid integer, deptid integer, salesperson integer, customer char(10), amount float, code char(2)) partitioning key (deptid,prodid,salesperson,customer)

# index on customer
create index idx.cust on customer (ssn)

# index on employee
create index idx.emp on employee (empno)

# index on sales
create index idx.sales on sales (prodid, deptid)

The total database size was 5.2 GB and it was stored in 831 files (including the indices).

Workload

We ran five consecutive queries against this database. These queries perform complex join, set, and aggregate operations on indexed and non-indexed relations:

# query #1: List the names of all managers of all departments
select name,phoneno,salary from EMPLOYEE, DEPT where empno = manager

# query #2: list the sales volume of each department.
select deptid,sum(amount) from SALES group by deptid order by 2

# query #3: find name and phone number of all customers who have spent
# less then $100 in the past year
select name, phone from CUSTOMER where ssn in
(select distinct customer from SALES where amount < 100)


# query #4: list all products that brought revenue higher than $10 million
select prodid, pname from PRODUCT where prodid in
(select prodid from SALES group by prodid having sum(amount)>10000000.000)


# query #5: how many items have we sold in the last year?
select count(*) from SALES

For these queries, DB2 uses a directed outer-table join strategy in which rows of the outer table is hashed (based on the partitioning key) to the node where join is performed.

Traces

You can download the trace files in the following formats:

----------------------------------------------------------------------
Last updated on Tue May 27 12:37:44 EDT 1997 by Mustafa Uysal (uysal@cs.umd.edu ).