Optimizing the computation (cont.)
Cache results
- caching (in memory) the results of a group-by from which other group-byes are computed (less I/O)
Amortize-scan
- amortizing disk reads by computing as many group-bys together
Share-sorts (for SORT only)
- sharing sorting cost across multiple group-bys
Share-partitions (for HASH only)
- when the hash table is too large, data is partitioned and aggregation is done for each partition that fits in memory
THEY REPRESENT TRADE-OFFS