## On Scheduling Coflows

University of Maryland, College Park

*: currently an employee of Google Inc.

### Facebook analytics jobs spend 33% of their runtime in communication*!

*: Chowdhury, Mosharaf, et al. Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review. Vol. 41. No. 4. ACM, 2011.

# Coflow*

*:Chowdhury, Mosharaf, and Stoica, Ion. Coflow: A networking abstraction for cluster applications.Proceedings of the 11th ACM Workshop on Hot Topics in Networks. ACM, 2012.

### Problem Description

• $$n$$ coflows, coflow will be represented as matrix $$D^j = [d^j_{io}]$$
• For every coflow $$j$$:
• $$r_j$$: release time
• $$w_j$$: weight
• Minimize: $$\sum_{i=1}^n w_j\cdot C_j$$.
• $$C_j$$: completion time. A coflow is finished when all its flows are scheduled

### Problem Description

• $$d^j_{io}$$: demand from input port $$i$$ to output port $$o$$
• Discrete time
• At each time slot an input/output port can send/receive at most one unit of flow.

### Theorem

A coflow $$G$$ with largest degree $$\Delta(G)$$ can be scheduled in $$\Delta(G)$$ time.

### Existing Results*

Deterministic Randomized
Zero release time $$\frac{64}{3}$$ $$8 + \frac{16\sqrt{2}}{3}$$
Arbitrary release time** $$\frac{76}{3}$$ $$9 + \frac{16\sqrt{2}}{3}$$
*:Z. Qiu, C. Stein, and Y. Zhong Minimizing the total weighted completion time of coflows in datacenter networks In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 294-303). 2015, ACM. **:The authors claimed a ratio of $$\frac{67}{3}$$, but the proof only holds when all release times are the same.

### Improvements

S. Khuller and M. Purohit* use black-box reduction to Concurrent Open Shop, and get

• Zero release time, deterministic: 8-approx
• Arbitrary release time, deterministic: 12-approx

*:Samir Khuller and Manish Purohit Brief Announcement: Improved Approximation Algorithms for Scheduling Co-Flows In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 239-240). ACM.

### Our Contribution

• Zero release time*: 4-approx
• Arbitrary release time*: 5-approx
• Combinatorial using primal-dual
*: An independent work A New Improved Bound for Coflow Scheduling by Shafiee, Ghaderi in SPAA 2017 gave the same bound, but relies on solving an LP.

## Dive into the problem

### Algorithm Overview

1. Use LP (or Primal-Dual) to find a solution, which generates an ordering of coflows according to the non-decreasing order of completion time.
2. Consider each coflow $$i$$ according to the ordering, move edges backward, i.e. move flows to previous coflows $$j < i$$ without violating the largest degree $$\Delta(G_j)$$. The new graph is denoted as $$G_i'$$.
3. Schedule the coflows one by one, within $$\Delta(G_i')$$ time, using modified Hall's Theorem.

### Concurrent Open Shop

• $$n$$ jobs to be scheduled on $$m$$ different machines
• Each job $$j$$ consists of many tasks. For each machine $$i$$, there is an task that takes $$d^j_i$$ time to finish
• Minimize weighted completion time

### Relationship with Concurrent Open Shop

• Concurrent Open Shop $$\Longrightarrow$$ Coflow:
• diagonal matrix

### Relationship with Concurrent Open Shop

• Coflow $$\Longrightarrow$$ Concurrent Open shop:
• Each input port $$i$$ is a machine
• Load $$L_i = \sum_{o\in \text{output ports}} d_{i,o}$$
• Each output port $$o$$ is a machine
• Load $$L_o = \sum_{i\in \text{input ports}} d_{i,o}$$

### LP for Concurrent Open Shop

\begin{align} &\min\sum_{j\in J}w_jC_j\\ \text{s.t.,}\ \ & C_j \geq r_j + L_{i,j},&& \forall j, \forall i\in M\\ & \sum_{j\in S}L_{i,j}C_j \geq \frac{1}{2}\left(\sum_{j\in S}L_{i,j}^2 + (\sum_{j\in S}L_{i,j})^2\right), && \forall S\subseteq J, \forall i\in M \end{align}

### Parallel Constraint

$$\sum_{j\in S}L_{i,j} C_j \geq \frac{1}{2}\left(\sum_{j\in S} L_{i,j}^2 + \left(\sum_{j\in S} L_{i,j}\right)^2\right)$$

Assume $$S = \{j_1, \dots, j_t\}$$, where $$C_{j_1} < \cdots < C_{j_t}$$

\begin{align} C_{j_1} & \geq L_{i, j_1},\\ C_{j_2} & \geq L_{i, j_1} + L_{i, j_2},\\ C_{j_3} & \geq L_{i, j_1} + L_{i, j_2} + L_{i, j_3},\\ & \ \ \vdots \end{align}

\begin{align} &\textstyle \sum_{j\in S} L_{i, j} C_j\\ \Longrightarrow\hspace{8em}=& \textstyle\sum_{p=1}^t L_{i, j_p}C_{j_p}\\ \geq& \textstyle\sum_{p=1}^t L_{i, j_p} \sum_{k=1}^p L_{i, j_k}\\ =& \cdots \end{align}

### Algorithm Overview

1. Use LP (or Primal-Dual) to find a solution, which generates an ordering of coflows according to the non-decreasing order of completion time.
2. Consider each coflow $$i$$ according to the ordering, move edges backward, i.e. move flows to previous coflows $$j < i$$ without violating the largest degree $$\Delta(G_j)$$. The new graph is denoted as $$G_i'$$.
3. Schedule the coflows one by one, within $$\Delta(G_i')$$ time, using modified Hall's Theorem.

### Moving Edges

$$\Delta(G_1)\vphantom{G_1'} = 100, \Delta(G_2) = 100, \Delta(G_3) = 100.$$

### Moving Edges

$$\Delta(G_1') = 100, \Delta(G_2') = 1,\phantom{00} \Delta(G_3') = 100.$$

### Moving Edges

$$\Delta(G_1') = 100, \Delta(G_2') = 1,\phantom{00} \Delta(G_3') = 0\phantom{00}.$$

# 2-approx

for this moving edges step

### Lemma 1

The completion time of job $$j$$ in an optimal LP satisfies:

$\overline{C_j} \geq \frac{1}{2}\max_i\left\{\sum_{k=1}^j L_{i,j}\right\} = \frac{1}{2}\Delta(\bigcup_{k\leq j} G_k)$

### Lemma 2

The completion time of job $$j$$ by our algorithm satisfies:

$C_j(\text{alg}) = \sum_{k \leq j}\Delta(G'_k) \leq 2 \Delta(\bigcup_{k\leq j} G_k)$

### Theorem

The completion time of job $$j$$ by our algorithm satisfies

$C_j(\text{alg}) \leq 2 \Delta(\bigcup_{k\leq j} G_k) \leq 4 \overline{C_j}$

# 4-approx

without release time

### Proof

\begin{align} 2\Delta(\hat{G_j}) = &2\Delta(\bigcup_{k\leq j}G_k)\\ \geq & 2\max\{\textsf{deg}_{\hat{G_j}}(u), \textsf{deg}_{\hat{G_j}}(v)\}\\ \geq & 2\max\{\sum_{G\in S_u}\textsf{deg}_{G}(u), \sum_{G\in S_v}\textsf{deg}_{G}(v)\}\\ = & 2\max\{\sum_{G\in S_u}\Delta(G), \sum_{G\in S_v}\Delta(G)\}\\ \geq & \sum_{G\in S_u}\Delta(G) + \sum_{G\in S_v}\Delta(G) \geq \sum_{G\in S} \Delta(G) \end{align}

# Combinatorial?

### Dual LP

$\max \sum_{j\in J}\sum_{i \in M} \alpha_{i,j}(r_j + L_{i,j}) + \sum_{i\in M} \sum_{S\subseteq J} \beta_{i,S}\frac{1}{2}\left(\sum_{j\in S} L^2_{i,j} + (\sum_{j\in S}L_{i,j})^2\right)$

\begin{align} \text{subject to, } &\sum_{i\in M}\alpha_{i,j} + \sum_{i\in M} \sum_{S| j\in S} L_{i,j}\beta_{i,S} < w_j \quad && \forall j\in J\\ &\alpha_{i,j} \geq 0 && \forall j\in J, i \in M\\ &\beta_{i,S} \geq 0 && \forall i \in M, \forall S\subseteq J \end{align}

### Primal Dual Algorithm Sketch

1. Consider the most loaded port $$i$$ and the coflow $$j$$ with largest release time
2. Test if $$r_j > \kappa \cdot L_i$$:
• If $$r_j > \kappa \cdot L_i$$: put coflow $$j$$ at last
• If $$r_j \leq \kappa \cdot L_i$$: raise the corresponding dual variable, put it at first if some constraint gets tight
3. Loop through step 1, 2, until all coflows got a position
4. Use the ordering above to move edges backward, and schedule each coflow

## Open questions

• Consider flow time instead of completion time, possibly using augmented resources.
• Since coflow scheduling generalizes concurrent open shop, it is NP-hard to approximate it within a factor better than $$(2-ε)$$*. We have an approximation factor of 4.
*: S. Sachdeva and R. Saket. Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover, Computational Complexity (CCC), 2013 IEEE Conference on. IEEE.