On Scheduling Coflows

Saba Ahmadi, Samir Khuller, Manish Purohit*, Sheng Yang

University of Maryland, College Park

*: currently an employee of Google Inc.

Datacenters around the world

Picture origin: https://crmgiant.eu/research-and-discussion/explaining-the-business-value-of-crm-in-the-cloud/

Facebook analytics jobs spend 33% of their runtime in communication*!

*: Chowdhury, Mosharaf, et al. Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review. Vol. 41. No. 4. ACM, 2011.

Coflow*

*:Chowdhury, Mosharaf, and Stoica, Ion. Coflow: A networking abstraction for cluster applications.Proceedings of the 11th ACM Workshop on Hot Topics in Networks. ACM, 2012.

Problem Description

\(n\) coflows, coflow will be represented as matrix \(D^j = [d^j_{io}]\)
For every coflow \(j\):
- \(r_j\): release time
- \(w_j\): weight
Minimize: \(\sum_{i=1}^n w_j\cdot C_j\).
- \(C_j\): completion time. A coflow is finished when all its flows are scheduled

Problem Description

\(d^j_{io}\): demand from input port \(i\) to output port \(o\)
Discrete time
At each time slot an input/output port can send/receive at most one unit of flow.

Schedule a single coflow: matching!

Matching

Theorem

A coflow \(G\) with largest degree \(\Delta(G)\) can be scheduled in \(\Delta(G)\) time.

Existing Results*

	Deterministic	Randomized
Zero release time	\(\frac{64}{3}\)	\(8 + \frac{16\sqrt{2}}{3}\)
Arbitrary release time**	\(\frac{76}{3}\)	\(9 + \frac{16\sqrt{2}}{3}\)

*:Z. Qiu, C. Stein, and Y. Zhong Minimizing the total weighted completion time of coflows in datacenter networks In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 294-303). 2015, ACM. **:The authors claimed a ratio of \(\frac{67}{3}\), but the proof only holds when all release times are the same.

Improvements

S. Khuller and M. Purohit* use black-box reduction to Concurrent Open Shop, and get

Zero release time, deterministic: 8-approx
Arbitrary release time, deterministic: 12-approx

*:Samir Khuller and Manish Purohit Brief Announcement: Improved Approximation Algorithms for Scheduling Co-Flows In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 239-240). ACM.

Our Contribution

Zero release time*: 4-approx
Arbitrary release time*: 5-approx
Combinatorial using primal-dual

*: An independent work A New Improved Bound for Coflow Scheduling by Shafiee, Ghaderi in SPAA 2017 gave the same bound, but relies on solving an LP.

Dive into the problem

Algorithm Overview

Use LP (or Primal-Dual) to find a solution, which generates an ordering of coflows according to the non-decreasing order of completion time.
Consider each coflow \(i\) according to the ordering, move edges backward, i.e. move flows to previous coflows \(j < i\) without violating the largest degree \(\Delta(G_j)\). The new graph is denoted as \(G_i'\).
Schedule the coflows one by one, within \(\Delta(G_i')\) time, using modified Hall's Theorem.

Concurrent Open Shop

\(n\) jobs to be scheduled on \(m\) different machines
Each job \(j\) consists of many tasks. For each machine \(i\), there is an task that takes \(d^j_i\) time to finish
Minimize weighted completion time

Relationship with Concurrent Open Shop

Concurrent Open Shop \(\Longrightarrow\) Coflow:
- diagonal matrix

Relationship with Concurrent Open Shop

Coflow \(\Longrightarrow\) Concurrent Open shop:
- Each input port \(i\) is a machine
- Load \(L_i = \sum_{o\in \text{output ports}} d_{i,o}\)
- Each output port \(o\) is a machine
- Load \(L_o = \sum_{i\in \text{input ports}} d_{i,o}\)

LP for Concurrent Open Shop

\[\begin{align} &\min\sum_{j\in J}w_jC_j\\ \text{s.t.,}\ \ & C_j \geq r_j + L_{i,j},&& \forall j, \forall i\in M\\ & \sum_{j\in S}L_{i,j}C_j \geq \frac{1}{2}\left(\sum_{j\in S}L_{i,j}^2 + (\sum_{j\in S}L_{i,j})^2\right), && \forall S\subseteq J, \forall i\in M \end{align}\]

Parallel Constraint

\(\sum_{j\in S}L_{i,j} C_j \geq \frac{1}{2}\left(\sum_{j\in S} L_{i,j}^2 + \left(\sum_{j\in S} L_{i,j}\right)^2\right) \)

Assume \(S = \{j_1, \dots, j_t\}\), where \(C_{j_1} < \cdots < C_{j_t}\)

\[\begin{align} C_{j_1} & \geq L_{i, j_1},\\ C_{j_2} & \geq L_{i, j_1} + L_{i, j_2},\\ C_{j_3} & \geq L_{i, j_1} + L_{i, j_2} + L_{i, j_3},\\ & \ \ \vdots \end{align}\]

\[\begin{align} &\textstyle \sum_{j\in S} L_{i, j} C_j\\ \Longrightarrow\hspace{8em}=& \textstyle\sum_{p=1}^t L_{i, j_p}C_{j_p}\\ \geq& \textstyle\sum_{p=1}^t L_{i, j_p} \sum_{k=1}^p L_{i, j_k}\\ =& \cdots \end{align}\]

Algorithm Overview

Use LP (or Primal-Dual) to find a solution, which generates an ordering of coflows according to the non-decreasing order of completion time.
Consider each coflow \(i\) according to the ordering, move edges backward, i.e. move flows to previous coflows \(j < i\) without violating the largest degree \(\Delta(G_j)\). The new graph is denoted as \(G_i'\).
Schedule the coflows one by one, within \(\Delta(G_i')\) time, using modified Hall's Theorem.

Moving Edges

\(\Delta(G_1)\vphantom{G_1'} = 100, \Delta(G_2) = 100, \Delta(G_3) = 100. \)

Moving Edges

\(\Delta(G_1') = 100, \Delta(G_2') = 1,\phantom{00} \Delta(G_3') = 100. \)

Moving Edges

\(\Delta(G_1') = 100, \Delta(G_2') = 1,\phantom{00} \Delta(G_3') = 0\phantom{00}. \)

2-approx

for this moving edges step

Lemma 1

The completion time of job \(j\) in an optimal LP satisfies:

\[\overline{C_j} \geq \frac{1}{2}\max_i\left\{\sum_{k=1}^j L_{i,j}\right\} = \frac{1}{2}\Delta(\bigcup_{k\leq j} G_k)\]

Lemma 2

The completion time of job \(j\) by our algorithm satisfies:

\[C_j(\text{alg}) = \sum_{k \leq j}\Delta(G'_k) \leq 2 \Delta(\bigcup_{k\leq j} G_k)\]

Theorem

The completion time of job \(j\) by our algorithm satisfies

\[C_j(\text{alg}) \leq 2 \Delta(\bigcup_{k\leq j} G_k) \leq 4 \overline{C_j}\]

4-approx

without release time

Proof

\[\begin{align} 2\Delta(\hat{G_j}) = &2\Delta(\bigcup_{k\leq j}G_k)\\ \geq & 2\max\{\textsf{deg}_{\hat{G_j}}(u), \textsf{deg}_{\hat{G_j}}(v)\}\\ \geq & 2\max\{\sum_{G\in S_u}\textsf{deg}_{G}(u), \sum_{G\in S_v}\textsf{deg}_{G}(v)\}\\ = & 2\max\{\sum_{G\in S_u}\Delta(G), \sum_{G\in S_v}\Delta(G)\}\\ \geq & \sum_{G\in S_u}\Delta(G) + \sum_{G\in S_v}\Delta(G) \geq \sum_{G\in S} \Delta(G) \end{align}\]

Combinatorial?

Dual LP

\[\max \sum_{j\in J}\sum_{i \in M} \alpha_{i,j}(r_j + L_{i,j}) + \sum_{i\in M} \sum_{S\subseteq J} \beta_{i,S}\frac{1}{2}\left(\sum_{j\in S} L^2_{i,j} + (\sum_{j\in S}L_{i,j})^2\right)\]

\[\begin{align} \text{subject to, } &\sum_{i\in M}\alpha_{i,j} + \sum_{i\in M} \sum_{S| j\in S} L_{i,j}\beta_{i,S} < w_j \quad && \forall j\in J\\ &\alpha_{i,j} \geq 0 && \forall j\in J, i \in M\\ &\beta_{i,S} \geq 0 && \forall i \in M, \forall S\subseteq J \end{align}\]

Primal Dual Algorithm Sketch

Consider the most loaded port \(i\) and the coflow \(j\) with largest release time
Test if \(r_j > \kappa \cdot L_i\):
- If \(r_j > \kappa \cdot L_i\): put coflow \(j\) at last
- If \(r_j \leq \kappa \cdot L_i\): raise the corresponding dual variable, put it at first if some constraint gets tight
Loop through step 1, 2, until all coflows got a position
Use the ordering above to move edges backward, and schedule each coflow

Open questions

Consider flow time instead of completion time, possibly using augmented resources.
Since coflow scheduling generalizes concurrent open shop, it is NP-hard to approximate it within a factor better than \((2-ε)\)*. We have an approximation factor of 4.

*: S. Sachdeva and R. Saket. Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover, Computational Complexity (CCC), 2013 IEEE Conference on. IEEE.

Q & A

Powered by

Reveal.js: the slide framework
animate.css: some animation
magic.css: some other animation