On Scheduling Coflows

Saba Ahmadi, Samir Khuller, Manish Purohit*, Sheng Yang

University of Maryland, College Park

*: currently an employee of Google Inc.

Facebook analytics jobs spend 33% of their runtime in communication*!

*: Chowdhury, Mosharaf, et al. Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review. Vol. 41. No. 4. ACM, 2011.

Coflow*

*:Chowdhury, Mosharaf, and Stoica, Ion. Coflow: A networking abstraction for cluster applications.Proceedings of the 11th ACM Workshop on Hot Topics in Networks. ACM, 2012.

Problem Description

Problem Description

Problem Description

  • \(n\) coflows, coflow will be represented as matrix \(D^j = [d^j_{io}]\)
  • For every coflow \(j\):
    • \(r_j\): release time
    • \(w_j\): weight
  • Minimize: \(\sum_{i=1}^n w_j\cdot C_j\).
    • \(C_j\): completion time. A coflow is finished when all its flows are scheduled

Problem Description

  • \(d^j_{io}\): demand from input port \(i\) to output port \(o\)
  • Discrete time
  • At each time slot an input/output port can send/receive at most one unit of flow.

Schedule a single coflow: matching!

Matching

Matching

Matching

Theorem

A coflow \(G\) with largest degree \(\Delta(G)\) can be scheduled in \(\Delta(G)\) time.

Existing Results*

Deterministic Randomized
Zero release time \(\frac{64}{3}\) \(8 + \frac{16\sqrt{2}}{3}\)
Arbitrary release time** \(\frac{76}{3}\) \(9 + \frac{16\sqrt{2}}{3}\)
*:Z. Qiu, C. Stein, and Y. Zhong Minimizing the total weighted completion time of coflows in datacenter networks In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 294-303). 2015, ACM. **:The authors claimed a ratio of \(\frac{67}{3}\), but the proof only holds when all release times are the same.

Improvements

S. Khuller and M. Purohit* use black-box reduction to Concurrent Open Shop, and get

  • Zero release time, deterministic: 8-approx
  • Arbitrary release time, deterministic: 12-approx

*:Samir Khuller and Manish Purohit Brief Announcement: Improved Approximation Algorithms for Scheduling Co-Flows In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 239-240). ACM.

Our Contribution

  • Zero release time*: 4-approx
  • Arbitrary release time*: 5-approx
  • Combinatorial using primal-dual
*: An independent work A New Improved Bound for Coflow Scheduling by Shafiee, Ghaderi in SPAA 2017 gave the same bound, but relies on solving an LP.

Dive into the problem

Algorithm Overview

  1. Use LP (or Primal-Dual) to find a solution, which generates an ordering of coflows according to the non-decreasing order of completion time.
  2. Consider each coflow \(i\) according to the ordering, move edges backward, i.e. move flows to previous coflows \(j < i\) without violating the largest degree \(\Delta(G_j)\). The new graph is denoted as \(G_i'\).
  3. Schedule the coflows one by one, within \(\Delta(G_i')\) time, using modified Hall's Theorem.

Concurrent Open Shop

  • \(n\) jobs to be scheduled on \(m\) different machines
  • Each job \(j\) consists of many tasks. For each machine \(i\), there is an task that takes \(d^j_i\) time to finish
  • Minimize weighted completion time

Relationship with Concurrent Open Shop

  • Concurrent Open Shop \(\Longrightarrow\) Coflow:
    • diagonal matrix

Relationship with Concurrent Open Shop

  • Coflow \(\Longrightarrow\) Concurrent Open shop:
    • Each input port \(i\) is a machine
    • Load \(L_i = \sum_{o\in \text{output ports}} d_{i,o}\)
    • Each output port \(o\) is a machine
    • Load \(L_o = \sum_{i\in \text{input ports}} d_{i,o}\)

LP for Concurrent Open Shop

\[\begin{align} &\min\sum_{j\in J}w_jC_j\\ \text{s.t.,}\ \ & C_j \geq r_j + L_{i,j},&& \forall j, \forall i\in M\\ & \sum_{j\in S}L_{i,j}C_j \geq \frac{1}{2}\left(\sum_{j\in S}L_{i,j}^2 + (\sum_{j\in S}L_{i,j})^2\right), && \forall S\subseteq J, \forall i\in M \end{align}\]

Parallel Constraint

\(\sum_{j\in S}L_{i,j} C_j \geq \frac{1}{2}\left(\sum_{j\in S} L_{i,j}^2 + \left(\sum_{j\in S} L_{i,j}\right)^2\right) \)

Assume \(S = \{j_1, \dots, j_t\}\), where \(C_{j_1} < \cdots < C_{j_t}\)

\[\begin{align} C_{j_1} & \geq L_{i, j_1},\\ C_{j_2} & \geq L_{i, j_1} + L_{i, j_2},\\ C_{j_3} & \geq L_{i, j_1} + L_{i, j_2} + L_{i, j_3},\\ & \ \ \vdots \end{align}\]

\[\begin{align} &\textstyle \sum_{j\in S} L_{i, j} C_j\\ \Longrightarrow\hspace{8em}=& \textstyle\sum_{p=1}^t L_{i, j_p}C_{j_p}\\ \geq& \textstyle\sum_{p=1}^t L_{i, j_p} \sum_{k=1}^p L_{i, j_k}\\ =& \cdots \end{align}\]

Algorithm Overview

  1. Use LP (or Primal-Dual) to find a solution, which generates an ordering of coflows according to the non-decreasing order of completion time.
  2. Consider each coflow \(i\) according to the ordering, move edges backward, i.e. move flows to previous coflows \(j < i\) without violating the largest degree \(\Delta(G_j)\). The new graph is denoted as \(G_i'\).
  3. Schedule the coflows one by one, within \(\Delta(G_i')\) time, using modified Hall's Theorem.

Moving Edges

\(\Delta(G_1)\vphantom{G_1'} = 100, \Delta(G_2) = 100, \Delta(G_3) = 100. \)

Moving Edges

\(\Delta(G_1') = 100, \Delta(G_2') = 1,\phantom{00} \Delta(G_3') = 100. \)

Moving Edges

\(\Delta(G_1') = 100, \Delta(G_2') = 1,\phantom{00} \Delta(G_3') = 0\phantom{00}. \)

2-approx

for this moving edges step

Lemma 1

The completion time of job \(j\) in an optimal LP satisfies:

\[\overline{C_j} \geq \frac{1}{2}\max_i\left\{\sum_{k=1}^j L_{i,j}\right\} = \frac{1}{2}\Delta(\bigcup_{k\leq j} G_k)\]

Lemma 2

The completion time of job \(j\) by our algorithm satisfies:

\[C_j(\text{alg}) = \sum_{k \leq j}\Delta(G'_k) \leq 2 \Delta(\bigcup_{k\leq j} G_k)\]

Theorem

The completion time of job \(j\) by our algorithm satisfies

\[C_j(\text{alg}) \leq 2 \Delta(\bigcup_{k\leq j} G_k) \leq 4 \overline{C_j}\]

4-approx

without release time

Proof

Proof

Proof

Proof

\[\begin{align} 2\Delta(\hat{G_j}) = &2\Delta(\bigcup_{k\leq j}G_k)\\ \geq & 2\max\{\textsf{deg}_{\hat{G_j}}(u), \textsf{deg}_{\hat{G_j}}(v)\}\\ \geq & 2\max\{\sum_{G\in S_u}\textsf{deg}_{G}(u), \sum_{G\in S_v}\textsf{deg}_{G}(v)\}\\ = & 2\max\{\sum_{G\in S_u}\Delta(G), \sum_{G\in S_v}\Delta(G)\}\\ \geq & \sum_{G\in S_u}\Delta(G) + \sum_{G\in S_v}\Delta(G) \geq \sum_{G\in S} \Delta(G) \end{align}\]

Combinatorial?

Dual LP

\[\max \sum_{j\in J}\sum_{i \in M} \alpha_{i,j}(r_j + L_{i,j}) + \sum_{i\in M} \sum_{S\subseteq J} \beta_{i,S}\frac{1}{2}\left(\sum_{j\in S} L^2_{i,j} + (\sum_{j\in S}L_{i,j})^2\right)\]

\[\begin{align} \text{subject to, } &\sum_{i\in M}\alpha_{i,j} + \sum_{i\in M} \sum_{S| j\in S} L_{i,j}\beta_{i,S} < w_j \quad && \forall j\in J\\ &\alpha_{i,j} \geq 0 && \forall j\in J, i \in M\\ &\beta_{i,S} \geq 0 && \forall i \in M, \forall S\subseteq J \end{align}\]

Primal Dual Algorithm Sketch

  1. Consider the most loaded port \(i\) and the coflow \(j\) with largest release time
  2. Test if \(r_j > \kappa \cdot L_i\):
    • If \(r_j > \kappa \cdot L_i\): put coflow \(j\) at last
    • If \(r_j \leq \kappa \cdot L_i\): raise the corresponding dual variable, put it at first if some constraint gets tight
  3. Loop through step 1, 2, until all coflows got a position
  4. Use the ordering above to move edges backward, and schedule each coflow

Open questions

  • Consider flow time instead of completion time, possibly using augmented resources.
  • Since coflow scheduling generalizes concurrent open shop, it is NP-hard to approximate it within a factor better than \((2-ε)\)*. We have an approximation factor of 4.
*: S. Sachdeva and R. Saket. Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover, Computational Complexity (CCC), 2013 IEEE Conference on. IEEE.

Q & A

Powered by