I will present a paper by Papadimitriou and Yannakakis on this topic. In general the design of a parallel algorithm proceeds as follows. First choose an algorithm (a DAG representing elementary computations and their interdependence), then choose a particular multiprocessor architecture and finally find a schedule whereby the algorithm is executed on the computer such that the makespan (the elapsed time for computing the last result) is minimized. The problem with this approach is that no general techniques exist that are applicable to any DAG for any parallel architecture. In this paper an attempt is made towards this generalization. An approximation algorithm is presented for the following NP complete scheduling problem. Given a DAG of tasks, and the inter processor communication delays, find a schedule (on possibly unbounded # of processors), that minimizes the makespan.