I will present a paper by Papadimitriou and Yannakakis on this topic.
In general the design of a parallel algorithm proceeds
as follows. First choose an algorithm (a DAG representing elementary
computations and their interdependence), then choose a particular
multiprocessor architecture and finally find a schedule whereby the
algorithm is executed on the computer such that the makespan (the
elapsed time for computing the last result) is minimized. The problem with
this approach is that no general techniques exist that are applicable to
any DAG for any parallel architecture.
In this paper an attempt is made towards this generalization.
An approximation algorithm is presented for the following
NP complete scheduling problem. Given a DAG of tasks, and the inter
processor communication delays, find a schedule (on possibly
unbounded # of processors), that minimizes the makespan.