Abstract
| We explore the All-to-All Comparison problem on large data sets with a
distributed computational resource. This has applications in biological
modeling and image analysis. The key to high performance is to distribute
data and schedule computations in such a way that computations are scheduled
where the data already lies. Unlike the highly successful Map-Reduce framework
(Hadoop) which partitions the data set, we must distribute the data so that
every pair appears on at least one machine.
This could easily be achieved by placing every object on every machine, but the
data is large, so we wish to minimize the amount of data replication.
We prove that this problem can be solved optimally using Finite Projective
Planes and Affine Planes. |