Volunteer Computing (VC) projects deploy resources connected to the Internet and owned by the public. VC projects are throughput-based: performance is measured in terms of number of valid results that are delivered to the scientists within a given interval of time. The path that leads from the generation of work-units (WUs) to their result validation is error-prone and is characterized by delays hidden in the different phases of a WU's lifetime, i.e., from the generation of a WU to the generation of the associated replicas; from the generation of replicas to the distribution to workers (volunteers' hosts); from the distribution to workers to the collection of the associated results; and finally from the collection of results to the validation (comparison of two or more replica results which need to be in agreement).
BOINC projects [1] are VC project that may execute longer or shorter applications with different degrees of sensitivity to errors and may require different levels of homogeneous redundancy (where replicas of a WU are assigned to workers within a computational equivalent class, i.e., with the same operating system or both the same operating system and processor architecture) [2] or do not require any homogenous redundancy at all; different worker communities with different levels of availability and reliability can support a VC project. This heterogeneity across BOINC projects makes it very challenging to find performance bottlenecks as well as common scheduling strategies across projects that are able to tackle these bottlenecks. Questions that address poor throughput include: Is a given scheduling approach effective at reducing the time from when the WU is generated to when its replicas are distributed, or from the distribution of WU replicas to their collection? Does a reduction in the lifetime of successfully completed replicas cause an increase in errors or invalid results? Will a given approach for reducing the time interval between generation and distribution of replicas be effective for all projects? Is there a correlation between a worker's features and its throughput?
The goal of the presented work is to find common threads across different BOINC projects in order to determine performance bottlenecks followed by a conversion of those findings into scheduling policies that improve throughput of the projects. To understand whether there are common threads and identify effective policies, we are analyzing traces from four BOINC projects: Predictor@Home [3] and three projects from the IBM World Community Grid (WCG) initiative (FightAIDS@Home, Proteome Folding, and Genome Comparison). The work is supported by statistical methods and data-mining techniques.