ProtoLife - PDT Overview

PDT Overview

You are here because you want to optimize a complex experiment. You have some way of judging the quality of your experiments, by measuring an experimental response. Your experiment is complex, because it has several experimental parameters (e.g., between 4 and 20) that interact with each other in unknown, unpredictable ways. You want to optimize these parameters, i.e., find values of the experimental parameters that lead to desired experimental responses. You will have some way of executing experiments in an automated or semi-automated way (e.g. using a high-throughput experimental platform).

We will consider your experiments as being grouped into generations. PDT is a tool for designing experiments for each generation, based on sophisticated machine-learning modeling of data from all past generations, with the aim of optimizing experimental responses.

Your experiments are expensive, so that each generation typically won't contain more than 10's, 100's, or few 1000's of experiments. Prediction of the next generation of experiments is therefore a small data problem. Building predictive models using machine-learning techniques for small data problems requires special methods (in contrast to methods used for big data), made conveniently available to you with PDT's web interface.

Your use of PDT entails three basic phases, two for initialization, and one that is repeated over and over with each generation:

1. Experimental space definition

You start by choosing the type of experimental space in which your experiments are defined. Currently there are two options:

Factorial, where you specify the experimental space as a list of experimental parameters (e.g. concentrations, temperatures, etc), and of the discrete values each parameter may take when you execute an experiment. The experimental space is given by all possible combinations of values, one for each parameter.

Mixture, where you specify the experimental space as a list of experimental parameters, and a range of integer values that each parameter may take. These values correspond to the number of units of that parameter in an experiment (for example, the number of drops of solution in an experiment set up by a liquid handling robot). The number of units, summed over all parameters, is equal to a constant u_sum for all experiments. That implies that if an experiment contains u units of a certain parameter, the ratio u/u_sum represents the proportion of that parameter in the mixture.

After choosing the type of experimental space, and specifying experimental parameters and values, you then specify the generation parameters, N_p = population size and N_r = number of replicates. These two numbers determine how many experiments there will be in a generation: N_exp = N_p (N_r + 1).

2. Initial experiments

Before launching a PDT campaign, you may want to enter experimental results (experiments and response measurements) that you have gathered in preliminary studies, e.g., calibration runs, as initial experiments. To be usable by PDT, these experiments must lie in an expanded variant of the experimental space determined in the previous experimental space definition phase. This means that each experiment must have a specified value for each of the parameters in the experimental space definition (and, if the space is a mixture, respect the constraint on the total number of units), but it is not necessary that its parameter values correspond to particular values specified in the experimental space definition.

3. Experiment design and response measurements

PDT iteratively chooses designs of N_exp experiments, each selected from your experimental space, called generations. The first generation is typically exploratory, aimed at "covering" the experimental space, while keeping into account any initial experiments you may have entered in the previous phase. At the end of every generation, PDT will: (a) use all available experimental results to build a model that predicts which, among all untried experiments, are those most likely to have good experimental response; (b) identify the regions of the experimental space that have not been yet covered by the tried experiments; (c) choose the design for the following generation based on (a) and (b).

In addition to the experiments chosen by PDT for the current generation, you may optionally perform extra (additional) experiments, either to explore your own intuition, or to further validate results from previous generations, e.g., for calibration reasons. PDT will use results from extra experiments the same way it does with results from PDT-designed experiments.