Umetrics Suite Blog

What is DOE? Design of Experiments Basics for Beginners

August 9, 2018

[This blog was a favorite last year, so we thought you'd like to see it again. Send us your comments!].

Whether you work in engineering, R&D, or a science lab, understanding the basics of experimental design can help you achieve more statistically optimal results from your experiments or improve your output quality.

Two scientists looking into the container in the factory-685002-edited.jpeg

Using Design of Experiments (DOE) techniques, you can determine the individual and interactive effects of various factors that can influence the output results of your measurements. You can also use DOE to gain knowledge and estimate the best operating conditions of a system, process or product.

DOE applies to many different investigation objectives, but can be especially important early on in a screening investigation to help you determine what the most important factors are. Then, it may help you optimize and better understand how the most important factors that you can regulate influence the responses or critical quality attributes.

Another important application area for DOE is in making production more effective by identifying factors that can reduce material and energy consumption or minimize costs and waiting time. It is also valuable for robustness testing to ensure quality before releasing a product or system to the market.

What’s the alternative?

In order to understand why Design of Experiments is so valuable, it may be helpful to take a look at what DOE helps you achieve. A good way to illustrate this is by looking at an alternative approach, one that we call the “COST” approach. The COST (Change One Separate factor at a Time) approach might be considered an intuitive or even logical way to approach your experimentation options (until, that is, you have been exposed to the ideas and thinking of DOE).

Let’s consider the example of a small chemical reaction where the goal is to find optimal conditions for yield. In this example, we can vary only two elements, or factors:

  1. the volume of the reaction container (between 500 and 700 ml), and
  2. the pH of the solution (between 2.5 and 5).

We change the experimental factors and measure the response outcome, which in this case, is the yield of the desired product. Using the COST approach, we can vary just one of the factors at time to see what affect it has on the yield.

So, for example, first we might fix the pH at 3, and change the volume of the reaction container from a low setting of 500ml to a high of 700ml. From that we can measure the yield.

Below is an example of a table that shows the yield that was obtained when changing the volume from 500 to 700 ml. In the scatterplot on the right, we have plotted the measured yield against the change in reaction volume, and it doesn’t take long to see that the best volume is located at 550 ml.

Volume

Next, we evaluate what will happen when we fix the volume at 550 ml (the optimal level) and start to change the second factor. In this second experimental series, the pH is changed from 2.5 to 5.0 and you can see the measured yields. These are listed in the table and plotted below. From this we can see that the optimal pH is around 4.5.

pH

The optimal combination for the best yield would be a volume of 550 ml and pH 4.5. Sounds good right? But, let’s consider this a bit more.

Gaining a better perspective with DOE

What happens when we take more of a bird’s eye perspective, and look at the overall experimental map by number and order of experiments?

For example, in the first experimental series (indicated on the horizontal axis below), we moved the experimental settings from left to right, and we found out that 550 was the optimal volume.

Then in the second experimental series, we moved from bottom to top (as shown in the scatterplot below) and after a while we found out that the best yield was at experiment number 10 (4.5 pH).

Colered by yield

The problem here is that we are not really certain whether the experimental point number 10 is truly the best one. The risk is that we have perceived that as being the optimum without it really being the case. Another thing we may question is the number of experiments we used. Have we used the optimal number of runs for experiments?

Zooming out and picturing what we have done on a map, we can see that we have only been exploiting a very small part of the entire experimental space. The true relationship between pH and volume is represented by the Contour Plot pictured below. We can see that the optimal value would be somewhere at the top in the larger red area.

Response contour plot

So the problem with the COST approach is that we can get very different implications if we choose other starting points. We perceive that the optimum was found, but the other— and perhaps more problematic thing—is that we didn’t realize that continuing to do additional experiments would produce even higher yields.

How to design better experiments

Instead, using the DOE approach, we can build a map in a much better way. First, consider the use of just two factors, which would mean that we have a limited range of experiments.  As the contour plot below shows, we would have at least four experiments (defining the corners of a rectangle.)

Response contour plot

These four points can be optimally supplemented by a couple of points representing the variation in the interior part of the experimental design.

The important thing here is that when we start to evaluate the result, we will obtain very valuable information about the direction in which to move for improving the result. We will understand that we should reposition the experimental plan according to the dashed arrow.

However, DOE is NOT limited to looking at just two factors. It can be applied to three, four or many more factors.

If we take the approach of using three factors, the experimental protocol will start to define a cube rather than a rectangle. So the factorial points will be the corners of the cube.

Representative experiments

In this way, DOE allows you to construct a carefully prepared set of representative experiments, in which all relevant factors are varied simultaneously.

DOE is about creating an entity of experiments that work together to map an interesting experimental region. So with DOE we can prepare a set of experiments that are optimally placed to bring back as much information as possible about how the factors are influencing the responses.

Plus, we will we have support for different types of regression models. For example, we can estimate what we call a linear model, or an interaction model, or a quadratic model. So the selected experimental plan will support a specific type of model.

Why is DOE a better approach?

We can see three main reasons that DOE Is a better approach to experiment design than the COST approach.

  1. DOE suggests the correct number of runs needed (often fewer than used by the COST approach)

  2. DOE provides a model for the direction to follow

  3. Many factors can be used (not just two)

In summary, the benefits of DOE are:

  • An organized approach that connects experiments in a rational manner
  • The influence of and interactions between all factors can be estimated
  • More precise information is acquired in fewer experiments
  • Results are evaluated in the light of variability
  • Support for decision-marketing: map of the system (response contour plot)

 

Download Presentation

 

Watch the webinar video

 

 

Have a question or comment? Leave it below!

 

Topics: Design of Experiments (DOE), Process Validation

Lennart Eriksson

Written by Lennart Eriksson

Sr Lecturer and Principal Data Scientist at Sartorius Stedim Data Analytics