Umetrics Suite Blog

Understanding the basics of batch process data analytics

October 16, 2017

Analyzing batch process data is a lot like juggling. You have multiple sets of data from different sources and in order to turn them into a meaningful presentation, you need a method of handling them to make sure they are all in the right place at the right time.


Deciding which type of data analytics method to use is the first step. For analyzing a process such as one used in manufacturing a biopharma product or even for making beer, both the type of data you have and your objectives will affect your choice of data analytics methods. There are three relevant methods to consider in three different situations.

1. Monitoring a process (diagnostics)

When you have just one set of input data, the data analytics method used is principal component analysis (PCA). This is the simplest situation, as you are looking at a single category of data, for example, process inputs. PCA applies to the analysis of continuous process data and batch process data, and it is the latter situation we target in this article.

Typically, the objective of this type of process analytics is to create a model to help distinguish between good and bad process conditions. Or more specifically, to enable monitoring a process over time and recognize when a deviation from the normal situation occurs. You can use multivariate process modeling to extract and interpret what are called “assignable causes” for deviations. The goal here is to interpret the reason for any deviations in order to figure out how to avoid such issues in the future.

2. Modeling a process output (monitor quality)

When you have two sets of data available—both process inputs and process outputs—you can move up to a slightly more ambitious level of process modeling. For example, you may be able to determine if any data residing in the block of process inputs, such as a temperature, a pressure, or a pH parameter, can inform or predict process outputs, such as yield or quality. When you have two blocks of data, you are working with a regression problem and use extensions of the principal component analysis method. These include:

  • Partial Least Squares (PLS) – this method was the standard up until about 10 years ago.
  • Orthogonal PLS (OPLS) –more recently, within the last 4-5 years, OPLS is the preferred regression tool because of the clarity and transparency regarding model interpretation.

PLS and OPLS apply to the analysis of both continuous and batch processes and it is the latter situation we discuss in this article.

3. Batch process modeling (modeling completed batches)

Modeling and working with batch processes implies a more complicated data architecture. This involves a
two-way data table that represents the initial configuration for each batch as well as the output data for each batch in terms of quality, yield, amount, or another value after the batch completion. Additionally, you are faced with a three-way data table containing the process evolution measurements. Being able to synchronize, analyze, interpret and visualize the two-way and three-way data structures is a key step in batch data analytics.

Batch Process Modeling uses OPLS/PLS plus PCA

Analysis of batch data requires a combination of PCA and PLS or OPLS

The key characteristic of batch data is that it has a clearly defined beginning and end. As the process evolution measurements are added in, you have a three-way array of data consisting of the dimensions of batches, times and variables. To work with batch data, a combination of principal component analysis and regression extensions (PLS and OPLS) is used in order to get a good account of all the data that are available or can be available during a batch process production.

Introducing Statistical Process Control (SPC)

Regardless of whether you are working with a continuous process or a batch process, the basic tool underpinning the analysis is called statistical process control (SPC). The root ideas of the statistical process control philosophy go back to the 1930s when Walter Shewhart started to work with it.  He had the view that a process is always in a state of “statistical control” unless an event or disturbance is occurring. He devised a test, called the Shewhart control chart, as a tool to discover when a process is deviating from what is expected. It is a way to visualize and monitor a parameter over time or throughout the progression of production.


The Shewhart control chart has a target value and upper and lower warning limits.

As you monitor a parameter using a Shewhart control chart, you have a target value and associated upper and lower warning and action limits. Using the chart, you can look for deviations from a normal process behavior. When a deviation from normality is detected, complementary diagnostic tools can be used to look for “assignable causes” for an event, which means interpreting the occurring deviation and trying to figure out what a corrective action would be. The goal is to develop a more robust process performance and over the long term to realize a process improvement.

The same philosophy is the basis for Batch Multivariate Statistical Process Control (BSPC).

Two types of models: Batch evolution model and batch level model

For batch process analytics, two perspectives need to be merged: data over time and data that shows quality or yield parameters of finished batches. Two types of BSPC models are needed to fully account for batch process data. These two models, called Batch Evolution Model (BEM) and Batch Level Model (BLM), are designed to monitor the evolution of the batch processes over time (BEM, left below)) and to account for all data of completed batches (BLM, right below).

Via the BEM, a process “path” for normal evolution is constructed and visualized in a control chart similar to the control chart shown above. The difference with the BSPC chart is that it may display many lines, not just one line, because there may be many finished batches. This ideal path is represented by a green line that shows the desirable process trajectory for future batches. A reliable monitoring model should be able to detect when the process does not evolve in the normal way. Any deviations are considered as abnormal process events to be analyzed and acted upon.


(Batch process analytics requires two perspectives to be merged: data over time (left) and data for finished batches that also account for quality or yield parameters)


The main idea of the batch data analytics methodology is to use all available data for batch process modeling and monitoring. The BEM means (i) working with individual observations (time points), (ii) monitoring the evolution of new batches, and (iii) classifying current batch-phase. The BLM involves (i) working with whole batches, and (ii) predicting the outcome of evolving batches (good or bad). Both PCA and PLS may be used at both levels.

For process monitoring, the models that are developed arise from a set of accepted batches representing good batch behavior and quality. These models provide a powerful means for overviewing and monitoring batches as well as making on-line predictions. Although many variables are dealt with in batch monitoring, the simplicity of presentation and interpretation of conventional SPC charts is retained in the new BSPC strategy. Deviating batch behavior can be diagnosed when charting multivariate parameters, like scores and DModX, and an interpretation of discovered process upsets is possible through, for example, contribution plots. Dominating variables are found in a plot of batch variable importance.

Batch models that are developed offline can be executed online using SIMCA-online. As long as the new batch is inside the control limits of the control charts, the process operator knows that the batch is developing as it should. However, if there is a crossing of a control limit, the process operator receives an alarm drawing the attention to the problematic batch. By means of contribution plots and loading plots it is then possible for the process operator to drill down to the problematic variable or variables causing the process deviation. The usefulness of SIMCA-online to batch process monitoring will be discussed in an upcoming blog, where also contribution plotting and drill-down capabilities to raw data will be exemplified.  

Want to know more?

To learn more and see examples of how Batch Process Analysis can be applied using SIMCA, download the presentation and watch the recorded webinar now.

Download Presentation


Topics: Multivariate Data Analysis, Data Analytics, SIMCA, Manufacturing Processes

Lennart Eriksson

Written by Lennart Eriksson

Sr Lecturer and Principal Data Scientist at Sartorius Stedim Data Analytics