Umetrics Suite Blog

OPLS vs. PLS modeling to improve bioprocess yields of batch processes

March 6, 2019

In industries that depend on bioprocessing, achieving the highest possible yields in the shortest time frame, while keeping costs down and product quality high is often challenging. Meeting these goals requires having a well-designed, well-defined and well-controlled process. And at the core of any effective process control is a set of effective process modeling tools.

OPLS modeling bioprocess-yields-batch-processes

OPLS, a more recent and less well-known modeling method for multivariate regression, has distinct benefits over traditional PLS modeling for some applications, such as batch bioprocesses.

One of the more recent methods for modeling multivariate data is orthogonal partial least squares (OPLS), which is a modification of the traditional modeling technique known as partial least squares (PLS) that is often used for statistical regression modeling. The PLS method of multivariate analysis dates back to the 1960s and was updated in the early 1980s. OPLS is a more recent, modified algorithm, introduced in early 2000s, which sometimes offers enhanced model interpretation.

An extension of PCA

Simply put, PLS is an extension of principal components analysis (PCA), a data analysis method that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed. In statistical terms, PCA maximizes the variance explained of the X data table.

So then it’s important to keep in mind that:

  • principal component analysis (PCA) is best used for data summary/overview, whereas
  • partial least squares (PLS) and orthogonal PLS (OPLS) are for regression analysis

When you need to analyze and model a very wide set of data (meaning a data set with many more variables than observations), you’ll likely be looking at using a method such as PLS or OPLS. As a variant of PLS, OPLS separates the variability in the X data table into two parts: one that is predictive to Y and another that is uncorrelated to Y.

Using OPLS vs. PLS

In the situation of just a single Y-variable, a PLS model and an OPLS model fitted to the same data will have identical predictive power (as long as you are comparing models with the same total number of components). The great asset of OPLS over PLS lies in the much simplified model interpretability arising with OPLS, because of the ability to separate explained variance into predictive and orthogonal model compartments.

The underlying data for a PLS or OPLS model can be measurements describing properties of production samples, chemical compounds or reactions, process time points of a continuous process, batches from a batch process, biological individuals or trials of a DOE-protocol, for example.

OPLS in regression analysis

"Regression analysis is deployed both for batch evolution and batch level modeling," says Anna Persson, Senior Principal Data Scientist, Sartorius Stedim Data Analytics. Multivariate methods to aid in analyzing process data, such as PLS and OPLS, are part of the Sartorius's SIMCA solution.

"Whereas both methods divide the variability in a dataset into systematic (structured) and residual (noise) – OPLS further splits the systematic variability into two components: predictive (everything correlated with the response, Y); and orthogonal (everything not correlated with the response)."

OPLS offers enhanced visualization when there is a large amount of Y-orthogonal structure in X.

So remember:

  • PLS and OPLS provide same fit and predictive ability for a single Y model
  • OPLS may help in clarifying and understanding correlated and non-correlated variation 

Applying OPLS to PAT

Kaiser Optical Systems, Inc., a leading manufacturer of Raman spectroscopy equipment, embeds SIMCA®-Q software – part of the Umetrics® Suite of Data Analytics Solutions – in its Kaiser RamanRxnSystems™ Analyzers. Sartorius used Kaiser Raman spectra data collected from in-line monitoring of batch cell culture processes to compare the results of regression analysis using OPLS versus PLS modeling.

The findings showed a clear advantage to using OPLS for batch processes that involve in-line process monitoring of cell cultures.

Find out more

Read more about the Kaiser research study here.

Get Case Story

 

Topics: OPLS, Batch processes, Statistical Process Control

Marie Wensley

Written by Marie Wensley

Marketing Manager at Sartorius Stedim Data Analytics

Leave a comment