Umetrics Suite Blog

Scientists Use Data Analytics to Uncover the Differences Between Coronavirus Strains

March 19, 2020

An important step on the road to creating treatments for  illnesses like  COVID-19, which has caused the recent global pandemic, may start with understanding the similarities and differences between the various strains of coronavirus known to exist today. Making sense of large and complex sets of data, especially those that require novel interpretation, calls for a powerful analytics toolset to speed up the process. 

coronavirus-study-mass-spectometry- mvdc

An embedded multivariate data analytics tool can help researchers more easily visualize and evaluate the relevant similarities and differences between large groups of omics data in order to gain a better understanding of virus replication mechanisms.


Until late 2019, there were six known coronaviruses in existence that affect humans, including the strains that cause SARS (severe acute respiratory syndrome) and MERS  (Middle East Respiratory Syndrome). With acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the strain that causes COVID-19, a seventh coronavirus emerged.

For scientists looking to understand what makes one coronavirus different from another and to map the complexities of transmission and possible effects on humans, gaining accurate statistical data from studies of existing coronaviruses can provide valuable information.

For example, a multivariate data analysis of ultra-high performance liquid chromatography-mass spectrometry (UPLC–MS) can help researchers more easily visualize differences in host cell lipid response during coronavirus infection. This is one way researchers can study the similarities and differences in the human body’s response to different to strains of coronavirus.

Having advanced data analytics tools embedded right into lab analysis instruments such as mass spectrometry makes the process that much more efficient and faster. One example of this is the collaboration between Sartorius and the Waters Corp. (UHPLC–MS, MassLynx software) using EZinfo or SIMCA software. EZinfo provides a “light” version of the robust SIMCA software that is easy to use within the instrument.

A recent article published in Viruses 2019 explains how the authors used OPLS-DA (Orthogonal Projections to Latent Structures Discriminant Analysis) for Characterization of the Lipidomic Profile of Human Coronavirus-Infected Cells and the Implications for Lipid Metabolism Remodeling upon Coronavirus Replication (HCoV-229E; MERS-CoV, SARS-CoV) using mass spectrometry with embedded data analytics tools.

Using Lipids Studies to Understand Virus Pathogenicity

In the human body, lipids play an important role in cellular function and are linked to multiple steps in the replication cycle of viruses. When coronaviruses infect humans, a wide variety of clinical outcomes can occur, ranging from flu-like symptoms to respiratory distress or severe pneumonia – even leading to systemic organ failure.

“As in other viruses, lipids play key roles in the life cycle of coronaviruses. Coronaviruses confiscate intracellular membranes of the host cells to generate new compartments known as double membrane vesicles (DMVs) for the amplification of the viral genome. DMVs are membranous structures that not only harbor viral proteins but also contain a specific array of hijacked host factors, which collectively orchestrate a unique lipid micro-environment optimal for coronavirus replication.” ²

Yet, scientific understanding for how cellular lipids may affect the transmission and impact of human-pathogenic coronaviruses is limited. To learn more, scientists completed a study of host cell lipid response during coronavirus infection using the human coronavirus 229E (HCoV-229E) as a model coronavirus.

In this study, a MS-based lipidomics approach was established to characterize the host cell lipid changes upon coronavirus infection. Univariate and multivariate statistical analyses were applied in data processing for the selection of significant lipid features. A total of 24 lipids including lysophospholipids and FAs were identified and were consistently up-regulated in HCoV-229E-infected cells. Seven representative lipids were confirmed by authentic standards, including lysoPC (16:0/0:0), PAF C-16, lysoPE.

A total of 24 lipids including lysoPCs, lysoPEs and unsaturated/saturated FAs were identified to be significantly upregulated after HCoV-229E infection. In addition the study showed that a number of lysophospholipids and FAs downstream of cPLA2 activation, were upregulated upon HCoV-229E infection.

Data were produced using an ultra-high performance liquid chromatography-electrospray ionization-quadrupole-time of flight-mass spectrometry (UPLC-ESI-Q-TOF-MS) analysis and analyzed using SIMCA software connected directly in the instruments. 

The study demonstrated that host lipid metabolic remodeling was significantly associated with human-pathogenic coronavirus propagation. The data further suggested that lipid metabolism regulation would be a potential target for drug development to help control coronavirus infections. These results were produced using a form of data analytics known as OPLS-DA (performed by SIMCA software) that is designed to uncover and visualize the similarities and differences between large sets of data.

What is OPLS-DA?

Often, evaluating scientific or medical data involves defining the differences between groups of data or interpreting differences between the groups. Being able to gather meaningful insights from data, including data from lipidomics, genomics, proteomics, metabolomics, or other omics studies, requires using the right multivariate data analysis (MVDA) techniques. One central tool in this aspect is orthogonal partial least squares discriminant analysis, or OPLS-DA for short.

With OPLS-DA you are asking this question:  What is the difference?  Here you are targeting the variables. Which variables are driving the separation between the two groups? And if you have a two-group problem, the resulting OPLS-discriminant model will be very easy to interpret because you will have only one predictive component to interpret. This component is rendered as the x-axis in the score scatter plot arising from the OPLS-DA model.


To learn more about uncovering differences using OPLS-DA, read this article.

Using Data Analytics to Understand Viral Modulation Differences

Better understanding of viral replication mechanisms, such as learning which lipids are up- or down-regulated during infection, will help scientists design more effective treatments or vaccines. In studies like this, OPLS-DA is an essential tool for researchers to be able to correctly classify and distinguish causal factors and differences.

OPLS-DA provides a look at the metabolomic profile of a virus and the virus structure, and a way to compare them with other known viruses or substances. This method, especially when embedded right into instruments, makes it easier to see what the differences are and what the virus strains have in common.

SIMCA software (in particular when used with an OMICS skin that offers targeted graphics and analysis) can simplify MVDA analysis of omics data and identification of biomarkers and viral mechanisms.

SIMCA also offers spectroscopy skin - a customized interface dedicated for handling spectroscopy data. 

A specialized version of SIMCA (SIMCA-Q) can be embedded in any software for controlling or running of advanced instruments giving the end user easy access to targeted and specialized tools for enhancing and simplifying the data analytics step.

Read more about SIMCA-Q and OEM data analytics solutions for instrument makers:  Umetrics OEM solutions>

Want to Know More?

Watch this recorded webinar to learn more about the OMICS skin in SIMCA


Get a Hands-On Exercise to See the Differences

Want to see how it works first-hand? Download our free exercises that show you a step-by-step process for how to use OPLS-DA and PCA when reviewing omics and mass spectroscopy data. 

Download Omics Course



1. Chan, J.F.; Lau, S.K.; Woo, P.C. The emerging novel Middle East respiratory syndrome coronavirus: The “knowns” and “unknowns”. J. Formos Med. Assoc. 2013, 112, 372–381.

2. Bingpeng Yan , et al.“Characterization of the Lipidomic Profile of Human Coronavirus-Infected Cells: Implications for Lipid Metabolism Remodeling upon Coronavirus Replication,” Viruses 2019, 11(1), 73;

3. Zhang, J.; Pekosz, A.; Lamb, R.A. Influenza virus assembly and lipid raft microdomains: A role for the cytoplasmic tails of the spike glycoproteins. J. Virol.200074, 4634–4644


Topics: SIMCA, Omics Data Analytics, Spectroscopy, Medicine/Health

Johan Hultman

Written by Johan Hultman

Johan Hultman is the Business Development Manager for OEM & Channel Partners at Sartorius Stedim Data Analytics with almost 20 years of experience from helping and guiding customers and partners to enable, utilize and embed Advanced Data Analytics. He has worked as a Sales Manager, Key Account Manager, Project Leader and Solutions Enabler for projects within phama, biotech, food & beverage, chemical, semicon and many other industries and have sold more software and signed more agreements than any other person within the Umetrics organization.

Search the Blog

    Subscribe to the Blog

    View the:

    Data Analytics Glossary of Terms

    List of Webinars

    Get a free trial