Umetrics Suite Blog

Data analytics can assist in estimating the toxic effect of everyday chemicals

November 16, 2017

An important environmental issue that has come into focus is the increasing number of chemicals that we are exposed to in our everyday life. Chemicals are found in products ranging from cars and furniture to clothing and skincare, and are also by-products from combustion. The CAS REGISTRYSM, an international standard for chemical information, currently contains more than 134 million unique organic and inorganic chemical substances and more than 67 million sequences.

data-analytics can predict toxicity in everyday products like furniture

How can multivariate data analytics be used to create more accurate measurements of toxicity from exposure to chemicals in everyday products ranging from furniture to clothing?

It is of course important to understand the potential adverse effects of many of these chemicals. Unfortunately, a direct testing would lock test resources for decades and it would also conflict with ethical concerns as regards animal welfare. However, a way forward to estimate the potential hazards of chemicals could be the use of mathematical models, such as quantitative structure-activity relationships, QSAR.

QSAR models of the chemical structure of molecules can be used to estimate toxicity

A QSAR is a mathematical model that relates the environmental or biological effect of a series of compounds to the variation in their chemical structure. A new PhD thesis by Malin Larsson at Umeå University, Sweden, demonstrates that there is a great potential in the use of QSARs to estimate the toxicity of chemicals.

All chemical, biological, and environmental measurements are to some extent inexact. Hence, scientific models based on measured data, including QSARs, are necessarily multivariate and statistical in nature. In her thesis, Malin Larsson used the data analytical powers of the SIMCA software to build mathematical models – QSARs – of the molecular structures of dioxin-like compounds, DLCs. DLCs are compounds of high environmental concern as they are persistent and have a high tendency to bioaccumulate in the food chain, and eventually pose a threat to humans when eating for instance fish. Most DLCs stem from the chemical families of ploychlorinated dibenzo-p-dioxins/furans (PCDDs/PCDFs) and polychlorinated biphenyls (PCBs).

Malin Larsson used biological in vitro toxicity data of DLCs from human and rodent cell lines to build the QSARs, in order to understand what kind of molecular properties of the DLCs that influence and regulate adverse effects. The QSARs were calculated using two analytical methods, PCA and OPLS, in the SIMCA software. A great advantage of SIMCA is that it gives a condensed overview of molecular properties and results are conveniently visualized using, for instance, a score plot of the type seen on the image below.

Smiles: PCA score plot summarizing a set of chemical and toxicological data for series of 34 PCBs.

Figure Legend: PCA score plot summarizing a set of chemical and toxicological data for series of 34 PCBs.

Note: This subset of molecules is NOT related to the studies reported by Malin Larsson. SIMCA has the capability of interpreting the SMILES (Simplified Molecular Input Line Entry Specification) notation on chemical structures. Provided that the SMILES code for each molecule is available as a secondary observation ID, the investigator can co-chart the Item information tool with the score plot and get an instantaneous understanding of which molecular structures are involved in the current SIMCA project.

Improved ability to measure toxic exposure for humans

An important finding of the thesis is that there are major differences between humans and animals regarding the sensitivity to DLCs. Organizations such as WHO and EU use a tool called the toxic equivalency factor, TEF, to assess the risk exposure of toxic compounds. TEF describes the toxicity of a certain toxin (dioxins, furans, and PCBs) compared to the most dangerous dioxin 2,3,7,8-TCDD. To calculate the total contamination in for example fish, the concentration of each DLC is multiplied by its TEF value. The resulting products are then summed up to the total toxic equivalent, TEQ. Since TEQ is expressed in a single number, TEQ values are used by legislative authorities to assess the risks of exposure to DLCs in food such as fish, meat, milk, eggs, and baby food. It is also used to understand and compare exposure levels of DLCs in potentially contaminated geographical sites or to monitor levels of DLCs in humans.

Based on her research and multivariate data analytics approach, Malin Larsson introduces a furthering of the TEF approach by introducing the term consensus toxicity factor, CTF. The current TEFs are primarily based on data from rat and mouse experiments. CTF, on the other hand, is based also on human cell responses, and shows clear differences compared to the current TEFs. The CTF values of some DLCs are for example higher than the corresponding TEF, whereas the CTF values of other DLCs are lower. This means that the risk estimation for humans from, for example, eating certain foods may be different from current estimations.

In one of the investigations presented in the thesis, the CTFs were applied to salmon samples from different parts of the Baltic Sea to estimate the risk associated with eating salmon. The calculations showed that the overall risk of eating salmon would be in the same order as before. However, the contribution of dioxins and polychlorinated furans to the total risk would be approximately 98% using CTF, whereas according to the current TEF, dioxin-like PCBs contribute up to 50% of the total estimated risk. Thus, although the concentration of PCBs would decrease in salmon, now that PCBs are prohibited, the overall risk of eating salmon would still be high.

Data analytics software can improve our understanding of toxicity and risk exposure

The final recommendation in the thesis is that organizations such as WHO and EU should use the novel CTFs in place of TEFs in the future when conducting risk assessment of DLCs. The thesis also demonstrates that data analytics software such as SIMCA can be used to significantly improve our understanding of toxicity and risk exposure, not only in food but in our environment at large. 

Download thesis by Malin Larsson

Want to know more?

Find out more about the SIMCA software and data visualization in this presentation and video.

Download Presentation




Topics: Data Analytics

Lennart Eriksson

Written by Lennart Eriksson

Sr Lecturer and Principal Data Scientist at Sartorius Stedim Data Analytics