Umetrics Suite Blog

Racing to the breaking point with multivariate data analysis

October 23, 2018

On the west coast of southern Sweden, facing the expanse of the ocean, is the beautiful city of Gothenburg. Surrounded by a string of islands, this city has been the home for sailors and merchants, seafaring and shipping, since ancient times. One of the islands to the north of Gothenburg is the picturesque island of Tjörn. Once every year, Tjörn is the location for one of the most famous sailing races in Sweden – “Tjörn Runt” or “Around Tjörn”.



One of the contestants in recent years is Seldén Mast, a company specializing in sails, masts, and other rig equipment. The company is passionate about sailing and knows how important the precision of the equipment is, whether you just want to enjoy a leisurely sailing trip or want to push your boat to the limit in a race. Seldén Mast has supplied rig systems for sailing boats that have won a large number of medals in the Olympics, World Championships, European Championships, as well as national championships.

Tjörn Runt gives Seldén Mast an excellent opportunity to test their equipment in a race. At a competition, you want to push the boat to the max without the equipment being damaged. A part of the rig system that is exposed to a lot of material stress is the jib halyard – the rope that holds up the jib, the front sail of the boat. If the jib halyard breaks, the jib sail will fall down – and you will most likely fall out of the game.

Using multivariate data analysis to find answers

Seldén Mast has used data logging for many years and in a recent race, the company logged as many as 20 different parameters – such as wind speed, wind direction, velocity, drift, and heel angle. The total sailing time was approximately 5 hours and 20 minutes, which sums up to more than 19,000 time points when measurements were taken. The aim of measuring the data was to investigate the material stress during the race.

Seldén Mast has generously shared the data with Sartorius Stedim Data Analytics to find out if multivariate data analysis can be used to answer two main questions:

1. Is it possible to identify different sailing environments in the race?

2. How does the material stress or load on the jib halyard change during the race and which parameters correlate with material stress?

Tjörn Runt sailing race course map

The race starts at the northeastern point of Tjörn and heads clockwise around the island to the finish line north of Tjörn. The first boats start at 09:00 AM and more boats join the race every five minutes. (Image courtesy of Tjörn Runt website).

Identifying different sailing environments

As the race proceeds, the environment around the boat changes. The image to the left in the graph below shows a scatterplot consisting of 19,000 dots, where each dot represents measurements taken at a given time point. Dots that are close together indicate time points with similar sailing environments. Dots that are far apart correspond to very different sailing environments. The scatterplot highlights that there were three main types of sailing environments during the race, which is reflected by a higher density of dots – one elongated cluster to the left, another elongated cluster to the right that leans to the left, and a smaller cluster at the bottom of the scatterplot. The image below to the right shows the variables that define these tree main types of sailing environments. For example, one influential parameter is the average wind speed (AWS). To corroborate this finding, the 19,000 dots are colored in accordance with AWS, where blue dots represent low wind speed, green dots medium wind speed, and orange and red dots strong wind speed.


(left) Scatter plot summarizing the sailing conditions across the entire race. The three clusters indicate three predominant types of sailing conditions/environments. (right) Scatter plot visualizing how the logged sailing parameters combine in reflecting the three sailing environments. For example, the oblong cluster to the left is characterized by high numerical values in average wind speed (AWS), distance made good (DMG), and true wind speed (TWS).


The second graph below shows the same data as the first graph, but this time with a time line along the x-axis in the left-hand trend chart. The resulting image to the left could be compared to an electrocardiogram, EKG, and can be interpreted in the same way. The higher the fluctuations, the more rapidly the sailing environment is changing. As can be seen, there are more fluctuations in the sailing environment during approximately the first two hours, followed by over an hour of more stable conditions, followed by even more ruffled conditions during the last two hours of the race. The stable period in the middle of the race most likely matches the sailing distance in the open sea from the southern tip of the island until the route turns into the archipelago again.

The green bars to the right in the column plot below show the variation of different parameters during the race. The higher the bar, the more influential is the parameter. Parameters with a positive bar started out on the high side and then gradually decreased during the race. Parameters with a negative bar started out on the low side and then gradually increased during the race.



The line plot to the left shows how the first summary index of the 20 logged parameters change across time. The time scale is elapsed time since the boat started the race. The duration of the race for the boat in question is approximately 5h20mins. As can be seen, sailing environments change the most in the beginning and at the end of the race. The plot to the right shows the contribution of the logged parameters to the observed changes in sailing conditions.

Material stress and parameters that correlate with material stress

Identifying changes in sailing conditions and environments during the race is one of the benefits that comes out of the multivariate data analysis. But a perhaps more intriguing question is to examine which parameters correlate with material stress, and if material stress can be reliably predicted in advance. It turns out that a surprisingly strong model can be obtained, which models and predicts the stress on the jib halyard to more than 80% of the variance in that particular response variable.

The next graph below arises from this alternative data analytics model, where information in the 20 logged parameters is sought to predict the load on the jib halyard. The trend chart shows how the main summary index relating to material stress changes during the race. The higher the numerical value of the summary index, the higher the load on the jib halyard. As shown, the load is especially high between the third and fourth hour of the race. The line plot is colored according to average wind speed (AWS).



Trend chart of the summary index of the model predicting the load on the jib halyard. The most extensive stress occurs between three and four hours into the race.

The last bar chart displays which parameters correlate most strongly with a high load on the jib halyard. This plot is size sorted, meaning that the best predictors of the material stress are found at the sides of the plot. Not surprisingly, average wind speed (AWS) is the parameter that is most strongly correlated to material stress. Other parameters with a strong correlation are wind made good (WMG), distance made good (DMG), true wind speed (TWS), and heading compass course (HDC). All these parameters have a positive correlation to the load on the jib halyard, meaning that the higher their numerical value, the stronger the material stress.



Bar chart showing the influence of each parameter on the modeling and prediction of the load on the jib halyard.

Further applications of multivariate data analysis

During the race around the island of Tjörn, the boat is exposed to different types of stress and challenges. As shown in the graphs, multivariate data analysis can be used to get a better insight into when and how material stress occurs. The data can be used to build a model for real-time analysis – for the same boat in the same race the next year. With real-time analysis, the crew could get a faster classification of the sailing conditions at each moment and the risks of material stress. This information could tell them, at each and every moment, if they can push the boat to the max – or if they are racing to the breaking point and should reduce the load on the jib halyard.

Want to know more?

Download the example exercise with analytics review.

Download Sailing Exercise


List of abbreviations

  • AWA, apparent wind angle
  • AWS, apparent wind speed
  • BSP, boat speed
  • CMG, course made good
  • COG, course over ground
  • DEP, depth
  • DFT, drift (speed of water against boat)DMG, distance made good
  • GWD, ground wind direction
  • HDC, heading compass course
  • LOG, distance sailed
  • HEL, heel angle
  • SET, compass direction of moving water
  • SOG, speed over ground
  • TEMP, air temperature
  • TWA, true wind angle
  • TWD, true wind direction
  • TWS, true wind speed
  • VMG, velocity made good



Topics: Multivariate Data Analysis, Data Interpretation & Analysis

Lennart Eriksson

Written by Lennart Eriksson

Sr Lecturer and Principal Data Scientist at Sartorius Stedim Data Analytics