Swiss Datastory of Car Accidents

Study of the most influent factors on car accidents in Switzerland




Introduction


In 2017, there were 17,799 accidents on Swiss roads causing body injuries. Even though the number of victims is decreasing since 25 years, Switzerland still faces around 20,000 victims each year, that’s to say 0,25% of the population. In this data story, we are interested in understanding the most important factors on the number of car accidents. First, a quick investigation is proposed showing a few classifications of drivers dangerousness, victims vulnerability, periods with more accidents as well as substances consumed before accidents. This contextualisation at national scale is then followed by a machine-learning-based approach to determine which factors have the higher influence on car accidents in each canton. Finally, some visual plots of correlation are shown between these factors and accidents in Switzerland. Have a nice trip in our data story!


Investigation

Investigation of Car Accidents Factors in Switzerland since 1992


Who are the most dangerous drivers in Switzerland?


This question is important for policy makers as well as insurance companies since it underlines which category of the population has the higher responsibility in the issue of car accidents. The easiest categories to study are gender and age, and it first seems intuitive that young drivers with less experience are generating more accidents than experienced ones. However, with experience might arise some unsecure driving habits such as stopping using the indicator before turning, that’s why a study according to driver’s habits is also proposed. Finally, some data about illegal drivers are provided.

Dangerous categories according to gender and age

When splitting the drivers into multiple categories according to their age and gender profiles, it becomes possible to compare the dangerousness of those profiles. In the following graph, the number of accidents are normalized by the Swiss population corresponding to each category. Only the top 5 more dangerous drivers categories are illustrated below.



The most striking observation is that the three most dangerous driving categories are only concerning men and belonging to the three younger driving classes. The only women category of the top 5 corresponds to the age of 20 years old, which is also the more dangerous category for men. We can suppose that at this age, many drivers are still beginners, but they might feel more confident than those of the category 18/19 years old. Finally, the fifth category still corresponds to men, much more experienced in theory but with results in practice very close to the ones of women of 20 years old. When exploring a bit deeper the data, men aged of more than 70 years old come to the 7th position in 2017. However, their accident rate still remains at half of this corresponding to 20 years old men.

If we now compare the records of 2017 with the ones from 1992, we see a drastical decrease, which was to be expected with the numerous improvements observed around the security of travelers, including the improvements of the cars themselves, but also the stricter rules of today. The decrease is more marked for young people, the number of accidents dropped of 2/3 for men aged 20!

Fun fact!

We can interestingly see that more than 1 in 20’000 men is involved in an accident before 18 years old ! Back 25 years ago, some boys even younger than 10 years old were involved as drivers in car incidents.

Nicely, we can see that in 2017, women did not have any accident before the legal age to drive a car in Switzerland, i.e. 18 years old, which wasn’t the case 25 years earlier. However, men younger than legal age are still responsible of car accidents in 2017, which will be detailed below.

Dangerous categories according to driving experience

We discussed above the number of accidents according to the age of drivers, however a 25 years-old driver might have only his/her driving licence since a few days. Therefore, the following plot shows the number of accidents according to driver’s experience, and lets us see if any counter-intuitive value comes out.



The results are as expected: one can easily see that the more experienced the people are, the less accidents they provoque. As before, and especially for new drivers, we observe a kind of pick in the early 2000’s, but the tendency has been clearly decreasing in the past years.

Dangerous categories among illegal drivers

Some of the drivers responsible of accidents were not possessing any driving license, and were therefore illegal. To which category did the most dangerous illegal drivers belong? Were these drivers representing a significant part in their category? Let’s check that on next graph.



Among illegal drivers and as in previous cases, it is clear that young men below 29 years old are the most dangerous. However a new category of drivers has entered the game: the 15-17 years old. Clearly because of their very low experience, these drivers were on the first step of the podium in 2017. In addition to being more secure, women drivers are also more legal, since only a tiny part of women drivers involved in a car accident was not possessing a licence. They only appear on the 5th step of the top 5.

Now let’s compare men 18-19 year old men involved in an accident legally and illegally. We can see that out of 10’000 people from their category, 34.5 have legally provoked an accident in 2017, whereas only 0.32 did it without any licence. As a result, round 1% of young men involved in an accident in 2017 didn’t were not allowed to drive.

Who are the most vulnerable victims of car accidents?


Between drivers, passengers and pedestrians, which category has been the most exposed to car accidents? Let’s check it out in the table below.


Table of victims for the 3 categories


We can see the expected drop in the number of victims in 25 years. In the table above, we can clearly distinguish a difference between the pedestrian victims and the car passengers. The pedestrian are indeed not protected by the car, the security of which has been greatly improved. That’s why we can note a lower drop in pedestrian victims compared to car passengers.

Similarly to pedestrians, the number of car drivers has also decreased of only round 20%, compared to 50% for passengers. Therefore, decreasing the security of drivers should become a higher priority, even though it seems to be more challenging for car manufacturers.

Analysis of pedestrian victims

We measured a 20% decrease in pedestrian victims since 1992. But does it mean a similar improvement for all age categories of walkers? And who among pedestrian victims are most vulnerable to car accidents?

The first table measures the exposure of various age categories to car accidents regardless the gravity.


Table of pedestrian victims in the top 5 more exposed


Interestingly, and quite sadly, we can see that the young people are the prior pedestrian victims. It even increased since 1992 for people between 15 and 20 years old. However, it clearly dropped for younger children, which is consolating.

Let’s measure in a second table the number of pedestrian victims who strictly died from a car accident.


Table of the top 5 dead pedestrian victims


Interestingly, the categories are now completely different! Older people are more likely to die from an accident: they are indeed involed in less accidents while resulting in more deaths. We also sadly find very young children in the top 5 of the deadliest categories.

Following this tragic part of the analysis oriented on victims, it would be pleasant to find recommendations for drivers to increase their awareness about the sensible periods and circumstances under which they should be more careful.

During which period should drivers be more careful?


The trap of accidents per month

Under which road circumstances are most of the accidents taking place? The answer to that question is illustrated in next bar plot.


The first accident circumstance is therefore due to skidding, which not only depends on a driving action but also on the weather. Therefore, it seems intuitive that the more dangerous months should be in winter, when the road is more likely to be covered with ice or snow. This intuition is verified in next plot.


Que pasa?! It looks like the number of accidents is higher in summer than winter, which is counter-intuitive! Moreover, the 6 months with most accidents due to skids are also from may to october, definitely the contrary of what was initially supposed. During summer months, drivers might think that the risk of skid is absent, and thus lack of caution. In the contrary, winter months are known to be more slippery, and drivers might increase their vigilence, consequently reducing the number of skids. A friend of mine lived the experience of skidding last month of June and he had to repay two trees that he destroyed to the municipality, so he won’t deny that!

To conclude, if you wish to contribute decreasing the number of car accident victims, please be careful about the stability of your vehicle when you drive under the rain, even in summer :)

The rule of weekdays accidents

Another trend of accident periods should be studied at the timescale of weekdays this time. Let’s give a look at the number of car accidents per hour from thursday to sunday in the next plot.


Apart from Saturdays and Sundays, all the week days are following a similar same distribution, that’s why only Thursdays and Fridays have been shown. We identify two peaks: one in the morning and one between 17h and 19h: this corresponds to the starting and ending working hours and is as expected. Throughout the Friday afternoon, more accidents are occuring compared to Thursday. This might be due to the excitement of workers finishing earlier their hard week, but unfortunately lacking of caution and then BOOM getting struck by a bus! Life is unfair sometimes…

Concerning the weekends, when people don’t work, the patterns are clearly different, the two peaks corresponding to the work hours disappeared and we observe less accidents in general. However, we see more of them in the early hours of the day, probably explained by the end of Fridays and Saturdays night. When comparig Thursday and Friday morning between 0-2h, the Jeudredi is also identified with higher rates of accidents. Overall, Sunday appears to be the chillest day in terms of accidents.

It is commonly agreed that drivers are having more accidents on Friday and Saturday night because of their consumption of substances that have an impact on human’s behavior and awareness faculty. Therefore, the last part of our brief investigation is dealing with substances consumed before car accidents.

Which substances should be the most monitored by the police?

Mentioning Jeudredi and other night events above, the question of substance contribution to accidents is key. Which substance is the most commonly involved in car accidents? This is illustrated in next plot.


Even though most of the accidents were not the consequence of any illegal substance, the key substance leading to accidents in Switzerland is clearly the alcohol. Indeed, in 2017, 411 accidents implying strict alcool consumption were counted, whereas only 71 for the second substance (drugs). It is moreover clear that the issue of alcool-related accidents has been well tackled by the Swiss government and citizens since it has decreased of 58% over last 25 years (not even considering the normalization to a growing population).

What about the influence of other substances on car accidents? To observe it, the contribution of alcool has been removed from the graph.


Interestingly, the accidents ensuing from alcohol have had a decreasing trend over the last 25 years, however this trend is not observed for all the other substances! The more striking case is concerning drugs, which led to only 40 accidents in 1992, and reached more than 70 in 2017. The involvement of medicaments has also increased in the number of accidents, from 10 to 41 in 2015, before starting decreasing last 2 years. Therefore, it looks like all the government’s efforts have been oriented towards alcool monitoring, which is understandable because of the wider impact of this substance on the number of car accidents compared to drugs and medicaments.

ML Study

Study of Car Accidents per Canton based on ML


Methodology

Now that a contextual investigation of car accident factors has been provided for Switzerland, the second part is oriented towards a Machine-Learning model of car accidents at canton scale according to various features. The methodology has been to first gather multiple datasets related to roads/accidents and available for each canton. These data were constituting our input features, and the outcome was the number of car accidents per canton. Finally, the set was trained for each canton over a period from 1996 to 2014, and tested on the value of 2015. This predicted value was then compared to the real one of that year, thus defining a score of accuracy.

Since the number of values was low, we decided to implement various ML strategies, to compare their accuracy over the 26 cantons and to select the approach with best predictions for 2015. One of the best ML approaches turned out to be the regression tree, which gave more precise results than a simple algorithm for more than 80% of the cantons. The simple algorithm was only based on dates as features, so the tree method shows that other features were useful for predicting car accidents. Our mean error to predict the number of accidents in 2015 was only of 7.38%.

In addition to its good performance, our selected method also provides visual decision trees. The interesting aspect of decision trees is that they show the most influent factors at level 0, and then each time the tree level increases, the influence of features decreases. An example of decision tree is provided below for Appenzell Ausserrhoden:

My Image
Decision tree of car accidents in Appenzell Aus.


We can see that for the feature at the top level, the ‘mse’ indicator is higher than for lower level features. Indeed, high ‘mse’ means a high contribution to lowering the variance of the model. Therefore, the higher the ‘mse’ and the more contribution our factor has in the outcome, that’s to say in the number of car accidents.

Which factors are the most influent?


Influent factors per canton

Since each canton has different influent factors at the top levels of its decision tree, it can be interesting to see which factors have more influence according to the canton. Therefore, a tool is provided to get the top level factors for each canton:

Despite its very “appealing” look, this tool can show us that some factors are significant only for few cantons, whereas some others are more constantly important such as ‘Road Expenses’ for example.

Influent factors overall cantons

To check which factors are the most frequently influent in Swiss cantons, a new table is shown below with the most recurrent key factors throughout our 26 cantons.


From previous bar plot, it seems obvious that ‘expenses in roads’ are the key factor in Switzerland. Indeed, they are the features that are the most often called at top level of decision trees in our model when considering the 26 cantons. The question remains to know wether the contribution of ‘road expenses’ has a positive or negative correlation with the number of car accidents. In other words, should decision makers increase or decrease the ‘road expenses’? Next section is aimed at answering this question by showing visual graphs of relation between car accidents and some key features of the model.

Correlation

Correlation between influent factors and accidents in cantons


Correlation with cantonal road expenses

In previous section, the ML tree model has determined “road expenses” as a highly influencing factor of car accidents. To plot the relation between both variables, the number of accidents has been normalized by 1000 people living in the canton, and the road expenses have been normalized by 100km of road distance in the canton. Because of this normalization by road distance, the canton of Basel Stadt was always an outlier (urban canton), therefore it has been removed from the maps.

Below, we can see the map of road distance per canton, as well as the plot of car accidents versus the road length for cantons whose correlation fits into a 95% confidence interval.

Map of expenses in roads
Plot of victims versus road expenses


A clear negative correlation between road expenses and car accidents appears. One could argue that it is because roads become safier with more spendings in the renovation, however we are only showing a correlation here, and not an explicit relation of causality. The number of accidents might have decreased in the time for other reasons than an increase in road expenses.


Correlation with cantonal police expenses

In 6 cantons, police surveillance is among the top 3 levels of the decision tree, such as TG and OW for example. Therefore, it is worth to check the relation between accidents and this variable. As before, a map is provided with the police expenses normalized per 100km of roads in each canton. Besides, a plot of the relation with car acidents is shown for cantons with 95% confidence of correlation.

Map of expenses in police
Plot of victims versus police expenses


One more time, expenses seem to be justified. The higher the expenses in police surveillance, the lower the number of accidents. However, this is only representative of cantons with a high confidence level, whereas 5 other cantons such as Neuchatel and Vaud are having a positive correlation (>0.6 for both!). As a conequence, no clear conclusion can be drawn for Switzerland, even despite it is a good indicator for prediction of accidents within each canton.


Correlation with vehicles cylinder size

According to the canton, the average cylinder size of vehicles might vary a lot, that’s why a formula has derived to compute the cylinder score of cantons. Cylinders are provided into 5 sub-categories, going from ‘below 1399 cm3’ to ‘above 2500 cm3’.

To have a general idea of the kind of cars used in each canton, we thus decided to build a score for it. We give more weight to cars with bigger cylinders and normalize it to get the final score between 0 and 1:
In words, this describe how “big cylinders” friendly a canton is.

Besides the map of cylinder score per canton below, we can then look at the correlation between this and the number of victims:

Map of cylinder score
Plot of victims versus cylinder scores


For Grisons and Schwytz (most confident cases), it seems that we have a soft negative correlation.

Again, the correlation is only significant for a few cantons, namely 11. What is worse, part of them exhibit positive correlations while others have negative ones. It it thus impossible to draw any conclusion from this data.

However, it is interesting to look at the shape of the scatter plots of both cantons: if you go from the highest to the lowest point (decreasing number of victims, the observed trend in time), you will first move right then left. This corresponds to the evolution in time of the “cylinder score” of the two cantons.

Correlation with gear boxes

We can also look at the different gear boxes: does the number automatic vehicles influence the number of accidents? Or for it to be more meaningful across the different cantons, does a higher share of automatic vehicles on the road influence the number of accidents?


The results are more interesting than before: 16 cantons (61.5%) exhibit a clear and significant negative correlation between the number of accidents and the percentage of automatic vehicles with a confidence of 95%.



Conclusion


Throughout this datastory, we hope you could learn many interesting and unexpected facts about car accidents in Switzerland. Did you know how more dangerous men are than women when driving? And could you guess that most of the accidents are occuring because of skidding in summer? What about the fact that the number of accidents due to drugs are increasing since last 25 years? Next time you’ll drive, don’t forget to keep an eye on young children who are the first victims of car accidents! Finally, adressing this time a message to policy makers, please keep on spending money in road infrastructure, since it is the top factor decreasing the number of car accidents! Finally, if you read this data story on a friday afternoon, please keep focused on the road when driving back home even though we know that your programm for this week end is exciting, the life of young children and yours is even more worth it…