
Visually exploring historic airline accidents, applying frequentist interpretations and validating changing trends with PyMC3.
On the 7th of August this year, an Air India Express flight on a repatriation mission from Dubai (United Arab Emirates) to Kozhikode (Kerala, India) skidded off the runway under heavy rainfall and fell into a valley [1].
The ensuing 35 foot drop broke the aircraft into two. The flight was ferrying a total of 180 souls and 18 of them lost their lives as an immediate consequence of the accident. The remaining 172 were injured to varying degrees and underwent treatment [2].
The official probe into this horrifying accident will naturally be a fact finding mission and would try to make sense of what went wrong and who’s to blame.
Following this story, I started Googling about recent aircraft accidents, to understand the context and to look at these events from a global perspective.
This search led me to numerous webpages that had lists of plane crashes, tables of statistics, accident investigation reports and sound bites
from different aviation industry experts following such catastrophic accidents.
The bottom line of this search was that we are in the midst of an increasingly safe flying environment. Safety measures are more stringent than ever before, thus making flying a relatively safer means of transport.
But I wanted to play with these numbers myself to validate this conclusion.
The motivating question for this exercise was —
Has flying become relatively safer in recent times than in the past?
I looked at publicly available air crash data on Aviation Safety Network and the National Transportation Safety Board (NTSB) and created a dataset that suited the needs of this exercise.
The entire exercise and dataset can be found on my repository.
Switching over to the first person plural….Now.
To answer the motivating question, we divide the task into two parts —
1. Exploratory Data Analysis (EDA) in Python.
2. Probabilistic programming (PyMC3) in Python.
In this part, we look at the aircraft crashes in the past, which forms our time series for analysis. A few things to remember -
The Federal Aviation Administration differentiates an aircraft accident from an aircraft incident. The difference is essentially whether fatalities occurred or not.
occurred or not.
2. Our focus in this exercise is restricted to the occurrence of the accident, rather than its cause.
3. We look at commercial aircraft accidents from 1975 till 2019.
Fig. 1— Number of accidents and fatalities per year from 1975 till 2019.
Looking at the historic time series, we visually sense a decline in number of accidents per year from 1978 onwards. There appears to be a minor rise in number of accidents between 1987 and 1989, after which the numbers steadily decrease. The lowest number of accidents was observed in 2017, which is the safest year in aviation history. After 2017, the numbers seem to increase marginally.
Another clear trend observable is the drop in the number of fatalities over time. The 1970s and 80s were dangerous times to fly, with aircraft accidents, on an average, causing nearly 2200 fatalities a year. But over time we see that this number has dramatically reduced.
When this declining trend is looked at in the context of rising number of air travellers (green shaded region in Fig. 1), we get a better picture of airline safety.
Fig. 2 — Fatalities in the context of million passengers travelling every year
When the declining number of fatalities are looked at from the perspective of rising number of air travellers, we get a clearly declining trend. The number of fatalities per million passengers travelling by air every year, has dropped drastically from 5 in a million to less than 1 in a million.
(Disclaimer: Bayesians, keep that pinch of salt ready)
Fig. 3 — Variation in the number of fatalities per aircraft accident
Another measure of aircraft safety is the number of fatalities per accident. Although there may be a number of exogenous factors (external factors) that influence the number of fatalities in a given accident — weather, nature of crash, time of day etc. — we still look at this measure as a rough estimate of aircraft safety.
There seems to be a slight decrease in trend beyond 1995 but it is not immediately observable from the graph. We also see that 1985, 1996, 2014 and 2018 were fatal years involving major crashes, because the average number of fatalities per crash is large.
变化率 (Rate of change)
Fig. 4 — Yearly percentage change in number of accidents
A final piece of evidence, before we begin the probabilistic testing of the motivating question, is the yearly rate of change of accidents.
卫星电视接收机价格If we are truly living in safe times, then we expect the graph to show a series of successively increasing green bars. Such a window was observed only in 1979–80, 1980–84, 1999–00, 2006–07 and 2013–14. Extended periods of relatively safe travel can be seen from 1980–84 and 1996–2000.
If we look at the rate of change beyond 1995, we see that there has largely been a decline in year-on-year accidents (very few red bars and more of green bars).
It appears that some external factor (like change in aircraft design, civil aviation regulations, better ATC technology etc.) may have caused this decline beyond 1995.
概率编程 (Probabilistic Programming)
From our data exploration we saw that there is a continued decline in number of aircraft accidents every decade and we validated this trend with a couple of statistical measures.
We also saw that 1995 was, presumably, a turning point for the aviation industry. How can we validate this assumption?
One interesting technique to do so, with the limited data and non-repeatability of events (Let us assume that we can’t simulate these accidents a million times) is the use of probabilistic techniques like Markov Chain Monte Carlo (MCMC).
And one of the ways of implementing these techniques is by means of the PyMC3 library in Python.
PyMC3 is a library in Python that helps us carry out probabilistic programming. This does not mean that the programming is probabilistic (it is still a very much deterministic process!), but instead, we employ probability distributions and Bayesian methods.
This technique is built on top of a outlook of the world. We start with a belief (called prior probability)about a certain process or a parameter and we update this belief (called posterior probability) after several thousand runs (a.k.a random sampling). This method is opposite to that of the frequentist way of looking at things (like we did in the EDA).
The second foundation for this process is the random sampling methods of (MCMC). This is a set of algorithms that allows us to sample from the prior probability distributions and generate data to test our prior beliefs and update them.
The documentation provided on the PyMC3 website and hands on approach by various tutorials are excellent for a high-level understanding of the library and the techniques. The book Bayesian Methods for Hackers, by Cameron Davidson-Pilon is really helpful if you are thinking of getting your hands dirty.
好吧,让我们测试 (Alright so let’s test)
We begin by establishing our prior beliefs about the accidents —
What kind of distribution do aircraft accidents follow?
Here we assume that the accidents follow a Poisson Distribution.
P(x|lambda) = (lambda^x)*(exp^-lambda)/(lambda!)x: number of accidents
lambda: rate of occurrence of the accident
What would be the rate of occurrence?
Given our initial assumption, we further presume that this rate of occurrence can be roughly the reciprocal of the average occurrences for the whole dataset.
In other words,
lambda = 1/(mean of number of accidents from 1975 to 2019)
What would be the initial turning point?
The turning point is that year before which the rate of occurrence was high and after which, it became low. We initially assume that every year from 1975 to 2019 has an equal probability (drawn from a discrete uniform distribution) of being considered as a turning point.
With these set of prior beliefs, we instantiate the model —

