pymc3使用_使用PyMC3了解飞机事故趋势

pymc3使⽤_使⽤PyMC3了解飞机事故趋势
pymc3使⽤
Visually exploring historic airline accidents, applying frequentist interpretations and validating changing trends with PyMC3.
使⽤PyMC3直观地浏览历史性航空事故,应⽤常识性解释并验证变化趋势。
前⾔ (Preface)
On the 7th of August this year, an Air India Express flight on a repatriation mission from Dubai (United Arab Emirates) to Kozhikode (Kerala, India) skidded off the runway under heavy rainfall and fell into a valley [1].
今年8⽉7⽇,印度航空快运航班执⾏遣返任务,从迪拜(阿拉伯联合酋长国)飞往科泽科德(印度喀拉拉邦),在⼤⾬中滑出跑道并坠⼊⼭⾕[1]。
The ensuing 35 foot drop broke the aircraft into two. The flight was ferrying a total of 180 souls and 18 of them lost their lives as an immediate consequence of the accident. The remaining 172 were injured to varying degrees and underwent treatment [2].
随后的35英尺⾼将飞机分成两部分。 这次飞⾏总共运送了180个⼈,其中18个⼈是事故的直接后果。 其余172⼈不同程度地受伤并接受了[2]。
The official probe into this horrifying accident will naturally be a fact finding mission and would try to make sense of what went wrong and who’s to blame.
官⽅对此可怕事件的调查⾃然是⼀项事实调查任务,并将试图弄清出了什么问题以及应归咎于谁。
动机 (Motivation)
Following this story, I started Googling about recent aircraft accidents, to understand the context and to look at these events from a global perspective.
跟随这个故事,我开始⾕歌搜索有关最近发⽣的飞机事故,以了解背景并从全球⾓度审视这些事件。
This search led me to numerous webpages that had of plane crashes, tables of , accident investigation and sound bites
from different aviation industry experts following such catastrophic accidents.
通过搜索,我到了许多⽹页,其中包含飞机失事的 , ,事故调查以及发⽣此类灾难性事故后来⾃不同航空业专家的声⾳。神黄豆
The bottom line of this search was that we are in the midst of an increasingly safe flying environment. measures are more stringent than ever before, thus making flying a relatively safer means of transport.
搜索的底线是我们处于⼀个越来越安全的飞⾏环境中。 ⽐以往任何时候都更加严格,因此使飞⾏成为⼀种相对安全的运输⽅式。
But I wanted to play with these numbers myself to validate this conclusion.
但是我想⾃⼰使⽤这些数字来验证这个结论。
The motivating question for this exercise was —
荐股英雄榜这项练习的动机问题是-
Has flying become relatively safer in recent times than in the past?
最近的飞⾏是否⽐过去变得相对安全?
数据源 (Data Source)
I looked at publicly available air crash data on and the (NTSB) and created a dataset that suited the needs of this exercise.
我查看了和 (NTSB)上公开可⽤的空难数据,并创建了适合此练习需求的数据集。
The entire exercise and dataset can be found on my repository.
整个练习和数据集可以在我的存储库中到。
Switching over to the first person plural….Now.
切换到第⼀⼈称复数…。现在。
⼯作⼯具 (Tools for the Job)
To answer the motivating question, we divide the task into two parts —
为了回答激励性问题,我们将任务分为两个部分:
1. Exploratory Data Analysis (EDA) in Python.
Python中的探索性数据分析(EDA)。
2. Probabilistic programming (PyMC3) in Python.
Python中的概率编程(PyMC3)。
探索性数据分析 (Exploratory Data Analysis)
In this part, we look at the aircraft crashes in the past, which forms our time series for analysis. A few things to remember -
在这⼀部分中,我们将查看过去的飞机失事,这构成了我们进⾏分析的时间序列。 需要记住的⼏件事-
1. The differentiates an aircraft accident from an aircraft incident. The difference is essentially whether fatalities
occurred or not.
在区分从飞机事故的飞机事故 。 本质上的区别在于是否发⽣了死亡
2. Our focus in this exercise is restricted to the occurrence of the accident, rather than its cause.
我们在此练习中的重点仅限于事故的发⽣ ,⽽不是事故的原因。
3. We look at commercial aircraft accidents from 1975 till 2019.
我们考察了1975年⾄2019年的商⽤飞机事故。
事故和死亡⼈数趋势 (Trend of accidents and fatalities)
Image for post
Fig. 1— Number of accidents and fatalities per year from 1975 till 2019.
图1-从1975年到2019年每年的事故和死亡⼈数。
Looking at the historic time series, we visually sense a decline in number of accidents per year from 1978 onwards. There appears to be a minor rise in number of accidents between 1987 and 1989, after which the numbers steadily decrease. The lowest number of accidents was observed in 2017, which is the safest year in aviation history. After 2017, the numbers seem to increase marginally.
纵观历史时间序列,我们从视觉上感觉到⾃1978年以来每年事故数量的下降。 在1987年⾄1989年之间,事故数量似乎有⼩幅上升,此后,这⼀数字稳步下降。 2017年的事故发⽣率最低,这被是航空历史上最安全的⼀年。 2017年之后,这个数字似乎略有增加。
Another clear trend observable is the drop in the number of fatalities over time. The 1970s and 80s were dangerous times to fly, with aircraft accidents, on an average, causing nearly 2200 fatalities a year. But over time we see that this number has dramatically reduced.
可观察到的另⼀个明显趋势是死亡⼈数随着时间的流逝⽽下降。 1970年代和80年代是飞⾏的危险时期,平均每年有飞机事故,每年造成近2200⼈死亡。 但是随着时间的流逝,我们看到这个数字已经⼤⼤减少了。
When this declining trend is looked at in the context of rising number of (green shaded region in Fig. 1), we get a better picture of airline safety.
在⼈数增加的背景下观察这种下降趋势(图1中的绿⾊阴影区域),我们可以更好地了解航空公司的安全状况。
每百万乘客死亡⼈数 (Fatalities per million passengers)
Image for post
Fig. 2 — Fatalities in the context of million passengers travelling every year
图2 –每年有百万旅客死亡的情况
When the declining number of fatalities are looked at from the perspective of rising number of air travellers, we get a clearly declining trend. The number of fatalities per million passengers travelling by air every year, has dropped drastically from 5 in a million to less than 1 in a million.
从航空旅客⼈数上升的⾓度看待死亡⼈数的下降,我们得到了明显下降的趋势。 每年每百万乘飞机旅⾏的乘客中的死亡⼈数已从百万分之五减少到不到百万分之⼀。
(Disclaimer: Bayesians, keep that pinch of salt ready)
(免责声明:贝叶斯主义者,请准备⼀点盐)
每次事故死亡⼈数 (Fatalities per accident)
Image for post
Fig. 3 — Variation in the number of fatalities per aircraft accident
图3 —每架飞机事故中死亡⼈数的变化
Another measure of aircraft safety is the number of fatalities per accident. Although there may be a number of exogenous factors (external factors) that influence the number of fatalities in a given accident — weather, nature of crash, time of day etc. — we still look at this measure as a rough estimate of aircraft safety.
飞机安全的另⼀项衡量标准是每次事故的死亡⼈数。 尽管可能有许多外来因素(外部因素)会影响给定事故中的死亡⼈数(天⽓,坠机性质,⼀天中的时间等),但我们仍然将此措施视为飞机安全的粗略估计。
There seems to be a slight decrease in trend beyond 1995 but it is not immediately observable from the graph. We also see that 1985, 1996, 2014 and 2018 were fatal years involving major crashes, because the average number of fatalities per crash is large.
1995年以后趋势似乎略有下降,但不能⽴即从图中观察到。 我们还看到1985、1996、2014和2018年是涉及重⼤坠机事故的致命年份,因为每起事故的平均死亡⼈数很⾼。
变化率 (Rate of change)
Image for post
Fig. 4 — Yearly percentage change in number of accidents
图4 —事故数量的年度百分⽐变化
A final piece of evidence, before we begin the probabilistic testing of the motivating question, is the yearly rate of change of accidents.
在我们开始对动机问题进⾏概率测试之前,最后的证据是事故的年变化率。
卫星电视接收机价格If we are truly living in safe times, then we expect the graph to show a series of successively increasing green bars. Such a window was observed only in 1979–80, 1980–84, 1999–00, 2006–07 and 2013–14. Extended periods of relatively safe travel can be seen from 1980–84 and 1996–2000.
如果我们确实⽣活在安全时期,那么我们希望图表显⽰⼀系列连续增加的绿⾊柱。 仅在1979–80、1980–84、1999–00、2006–07和2013–14中观察到这种窗⼝。 从1980-84年和1996-2000年可以看到相对安全的旅⾏延长了。
If we look at the rate of change beyond 1995, we see that there has largely been a decline in year-on-year accidents (very few red bars and more of green bars).
如果我们看⼀下1995年以后的变化率,我们发现事故率在逐年下降(红⾊条很少,绿⾊条更多)。
It appears that some external factor (like change in aircraft design, civil aviation regulations, better ATC technology etc.) may have caused this decline beyond 1995.
看来,某些外部因素(例如飞机设计变更,民航法规,更好的空中交通管制技术等)可能导致了1995年以后的下降。
概率编程 (Probabilistic Programming)
From our data exploration we saw that there is a continued decline in number of aircraft accidents every decade and we validated this trend with a couple of statistical measures.
从我们的数据探索中,我们可以看到每⼗年飞机事故的数量持续下降,并且我们通过⼀些统计⽅法验证了这⼀趋势。
We also saw that 1995 was, presumably, a turning point for the aviation industry. How can we validate this assumption?
我们还看到,1995年⼤概是航空业的转折点。 我们如何验证这个假设?
One interesting technique to do so, with the limited data and non-repeatability of events (Let us assume that we can’t simulate these accidents a million times) is the use of probabilistic techniques like Markov Chain Monte Carlo (MCMC).
在数据有限且事件不可重复的情况下(例如, 假设我们⽆法百万次模拟这些事故 ),⼀种有趣的技术是使⽤概率技术,例如Markov Chain Monte Carlo(MCMC)。
And one of the ways of implementing these techniques is by means of the PyMC3 library in Python.
实现这些技术的⽅法之⼀是借助Python中的PyMC3库。
快速⼊门 (A quick primer)
刘荃
PyMC3 is a library in Python that helps us carry out probabilistic programming. This does not mean that the programming is probabilistic (it is still a very much deterministic process!), but instead, we employ probability distributions and Bayesian methods.
PyMC3是Python中的⼀个库,可帮助我们进⾏概率编程。 这并不意味着编程是概率性的( 它仍然是⼀个⾮常确定性的过程! ),⽽是使⽤概率分布和贝叶斯⽅法。
This technique is built on top of a outlook of the world. We start with a belief (called prior probability)about a certain process or a parameter and we update this belief (called posterior probability) after several thousand runs (a.k.a random sampling). This method is opposite to that of the frequentist way of looking at things (like we did in the EDA).
该技术建⽴在基础之上 世界观。 我们从某个过程或参数的置信度( 称为先验概率 )开始,经过数千次运⾏( ⼜称为随机抽样 ) 后 ,我们更新此置信度( 称为后验概率 )。 这种⽅法与常看事物的⽅法相反( 就像我们在EDA中所做的那样 )。
The second foundation for this process is the random sampling methods of (MCMC). This is a set of algorithms that allows us to sample from the prior probability distributions and generate data to test our prior beliefs and update them.
此过程的第⼆个基础是 (MCMC)的随机抽样⽅法。 这是⼀组算法,使我们可以从先前的概率分布中采样并⽣成数据以测试我们的先前的信念并对其进⾏更新。
The documentation provided on the PyMC3 and hands on approach by are excellent for a high-level understanding of the library and the techniques. The book Bayesian Methods for Hackers, by is really helpful if you are thinking of getting your hands dirty.
PyMC3 上提供的⽂档以及 动⼿操作⽅法对于深⼊了解库和技术⾮常有⽤。 如果您打算弄脏⼿, 的《 贝叶斯⿊客⽅法》确实很有帮助。
好吧,让我们测试 (Alright so let’s test)
缺省We begin by establishing our prior beliefs about the accidents —
我们⾸先建⽴对事故的先前信念,
What kind of distribution do aircraft accidents follow?黎曼
飞机事故的后果如何?
Here we assume that the accidents follow a Poisson Distribution.
在这⾥,我们假设事故遵循泊松分布。
P(x|lambda) = (lambda^x)*(exp^-lambda)/(lambda!)x: number of accidents
lambda: rate of occurrence of the accident
What would be the rate of occurrence?
发⽣率是多少?
Given our initial assumption, we further presume that this rate of occurrence can be roughly the reciprocal of the average occurrences for the whole dataset.
给定我们最初的假设,我们进⼀步假设该发⽣率可以⼤致等于整个数据集平均发⽣率的倒数。
In other words,
换⼀种说法,
lambda = 1/(mean of number of accidents from 1975 to 2019)
What would be the initial turning point?
最初的转折点是什么?
The turning point is that year before which the rate of occurrence was high and after which, it became low. We initially assume that every year from 1975 to 2019 has an equal probability (drawn from a discrete uniform distribution) of being considered as a turning point.
转折点是发⽣率⾼的那⼀年,之后发⽣率低的那⼀年。 我们最初假设,从1975年到2019年,每年都有相等的可能性( 从离散的均匀分布中得出 )被视为转折点。
With these set of prior beliefs, we instantiate the model —
基于这些先验信念,我们实例化了模型-

本文发布于:2024-09-22 06:40:01,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/140752.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:事故   飞机   死亡   下降   概率   数据   问题   技术
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议