Covid-19: Part 2 — The Calculus of Death (A Data Science Perspective)
The Peril of Probability
Fiona Lowenstein shouldn’t have been there.
Fiona was a 26-year old woman, living in New York, who worked out six times a week, did not smoke and had no prior autoimmune or respiratory conditions. She had certainly not expected to be hospitalized with Covid-19 when she went to bed on Sunday 15th March night. It was true that she had been having a fever and headache since Friday night. However, she was not unduly concerned and planned to stay at home and catch up with her work during her illness.
However, Fiona woke up in the middle of the nigh with chills, vomiting and gasping for breath. By Monday she could hardly eat, talk or walk and had to be rushed to the hospital where she was given oxygen. After two days of treatment Fiona was released and is now recovering in isolation.
Fiona was fortunate. She lived to tell her tale to CNN and New York Post and publish in the Straits Times. Chloe Middleton was less lucky. She died from the virus at the age of 21, in High Wycombe, Buckinghamshire, UK. So did a 17-year old teenager in California and an 16-year old in France.
But doesn’t this contradict what we have been told about the virus? That it kills the old but spares the young. What then killed Chloe and the teenagers and other young virus patients?
The one word answer is — probability.
Many of us have seen the table below based on data of 72,314 Chinese cases as of 11th February, 2020. These included confirmed, suspected and clinically diagnosed cases.
Table 1: Death Rates by Age among Covid-19 patients in China (11th February 2020)
Many indeed saw this table — or similar ones — but misread them. They thought that the data showed that young people are safe from Covid-19.
This error is due to lazy thinking and misunderstanding of probability theory.
As Daniel Kahneman — the Nobel laureate — has pointed out in his classic text “Thinking Fast and Slow”, most human decisions are based on reflex rather than conscious and careful thought. This in itself is not a problem. Indeed, life would be impossible if the choice between coffee and tea could only be made every morning after carefully thinking through all consequences.
So, when the newscaster on morning TV says “the probability of rain today is 10%” the mind lazily translates it to “it won’t rain today” and decides that there is no need to carry an umbrella. But that is not what the newscaster said at all. What he really said was “if we consider all days on which weather conditions are similar to today’s, it would rain on 10% of such days.”
The life of an individual is proverbially uncertain. But statisticians are awfully good at figuring out how many people in a large group will die from a specific cause. Otherwise life insurance companies could not survive. In fact, they employ specialists, known as actuaries, just to perform such calculations.
What Table 1 actually tells us is illustrated by an example. Suppose that there are 20,000 Covid-19 patients, younger than 30, in a population. Then about 80 of them would die from the disease. In a city as large as New York there are bound to be more than 20,000 young patients. Consequently, the deaths of a relatively small number of young patients is to be expected, though we could not know in advance which individuals would die. Around four times as many may require hospitalization. Since the life of every living human being is precious, it makes eminent sense for even the young to avoid behaviour that could lead to them (and through a chain reaction, others) contracting the virus.
Young American students might not have thronged the Florida beaches during their spring break earlier this month if they had understood probability theory better. Unfortunately, this lack of understanding probably cost lives.
When I began writing this blog a week ago a vast amount of information on Covid-19 was already available on mainstream and social media. But very little of it was data driven analysis. (Tomas Pueyo’s excellent articles — link provided at the end of this post — were among the few exceptions to the rule.)
The situation has changed since then and in fact the pendulum has begun to swing the other way. We are now beginning to be flooded by Covid-19 data and analysis, showered on us by statisticians, epidemiologists, economists and journalists. Unfortunately, their analyses are often contradictory; it’s difficult to make sense of it all and consequently many of us are confused.
Even such an apparently simple thing as calculating the death rate turns out to be surprisingly complicated.
Let’s look at following example which shows the death toll in Italy among confirmed cases as of today (30th March).
Source: https://www.worldometers.info/coronavirus/ (accessed on 29/3/2020)
So how should the death rate be calculated here? One way would be to divide the number of deaths (10,779) by the number of closed cases (23,809). That would give us a death rate of about 45%. But, of course, this is a huge overestimate since the majority of active cases would recover.
We could alternatively divide the number of deaths by the total number of cases. This gives us a death rate of about 11% which is a lot lower. This however is an underestimate since some of the active cases will die and so add to the numerator. I decided to make a simple adjustment to account for this.
I simply add a proportion of critical cases to the numerator while calculating the death rate, since these are the ones which will contribute to the death toll. The calculations are straightforward. Of the 23,809 closed cases 10,779 died while 13,030 recovered. Therefore, I added a proportion (10,779/23,809) = ~ 45% of the 3,906 critical cases to the deaths. With this adjustment the number of deaths increased by 1,768 and the death rate rose to 12.8%.
The table below gives the death rate today for 10 selected countries. For India, which does not report critical cases, I calculated the death rate by simply dividing the number of deaths by the total number of cases.
Source: https://www.worldometers.info/coronavirus/ (accessed on 30/3/2020)
Netherlands’ figures are high because the virus crisis hit it relatively recently and so it has reported very few recoveries (250) as against 771 deaths. Therefore, the adjustment that I make adds most of its 972 critical cases to the death toll. This increases the death rate. If we simply divide deaths by the total number of cases, we would get 7.1% which would be, in this case, a fairer estimate.
Italy and Iran have the highest death rates if we leave out the Netherlands. Singapore has the lowest. There is a wide variation in death rates from 0.4 to 12.7%. It is clear that your chance of surviving Covid-19 depends strongly on where you are are when you contract the disease. You are much better off in Singapore, South Korea or Germany than in Iran, Italy or Spain.
Why do the death rates differ so much across countries? There are many reasons.
Death rates may be high in a country because of its demographics. Since Covid-19 inflicts more deaths among the elderly, countries with aging populations naturally have a higher death rate.
Death rates are high when a country is unable to provide quality care to Covid-19 patients.
Finally, death rates may be high because the number of confirmed cases is low. Low number of cases might be a cause for concern rather than comfort. It may mean that cases are being under-reported because the country is not testing enough or for some other reason.
Italy had a perfect storm. On the one hand it had an aging population, with 23% of its citizens over 65 years in age. On the other hand, the Covid-19 crisis brought its healthcare system to its knees. So, its high death rate comes as no surprise.
And that brings us to a deeper problem. Remember what we said about sampling in Part I of this blog? The confirmed cases are based on the sample of the population which was tested. Therefore, the confirmed cases are a subset of all Covid-19 cases since many of the untested individuals could also be infected. The death rate that we have been talking about so far is the observed death rate based on confirmed cases. We cannot know the true death rate based on all cases — confirmed and unconfirmed — for sure until we test the entire population.
But we do know that tested individuals constitute a biased sample since people with severe symptoms of the disease are likelier to be tested. Therefore, the death rate is likely to be higher among confirmed than among all cases.
How much higher? Two Stanford scientists, Eran Bendavid and Jay Bhattacharya, argue that actual number of Covid-19 cases could be one or two orders of magnitude higher than the number of confirmed cases. The true death rate would therefore be one or two orders of magnitude lower than the observed death rate; if this is true then Covid-19 may be no deadlier than ordinary flu.
All countries necessarily under-report the number of Covid-19 cases. But can we say anything about by how much?
Look at Iran’s peculiar curve below.
For most countries the curve the total number of cases rise in a steeply exponential manner with the number of new cases rising sharply each day. Think Italy, Spain, USA. For a few (think Singapore, South Korea) the curve flattens due to sharp policy interventions.
Iran follows neither of these two classic patterns. Its graph grows almost linearly for the first 3 weeks of March with a roughly constant number of cases being added each day. What could account for that?
The likely explanation is that the number of cases in Iran is growing much faster than the graph suggests. Iran is quite probably testing only the severest cases which might also account for its higher death rate. There have been reports that Iran’s fight against the Coronavirus has been hamstrung by international sanctions. Iran’s government claims that not lifting of sanctions, to let it deal more effectively with this crisis, is a crime against humanity; they may well be right.
Iran’s curve has however been rising more steeply last week. This may be a matter of hope rather than concern. They may simply be testing more widely and so finding more cases.
Bendavid & Bhattacharya — and many others — feel that it is critically important to determine the actual number of Covid-19 cases and the true death rates. This is definitely very important from the point of view of formulating appropriate long-term policies, especially if the Coronavirus crisis takes years run its course, as Lee Hsien Loong, the Prime Minister of Singapore, suggested yesterday.
In the short term however– say for the next few weeks — the difference between the confirmed and actual death rate should matter little. Covid-19 is overwhelming many healthcare systems. Masks, protective equipment and ventilators are in short supply everywhere. The first priority is clearly to “flatten the curve” by following stringent social distancing measures. To paraphrase what Nassim Nicholas Taleb and Yaneer Bar-Yam, said in the article below, when an avalanche is moving towards you, don’t spend time modelling its path; run.
Shamanism, Voodoo and Astrology
I came across the following rant in a private Facebook post.
“Mathematical modelling is no better than shamanism, voodoo and astrology”.
Obviously, I do not agree. I taught students to build mathematical models for a decade. For two more I earned my livelihood by applying them to solve real-world business problems. Quantitative models are essential in just about every sphere of life in today’s world.
Yet I can sympathize with this very angry netizen.
Consider the case of the UK and the Netherlands. Both have a well-deserved reputation for having some of the best modellers in the world. Both decided to follow expert advice and ride out the Coronavirus crisis without imposing strict social distancing measures, as they judged these would seriously damage the economy. Reassuring statements about everything being under control were being issued by the British government as late as 12th March.
The British government was shocked into action by a paper that Neil Ferguson and his team at the Imperial College of London, made public on 16th March. It predicted 250,000 deaths without further strict measures. UK began implementing serious measures just as the Coronavirus tsunami hit.
Yet, as Taleb & Bar-Yam point out in their article the British government had relied at all stages on epidemiological models. How could the experts have got it so wrong?
Laymen may then be forgiven for being skeptical of mathematical models.
How then should we treat models? Should we dispense with them altogether and rely on gut feel?
No. A thousand time no!
We should not go into the war against the Coronavirus without models any more than a military commander should go into a shooting war without a battle plan. That is a sure recipe for disaster.
But models — as every practitioner who has applied them to real problems knows — need to be used with caution. They can go wrong for various reasons. Models are built on assumptions which could be wrong or at least not completely correct. The data used to build a model may be inadequate. Models are necessarily simplifications of reality for a model that is as complex as the situation being modeled would be as useless as road map drawn on a 1:1 scale. Some critical factor might have been left out by the model during the simplification process and its absence may only be felt when one goes about trying to implement it.
So, models go wrong quite often and require constant review. When we are dealing with a phenomenon as little understood as Covid-19 order-of-magnitude errors are possible — perhaps even likely.
The solution then is not abandoning models but using them with great caution and care. One should not believe blindly in models just as a good General should not slavishly follow the plan drawn up before the battle actually begins. Before implementing a model one should check its robustness and how well it stands up to different scenarios, especially extreme ones. During model implementation it is important to keep tabs on model performance just as a General must keep on reviewing his plan as the battle progresses.
Perhaps we — the data science community — share some responsibility for this misplaced trust in models on part of laymen and governments. There is an enormous amount of hype surrounding artificial intelligence, machine learning and big data. We — and the media — tend to hype the successes, such as IBM winning Jeopardy, and downplay the flops such as the failure of Google flu trends to predict the onset of flu epidemic using internet searches. Most importantly, we have not emphasized the role of judgement in modelling. As a result, many laymen — and possibly even governments — have to come to view models as magic spells to solve all problems. Therefore when models fails it destroys people’s trust in the entire modeling process. Perhaps we should be careful and label all models that we build “fragile, handle with care”.
The Ubasute Option
Japanese legends speak of a dark time in the country’s past when some remote communities practiced ubasute. They would abandon old people on lonely mountain peaks, to die of hunger, thirst or cold. The elderly victims participated willingly in the ritual and sacrificed themselves for the sake of their community.
Even though a number of Japanese books and movies have been based on ubasute, the consensus of historians is that ubasute did not really ever exist in Japan outside the realm of folklore. Of course, in times of famine or stress, some families in Japan — as in all societies — sacrificed their elderly. But ubasute never existed as an institutionalized ritual.
Ubasute, like werewolves and vampires, should have been banished to the realms of morbid tales and horror movies. Except, that it wasn’t.
On 23rd March evening Dan Patrick, the 69-year old lieutenant governor of Texas suggested that the US should exercise the Ubasute option. Of course, he did not coin the term Ubasute option. To the best of my knowledge, I just did. However, Patrick did express the hope that US grandparents would sacrifice themselves to save the economy and their grandchildren’s future.
It is of course most unlikely that any country would actually choose to exercise the Ubasute option, though one cannot quite rule out this happening in authoritarian regimes on the world’s fringes. Denial of medical treatment to the elderly is morally repugnant and would rightly be viewed as a monstrous crime.
However, many countries are currently in lockdowns that paralyze the economy. Donald Trump, the President of the US would like to get his country back to work as soon as possible. So would Jay Bolsonaro, the President of Brazil and no doubt many other leaders as well. In such a world the Ubasute option retains considerable attraction and resurfaces in many guises.
Let’s turn to a recent article by the economists Debraj Ray & S Subramanian which was published on 28th March in an Indian news outlet..
Ray & Subramanian would like to lift the lockdown in India on the under 40 segment of the population, since their chance of dying from Covid-19 death is not higher than that from other causes. “Protection of the elderly and the very young must be left to households”.
It is not very clear exactly how such protection would be ensured in India, where — unlike the west -multiple generations often share a home. Perhaps the elderly would be advised to limit trips outside home and practice better hygiene. However, Indians are not too good at following advice from their doctors or the government.
In practice, the implementation of this strategy would amount to complete lifting of the lockdown and return to “business-as-usual” mode. The result would be Ubasute on a horrifying scale.
Two mathematicians at Cambridge University — Ronojoy Adhikari & Rajesh Singh — have recently come up with a model that suggests that “business-as-usual” would lead to a holocaust in India in which between 2 and 3 million people — mostly over 40 — would die. Of course, mathematical models have their limitations as we have discussed. The predictions may well be wrong. But given such potentially devastating consequences, can we dare to take such huge risks?
Yet Trump, Bolsonaro, Ray & Subramanian are all undoubtedly correct in saying that a prolonged lockdown would destroy the economy and kill us more surely than the virus. Is there any way out? Does history and data science have anything to tell us?
Please stay tuned for my next post
Covid-19: Part 3 — History Lessons (A Data Science Perspective) — Forthcoming
Disclosure: I am the Director of Smart Consulting Solutions Pte Ltd, incorporated in Singapore and its subsidiary Radix Analytics Pvt Ltd, incorporated in India. I am also a Visiting Faculty Member at the Indian Institute of Management, Udaipur. However, the opinions expressed in this post are solely mine and not necessarily shared by any company or institution with which I am affiliated.
Tomas Pueyo’s articles still remain the best data-based analysis of the Covid-19 crisis.
Technical Note: Please note that my method of calculating death rates differs from Tomas Pueyo’s. He tries to forecast what the death rate for a country will be when the epidemic ends. I on the other hand try to calculate the snapshot death rate for each country today, correcting for the fact that we do not know what the outcome will be for currently active cases.
I am, as always, grateful to my friend Dr. Ashish Kumar Dawn for many insightful comments.
Ravi Dixit and Rajat Poddar referred me to some of the articles discussed in this post.
I invite comments on the post. Please feel free to write to me at firstname.lastname@example.org