Mark Twain wrote: “Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.'”*
I am OK with this statement because I get the intent, and I hope many would agree that the entire scientific discipline of statistics is not a lie. Like any good statistician, I must insist on a significant evidence to the contrary before rejecting the null hypothesis that at least some of statistics is honest, and in the absence of such evidence, I am not quite ready to concede that statistics is all a big lie.
Webster defines lying as making “an untrue statement with intent to deceive.” Lack of competence does not make one a liar, so not knowing how to use statistics correctly is a different issue. The key to lying is the “intent to deceive,” and this can be in the form of unwillingness to face the reality. This past week I heard multiple references to anecdotes of someone’s desire to make the results look “not so bad”; it can also go the other way to make someone else look “not so good.” It is not that the numbers are easy to manipulate, but rather that it is easy to appear data-driven.
Back when I taught introductory statistics courses, the syllabus always included the topic of subjectivity and the impact it may have on how the results are conveyed. We looked at various mass-circulation articles, identifying the author and/or the sponsor of the piece, the potential biases and their potential impact on the conclusions. While the results may be perfectly valid in one sense, it is important to take an objective view in order to understand what is really going on. The same is true in business settings.
The assumptions are critical–especially the business assumptions, which may be called business contexts or caveats that may or may not be made explicit. Statistical assumptions are important for sure; however, in practice, the violations of contextual assumptions are far more impactful than the violations of the statistical assumptions–many methodologies are fairly robust against violations of statistical assumptions and can generally produce directionally correct results. One may choose only the results that support one’s cause and ignore others that are more important, or choose the methodology or display that allows one’s story to be told, or choose to analyze in such a way that the results would only justify one’s position. Selecting the data to fit one’s pre-formed story, rather than letting the data coalesce into a story, is the opposite of being data-driven–call it agenda-driven analytics.
Agenda-driven analytics will tell you only what one wants to hear, not necessarily what one needs to hear. And in this case, analytics will never have a chance to do what it can do–it will be an involuntary participant in the advancement of an agenda it doesn’t even support. In the meanwhile, others, including the customers, suffer from lack of better treatment; depending on the context, the consequences may be quite grave.
P.S. I should fully expect a flurry of hate mails from my esteemed statistical colleagues for saying that statistical assumptions are not very important!
*”Chapters from my Autobiography–XX,” North American Review no. 618, July 5, 1907.