Warning: A non-numeric value encountered in /home/msightan/public_html/wp-content/themes/Divi/functions.php on line 5837

Analytics: It’s All about People

In a recent Dilbert strip (April 24, 2015), Dilbert declares: “I found the root cause of our problems… It’s people. They’re buggy.” The thing is, even with all the hype around analytics, this is so true that it almost isn’t funny.​​​

There are two key disconnects in many analytics initiatives today: the business feeling that the data scientists do not understand the business needs, and the data scientists feeling like the business is not listening to what the data have to say. And the gap is not closing nearly as fast as we would like.

What we have is a failure to communicate.

Many data scientists have been hired, and many great algorithms have been developed, many of which have died a lonely death without ever being appreciated. In less dire situations, some analytics is being consumed, but it is far from optimal. What we all need to realize is that analytics is really just a methodology, and a data scientist is just a person with the appropriate skills to apply the said methodology. In reality, however, the business and the data scientists often have practical expectations from each other that are not very well articulated, resulting in friction that leads to distrust and/or indifference. With that said, I have a message for the business leaders and another for the data scientists.

To the business leaders: If great analytics happens in the forest, and no one is there to make decisions with it, does it make a business impact? The answer is no. Hiring data scientists does not make analytics happen. In fact, hiring data scientists is not one of the first things I would recommend to any organization starting out with analytics. Analytics is not just-add-water—there must be a culture and an ecosystem along with the right processes and functions in place to make it all work. Unless the organization was built data-driven from the ground up, building an analytics capability will always involve a degree of retrofitting. Until the people are ready to receive analytics, it will not be received.

To the data scientists: Building the best model is your job, but you must keep in mind that it is not the end objective–it is simply a means to the end. You have a specific skill set for which people are willing to pay, and your objective is to help people do better at whatever they are trying to do better. There is always a person at the end, and often in between; care for all people involved, and build a positive relationship with all of them. Build models for others–whether you like it or not, being an expert holder of a skill set means that you have a responsibility as a consultant in some form and thus a responsibility to manage your relationship with the end client, with very few exceptions. Without people ready to embrace your work, even your most beautiful algorithms will sit idle. Strive to connect with the people involved, and you will find that not only you will have an entirely different relationship with your client, but also you will approach the analysis differently.

Being successful in analytics, whether you are a business executive or a data scientist, is not at all about the capability to do analytics. It is about people working together and relating to each other. You have a much better chance of success by doing basic analytics with the right people, processes, functions, and culture, than by doing great analytics without them.​​​​​​

P.S. I’ve used the term “data scientist” here for mere convenience. What it ​​should really be called is another discussion!​

“Lies, Damned Lies, and Statistics”

Mark Twain wrote: “Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.'”*

I am OK with this statement because I get the intent, and I hope many would agree that the entire scientific discipline of statistics is not a lie. Like any good statistician, I must insist on a significant evidence to the contrary before rejecting the null hypothesis that at least some of statistics is honest, and in the absence of such evidence, I am not quite ready to concede that statistics is all a big lie.

Webster defines lying​ as making “an untrue statement with intent to deceive.” Lack of competence does not make one a liar, so not knowing how to use statistics correctly is a different issue. The key to lying is the “intent to deceive,” and this can be in the form of unwillingness to face the reality. This past week I heard multiple references to anecdotes of someone’s desire to make the results look “not so bad”; it can also go the other way to make someone else look “not so good.” It is not that the numbers are easy to manipulate, but rather that it is easy to appear data-driven.

Back when I taught introductory statistics courses, the syllabus always included the topic of subjectivity and the impact it may have on how the results are conveyed. We looked at various mass-circulation articles, identifying the author and/or the sponsor of the piece, the potential biases and their potential impact on the conclusions. While the results may be perfectly valid in one sense, it is important to take an objective view in order to understand what is really going on. The same is true in business settings.

The assumptions are critical–especially the business assumptions, which may be called business contexts or caveats that may or may not be made explicit. Statistical assumptions are important for sure; however, in practice, the violations of contextual assumptions are far more impactful than the violations of the statistical assumptions–many methodologies are fairly robust against violations of statistical assumptions and can generally produce directionally correct results. One may choose only the results that support one’s cause and ignore others that are more important, or choose the methodology or display that allows one’s story to be told, or choose to analyze in such a way that the results would only justify one’s position. Selecting the data to fit one’s pre-formed story, rather than letting the data coalesce into a story, is the opposite of being data-driven–call it agenda-driven analytics.

Agenda-driven analytics will tell you only what one wants to hear, not necessarily what one needs to hear. And in this case, analytics will never have a chance to do what it can do–it will be an involuntary participant in the advancement of an agenda it doesn’t even support. In the meanwhile, others, including the customers, suffer from lack of better treatment; depending on the context, the consequences may be quite grave.

P.S. I should fully expect a flurry of hate mails from my esteemed statistical colleagues for saying that statistical assumptions are not very important!​

*”Chapters from my Autobiography–XX,” North American Review no. 618, July 5, 1907.