What is Analytics?

Big Data, Analytics & Data Science: The Big, the Smart & the Sexy? (PART TWO)

The Wild West

The growth of analytics, both in discussions and in practice, has coincided with the growing use of terms such as "big data" and "data science". But how are these concepts related? Are they synonyms for the same thing, complimentary concepts, or entirely separate entities? This post will attempt to address this issue by reviewing and comparing the literature concerning each.

This article is split into three parts: the first discussing big data, the second data science, and the third comparing each with analytics.

This is part two. For part one click here and for part three click here.

Data science and data scientists (the commonly used term for its practitioners), are terms that have come to prominence relatively recently. In the main it has grown in importance and recognition in response to the challenges of big data discussed in the previous part.

However, there are ambiguities about when such terms were first brought into use. Davenport and Patil (2012) report that the term data scientist was first coined in 2008. However, Smith (2006) argues that data science started to be discussed in the late 1990s, confirmed by the release of the Data Science Journal in 2002 (which was followed by the release of the Journal of Data Science the following year). Whilst this may be true of the term in its current use, Press (2012) traces its first usage back to the late 1960s as an alternative name for computer science suggested by Peter Naur.

Whilst the origins of the terminology may show some contention there does seem to be general agreement about the general definitions of the term, although admittedly the literature is quite limited, particularly academic literature. Liu et al, (2009) offer the following overview:

Data science has developed […] to include the study of the capture of data, their analysis, metadata, fast retrieval, archiving, exchange, mining to find unexpected knowledge and data relationships, visualization in two and three dimensions including movement, and management.

If this definition of data science is to be accepted, then this is clearly a broad church which incorporates the full lifecycle of business data, from collection and extraction to final presentation, and would include aspects of computer science, statistical & mathematical modelling, and general business & management practices.

So is a data scientist required to be an expert in all of these fields? Would such wide-ranging expertise be feasible? Or more broadly, what exactly is a data scientist?

There are a variety of definitions suggested in the literature. Perhaps the most humorous is the joke that circulated SAS Analyst Conference 2013, that "a data scientist is a business analyst that lives in California" (Sicular, 2013). Some more serious suggestions have also been offered. In 2012 Forbes' Dan Woods ran a series of interviews to try to better define what a data scientist actually is. However the results were not entirely consistent.

For example, Michael Rappa of North Carolina State University argues that "statistician" is perhaps the closest of traditional job roles, though adding that that it is also necessary to have "knowledge of the computational dimension and advanced programming languages" (Woods, 2012a). Monica Rogati of LinkedIn alternatively argues that a more general science background is appropriate arguing that "all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights" (Woods, 2012b). Steven Hillion of Greenplum on the other hand chooses to highlight the importance of general business skills and awareness. He argues you have to be "a good interviewer" to understand the business problems, as well as having specific domain experience (Woods, 2012c). Davenport and Patil (2012) however consider the most important, "basic, universal skill is the ability to write code". This would suggest that computer science would be the most likely background.

With the "expert" literature seemingly offering inconsistencies, practitioner opinion may be more illuminating. However, an EMC (2012) survey of close to 500 professionals working as either data scientists or in business intelligence showed further discrepancies about the backgrounds from which data science professionals can be sought from:

» 34% suggesting computer science graduates

» 27% professionals from disciplines outside of typical computer science roles

» 24% graduates from other academic disciplines

» 12% from "today’s [business intelligence] professionals" (EMC, 2012).

Rather than focusing entirely on academic or vocational background, some authors have highlighted personal characteristics as the principal requirements. Davenport and Patil (2012) cite "intense curiosity" as the most important. John Rauser of Amazon points to scepticism and the ability to be self-critical – so that they "will look as hard for evidence that refutes [their] thesis as [they do] for evidence that confirms it" (Woods, 2012d). Bladt and Filbin (2003) instead consider that an empathy for the less data-minded to be key so that the data can be translated into compelling stories.

In summary, no obvious pathway to the position of data scientist would seem to be apparent. Data scientists, with the proportions dependent on who you chose to listen to, would seem to require a mixture of programming, computer/software engineering, statistics, applied mathematics/operational research, data visualisation, requirements gathering, communication & presentation skills, management skills, curiosity, scepticism, and story-telling.

Is all this possible? Will it be a case of a very few exceptional candidates being able to hold employers to ransom for their services? Will universities be able to design courses to give graduates enough of a grounding in all of these areas? Or will a data scientist become another description of someone who is a 'jack of all trades, but master of none'? These is issues will be discussed further in part three.

Have your say by leaving a comment below and click here for part three where we compare data science with big data and analytics.

 

REFERENCES

Bladt J and Filbin B (2013). A Data Scientsit's Real Job: Storytelling. Harvard Business Review. Available from: http://blogs.hbr.org/cs/2013/03/a_data_scientists_real_job_sto.html, [accessed March 2013].

Davenport TH and Patil DJ (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, 90: 70-76.

EMC (2012). Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field, [Online]. Available from: http://uk.emc.com/collateral/about/news/emc-data-science-study-wp.pdf, [accessed March 2013].

Liu L, Zhang H, Li J, Wang R, Yu L, Yu J and Li P (2009). Building a Community of Data Scientists. Data Science Journal, 8: 201-208.

Press G (2012). Data Scientists: The Definition of Sexy, [Online]. Forbes. Available from: http://www.forbes.com/sites/gilpress/2012/09/27/data-scientists-the-definition-of-sexy/, [accessed March 2013].

Sicular S (2013). California Legislature Begets Data Scientists, [Online]. Available from: http://blogs.gartner.com/svetlana-sicular/california-legislature-begets-data-scientists/, [accessed March 2013].

Smith F (2006). Data Science as an Academic Discipline. Data Science Journal, 5: 163-164.

Woods D (2012a). What Is a Data Scientist?: Michael Rappa, Institute for Advanced Analytics, [Online]. Available from: http://www.forbes.com/sites/danwoods/2012/03/05/what-is-a-data-scientist-michael-rappa-north-carolina-state-university, [accessed March 2013].

Woods D (2012b). LinkedIn's Monica Rogati On "What Is A Data Scientist?", [Online]. Available from: http://www.forbes.com/sites/danwoods/2011/11/27/linkedins-monica-rogati-on-what-is-a-data-scientist/, [accessed March 2013].

Woods D (2012c). EMC Greenplum's Steven Hillion on What Is a Data Scientist?, [Online]. Available from: http://www.forbes.com/sites/danwoods/2011/10/11/emc-greenplums-steven-hillion-on-what-is-a-data-scientist/, [accessed March 2013].

Woods D (2012d). Amazon's John Rauser on "What Is a Data Scientist?", [Online]. Available from: http://www.forbes.com/sites/danwoods/2011/10/07/amazons-john-rauser-on-what-is-a-data-scientist/, [accessed March 2013].

You are here: Home Analytics Articles What is Analytics? Big Data, Analytics & Data Science: The Big, the Smart & the Sexy? (PART TWO)

Contact us

  • This email address is being protected from spambots. You need JavaScript enabled to view it.     Connect via LinkedIn    |    In assosciation with:    The OR Society    Loughborough University    |    About the Project

  • Address:

    The ORATER Project, C/O MJ Mortenson, School of Business & Economics, Loughborough University, Leicestershire, LE11 3TU