# Operational Research & Analytics

## Can Operational Research Breed the New Generation of Data Scientists?

- Created on 16 April 2013

There is little doubt that data scientists are becoming highly in demand, though sexy maybe an adjective too far! (Davenport and Patil, 2012). Whilst this term is clearly bound in a bubble of hype, and has a job description that may be asking too much of most candidates (see our previous post), such protests will mean little to those able to cash in on the reported six-figure salaries the title commands (e.g. Miller, 2013). So what does this mean for operational research (OR) specialists? Does data science offer the potential to boost their salaries or a threat to their job opportunities?

*For discussion about what is data science (and data scientists) click here, and for operational research (OR) click here.*

Many of the most prominent articles on data scientists have seemingly ignored OR as either a key skillset or route to the profession (e.g. Liu et al, 2009; Davenport and Patil, 2012; EMC, 2012). Is this an oversight or are the two too diverse for OR graduates and professionals to bridge this gap? Using Drew Conway’s (2013) Data Science Venn Diagram this post will discuss these issues further.

Despite the lack of discussion about the use of OR in the job of data scientist there is evidence that the two do have overlaps. In a previous post we discussed how the actual process of data science is effectively synonymous with analytics, and another post described the close link between analytics and OR. So, if we consider a data scientist as an individual capable of all areas of the analytics/data science process it would obviously follow that OR (or more generally mathematical and statistical modelling) does represent an important skillset in their arsenal. That said, as to whether OR graduates and professionals possess the full range of skills of a data scientist is another question entirely.

An interesting conceptualisation of a data scientist’s skillset is given in Drew Conway’s Venn diagram shown in figure 1.

**Figure 1 – The Data Science Venn Diagram (Conway, 2013)**

In Conway conceptualisation "Hacking Skills" describes essentially programming and other technology-based competencies to obtain and manipulate data. However, he notes that degrees in Computer Science are not a pre-requisite but the ability to "manipulate text files at the command-line, understanding vectorized operations, thinking algorithmically" (Conway, 2013). OR specialist will be well used to thinking "algorithmically" but command-line coding and vectorisation are not typically associated with the discipline.

"Math & Statistics Knowledge" is far more expected of OR specialists. The discipline’s focus on applied mathematical processes mean that such individuals will be well-equipped to manage this aspect of a data scientist’s brief. However, the cross-over area between the two ("Machine Learning") is less typical and something an OR graduate seeking a data scientist role may wish to develop. Data mining and OR do have many synergies (see for example Brown *et al*, 2011; Corne *et al*, 2012) but most OR degree programmes do not have strong focus on this area.

The third piece is "Substantive Expertise". Conway describes this section by arguing "science is about discovery and building knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested". The inference is that enough knowledge about the broader disciplines (and we could also add about the domain) is critical in identifying the right questions to ask and experiments to design. OR graduates would undoubtedly have experience with academic materials (also making them competent in the field of "Traditional Research"), and the discipline has developed significant work into problem structuring models, elicitation and facilitating workshops to better understand the problems and questions that matter most.

The remaining overlap area, "Danger Zone", Conway believes is the potentially dangerous combination of people who can extract and model the data and are aware of what they are trying to achieve, but may not understand the full mathematical and statistical rigour to ensure the appropriacy and accuracy of their tests. Obviously in this scenario there is the potential to reach sub-optimal or misleading recommendations. OR specialists are less likely to fall into this category as the mathematical demands of the subject should make them well aware of the necessary issues and pitfalls.

Overall we can summarise that OR specialist would fit many of the crietia asked of a data scientist. The strong focus of the discipline on mathematical methods and decision making, knowledge elicitation and problem structuring mean they should have genuine competencies in both the "Math & Statistical Knowledge" and "Substantive Expertise" areas. However, the "Hacking" area is one not normally covered in OR training.

Whilst this is clearly an important part in Conway’s conceptualisation there is one significant benefit for any OR specialist. Learning the programming and data structures required may be far easier and cheaper than learning either of the other two areas. There are many free online resources to learn languages such as Python, Pig, or PHP (typically used in extracting and manipulating web data) and commonly used applications such as R, Hadoop, MongoDB or CouchDB are open-source and free to download.

So whilst OR as an academic discipline may not provide all the skills and competencies required of a data scientist (at least in Conway's description), it does provide the skills not easily acquired outside of formal education. OR specialists will bring the quantitative and substantive skills, which if combined with programming, "hacking" and technological skills can produce graduates competent in these roles, and the high-salaries they command.

*For more information about the free courses and tools available online in programming, software and big data architecture, please see our weblinks section.*

*Special thanks to Drew Conway for allowing us to use his diagram in this article.*

**REFERENCES**

Brown D *et al* (2011). Future Trends in Business Analytics and Optimization. *Intelligent Data Analysis*, **15: ** 1001-1017.

Conway D (2013). *The Data Science Venn Diagram*, [Online]. Available from: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram, [accessed April 2013].

Corne D, Dhaenens C and Laetitia J (2012). Synergies between Operations Research and Data Mining: The Emerging use of Multi-Objective Approaches. *European Journal of Operational Research*, **221:** 469-479.

Davenport T and Patil D. (2012). Data Scientist: The Sexiest Job of the 21st Century. *Harvard Business Review*, **90:** 70-76.

EMC (2012). *Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field*, [Online]. Available from: http://uk.emc.com/collateral/about/news/emc-data-science-study-wp.pdf, [accessed March 2013].

Liu L, Zhang H, Li J, Wang R, Yu L, Yu J and Li P (2009). Building a Community of Data Scientists. *Data Science Journal*, **8:** 201-208.

Miller CC (2013). *Data Science: The Numbers of Our Lives*, [Online]. The New York Times. Available from: http://www.nytimes.com/2013/04/14/education/edlife/universities-offer-courses-in-a-hot-new-field-data-science.html?pagewanted=all&_r=1& [accessed April 2013].