July 31, 2016
Terminology: Data scientists vs. data engineers
I learned some newish terms on my recent trip. They’re meant to solve the problem that “data scientists” used to be folks with remarkably broad skill sets, few of whom actually existed in ideal form. So instead now it is increasingly said that:
- “Data engineers” can code, run clusters, and so on, in support of what’s always been called “data science”. Their knowledge of the math of machine learning/predictive modeling and so on may, however, be limited.
- “Data scientists” can write and run scripts on single nodes; anything more on the engineering side might strain them. But they have no-apologies skills in the areas of modeling/machine learning.
Related link
- I raised concerns about the “data science” term 4 years ago.
Categories: Predictive modeling and advanced analytics
Subscribe to our complete feed!
Comments
One Response to “Terminology: Data scientists vs. data engineers”
Leave a Reply
In my world, *data engineers* are generally app side developers who make at-scale data consumable in some form. This is generally a nod to big data or newer apps. They may be DBAs or devops staff. So – “data wrangling” for machine learning is often done by data engineers, but a MySQL admin at FB will likely call themselves a data data engineer. Data engineers are generally not doing supervised machine learning – data scientists are.
“Data scientists” generally have a goal to elucidate NEW models or insights from data using techniques based on machine learning. This is generally distinct BI (which is mostly representation of data or applying existing models such as KPIs or scoring.) Data science requires a combination of smarts with quantitative modelling skills, tool knowledge, and deep business understanding. This is a quant skill with business alignment. (Some data science tools, such as some FOSS versions of R are, are limited to a single node – perhaps leading to your description.)