July 31, 2016

Terminology: Data scientists vs. data engineers

I learned some newish terms on my recent trip. They’re meant to solve the problem that “data scientists” used to be folks with remarkably broad skill sets, few of whom actually existed in ideal form. So instead now it is increasingly said that:

“Data engineers” can code, run clusters, and so on, in support of what’s always been called “data science”. Their knowledge of the math of machine learning/predictive modeling and so on may, however, be limited.
“Data scientists” can write and run scripts on single nodes; anything more on the engineering side might strain them. But they have no-apologies skills in the areas of modeling/machine learning.

Related link

I raised concerns about the “data science” term 4 years ago.

Categories: Predictive modeling and advanced analytics

Subscribe to our complete feed!

Comments

One Response to “Terminology: Data scientists vs. data engineers”

Aaron on August 10th, 2016 3:01 pm

In my world, *data engineers* are generally app side developers who make at-scale data consumable in some form. This is generally a nod to big data or newer apps. They may be DBAs or devops staff. So – “data wrangling” for machine learning is often done by data engineers, but a MySQL admin at FB will likely call themselves a data data engineer. Data engineers are generally not doing supervised machine learning – data scientists are.

“Data scientists” generally have a goal to elucidate NEW models or insights from data using techniques based on machine learning. This is generally distinct BI (which is mostly representation of data or applying existing models such as KPIs or scoring.) Data science requires a combination of smarts with quantitative modelling skills, tool knowledge, and deep business understanding. This is a quant skill with business alignment. (Some data science tools, such as some FOSS versions of R are, are limited to a single node – perhaps leading to your description.)

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Terminology: Data scientists vs. data engineers

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin