The Future of “Big Data” Engineers
Data scientists have been the statistical wizards in the software world for years. They are the ones working with data sets, doing feature engaging and building features. These high demand professionals are what companies need to get into the world of “Big Data”. Recently companies have added data engineers to the ranks as their skills are necessary to complete the chain of work for data scientists. Data engineers manipulate, transform and clean the raw data so that the data scientists can use it. The demand for professionals with these skills is astronomical and there just isn’t enough supply to meet the demand. Companies are having to get creative and re-imagine job objectives and requirements.
For many companies, the ideal ratio is 2 data engineers for every 1 data scientist, which is nearly impossible to achieve in the current job market. The supply does not fulfill the demand which means those that have the skill set are making big bucks with starting salaries in the $100,000 and above range. Companies are realizing they are not just paying to save time, but also paying for expert assurance that there isn’t anything wrong that could go unnoticed. If the company doesn’t have the engineers they need and only have data scientists, they are not going to have full usage of the data they collect. The typical issue is that a data scientist might build an algorithm in a development environment, but they’re not able to run it on a cluster in a large data set. Therefore, someone else needs to create the tools that don’t already exist, which is why the role of a data engineer is essential.
Organizations often assume they will pick up data engineering experience as they work their way through a project, but they’re usually wrong. In response to the shortage, companies started looking for a completely new type of engineer. Enter machine learning engineers, a cross between a data engineer and data scientist. Companies that are looking for machine learning engineers want someone who is not only good at the data science aspects of machine learning but also good at building and running systems. They will need specific hard-earned, on-the-ground experience with building a data pipeline, data management systems, data analytics, and all of the intermediate code to make the data available and accessible. They must also assure that the data is correct. Desperate companies hope that combining the two specialties will solve the shortage problem and streamline the work to be done. Only time will tell.