Analysis: Five career paths to breaking into data science

Translator: Lu Miaomiao, Liang Fuqi; Proofreader: Lu Yanqin; Author: Matthew Mayo

Original link: http://www.kdnuggets.com/2017/02/5-career-paths-data-science-big-data-explained.html

The length of this article is 4970 words , it is recommended to read 6 minutes

This article gives you advice on how to actually get involved in a data science and/or big data career path .


I have been contacted by a number of people recently (mostly via LinkedIn) for advice on getting started with data science and/or big data. These people are generally interested in getting into this "field" and need some guidance on how to get there.

 

However, I say this with great respect, the central implication of these requests is that the requester does not really understand what he is asking for. Yes, no matter what you are learning, everyone needs to start somewhere. Rather than answering these similar questions one by one, this post will lay out some basic concepts related to data science and/or big data career paths and, hopefully, provide some insight into how to actually get involved in this complex field suggestion.


preparatory reading

 

Before we go any further, read these articles. I mean, read, these, articles.

 

  • Solving data science puzzles

    (http://www.kdnuggets.com/2016/03/data-science-puzzle-explained.html)

  • Re-analyzing data science puzzles

    (http://www.kdnuggets.com/2017/01/data-science-puzzle-revisited.html)

  • Analyzing Data Science and Big Data

    (http://www.kdnuggets.com/2016/11/big-data-data-science-explained.html)

  • Predictive Science vs Data Science

    (http://www.kdnuggets.com/2016/11/predictive-science-vs-data-science.html)

 

The first article provides an overview of some of the most dominant concepts in data science, while the second article is an update on these concepts earlier this year. The third article digs deeper into the concepts in data science and big data. The final article provides a brief look at the complexities and subtleties of the term "data science" against some other terms.

 

I've broken down the many career possibilities into five easily manageable paths. While there may be a lot of people who are vehemently opposed to this role division and panic because of it, it does give a high degree of categorization of skills and professional responsibilities. Therefore, I believe that what follows can effectively help newcomers navigate the myriad of confusing and confusing opportunities that exist in this professional field.


640?wx_fmt=png

Rough Analysis of Analytical Careers (click image to enlarge)


data management specialist


This is essentially an IT career, similar to a database administrator. Data stewards are considered to be concerned with managing data and the facilities that support data management. This position has little relevance to data analysis, nor is the use of languages ​​like Python and R necessary. The SQL language may be used, as well as query languages ​​related to Hadoop, such as Hive and Pig.

 

Key technologies and skills to focus on:

  • Apache Hadoop and its ecosystem

  • Apache Spark and its ecosystem

  • SQL and relational databases

  • NoSQL database

 

Further reading:

  • Parsing big data key terms

    (http://www.kdnuggets.com/2016/08/big-data-key-terms-explained.html)

  • Analytic database key technique

    (http://www.kdnuggets.com/2016/07/database-key-terms-explained.html)

  • Parsing Hadoop key terms

    (http://www.kdnuggets.com/2016/05/hadoop-key-terms-explained.html)

  • Parsing Apache Spark key terms

    (http://www.kdnuggets.com/2016/06/spark-key-terms-explained.html)

  • Analysis of key cloud computing terms

    (http://www.kdnuggets.com/2016/06/cloud-computing-key-terms-explained.html)

  • Seven Steps to Understanding NoSQL Databases (http://www.kdnuggets.com/2016/07/seven-steps-understanding-nosql-databases.html)

  • Seven Steps to Mastering the SQL You Need for Data Science

    (http://www.kdnuggets.com/2016/06/seven-steps-mastering-sql-data-science.html)


data engineer


This is a non-analytics big data career path. Remember the data facility mentioned in the career path just now? Yes, they need to be designed and executed, and data engineers do that part of the job. If a data steward is an auto mechanic, then a data engineer is an auto engineer. Make no mistake, though, both roles are critical to your car's driving and continued work, as well as driving you from point A to point B.


To be honest, the skills and skills required for data engineers and data stewards are similar, however, they each understand and use the same concepts at different levels. I won't repeat the information I mentioned in one of the previous careers (all of which is important for data engineers), but I'll add a list of extended readings specifically for data engineers.

 

Further reading:

  • Top NoSQL Database Engines

    (http://www.kdnuggets.com/2016/06/top-nosql-database-engines.html)

  • Top big data processing framework

    (http://www.kdnuggets.com/2016/03/top-big-data-processing-frameworks.html)

  • Top-level Spark system environment items

    (http://www.kdnuggets.com/2016/03/top-spark-ecosystem-projects.html)

  • Hadoop and Big Data: Answers to the Top Six Questions

    (http://www.kdnuggets.com/2016/01/hadoop-and-big-data-questions.html)

  • Why data scientists and data engineers need to understand virtualization in the cloud

    (http://www.kdnuggets.com/2017/01/data-scientist-engineer-understand-virtualization-cloud.html)


Business analyst


In this article, business analyst refers to a role closely related to data analysis and data presentation. Including reports, dashboards and anything called "business intelligence". This role often requires interaction (or querying) with relational and non-relational databases and big data frameworks.

 

While the first two roles are related to designing the infrastructure to manage the data as well as actually managing the data, the business analyst is primarily concerned with extracting information from those data that exist more or less. This contrasts with the following two roles (machine learning researcher/practitioner and data-driven professional), both of which focus on gaining insight from data or some surface information known outside of data. Therefore, business analysts need to have unique skills in these roles presented.

 

Key technologies and skills to focus on:

  • SQL and relational databases

  • NoSQL database

  • Business reporting and dashboard packaging techniques are often used

  • There is no fixed mode for the report in essence, and it is the key to quickly grasp the use of the tool

  • database


Further reading:

  • 10 trends in artificial intelligence in 2016

    (http://www.kdnuggets.com/2015/12/10-business-intelligence-trends-2016.html)

  • Embedded Analytics: The Future of Artificial Intelligence

    (http://www.kdnuggets.com/2016/09/embedded-analytics-future-business-intelligence.html)

  • Build or Buy – Analysis Dashboard (Visual Analysis)

    (http://www.kdnuggets.com/2016/07/build-buy-analytics-dashboards.html)


Machine Learning Researcher/Practitioner


Machine learning researchers and practitioners refer to those who make and use predictions and related tools for data exploitation. Machine learning algorithms allow statistical analysis to be applied at a high rate, and those who operate these algorithms are not content to have the data presented in its current form. Data interrogation is the way machine learning enthusiasts work, but with enough statistical understanding to know when to advance far enough and when to provide answers that are not credible.

 

Statistics and programming are the greatest assets of machine learning researchers and practitioners.

 

Key technologies and skills to focus on:

  • statistics!

  • Algebra and Calculus (intermediate level for practitioners, advanced level for researchers)

  • Programming skills: Python, C++ or some other general purpose language

  • Learning theory (intermediate level for practitioners, advanced level for researcher)

  • Understand the inner workings of machine learning algorithms (the more algorithms the better, the better!)


Further reading:

  • Machine Learning and Statistics

    (http://www.kdnuggets.com/2016/11/machine-learning-vs-statistics.html)

  • Explaining Machine Learning Key Terms (http://www.kdnuggets.com/2016/05/machine-learning-key-terms-explained.html)

  • Master Machine Learning with Python 7 Steps

    (http://www.kdnuggets.com/2015/11/seven-steps-machine-learning-python.html)

  • 5 Books You Must Read Before Entering a Machine Learning Career

    (http://www.kdnuggets.com/2016/10/5-free-ebooks-machine-learning-career.html)

  • Machine Learning Algorithms: A Short Technical Overview

    (https://www.linkedin.com/pulse/machine-learning-algorithms-concise-technical-overview-matthew-mayo)

  • 10 Algorithms Machine Learning Engineers Need to Know

    (http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html)

  • Algorithm Tutorial Overview

    http://www.kdnuggets.com/2016/09/great-algorithm-tutorial-roundup.html

  • 10 Data Mining Algorithms

    (http://www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html)

  • 15 Math MOOCs for Data Science

    (http://www.kdnuggets.com/2015/09/15-math-mooc-data-science.html)

 

Data Oriented Professionals


This is the best description I can come up with for what can be called a "real" data scientist. You know, unicorns. Except, there are no unicorns, and anyone who says otherwise is lying.

 

Data management professionals and data engineers focus on the infrastructure of data. Business analysts focus on extracting facts from data. Machine learning researchers and practitioners are concerned with advancing and using tools to leverage data for predictive and relevant analysis. Both roles are based on algorithms (development or exploitation or both). Data-oriented professionals focus primarily on the data itself and the facts it can tell, not the techniques or tools needed to perform the task.


Data-oriented professionals may use any of the technologies listed in any of the roles above, depending on their specific responsibilities. This is one of the biggest issues related to "data science"; the term doesn't have a specific practical meaning, but in the aggregate it encompasses everything. This role is like a panacea for the data world: knowing (probably) how to get a Hadoop ecosystem up and running; how to execute queries on the data stored in it; how to extract data and load it into a non-relational database; how to get Non-relational data and extract it to a flat file; how to identify this data in R or Python; how to design features after initial exploratory descriptive analysis; Predictive analysis; how to statistically analyze the results of the predicted tasks; how to visualize the results for easy use by non-technical personnel; how to use the end result of the data processing pipeline just described to tell managers a convincing fact.

 

These are just some of the skills a data scientist might have. In any case, however, the point of this role is data, and what can be gained from data. At the same time, in this role, professional knowledge occupies a large weight, which is obviously not something that can be taught here.

 

Key technologies and skills to focus on:

  • statistics

  • Programming languages: Python, R, SQL

  • data visualization

  • Communication skills


Further reading:

  • R language learning route: 7 steps to teach you from rookie to expert

    (http://www.kdnuggets.com/2016/03/datacamp-r-learning-path-7-steps.html)

  • Introduction to Data Science: Basic Concepts for Beginners

    (https://www.linkedin.com/pulse/data-science-primer-basic-concepts-beginners-matthew-mayo)

  • Statistics for Data Science 101

    (http://www.kdnuggets.com/2016/07/data-science-statistics-101.html)

  • What statistical topics are needed to excel in data science?

    (http://www.kdnuggets.com/2016/08/statistics-topics-needed-excelling-data-science.html)

  • Top algorithms and methods used by data scientists

    (http://www.kdnuggets.com/2016/09/poll-algorithms-used-data-scientists.html)

 

As an introductory article, I intentionally ignore IoT. There are two reasons: first, I don't want to add confusion to people trying to absorb all this new information; second, IoT is just a special case of data. Possibly with some refinements, these roles can all be applied to IoT data. But the essence is still the same.

 

I hope this introduction will be helpful to those who want to pursue a career in "Data Science" or "Big Data" but don't know where or how to start. Remember, for any characters mentioned in the article. None of the presentations here are all-inclusive. However, it's a good starting point for someone who doesn't know much about the data profession.

 

If you're interested in a different take on the topic, read Zachary Lipton's Will the Real Data Scientists Please Stand Up (http://www.kdnuggets.com/2015/05/data-science-machine-learning-scientist -definition-jargon.html) ?

END

about the author:

Lu Miaomiao : Currently studying English at Beijing Language and Culture University. A liberal arts student with a scientific mindset. Loves thinking and analysis, has a lot of big ideas, and likes to find potential connections in complex things. I like to look up at the stars, and I like to keep my feet on the ground. As an active member of the data faction, I hope to learn from everyone.

Liang Fuqi: Undergraduate in software engineering, majoring in big data analysis, likes to search and collect all kinds of information. I hope to meet more friends who are interested in data analysis on the THU Data School platform and study how to mine useful models and information from data.

This article is transferred from: Data Pi THU public account;

Related reading

Original series of articles:

1: Build your own data operation indicator system from 0(Summary)

2: Build your own data operation indicator system from 0 (positioning)

3: Build your own data operation system from 0 (business understanding)

4: The construction process and logic of data indicators

5: Series: From data indicators to data operation indicator system

6:   Actual combat: build a data operation indicator system for your own official account 

7:  Build your own data operation indicator system from 0 (operational activity analysis) 

Read articles related to data operations  :  

Getting started with operations, building a data analysis knowledge system from 0 to 1    

Recommended: 9 good habits for data analysts to collaborate with operations

Dry goods: teach you how to build a data-based user operation system

Recommendation: Interpretation of the most attentive operational data indicators

Dry goods: How to build a data operation indicator system

Build a data-based operation system from scratch

Dry goods: Interpretation of the three basic friendships of product, operation and data

Dry goods: build a data operation system from 0 to 1

Read articles related to data analysis and data products  :

Dry goods: building and thinking of data analysis team

Regarding user portraits, reading this article is enough

10 Analytical Mindsets Every Data Analyst Must Have.

How to build a big data hierarchy, read this article is enough

Dry goods: data products focusing on user behavior analysis

How to build a big data hierarchy, read this article is enough

80% of operations are destined to do chores? Because you have not built an effective user operation system

From the bottom to the application, the necessary skills of those data people

Understand the user operation system: user stratification and grouping

The data analysis thinking that must be mastered in operation, you dare to say that you can’t do data analysis

Business cooperation|For invitations, please add qq: 365242293  


For more relevant knowledge, please reply: "Moonlight Treasure Box";

Data Analysis (ID:  ecshujufenxi  ) Internet technology and data circle's own WeChat, and one of the members of the WeMedia WeMedia Alliance, which covers 50 million people.

640?wx_fmt=png

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326008754&siteId=291194637