DS/ML:《Top 19 Skills You Need to Know in 2023 to Be a Data Scientist,2023年成为数据科学家需要掌握的19项技能》翻译与解读

DS/ML:《Top 19 Skills You Need to Know in 2023 to Be a Data Scientist,2023年成为数据科学家需要掌握的19项技能》翻译与解读

目录

《Top 19 Skills You Need to Know in 2023 to Be a Data Scientist,2023年成为数据科学家需要掌握的19项技能》翻译与解读

An overview of the ten most important.

一、Big Data Processing

BigData:大数据开发的简介、核心知识(linux基础+Java/Python语言+分布式(Hadoop/Hive、Hbase、MongoDB、Spark、Storm、MaxCompute+Docker)之详细攻略

二、Cloud Computing

Cloud Computing:云计算的简介、必要性以及安全上云与企业数字化转型(从陈坤/辛芷蕾主演电视剧《输赢》看云计算的未来)的关系

三、SQL & Database Management

DBMS/Database:数据库管理的简介、安装(注意事项等)、学习路线(基于SQLSever深入理解SQL命令语句综合篇《初级→中级→高级》/几十项代码案例集合)之详细攻略

Database:五大数据库类型的简介(NDBMS/HDBMS/OODBMS/RDBMS/NoSQL)、两大主流数据库对比—关系型数据库VS非关系型数据库(存储方式/存储结构/存储规范等)之详细攻略

四、Data Warehousing & ETL

BigData/ETL:数仓/数据仓库、数据ETL技术(Extract Transform Load/数据提取-转换-加载)的简介、常用工具、应用之详细攻略

Bigdata:阿里云认证培训之《大数据开发工程师|数据中台L2培训》的简介、课程目录及其学习内容

Bigdata:阿里云认证培训之《大数据开发工程师|数据中台L2培训》Hologres(交互式分析/秒级实时数仓)的简介、应用场景、计算型存储、典型业务场景、交互式分析典型场景、系统架构、技术亮点

五、Data Modeling & Management

AI&BigData:数智化之数据中台(阿里系One Data)之数据模型设计概述2-3-4、规范4-6-10(数据中台模型设计四大约定+六层原则,数据中台开发十大规范)之详细攻略

AI&BigData:数智化之数据中台(阿里系One Data)简介(以保险公司为例理解)、技术架构、使用方法之详细攻略

AI&BigData:数智化之《某保险公司的人工智能时代保险行业数字化转型》——数据驱动运营、数据中台建设计划、数据智能蓝图、规范方案、问题和方向、系统现状之内部课堂笔记记录

六、Data Cleaning and Wrangling

七、Data Mining

八、Data Visualization

DataScience:数据可视化的简介(意义+优势)、常用方法(Tableau/PowerBI/QlikView等工具可视化、matplotlib/seaborn等编程可视化)之详细攻略

九、Machine Learning

十、Deep Learning

十一、What Other Skills Do You Need to Know to Become a Data Scientist in 2023?

11.1、Natural Language Processing (NLP)

11.2、Time Series Analysis & Forecasting

Math/ML:时间序列数据集/时间序列预测任务的简介、常用算法及其工具、案例应用之详细攻略

11.3、Experimental Design & A/B Testing

Internet:A/B Testing即对照实验(一种数据驱动决策方法)的简介、原理、案例应用之详细攻略

11.4、Data Storytelling

11.5、Generative Adversarial Networks (GANs)

DL之GAN:生成对抗网络GAN的简介、应用、经典案例之详细攻略

11.6、Transfer Learning

11.7、Automated Machine Learning (AutoML)

AI之MLOPS:数据科学/机器学习算法领域之工程化六大核心技术—MLOPS、模型开发(流水线/并行处理/持久化/可解释性)、模型部署(云端服务器)、模型监控、模型管理、自动化技术之详细攻略

AI/AutoML:人工智能领域-自动化技术之机器学习自动化技术的简介(预处理→设计算法→训练模型→优化参数)、常用工具或框架(机器对比)之详细攻略

11.8、Hyperparameter Tuning

ML/DL模型调参:机器学习和深度学习中超参数优化的简介(评估指标/过拟合)、常用调参优化方法(手动调参/随机调参/网格调参/贝叶斯调参)之详细攻略

ML/DL模型调参:深度学习神经网络中参数调优的简介、超参数网格搜索的技巧总结(n_jobs/batch_size/epoch/lr/权值初始化/优化器/激活函数/Dropout 正则化/神经元个数)

11.9、Explainable AI (XAI)

XAI/ML:可解释人工智能XAI/ 机器学习模型可解释性的简介(背景/术语解释/核心思想/意义/方法/技术/案例)、常用工具包、案例应用(DS/CV/NLP各领域)之详细攻略


《Top 19 Skills You Need to Know in 2023 to Be a Data Scientist2023年成为数据科学家需要掌握的19项技能》翻译与解读

链接

Top 19 Skills You Need to Know in 2023 to Be a Data Scientist - KDnuggets

时间

2023年4月5日

作者

Nate Rosidi,KDnuggets

An overview of the ten most important.

Times are changing. If you want to be a data scientist in 2023, there are several new skills you should add to your roster, as well as the slew of existing skills you should have already mastered.

Why such an extensive set of skills? Part of the problem is job scope creep. Nobody knows what a data scientist is, or what one should do, least of all your future employer. So anything that has data gets stuck in the data science category for you to deal with.

时代在变。如果你想在 2023 年成为一名数据科学家,你应该将几项新技能添加到你的名册中,以及你应该已经掌握的大量现有技能。

为什么要有如此广泛的技能?部分问题是工作范围蔓延。没有人知道数据科学家是什么,或者应该做什么,尤其是你未来的雇主。因此,任何有数据的东西都会卡在数据科学类别中,供您处理。

You’re expected to know how to clean, transform, statistically analyze, visualize, communicate, and predict data. Not only that but new technology (or technology that has recently reached the mainstream) could also be added to your job responsibilities.

In this article, I’ll break down the top 19 skills you need to know in 2023 to be a data scientist.

您应该知道如何清理、转换、统计分析、可视化、交流和预测数据。不仅如此,新技术(或最近成为主流的技术)也可以添加到您的工作职责中。

在本文中,我将分解 2023 年成为数据科学家需要掌握的 19 大技能。

以下是对最重要的十项的概述。

Skill

What

Why is it important?

Big Data Processing

Processing, storing, and analyzing large amounts of data with Hadoop, Spar, or similar.

Increasing volume of data needed to be processed.

Cloud Computing

Using cloud-based technologies to store and process data.

Cost-efficient and scalable solution for dealing with the increasing amount of data many companies prefer over investing in on-site hardware.

SQL & Database Management

Organizing, storing, and accessing data stored in databases.

Keeping track of all data you need and retrieving it when required.

Data Warehousing & ETL

Extracting data from different sources,transforming it, and loading it into a data warehouse.

Increasing volume of data,number of its sources, and data users.

Data Modeling & Management

Creating models representing data, which involves entities, relationships, and attributes,as well as processes for data validation, integrity, and security.

Ensuring data is organized, structured, and shared in an adequate manner.

Data Cleaning and Wrangling

Transforming raw data into a format that can be used for analysis. It deals with:将原始数据转换为可用于分析的格式。它涉及

Missing values

Duplicates

Inconsistent data

Formatting data

Making your dataset reliable and your analysis trustworthy.

Data Mining

Extracting useful information from data through clustering,classification, and association rules.

Identifying patterns across large sets of data.

Data Visualization

Visual representation of findings in the form of graphs and charts.

Communication with stakeholders who are mostly without a technical background.

Machine Learning

The application of algorithms and statistical models to make predictions and decisions based on data.

Growing field used in various industries to solve an increasing number of business problems,such as image classification,speech recognition,building recommendation systems, etc.

Deep Learning

Creating algorithms that can learn patterns in data through multiple layers of artificial neural networks.

Processing vast amounts of data,especially if it's unstructured.

一、Big Data Processing

What is Big Data Processing?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn it?

Big data is a buzzword, yes, but it’s also a real concept - Oracle defines it as “data that contains greater variety, arriving in increasing volumes and with more velocity,” or data with the three V’s.

大数据是一个流行词,是的,但它也是一个真实的概念——Oracle 将其定义为“包含更多种类、以越来越大的数量和更快的速度到达的数据”,或者具有三个 V 的数据。

Big data processing is the ability to process, store, and analyze large amounts of data using technologies like Hadoop and Spark.

大数据处理是使用 Hadoop 和 Spark 等技术处理、存储和分析大量数据的能力

In 2023, the ability to process big data is critical for data scientists. The volume of data being generated continues to grow at an exponential rate, and being able to handle and analyze this data effectively is essential for making informed decisions and gaining valuable insights. Data scientists who have a deep understanding of big data processing techniques will be able to work with large data sets with ease and make the most out of the information they contain.

Also, thanks to its buzz-wordiness, it never hurts to whack “big data” on your resume.

2023 年,处理大数据的能力对数据科学家来说至关重要。生成的数据量继续以指数速度增长,能够有效地处理和分析这些数据对于做出明智的决策和获得有价值的见解至关重要。对大数据处理技术有深刻理解的数据科学家将能够轻松地处理大数据集并充分利用它们包含的信息。

此外,由于它的流行语,在你的简历上敲打“大数据”永远不会有坏处。

I love Simplilearn’s YouTube tutorial series on this concept.

我喜欢 Simplilearn关于这个概念的YouTube 教程系列。

BigData:大数据开发的简介、核心知识(linux基础+Java/Python语言+分布式(Hadoop/Hive、Hbase、MongoDB、Spark、Storm、MaxCompute+Docker)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/107577570

二、Cloud Computing

What is Cloud Computing?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn It?

It’s funny – as more products and services move into the cloud, cloud computing becomes a job requirement for pretty much every techy job, whether it’s DevOps or a data scientist.

这很有趣——随着越来越多的产品和服务迁移到云中,云计算成为几乎所有技术工作的工作要求,无论是DevOps还是数据科学家。

Cloud computing is the use of cloud-based technologies and platforms like AWS, Azure, or Google Cloud to store and process data. It’s kind of like having a virtual storage room that you can access from anywhere at any time. Instead of storing data and computing resources on local machines or servers, cloud computing allows organizations – and data scientists – to access these resources through the internet.

云计算是使用基于云的技术和平台(如 AWS、Azure 或 Google Cloud)来存储和处理数据。这有点像拥有一个您可以随时随地访问的虚拟储藏室。云计算不是将数据和计算资源存储在本地机器或服务器上,而是允许组织和数据科学家通过互联网访问这些资源

As I keep highlighting, the amount of data you’re expected to work with as a data scientist is growing. More companies will be sticking it in the cloud rather than dealing with it on-prem. It's becoming increasingly important to have the ability to store and process this data in a scalable and efficient manner.

Cloud computing provides an effective solution for this, allowing data scientists to access vast amounts of computing resources and data storage without needing pricy hardware and infrastructure.

正如我一直强调的那样,作为数据科学家,您需要处理的数据量正在增长。更多的公司将把它放在云端,而不是在本地处理。以可扩展和高效的方式存储和处理这些数据的能力变得越来越重要。

云计算为此提供了有效的解决方案,使数据科学家无需昂贵的硬件和基础设施即可访问海量的计算资源和数据存储。

The good news is because companies own various clouds, many of them have a vested interest in teaching you about it for free, so you learn to use theirs. Google, Microsoft, and Amazon all have great cloud computing resources.

好消息是因为公司拥有各种云,他们中的许多人有既得利益免费教你,所以你学会使用他们的。谷歌、微软和亚马逊都有很好的云计算资源。

Cloud Computing:云计算的简介、必要性以及安全上云与企业数字化转型(从陈坤/辛芷蕾主演电视剧《输赢》看云计算的未来)的关系

https://yunyaniu.blog.csdn.net/article/details/123418987

三、SQL & Database Management

What is SQL and Database Management?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn This Key Skill?

SQL is a Structured Query Language. Data scientists use SQL to work with SQL databases as well as manage databases and perform data storage tasks.

SQL 是一种结构化查询语言。数据科学家使用 SQL 来处理 SQL 数据库以及管理数据库和执行数据存储任务

SQL is a very popular language that lets you access and manipulate structured data. It goes hand in hand with database management, which is commonly done in SQL. Database management is basically how you can organize, store, and fetch data from a place.  SQL databases are one of the top backend technologies to learn in 2023, so it’s not just for data science.

SQL 是一种非常流行的语言,可让您访问和操作结构化数据。它与通常在 SQL 中完成的数据库管理密切相关。数据库管理基本上就是如何组织、存储和从某个地方获取数据。SQL 数据库是 2023 年最值得学习的后端技术之一,因此它不仅仅适用于数据科学。

As a data scientist, you have to keep track of all the data, make sure it's organized, and retrieve it when someone needs it. That’s what SQL and database management let you do.

作为数据科学家,您必须跟踪所有数据,确保数据井井有条,并在有人需要时检索数据。这就是 SQL 和数据库管理让您做的事情。

Coursera has a ton of great, well-priced database management/admin courses you can try. You can also get a sneak preview of some SQL interview questions here, which can be useful for testing your knowledge.

Coursera有大量很棒的、价格合理的数据库管理/管理课程,您可以尝试。您还可以在这里预览一些SQL 面试问题,这对于测试您的知识很有用。

DBMS/Database:数据库管理的简介、安装(注意事项等)、学习路线(基于SQLSever深入理解SQL命令语句综合篇《初级→中级→高级》/几十项代码案例集合)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/107589721

Database:五大数据库类型的简介(NDBMS/HDBMS/OODBMS/RDBMS/NoSQL)、两大主流数据库对比—关系型数据库VS非关系型数据库(存储方式/存储结构/存储规范等)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/107588917

四、Data Warehousing & ETL

What Are Data Warehousing and ETL?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn It?

“Wait, didn’t we just cover databases? What’s a data warehouse?” I hear you ask.

I get you. Sometimes it feels like the most critical data science skill is keeping all the acronyms and jargon straight.

“等等,我们不是只介绍了数据库吗?什么是数据仓库?” 我听到你问。

我明白了。有时感觉最关键的数据科学技能是保持所有首字母缩略词和行话的直截了当。

First, let’s differentiate data warehouses from databases.

Warehouses store current and historical data for multiple systems, while databases store current data needed to power a project. A database stores the current data required to power an application whereas a data warehouse stores current and historical data for one or more systems in a predefined and fixed schema to analyze the data.

In short, you’d use a data warehouse for data for lots of different projects together, whereas a database mostly stores one single project’s data.

ETL is a process that involves data warehousing, short for extract, transform, and load. An ETL tool will extract data from any data source systems you want, transform it in the staging area (usually cleaning, manipulating, or “munging” it), and then load it into a data warehouse.

首先,让我们区分数据仓库数据库

仓库存储多个系统的当前和历史数据,而数据库存储为项目提供动力所需的当前数据。数据库存储为应用程序提供动力所需的当前数据,而数据仓库以预定义和固定的模式存储一个或多个系统的当前和历史数据以分析数据。

简而言之,您将使用数据仓库来存储大量不同项目的数据,而数据库主要存储一个项目的数据

ETL 是一个涉及数据仓库的过程,是提取转换加载的缩写。ETL 工具将从您需要的任何数据源系统中提取数据,在暂存区中对其进行转换(通常是清理、操作或“修改”它),然后将其加载到数据仓库中。

I feel like I’ve repeated this point in every skill, but data is growing. Companies are hungry for it, and they’ll expect you to manage it. Knowing how to manage data in buildable pipelines is critical.

我觉得我在每个技能中都重复了这一点,但是数据在增长。公司渴望它,他们希望你能管理好它。了解如何管理可构建管道中的数据至关重要。

I recommend learning how to do a proper ETL with a specific language, like SQL or Python. Datacamp has got a good one with Python. Microsoft runs a more intermediate-level tutorial to go through a SQL option.

我建议学习如何使用特定语言(如 SQL 或 Python)执行适当的 ETL。Datacamp 有一个很好的Python。Microsoft 运行了一个更中级的教程来完成 SQL 选项。

  

BigData/ETL:数仓/数据仓库、数据ETL技术(Extract Transform Load/数据提取-转换-加载)的简介、常用工具、应用之详细攻略

https://yunyaniu.blog.csdn.net/article/details/124656886

Bigdata:阿里云认证培训之《大数据开发工程师|数据中台L2培训》的简介、课程目录及其学习内容

https://yunyaniu.blog.csdn.net/article/details/127276338

Bigdata:阿里云认证培训之《大数据开发工程师|数据中台L2培训》Hologres(交互式分析/秒级实时数仓)的简介、应用场景、计算型存储、典型业务场景、交互式分析典型场景、系统架构、技术亮点

https://yunyaniu.blog.csdn.net/article/details/127276953

五、Data Modeling & Management

What is Data Modeling And Management?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn It?

Every data scientist is a model specialist. I’m not talking about Giselle Bundchen. I mean creating a model of how data is stored and organized in a system.

每个数据科学家都是模型专家。我不是在谈论Giselle Bundchen(巴西著名超模,幽默的方式表达数据科学家通常需要创建数据模型的含义)。我的意思是创建一个模型,说明数据在系统中的存储和组织方式。

Data modeling and management is the process of creating mathematical models to represent data, as well as the management of data to maintain its quality, accuracy, and usefulness.

This involves defining data entities, relationships, and attributes, as well as implementing processes for data validation, integrity, and security.

In simpler terms, data modeling basically means you’re creating a blueprint for how data is organized and connected in your employer’s systems. You can think of it like drafting a blueprint of a house. Just like a blueprint shows the different rooms and how they're connected, data modeling shows how different pieces of information are related and connected to each other.

This helps ensure that data is stored and used in a consistent and effective way.

数据建模和管理是创建数学模型来表示数据以及管理数据以保持其质量、准确性和实用性的过程

这涉及定义数据实体、关系和属性,以及实施数据验证、完整性和安全性流程。

简而言之,数据建模基本上意味着您正在为数据在雇主系统中的组织和连接方式创建蓝图。您可以将其视为绘制房屋蓝图。就像蓝图显示了不同的房间以及它们是如何连接的一样,数据建模显示了不同的信息片段是如何相互关联和连接的。

这有助于确保以一致且有效的方式存储和使用数据

As a data scientist, you’ll be responsible for making sure data is organized and structured in an accessible way. Data modeling and management help you work with data, share it, make sure it’s accurate, and make decisions based on it.

作为数据科学家,您将负责确保数据以可访问的方式进行组织和结构化。数据建模和管理可帮助您处理数据、共享数据、确保数据准确并根据数据做出决策。

Microsoft has a good intro on their blog, just half an hour long and highly rated. It’s a good place to start.

微软在他们的博客上有一个很好的介绍,只有半小时,而且评价很高。这是一个很好的起点。

AI&BigData:数智化之数据中台(阿里系One Data)之数据模型设计概述2-3-4、规范4-6-10(数据中台模型设计四大约定+六层原则,数据中台开发十大规范)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/129106195

AI&BigData:数智化之数据中台(阿里系One Data)简介(以保险公司为例理解)、技术架构、使用方法之详细攻略

https://yunyaniu.blog.csdn.net/article/details/123990924

AI&BigData:数智化之《某保险公司的人工智能时代保险行业数字化转型》——数据驱动运营、数据中台建设计划、数据智能蓝图、规范方案、问题和方向、系统现状之内部课堂笔记记录

https://yunyaniu.blog.csdn.net/article/details/120230386

六、Data Cleaning and Wrangling

What is Data Cleaning and Wrangling?
Why Does it Matter in Becoming a Data Scientist in 2023?
Where Can You Learn This Key Skill?

While it’s not 80% of a data scientist’s job, data cleaning and wrangling are still one of the most important skills a data scientist can master in 2023.

虽然这不是数据科学家工作的 80%,但数据清理和整理仍然是数据科学家在 2023 年可以掌握的最重要技能之一。

Data cleaning and wrangling are the processes of transforming raw data into a format that can be used for analysis. This involves handling missing values, removing duplicates, dealing with inconsistent data, and formatting the data in a way that makes it ready for analysis.

Cleaning the data usually refers to getting rid of bad/inaccurate values, filling in any blanks, finding duplicates, and otherwise making sure your data set is as spotless and reliably accurate as can be expected. Wrangling it (or munging it, massaging it, or any other weird verb like that) means getting it into an analyzable shape. You convert it or map it into another, easier-to-look-at-format.

数据清理和整理是将原始数据转换为可用于分析的格式的过程。这涉及处理缺失值、删除重复项、处理不一致的数据,以及以使其为分析做好准备的方式格式化数据。

清理数据通常指的是去除错误/不准确的值、填充任何空白、查找重复项,以及以其他方式确保您的数据集一尘不染且可靠准确。整理它(或捣碎它、按摩它或任何其他类似的奇怪动词)意味着将它变成可分析的形状。您将其转换或映射为另一种更易于查看的格式。

Ask any data scientist what they do, and one of the first things they mention will be data cleaning and wrangling. Data never comes into your hands in a nice, clean, analyzable shape, so it’s super important to know how to get it tidy.

The ability to clean and wrangle data ensures that your analysis results are trustworthy, and helps to avoid incorrect conclusions being drawn.

问任何数据科学家他们做什么,他们提到的第一件事就是数据清理和整理。数据永远不会以漂亮、干净、可分析的形式进入您的手中,因此了解如何整理数据非常重要。

清理和整理数据的能力可确保您的分析结果值得信赖,并有助于避免得出不正确的结论。

There are plenty of great options to learn data cleaning and wrangling. Harvard offers a course on EdX. You can also practice on your own by cleaning and wrangling free, raw datasets like the Common Crawl, web crawl data composed of over 50 billion web pages (here), or Brazil’s weather data (here).

有很多很好的选择来学习数据清理和整理。哈佛大学提供有关 EdX 的课程。您还可以通过清理和整理免费的原始数据集(例如 Common Crawl)、由超过 500 亿个网页组成的网络爬网数据(此处)或巴西的天气数据(此处)来自行练习。

七、Data Mining

What is Data Mining?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn It?

Many data science terms have just been robbed from other professions, like modeling and mining. Let’s get into what it means and why it matters.

许多数据科学术语刚刚从其他专业中被抢走,比如建模和挖掘。让我们深入了解它的含义及其重要性。

Data mining is the process of extracting useful information from data through techniques like clustering, classification, and association rules. You’re sifting through the veritable flood of data to find useful golden nuggets. (Maybe data panning would have been a better name for this skill!)

数据挖掘是通过聚类分类和关联规则等技术从数据中提取有用信息的过程。您正在筛选名副其实的数据洪流,以找到有用的金块。(也许数据平移更适合这项技能!)

Imagine it: you’re a data scientist in 2023. You have data coming in from ten thousand different sources. What skill do you use to identify patterns across all these data fountains?

It’s data mining.

想象一下:你是 2023 年的数据科学家。你有来自一万个不同来源的数据。您使用什么技能来识别所有这些数据喷泉中的模式?

是数据挖掘。

Data mining is typically covered in courses that cover big data or data analytics since it’s a pretty critical component of those two skills. EdX offers a couple of options to learn data mining.

数据挖掘通常涵盖在涵盖大数据或数据分析的课程中,因为它是这两种技能中非常重要的组成部分。EdX提供了几个学习数据挖掘的选项。

八、Data Visualization

What is Data Visualization?
Why Does it Matter in Becoming a Data Scientist in 2023?
Where Can You Learn This Key Skill?

This skill is pretty self-explanatory. When you analyze numbers, key stakeholders will want to understand your findings with pretty graphs and charts.

这项技能是不言自明的。当您分析数字时,主要利益相关者会希望通过漂亮的图形和图表来了解您的发现

Data visualization is the creation of charts, graphs, and other graphics to help make data easier to understand. You take the numbers you’ve just cleaned, wrangled, or predicted and you put them into some kind of visual format, either to communicate trends with others or to make trends easier to spot.

数据可视化是创建图表、图形和其他图形,以帮助使数据更易于理解。您将刚刚清理、争论或预测的数字放入某种视觉格式中,以便与他人交流趋势或使趋势更容易发现

In 2023, being able to visualize data is crucial for a data scientist. It's like having a secret superpower for uncovering hidden patterns and trends in the data that might not be obvious at first glance. And the best part? You get to share your findings with others in a way that's both engaging and memorable.  As a data scientist, you’ll work with groups of all different experience levels, but a picture is much more easily understood than a row of numbers.

So, if you want to be a data scientist who can effectively communicate your insights and discoveries, it's important to master the art of data visualization.

到 2023年,能够可视化数据对于数据科学家来说至关重要。这就像拥有一种秘密的超能力,可以发现数据中乍一看可能并不明显的隐藏模式和趋势。最好的部分是什么?您可以以既引人入胜又令人难忘的方式与他人分享您的发现。作为一名数据科学家,您将与各种不同经验水平的团队一起工作,但一张图片比一行数字更容易理解。

所以,如果你想成为一名能够有效传达你的见解和发现的数据科学家,那么掌握数据可视化的艺术就很重要。

Here’s a list of free places to learn data viz.

这是学习数据的免费场所列表。

DataScience:数据可视化的简介(意义+优势)、常用方法(Tableau/PowerBI/QlikView等工具可视化、matplotlib/seaborn等编程可视化)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/105509230

九、Machine Learning

What is Machine Learning?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where Can You Learn This Key Skill?

No, it’s not just a buzzword! Machine learning is a very important skill for any future data scientist to know.

不,这不仅仅是一个流行语!机器学习对于任何未来的数据科学家来说都是一项非常重要的技能。

Machine learning is the application of algorithms and statistical models to make predictions and decisions based on data.

It’s a subfield of artificial intelligence that enables computers to improve their performance on a specific task by learning from data, without being explicitly programmed. It helps with automation. You’ll find it in any industry.

机器学习是应用算法和统计模型根据数据做出预测和决策。

它是人工智能的一个子领域,使计算机能够通过从数据中学习来提高其在特定任务上的性能,而无需明确编程。它有助于自动化。你会在任何行业找到它。

You need to know about machine learning in 2023 because it’s a rapidly growing field that has become a crucial tool for solving complex problems and making predictions in various industries.

Machine learning algorithms can be used to classify images, recognize speech, do natural language processing, and create recommendation systems. You’ll be hard-pressed to find an industry that doesn’t do (or doesn’t want to) do those ML-assisted tasks.

Being proficient in machine learning allows a data scientist to extract valuable insights from large and complex data sets, and to develop predictive models that can drive better business decisions.

您需要了解 2023 年的机器学习,因为它是一个快速发展的领域,已成为解决复杂问题和在各个行业进行预测的重要工具。

机器学习算法可用于对图像进行分类、识别语音、进行自然语言处理以及创建推荐系统。您将很难找到一个不做(或不想)做这些机器学习辅助任务的行业。

精通机器学习使数据科学家能够从庞大而复杂的数据集中提取有价值的见解,并开发可以推动更好的业务决策的预测模型。

We’ve got a repository of over thirty machine-learning projects on ScrataScratch to show this skill off on your resume. TensorFlow also has a set of great free resources to learn machine learning.

我们在 ScrataScratch 上有一个包含 30 多个机器学习项目的存储库,可以在您的简历上展示这项技能。TensorFlow 还有一组很棒的免费资源来学习机器学习。

十、Deep Learning

What is Deep Learning?
Why Does It Matter in Becoming a Data Scientist in 2023?
Where can you learn it?

Deep learning is subtly different from machine learning! Deep learning is a subfield of machine learning.

深度学习与机器学习有微妙的不同!深度学习是机器学习的一个子领域。

Deep learning is a facet of machine learning that focuses on creating algorithms that can learn patterns in data through multiple layers of artificial neural networks. (Artificial neural networks, by the way, are a type of machine learning algorithm modeled to be similar to the structure and function of the human brain.)

深度学习是机器学习的一个方面,它专注于创建可以通过多层人工神经网络学习数据模式的算法。(顺便说一下,人工神经网络是一种模拟人脑结构和功能的机器学习算法。)

Artificial intelligence is getting more sophisticated in 2023. It’s not enough to know the basics of AI and ML – you should be familiar with the cutting edge, too, because it won’t be cutting edge tomorrow. Deep learning was novel a few years ago, and now it’s a necessity.

Data scientists will be expected to use deep learning when companies have access to a truly vast amount of data. It’s used for image and video processing, or computer vision applications.

人工智能在 2023 年变得越来越复杂。仅了解 AI 和 ML 的基础知识是不够的——您还应该熟悉最前沿的知识,因为它明天就不是最前沿的了。深度学习在几年前还很新颖,而现在它已成为必需品

当公司能够访问真正海量的数据时,数据科学家将有望使用深度学习。它用于图像和视频处理,或计算机视觉应用

I like Simplilearn’s tutorial as a starting point.

我喜欢将Simplilearn 的教程作为起点。

十一、What Other Skills Do You Need to Know to Become a Data Scientist in 2023?

There are plenty of up-and-coming technologies and techniques that are useful to know. These are either even more advanced, like generative adversarial networks, or more soft-skills-based, like data storytelling, or specialized to a field like time series forecasting. I’ll briefly summarize these here:

有许多很有用的新兴技术和技巧。这些要么更高级,如生成对抗网络,要么更基于软技能,如数据讲故事,或专门针对时间序列预测等领域。我将在这里简要总结这些:

11.1、Natural Language Processing (NLP)

A subfield of AI that handles processing and understanding of human language. Chatbots use this.

AI 的一个子领域,负责处理和理解人类语言。聊天机器人使用这个。

11.2、Time Series Analysis & Forecasting

Time Series Analysis & Forecasting: The study of data over time and the use of statistical models to make predictions about future events. You might use this skill to do sales or revenue analysis.

随着时间的推移研究数据并使用统计模型对未来事件进行预测。您可以使用此技能进行销售或收入分析。

Math/ML:时间序列数据集/时间序列预测任务的简介、常用算法及其工具、案例应用之详细攻略

https://yunyaniu.blog.csdn.net/article/details/127156732

11.3、Experimental Design & A/B Testing

Experimental Design & A/B Testing: The process of designing and conducting controlled experiments to test hypotheses and make decisions based on data.

设计和进行受控实验检验假设并根据数据做出决策的过程。

Internet:A/B Testing即对照实验(一种数据驱动决策方法)的简介、原理、案例应用之详细攻略

https://yunyaniu.blog.csdn.net/article/details/129769649

11.4、Data Storytelling

Data Storytelling: The ability to effectively communicate data insights and findings to non-technical stakeholders. More and more stakeholders are taking an interest in the why behind data-based decisions, so this is critical.

非技术利益相关者有效传达数据见解和发现能力。越来越多的利益相关者对基于数据的决策背后的原因感兴趣,因此这一点至关重要。

11.5、Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs): A type of deep learning architecture where two neural networks are trained to work together to generate new data that resembles a given dataset.

一种深度学习架构,其中训练两个神经网络协同工作以生成类似于给定数据集的新数据

DL之GAN:生成对抗网络GAN的简介、应用、经典案例之详细攻略

https://yunyaniu.blog.csdn.net/article/details/79561020

11.6、Transfer Learning

Transfer Learning: A machine learning technique where a model is pre-trained on one task and is fine-tuned on a related task, improving performance and reducing the amount of training data needed. Smaller companies that are more resource-limited will find this useful.

一种机器学习技术,其中模型在一项任务上进行预训练,并在相关任务上进行微调,从而提高性能并减少所需的训练数据量。资源更有限的小公司会发现这很有用。

11.7、Automated Machine Learning (AutoML)

Automated Machine Learning (AutoML): A method of automating the process of selecting, training, and deploying machine learning models.

一种自动执行机器学习模型的选择、训练和部署过程的方法。

AI之MLOPS:数据科学/机器学习算法领域之工程化六大核心技术—MLOPS、模型开发(流水线/并行处理/持久化/可解释性)、模型部署(云端服务器)、模型监控、模型管理、自动化技术之详细攻略

https://yunyaniu.blog.csdn.net/article/details/130120082

AI/AutoML:人工智能领域-自动化技术之机器学习自动化技术的简介(预处理→设计算法→训练模型→优化参数)、常用工具或框架(机器对比)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/129779086

11.8、Hyperparameter Tuning

Hyperparameter Tuning: Another ML subcategory. This is the process of optimizing the performance of a machine learning model by adjusting the parameters that are not learned from the data, such as the learning rate or the number of hidden layers.

另一个 ML 子类别。这是通过调整未从数据中学习的参数(例如学习率或隐藏层数)来优化机器学习模型性能的过程。

ML/DL模型调参:机器学习和深度学习中超参数优化的简介(评估指标/过拟合)、常用调参优化方法(手动调参/随机调参/网格调参/贝叶斯调参)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/125548554

ML/DL模型调参:深度学习神经网络中参数调优的简介、超参数网格搜索的技巧总结(n_jobs/batch_size/epoch/lr/权值初始化/优化器/激活函数/Dropout 正则化/神经元个数)

https://yunyaniu.blog.csdn.net/article/details/104833424

11.9、Explainable AI (XAI)

Explainable AI (XAI): A branch of AI focused on creating algorithms and models that are transparent and interpretable, so their decision-making processes can be understood by humans. Again, helping stakeholders understand what’s happening.

工智能的一个分支,专注于创建透明和可解释的算法和模型,以便人类可以理解它们的决策过程。同样,帮助利益相关者了解正在发生的事情。

XAI/ML:可解释人工智能XAI/ 机器学习模型可解释性的简介(背景/术语解释/核心思想/意义/方法/技术/案例)、常用工具包、案例应用(DS/CV/NLP各领域)之详细攻略

https://yunyaniu.blog.csdn.net/article/details/126576465

猜你喜欢

转载自blog.csdn.net/qq_41185868/article/details/130312528