Correct data science learning Python, white learning Python

Most aspiring data scientists are learning by developers offer programming courses began to realize the python, python programming they have begun to solve the problem on similar leetcode website. They think before you start using python analyze data, you must be familiar with programming concepts.

Manu Jeevan senior data analyst believes that this is a huge mistake, because the data scientists use python to retrieve the data, cleansing, visualization and model building, rather than developing software applications. In fact, in order to accomplish these tasks, most of the time you have to focus on python modules and libraries to learn.

Correct data science learning Python, white learning Python

Please follow the steps below to learn science data python.

If you are still confused in the programming world, you can join us to learn Python buckle qun: 784758214, look at how seniors are learning. Exchange of experience. From basic web development python script to, reptiles, django, data mining and other projects to combat zero-based data are finishing. Given to every little python partner! Share some learning methods and need to pay attention to small details, click on Join us python learner gathering

Configuration Programming Environment

Jupyter Notebook is a project to develop and demonstrate scientific data powerful programming environment.

Jupyter Notebook easiest way to install on your computer is installed by Anaconda. Anaconda is the most widely used scientific data of python tool that preloaded with all of the most popular libraries.

You can browse titled "A Beginner's Guide to Installing Jupyter Notebook Using Anaconda Distribution," the blog article to learn how to install Anaconda. When installing Anaconda, please select the latest python 3 version.

After installing Anaconda, please read this article Code Academy to learn how to use Jupyter Notebook.

Only to learn the basics of python

Code Academy has an excellent course on the python, it takes about 20 hours to complete. You do not have to upgrade to the pro version, because your goal is just to be familiar with the basics of python programming language.

NumPy and Pandas, great learning resources

When computationally intensive algorithms and processing large amounts of data, python slower. You may ask, in that case why python is data science's most popular programming language?

The answer is, in python, it is easy to form C or Fortran extensions will be transferred to the underlying digital processing tasks. This is what NumPy and Pandas do.

First, you should learn NumPy. It is the most basic scientific computing module with python. NumPy support highly optimized multidimensional array, which is what most machine learning algorithms most basic data structure.

Next, you should learn Pandas. Data scientists spend most of their time cleaning data, which is also known as data whole.

Pandas are the most popular operating data python library. Pandas is an extension of NumPy. Pandas underlying code is widely used NumPy library. Pandas main data structures referred to as a data frame.

Pandas creator Wes McKinney wrote a great book called "Python for Data Analysis". In Chapter 4,5,7,8 and 10 of the book you can learn Pandas and NumPy. These sections cover the most common characteristics Pandas NumPy and process data.

Learn to use visual data Matplotlib

Matplotlib is essential for creating a basic package python visualization graphics. You have to learn how to use Matplotlib create some of the most common charts such as line charts, bar charts, scatter plots, histograms, and block diagram.

Another built on Matplotlib and closely integrated with the Pandas good graphics library is Seaborn. At this stage, I suggest you quickly learn how to create a basic chart in Matplotlib, rather than focusing on Seaborn.

I wrote a tutorial on how to use Matplotlib develop basic drawing, The tutorial consists of four components.

  • First part: Matplotlib draw basic FIG.
  • Part II: how to control the style and color graphics, indicia, line thickness, line pattern and color mapping using
  • Part III: annotation, scope of the control shaft, and the aspect ratio of the coordinate system
  • Part IV: handles complex graphics

You can master the basic knowledge Matplotlib through these tutorials.

In short, you do not have to spend too much time learning Matplotlib, because now the company has started using Tableau and Qlik and other tools to create interactive visualizations.

How to use SQL and python

Data organized in the database resides. Therefore, you need to know how to use SQL to retrieve data and perform analysis in Jupyter Notebook use python.

Pandas data using SQL and scientists to manipulate data. There are some SQL data manipulation tasks can be performed easily, and there are some tasks can be done efficiently using the Pandas. I personally prefer to use SQL to retrieve data and operate the Pandas.

Today, the company uses Mode Analytics and other analytics platform and Databricks to easily use python, and SQL.

So, you should know how to effectively use with SQL and python. To understand this, you can install SQLite database on your computer, and store a CSV file and then use SQL python and analyze them.

Here is a great blog article shows you how to do it: Programming with Databases in Python using SQLite.

Before reviewing the above blog post, you should understand the basics of SQL. There is a good tutorial on SQL on Mode Analytics: Introduction to SQL. By their basic part of SQL, basic knowledge of SQL, each data scientists should know how to use SQL to effectively retrieve data.

Python-related learning and knowledge of basic statistics

Most aspiring data scientists without learning the basics of statistics and machine learning to jump directly to the knowledge of learning.

Do not make this mistake, because statistical data is the backbone of science. Moreover, many scientists studying statistics data just learning theoretical concepts, rather than study and practice the concepts.

I mean, through the concept of practice, you should know what kind of problems can be solved using statistics, usage statistics to understand what challenges can be resolved.

Here's what you should know some basic statistical concepts:

  • Sampling, frequency distribution, mean, median, mode, variability metric, the probability basis, significant test, standard deviation, z scores, confidence intervals and hypothesis testing (including A / B test).

To learn knowledge, there is a very good book can take a look: "Practical Statistics for Data Scientists: 50 Essential Concepts". Unfortunately, the code examples in this book is written in R, but a lot of people including myself using Python.

I suggest you read the first four chapters of this book. Read the first four chapters of this book, I understand basic statistical concepts mentioned earlier, you can ignore the code examples, just to understand these concepts. The remaining chapters of the book focus on machine learning. I will discuss in the next section to learn how machine learning.

Most people recommended Think Stats statistical knowledge to learn the python, but the book's author, Professor own custom functions, rather than using the standard python library for statistical knowledge to explain. So I do not recommend this book.

Next, your goal is to learn the basic concepts in Python. StatsModels is a popular python library for building statistical models in python. StatsModels website provides an excellent tutorial on how to use Python to achieve statistical concepts.

Alternatively, you can also watch the video Gaël Varoquaux. He shows how to use a statistical model and Pandas and exploratory statistical inference to you.

Use Scikit-Learn machine learning

Scikit-Learn Python is one of the most popular machine learning library. Your goal is to learn how to use Scikit Learn some of the most common machine learning algorithms.

You should do it like this.

  • First, watch Andrew Ng machine-learning courses on Coursera's 1,2, 3, 6, 7 and 8 weeks video. I skipped the part about neural networks, because as a beginner, you have to focus on the most common machine learning techniques.
  • Upon completion, reading "Hands-On Machine Learning with Scikit-Learn and TensorFlow" a book. You only need to visit this first part of the book (about 300), which is one of the most practical machine learning books.
  • By completing coding exercises in this book, you will learn how to use python to achieve theoretical concepts you learned in the course Andrew Ng.

in conclusion

The final step is to cover all the steps to make a science project data. You can find your favorite set of data, then presented an interesting business issues, and then to answer these questions by analyzing. However, do not select the universal data set like the Titanic.

Another method is to apply scientific data to your area of ​​interest. For example, if you want to predict the stock market price, then you can get real-time data from Yahoo Finance, and the SQL database, and then use it to store machine learning to predict stock prices.

Guess you like

Origin blog.51cto.com/14510224/2438396