Hands-on Machine Learning with Scikit-Learn&TensorFlow

Learn python program language websit:

http://learnpython.org

Data resource and Code examples download websit:

https://github.com/ageron/handson-ml

Where you can find large datasets open to the public:

UC Irvine Machine Learning Repository:

http://archive.ics.uci.edu/ml/index.php

Kaggle datasets:

https://www.kaggle.com/datasets

Amazon’s AWS datasets:

https://registry.opendata.aws/

Meta portals:

http://dataportals.org/

http://opendatamonitor.eu/

http://quandl.com/

Other pages listing many popular open data repositories:

Wikipedia’s list of Machine Learning datasets:

https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research

Quora.com question:

https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

Datasets subreddit:

https://www.reddit.com/r/datasets

Supervised

Some of the most important supervised algorithms:

k-Nearest Neighbors

Linear Regression

Logistic Regression

Support Vector Machines

Decision Trees and Random Forests

Neural networks

Unsupervised

Some of the most important unsupervised algorithms:

Clustering:

k-Means,Hierarchical Cluster Analysis,Expectation Maximization

Visualization and dimensionality reduction:

Principal Component Analysis(PCA),Kernel PCA,Locally-Linear Embedding(LLE),t-distributed Stochastic Neighbor Embedding(t-SNE)

Association rule learning(Apriori,Eclat)

Semisupervised learning

Unsupervised -> set label

Reinforcement learning

Batch learning(offline learning)

incoming date can not change algorithm

Online learning

incoming date can change algorithm

Instance-Based learning

system learns the examples by heart,then generalizes to new cases using a similarity measure

Model-Based learning

build a model ,then use that model to make predictions.

machine learning process

get date -> visualize the data to gain information -> prepare the data -> select model and train

-> fine-tune your model -> present your solution -> launch monitor and maintain your system

Classification

Multiclass Classification

Multilabel Classification

Multioutput Classification

Performance Measures

Cross-Validation

Confusion Matrix

Precision and Recall

ROC Curve

Linear Regression

Gradient Descent

Batch Gradient Descent

Stochastic Gradient Descent

Mini-batch Gradient Descent

Polynomial Regression

Regularized Linear Models

Ridge Regression

Lasso Regression

Elastic Net

Early Stopping

Logistic Regression

Estimating Probabilities

Training and Cost Function

Decision Boundaries

Softmax Regression

Example 1-1. Training and running a linear model using Scikit-Learn

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
# Load the data
oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t',
encoding='latin1', na_values="n/a")
# Prepare the data
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()
# Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression()
# Train the model
lin_reg_model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]] # Cyprus' GDP per capita
print(lin_reg_model.predict(X_new)) # outputs [[ 5.96242338]]

or use K-Nearest Neighbors regression algorithm:

replacing code:

clf = sklearn.linear_model.LinearRegression()

with this one:

clf = sklearn.neighbors.KNeighborsRegressor(n_neighbors=3)

Hands-on Machine Learning with Scikit-Learn&TensorFlow

猜你喜欢