Hands-on Machine Learning with Scikit-Learn&TensorFlow

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u013452217/article/details/85785095

Learn python program language websit:

http://learnpython.org

Data resource and Code examples download websit:

https://github.com/ageron/handson-ml

Where you can find large datasets open to the public:

UC Irvine Machine Learning Repository:

http://archive.ics.uci.edu/ml/index.php

Kaggle datasets:

https://www.kaggle.com/datasets

Amazon’s AWS datasets:

https://registry.opendata.aws/                                                

Meta portals:

http://dataportals.org/

http://opendatamonitor.eu/

http://quandl.com/

 

Other pages listing many popular open data repositories:

Wikipedia’s list of Machine Learning datasets:

https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research

Quora.com question:

https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

Datasets subreddit:

https://www.reddit.com/r/datasets

 

Supervised

 

Some of the most important supervised algorithms:

    k-Nearest Neighbors

    Linear Regression

    Logistic Regression

    Support Vector Machines

    Decision Trees and Random Forests

    Neural networks

 

Unsupervised

Some of the most important unsupervised algorithms:

    Clustering:

        k-Means,Hierarchical Cluster Analysis,Expectation Maximization

    Visualization and dimensionality reduction:

        Principal Component Analysis(PCA),Kernel PCA,Locally-Linear Embedding(LLE),t-distributed Stochastic Neighbor Embedding(t-SNE)

    Association rule learning(Apriori,Eclat)

 

Semisupervised learning

Unsupervised -> set label

 

Reinforcement learning

 

Batch learning(offline learning)

incoming date can not change algorithm

 

Online learning

incoming date  can change algorithm

 

Instance-Based learning

system learns the examples by heart,then generalizes to new cases using a similarity measure

 

Model-Based learning

build a model ,then use that model to make predictions.

 

machine learning process

get date -> visualize the data to gain information -> prepare the data -> select model and train

-> fine-tune your model -> present your solution -> launch monitor and maintain your system

 

Classification

    Multiclass  Classification

    Multilabel Classification

    Multioutput Classification

 

Performance Measures

    Cross-Validation

    Confusion Matrix

    Precision and Recall

    ROC Curve

 

Linear Regression

 

Gradient Descent

    Batch Gradient Descent

    Stochastic Gradient Descent

    Mini-batch Gradient Descent

 

Polynomial Regression

 

Regularized Linear Models

    Ridge Regression

    Lasso Regression

    Elastic Net

    Early Stopping

 

Logistic Regression

    Estimating Probabilities

    Training and Cost Function

    Decision Boundaries

    Softmax Regression

 

Example 1-1. Training and running a linear model using Scikit-Learn

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
# Load the data
oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t',
encoding='latin1', na_values="n/a")
# Prepare the data
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()
# Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression()
# Train the model
lin_reg_model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]] # Cyprus' GDP per capita
print(lin_reg_model.predict(X_new)) # outputs [[ 5.96242338]]

or use K-Nearest Neighbors regression algorithm:

replacing code:

clf = sklearn.linear_model.LinearRegression()

with this one:

clf = sklearn.neighbors.KNeighborsRegressor(n_neighbors=3)


 

猜你喜欢

转载自blog.csdn.net/u013452217/article/details/85785095
今日推荐