Python笔记(读取txt文件中的数据)

在机器学习中,常常需要读取txt文本中的数据,这里主要整理了两种读取数据的方式

数据内容

  • 共有四列数据,前三列为特征值,最后一列为数据标签
40920   8.326976    0.953952    3
14488   7.153469    1.673904    2
26052   1.441871    0.805124    1
75136   13.147394   0.428964    1
38344   1.669788    0.134296    1
72993   10.141740   1.032955    1
35948   6.830792    1.213192    3
42666   13.276369   0.543880    3
67497   8.631577    0.749278    1
35483   12.273169   1.508053    3

方式一:手动读取

from numpy import *
import operator
from os import listdir

def file2matrix(filename):
    fr = open(filename)
    numberOfLines = len(fr.readlines())         #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))        #prepare matrix to return
    classLabelVector = []                       #prepare labels return   
    fr = open(filename)
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

dataMat,dataLabel = file2matrix("datingTestSet2.txt")

print(dataMat, dataLabel)

方式二:使用pandas

import numpy as np
import pandas as pd
df_news = pd.read_table('datingTestSet2.txt',header = None)
df_news

详细可以查看下面文档

猜你喜欢

转载自www.cnblogs.com/zou107/p/11904371.html