Panda read_csv()把第一行的数据变成了列名，怎么处理

前言

有些时候，我们会遇到很多这样的数据，比如，这个csv的第一行并不是我们想象中的那样是一个列名。那样，我们处理数据的时候，就会出现问题，第一个不一致了嘛。

解决方案1

调用csv库，自己重新编写读文件的程序。

csv库，是python自带的库。

如果数据都是字符类型

这样的条件下，问题是非常简单，直接调用csv.reader()这个迭代器来读取就好了。

如果数据中除了有字符串还有数字的话

下面我给一种解决的方法。

def float_test(data: str):
    try:
        return float(data)
    except Exception:
        return data


def read(filename):
    """
    :param filename:
    :return:
    """
    values = []
    with open(filename) as f:
        r = csv.reader(f)
        for row in r:
            values.append(list(map(float_test, row)))
    *data, label = list(map(list, zip(*values)))
    return list(zip(*data)), label

这个涉及到了之前的我写过的一篇文章机器学习算法【感知机算法PLA】【5分钟读完】
在上面的这个代码中，我需要读取训练感知机的模型，但是发现给我的数据没有列名，不想要改数据，所以，就只有这么先封装咯~
这个数据中，每一行的除了最后一列有可能是元素之外，其他都是浮点数。，所以，我就在这调用了float_test这个函数，来做测试。

最后两行，还有返回的那里是在做什么呢？其实就是，我想把最后一列给分出来，然后把其他恢复为一个二维的矩阵，每一行都是一个测试的X。

解决方法2

设置参数！！

参照pandas给出的read_csv这个函数的API解释：
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

其中有句话讲到了：

header : int or list of ints, default ‘infer’
- Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
names : array-like, default None
- List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list will cause a UserWarning to be issued.

关于names这个参数上说到，当文件没有涵盖有header的话，那么你需要在header参数中明确指出！！

这个就是正确解释，所以正确的操作是（**以需要读取一个1.csv**文件为例）

import pandas as pd

df = pd.read_csv('1.csv', header=None, names=['test'])

那么这个没有列名的列就会被设置为test列~

Panda read_csv()把第一行的数据变成了列名，怎么处理

前言

解决方案1

如果数据都是字符类型

如果数据中除了有字符串还有数字的话

解决方法2

猜你喜欢