querycsv.py 使用Sql语句查询csv文件

下载安装配置

querycsv.py下载链接：https://pypi.org/project/querycsv/#files。下载该包并解压，进入到改包目录下执行命令：

python setup.py install

安装querycsv.py脚本，需要把python安装目录下的Script添加到环境变量中，下图给出，python2.7的配置方法：

安装配置完成后。在命令行运行：querycsv.py -h。查看使用说明：

操作实例

材料准备

使用的csv文件为：Demo.csv,Demo2.csv,内容如下：

Demo.csv

ID,NAME,AGE,MARKS
1,Test1,21,88
2,Test2,22,89
3,Test3,23,90
4,Test4,24,91
5,Test5,25,92
6,Test6,26,93
7,Test7,27,94
8,Test8,28,95
9,Test9,29,96
10,Test10,30,97
11,Test11,31,98

Demo2.csv

ID,JP,ADRR
1,Level 1,China 1
2,Level 2,China 2
3,Level 3,China 3
4,Level 4,China 4
5,Level 5,China 5
6,Level 6,China 6
7,Level 7,China 7
8,Level 8,China 8
9,Level 9,China 9
10,Level 10,China 10
11,Level 11,China 11

查询单个、多个csv操作实例

查询Demo.csv文件中ID为6的客户所有信息如下：


querycsv.py -i Demo.csv "select * from Demo  where  id='6'"

查询Demo.csv、Demo2.csv文件中ID大于5的所有客户信息：

querycsv.py -i Demo.csv -i Demo2.csv "select t.id,t.name,t.age,t.marks,t2.jp,t2.adrr from Demo t left join Demo2 t2 on t2.id=t.id  where t.id>'5'"

通过多个-i，创建多张表，在表中使用左连接，把两个CSV文件通过ID进行左关联，获取id大于5的，这比较时基于ASCII进行比较，"5"在ASCII码表中对应的是53,"10"、"11"的第一个字符为"1"，"1"在ASCII码表中对应值为49，49小于53，所以ID为"10"、"11"的数据不显示。

使用sql脚本文件

示例脚本文件命名为：Demo.sql，sql语句要以;结尾，不然报错

select t.id,t.name,t.age,t.marks,t2.jp,t2.adrr 
from Demo t left join Demo2 t2 on t2.id=t.id 
where t.id>'5';

操作命令为：

querycsv.py -i Demo.csv -i Demo2.csv -s Demo.sql

把执行结果输入到一个csv文件：

使用-o参数，输出到result.csv文件，操作命令如下：

querycsv.py -i Demo.csv -i Demo2.csv -o result.csv "select t.id,t.name,t.age,t.marks,t2.jp,t2.adrr from Demo t left join Demo2 t2 on t2.id=t.id  where t.id>'5'"

-o参数要在-s参数之前使用，不然会出错。

编码格式设置为“utf-8”，不然会提示"UnicodeEncodeError：‘ascii’....range(128)"，问题处理方法如下代码：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

如果设置为utf-8提示“IOError：[Errno 0] Error”，需要在def pp(cursor)的末端加上“decode('utf-8')”调整后的完整代码如下：

## pp() function by Aaron Watters, posted to [email protected] 1999-01-18
## modified version
## Taken from sqliteplus.py by Florent Xicluna
def pp(cursor):
    rows = cursor.fetchall()
    desc = cursor.description
    if not desc:
        return rows
    names = [d[0] for d in desc]
    rcols = range(len(desc))
    rrows = range(len(rows))
    maxen = [max(0,len(names[j]),*(len(str(rows[i][j]))
             for i in rrows)) for j in rcols]
    names = ' '+' | '.join(
            [names[j].ljust(maxen[j]) for j in rcols])
    sep = '='*(reduce(lambda x,y:x+y, maxen) + 3*len(desc) - 1)
    rows = [names, sep] + [' '+' | '.join(
            [str(rows[i][j]).ljust(maxen[j])
            for j in rcols] ) for i in rrows]
    return '\n'.join(rows)+(
           len(rows)==2 and '\n no row selected\n' or '\n').decode('utf-8')