The way Python handles CSV, JSON, and XML data is fantastic

Python, especially its excellent simplicity and ease of use, has become the first choice for network programming languages, and the first choice for data and programming languages. Its main database and algorithm libraries have become the preferred language for Python entry data science.

In daily use, three data formats, CSV, JSON and XML, dominate. Below I will share its fast processing methods for three data formats.

CSV data

CSV is the most common method of storing data. These data in Kaggle competitions are stored in this way. CSV can be read and written using the built-in Python csv library. list.

Take a look at the code below. When we run csv.reader() all CSV data is accessible. The csvreader.next() function reads one line from the CSV; each time it is called, it moves to the next line. We can also loop through each row of the csv using a row in csvreader. Make sure you have the same number of columns in each row, otherwise you might end up with some bugs when dealing with lists of lists.

import csv 

filename = "my_data.csv"

fields = [] 
rows = []   
# Reading csv file 
with open(filename, 'r') as csvfile: 
    # Creating a csv reader object 
    csvreader = csv.reader(csvfile) 

    # Extracting field names in the first row 
    fields = csvreader.next() 

    # Extracting each data row one by one 
    for row in csvreader: 
        rows.append(row)  
# Printing out the first 5 rows 
for row in rows[:5]: 
    print(row)

Writing CSV in Python is easy. Field names set in a single, and in a list. We'll create a list writer() object and use the data to write our data to the file, basically the same way we read it.

import csv 

# Field names 
fields = ['Name', 'Goals', 'Assists', 'Shots'] 

# Rows of data in the csv file 
rows = [ ['Emily', '12', '18', '112'], 
         ['Katie', '8', '24', '96'], 
         ['John', '16', '9', '101'], 
         ['Mike', '3', '14', '82']]

filename = "soccer.csv"

# Writing to csv file 
with open(filename, 'w+') as csvfile: 
    # Creating a csv writer object 
    csvwriter = csv.writer(csvfile) 

    # Writing the fields 
    csvwriter.writerow(fields) 

    # Writing the data rows 
    csvwriter.writerows(rows)

We can use Pandas to convert the CSV to a fast one-line list of dictionaries. After formatting the data into a list of dictionaries, we will use the dicttoxml library to convert it to XML format. We save it as a JSON file!

import pandas as pd
from dicttoxml import dicttoxml
import json

# Building our dataframe
data = {'Name': ['Emily', 'Katie', 'John', 'Mike'],
        'Goals': [12, 8, 16, 3],
        'Assists': [18, 24, 9, 14],
        'Shots': [112, 96, 101, 82]
        }

df = pd.DataFrame(data, columns=data.keys())

# Converting the dataframe to a dictionary
# Then save it to file
data_dict = df.to_dict(orient="records")
with open('output.json', "w+") as f:
    json.dump(data_dict, f, indent=4)

# Converting the dataframe to XML
# Then save it to file
xml_data = dicttoxml(data_dict).decode()
with open("output.xml", "w+") as f:
    f.write(xml_data)

JSON data

JSON provides a concise and easy-to-read format that maintains a dictionary-like structure. Just like CSV, Python has a built-in JSON module that makes reading and writing very fast! When we read the CSV in simple dictionary format, then our dictionary format data is written to the file.

import json
import pandas as pd

# Read the data from file
# We now have a Python dictionary
with open('data.json') as f:
    data_listofdict = json.load(f)

# We can do the same thing with pandas
data_df = pd.read_json('data.json', orient='records')

# We can write a dictionary to JSON like so
# Use 'indent' and 'sort_keys' to make the JSON
# file look nice
with open('new_data.json', 'w+') as json_file:
    json.dump(data_listofdict, json_file, indent=4, sort_keys=True)

# And again the same thing with pandas
export = data_df.to_json('new_data.json', orient='records')

As we've seen before, we get data that can be easily converted to CSV via pandas or using the built-in Python CSV module. When converting to XML, you can use the dicttoxml library. The specific code is as follows:

import json
import pandas as pd
import csv

# Read the data from file
# We now have a Python dictionary
with open('data.json') as f:
    data_listofdict = json.load(f)

# Writing a list of dicts to CSV
keys = data_listofdict[0].keys()
with open('saved_data.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(data_listofdict)

XML data

XML is different from CSV and CSV. CSV and JSON are easy to read and write JSON interpretation due to their simplicity and speed. And take up more memory, transfer storage and need more space, more storage space and longer runtime. And XML has some extra features based on JSON and CSV: internal space you use to build and share structural standards, better presentation, and industry standards for data representation using XML, DTDs, etc.

To read in XML data, we'll use Python's built-in XML module and submodule ElementTree. We can convert ElementTree objects to dictionaries using the xmltodict library. as follows:

import xml.etree.ElementTree as ET
import xmltodict
import json

tree = ET.parse('output.xml')
xml_data = tree.getroot()

xmlstr = ET.tostring(xml_data, encoding='utf8', method='xml')


data_dict = dict(xmltodict.parse(xmlstr))

print(data_dict)

with open('new_data_2.json', 'w+') as json_file:
    json.dump(data_dict, json_file, indent=4, sort_keys=True

Here I would like to recommend the Python learning Q group I built by myself: 1020465983. Everyone in the group is learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and shares dry goods from time to time ( Only related to Python software development),
including a copy of the latest Python advanced materials and zero-based teaching in 2021 that I have compiled by myself. Welcome to the advanced middle and small partners who are interested in Python!

Guess you like

Origin blog.csdn.net/weixin_56659172/article/details/124266901