Sesame HTTP: Scrapy Tips - MySQL Storage - Code World

Sesame HTTP: Scrapy Tips - MySQL Storage

Others 2022-04-28 19:28:28 views: 0

In the past two days, I took over from work, and the crawler left by others found a very interesting splicing of SQL scripts.

As long as your Scrapy Field field name is the same as the database field name. Then congratulations, you can copy this SQL splicing script. Perform MySQL warehousing processing.

The specific splicing code is as follows:

    def process_item(self, item, spider):
        if isinstance(item, WhoscoredNewItem):
            table_name = item.pop('table_name')
            col_str = ''
            row_str = ''
            for key in item.keys():
                col_str = col_str + " " + key + ","
                row_str = "{}'{}',".format(row_str, item[key] if "'" not in item[key] else item[key].replace("'", "\\'"))
                sql = "insert INTO {} ({}) VALUES ({}) ON DUPLICATE KEY UPDATE ".format(table_name, col_str[1:-1], row_str[:-1])
            for (key, value) in six.iteritems(item):
                sql += "{} = '{}', ".format(key, value if "'" not in value else value.replace("'", "\\'"))
            sql = sql[:-2]
            self.cursor.execute(sql) #Execute SQL
            self.cnx.commit()# write operation

This SQL splicing is implemented. If

the inserted statement)

The second for loop implements the concatenation of field name = VALUES.

And the sql in the first for loop constitutes insert into XXXXX on duplicate key update. If it exists, update the SQL statement that does not exist or insert.

I can only think of the little brother who wrote this splicing. Pretty generic.

I don't know if you have thought of this method, but I didn't.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326521198&siteId=291194637

Sesame HTTP: Scrapy Tips - MySQL Storage

Sesame HTTP: Scrapy Tips

Sesame HTTP: MySQL Storage

Sesame HTTP: TXT text storage

Sesame HTTP: JSON file storage

Sesame HTTP: Non-relational database storage

Sesame HTTP: Scrapy framework installation configuration for advanced Python crawler

Sesame HTTP: Installation of RedisDump

Sesame HTTP: PyMongo Installation

Sesame HTTP: Installation of Scrapyd

Sesame HTTP: Ansible extension

Sesame HTTP: Installation of Flask

Sesame HTTP: Installation of Appium

Sesame HTTP: Ansible extension

Sesame HTTP: The Fundamentals of Proxying

Sesame HTTP: Fundamentals of Crawler

Sesame HTTP: Installation of pyspider

Sesame HTTP: Installation of Gerapy

Sesame HTTP: Ansible extension

[python] scrapy pipeline persistent local storage and mysql storage

Sesame HTTP: Ajax result extraction

Sesame HTTP: Construction of a Collection System

Sesame HTTP: Overview of Learning to Rank

Sesame HTTP: Analyzing the Robots Protocol

Sesame HTTP: TensorFlow LSTM MNIST Classification

Sesame HTTP: How to Find Crawler Entry

Sesame HTTP: installation of redis-py

Sesame HTTP: Testing Gerapy Tutorial on Alibaba Cloud

Sesame HTTP: TensorFlow LSTM MNIST Classification

Sesame HTTP: Getting Started with TensorFlow Basics

Recommended

Ranking

error: (-215:Assertion failed) !_img.empty() in function ‘cv::imwrite‘

Database migration between Navicat servers

Minimum number of rotation of the array: Array

balenaEtcher for mac (make a boot disk software) v1.5.67

Custom processing serialization and deserialization in jackson

Mu-en-mask system development software

Mastering Regular Expressions

Find mileage Java--

Web pages can not directly concern the public micro-channel number how to do? A key to arouse public concern number of micro-channel solutions

[CodeForces - 739B] Alyona and a tree Tree + [difference] + bipartite

Daily

More

2024-05-12(28)

2024-05-11(32)

2024-05-10(34)

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)