Projects reptiles explain Case II: Data Processing

Objective: the preceding data has been passed over, then the items are packaged in pipelines.py inside.

At this time, the need for data processing, how to deal with it?
First,
class StockPipeline (Object):
DEF __init __ (Self):
self.file = Open ( "executive_prep.csv", "A +"); # A +: get read and write access to the file, if not directly write, yes additional write
def process_item (self, item, spider ):
to create a class loader file #
# determine whether a file is empty, empty directly write
# is not empty then I would write additional files
if os.path.getsize ( "executive_prep.csv") and (Item not in self.file):
# start writing files
self.write_content (Item);
the else:
self.file.write ( "executive name, sex, age, ticker symbol, jobs \ n ");
self.file.flush ();
return Item;
Thus, a write follow the above, after writing a first set a csv file, and then read to ensure down without duplicating data acquisition, data acquisition code encapsulated as follows:
DEF write_content (Self, Item):
names Item = [ "names"];
sexes Item = [ "sexes"];
AGEs Item = [ "
= Codes [ "Codes"] Item;
Leaders Item = [ "Leaders"];
# all the data acquired at this time the
Result = "";
for I in Range (len (names)):
= [I] Result names + "," + sexes [I] + "," + AGEs [I] + "," + Codes [I] + "," + Leaders [I] + "\ n-";
self.file.write (Result);
This completes the processing of data, and find just the named file, it found that the data has been written. This, the first project is completed.

Guess you like

Origin www.cnblogs.com/jxxgg/p/11666852.html