Perl Project Improvement(3)Perl and XML, IDE and Regex and SQS

Perl Project Improvement(3)Perl and XML, IDE and Regex and SQS

1 XML PERL Operation
Search for <referencenumber> and ignore if there is <![CDATA[ ]], generate the contents and put into one file
> time perl -ne 'if (/referencenumber/){ s/<!\[CDATA\[//; s/]]>//; s/.*?>//; s/<.*//; print;}' 1052.xml > referencenumber.xml
real 0m7.773s
user 0m7.084s
sys 0m0.564s
time is just a measure tool for how much time it used to execute the command. It only take 7 seconds to search that in a 2G xml files.
749,999 lines, 749,999 words and 24,525,055 characters.
> wc referecenumber.xml
  749999  749999 24625055 referecenumber.xml

Another command
> grep 'referencenumber' /data/12001.xml | awk -F"</?referencenumber>" '{ print $2}'

> time grep 'referencenumber' /data/1052.xml | awk -F"</?referencenumber>" '{ print $2}' > /data/referencenumber.xml

real 0m44.050s
user 0m45.492s
sys 0m0.809s

2 Env IDE Setting Up
Plugin for Perl on Eclipse
http://www.epic-ide.org/download.php
Download a small Eclipse only for Java
http://www.eclipse.org/downloads/
Set Up the Plugin
http://www.epic-ide.org/running_perl_scripts_within_eclipse/eclipse-runperl-figure4.png

Once I have the latest JAVA only Eclipse there, I will add the Perl Plugin
http://www.epic-ide.org/updates/testing

After install that, we can set up the eclipse Preference with Perl
Perl executatble “/Users/carl/tool/perl-5.16.3/bin/perl"

The select the Project Properties, setting these things:
Perl Include Path —> Add to List ${project_loc}

Set up the Unit tests
[Run] -> [External Tools]->[External Tools Configurations]->[Program] -> New
RunAllTest
      - Location: /Users/carl/tool/perl-5.16.3/prove
      - Working Directory: ${workspace_loc}:/jobs-producer-perl}
      - Arguments: ${build_files:t/*}
SingleTest
      - Location: /Users/carl/tool/perl-5.16.3/perl
      - Working Directory: ${workspace_loc}:/jobs-producer-perl}
     - Arguments: t/NumberUtil.t
PerlApp
     - Location: /Users/carl/tool/perl-5.16.3/perl
     - Working Directory: ${workspace_loc}:/jobs-producer-perl}
     - Arguments: JobProducerApp.pl

3 Perl Regex and Command Supporting
Perl, method to generate the difference files.
sub generateReferenceNumbers {
die "Wrong arguments" if @_ != 2;

#serivces
my $logger = &loadLogger();

my $hugeFileName = $_[0];
my $source_id = $_[1];

#prepare 2 arrays
my @redisArray = ();
my @xmlArray;

#big File location should be from parameters
#output reference number file should be in the same directory
my $bigFile = "/data/1052.xml";
my $referencenumberFile = "/data/referencenumber.xml";

#command to regex the reference numbers
`perl -ne "if (/referencenumber/){ s/<referencenumber>//; s/<\\/referencenumber>//; s/<!\\[CDATA\\[//; s/]]>//; s/\\s*\\t*//; print; }" $bigFile > $referencenumberFile`;

#read and trim the reference numbers from file to array
open(my $fileHandler, "<", $referencenumberFile) or die "Failed to open file: $!\n";
while(<$fileHandler>) {
          chomp;
          push @xmlArray, $_;
}
close $fileHandler;

#find the differences
my @differencesArray = lib::CollectionUtil::differenceInArrays(\@xmlArray,\@redisArray);

#logging and testing the difference
#$logger->info("the difference array = @differencesArray");
#my $first = $differencesArray[0];
#$logger->info("===$first==");

#output the difference to XML and send 2 next steps

}

The most important part is this:
`perl -ne "if (/referencenumber/){ s/<referencenumber>//; s/<\\/referencenumber>//; s/<!\\[CDATA\\[//; s/]]>//; s/\\s*\\t*//; print; }" $bigFile > $referencenumberFile`

-ne means we can put regex there to find the match.
s/<referencenumber>// means once we find the match, replace <referencenumber> to empty ‘’| //
s/<\\/referencenumber>// means replace </referencenumber> to empty
s/<!\\[CDATA\\[>// means replace <![CDATA[> to empty
s/]]>// means replace ]]> to empty
s/\\s*\\t*// means replace all the blank, tap characters to empty

Read lines of the file and push to array
open(my $fileHandler, "<", $referencenumberFile) or die "Failed to open file: $!\n";
while(<$fileHandler>) {
          chomp;
          push @xmlArray, $_;
}
close $fileHandler;

3 SQS
http://search.cpan.org/~penfold/Amazon-SQS-Simple-2.04/lib/Amazon/SQS/Simple.pm
http://search.cpan.org/~penfold/Amazon-SQS-Simple-2.04/
> cpan -fi Amazon::SQS::Simple

Error Message:
ERROR [try ]: On calling SendMessage: 501 Protocol scheme 'https' is not supported (LWP::Protocol::https not installed) at lib/QueueClientHandler.pm line 39.

Solution:
> cpan -fi LWP::Protocol::https

Error Message:
t/QueueClientHandler.t (Wstat: 0 Tests: 2 Failed: 0)
  Parse errors: No plan found in TAP output

Solution:
Change to logging the output, not print

Some core Classes, QueueClientHandler.pm
use strict;
use warnings;

use lib::CollectionUtil;
use Amazon::SQS::Simple;

package lib::QueueClientHandler;

sub init {
  my $configService = &loadService('configService');
  my $logger = &loadLogger();


  $logger->debug("init SQS connection-----");

  $logger->debug("--------------------------");

  my $access_key = 'AKIAIMxxxxxxx'; # Your AWS Access Key ID
  my $secret_key = 'BIr5Xlu1xxxxxxxx'; # Your AWS Secret Key

  my $register = IOC::Registry->instance();
  my $container = $register->getRegisteredContainer('JobsProducer');


  my $queueClient = new Amazon::SQS::Simple($access_key, $secret_key);

  $container->register(IOC::Service->new('queueService'
               => sub { $queueClient }));

  return 1;
}

sub sendMessage(){
my $queueService = &loadService('queueService');
my $endpoint = 'https://sqs.us-east-1.amazonaws.com/216323611345/stage-tasks';
my $taskQueue = $queueService->GetQueue($endpoint);
my $response = $taskQueue->SendMessage('Hello world!');
}

sub fetchMessage(){
# Retrieve a message
my $queueService = &loadService('queueService');
my $logger = &loadLogger();


my $endpoint = 'https://sqs.us-east-1.amazonaws.com/216323611345/stage-tasks';
my $taskQueue = $queueService->GetQueue($endpoint);
    my $msg = $taskQueue->ReceiveMessage();

    #$msg->MessageBody
    #print $msg->MessageBody() ;
    if($msg){
    $logger->info("Message I get is = ". $msg->MessageBody());
    # Delete the message
    $taskQueue->DeleteMessage($msg->ReceiptHandle);
    }

}

sub loadService {
   #check parameters
   die "Wrong arguments" if @_ != 1;

   my $serviceName = $_[0];
   my $register = IOC::Registry->instance();

   my $service = $register->searchForService($serviceName)
        || die "Failt to find the service name = " . $serviceName . " in RedisClientHandler.";

   return $service;
}

sub loadLogger {
   my $logger = Log::Log4perl::get_logger("lib::RedisClientHandler");
   return $logger;
}

1;

__END__

Test Class to Send the Messages, QueueClientHandler.t
use strict;
use warnings;

use Test::More qw(no_plan);

use Log::Log4perl::Level;
use Log::Log4perl qw(:easy);

use YAML::XS qw(LoadFile);
use Data::Dumper;
use Cwd;

use IOC;

# Verify module can be included via "use" pragma
BEGIN { use_ok('lib::QueueClientHandler') };

# Verify module can be included via "require" pragma
require_ok( 'lib::QueueClientHandler' );

#init the test class
#logging
Log::Log4perl->init(cwd() ."/conf/log4perl-test.conf");
our $logger = Log::Log4perl::get_logger("JobsProducer");

#load configuration
my $config = LoadFile(cwd() .'/conf/config.yaml');
$logger->debug("----init configuration --------");
$logger->debug(Dumper($config));
$logger->debug("-------------------------------");

my $container = IOC::Container->new('JobsProducer');

$container->register(IOC::Service->new('configService'
               => sub { $config } ));

my $register = IOC::Registry->new();
$register->registerContainer($container);

# Test the Init Operation
lib::QueueClientHandler::init();

lib::QueueClientHandler::sendMessage();

#lib::QueueClientHandler::fetchMessage();

Consumer Pulling the Messages, TaskConsumerApp.pl
# import advertiser job feeds
#
# usage: $0  stop  stop after current batch
#        $0  start import loop

use strict;
use warnings;

use IOC;

use Log::Log4perl::Level;
use Log::Log4perl qw(:easy);

use YAML::XS qw(LoadFile);
use Data::Dumper;
use lib::MysqlDAOHandler;
use lib::RedisClientHandler;
use lib::FeedFileHandler;
use lib::JobImportHandler;
use lib::StringUtil;
use lib::NumberUtil;
use lib::QueueClientHandler;

use threads;
use threads::shared;
use Time::Piece;
use Cwd;

use constant FLAG_PID => 'JOBS_PRODUCER_RUNNING';

my $runningEnv =  $ENV{'RUNNING_ENV'};

#logging
Log::Log4perl->init(cwd() . "/conf/log4perl-${runningEnv}.conf");
my $logger = Log::Log4perl::get_logger("JobsProducer");

#IOC
my $container = IOC::Container->new('JobsProducer');
my $register = IOC::Registry->new();
$register->registerContainer($container);

#configuration
my $config = LoadFile(cwd() . "/conf/config-${runningEnv}.yaml");
$logger->debug("----init configuration --------");
$logger->debug(Dumper($config));
$logger->debug("-------------------------------");
$container->register(IOC::Service->new('configService'
               => sub { $config } ));

#receive params
my $pidFileName = $config->{pidFilePath} . FLAG_PID;

# data file path
my $dataFilePath = $config->{dataFilePath};

# php script path
my $phpScriptPath = $config->{phpScriptPath};

#my $MAX_SPLIT_SIZE = 100_000_000; #max split file size
my $MAX_SPLIT_SIZE = $config->{maxSplitFileSize};

if (@ARGV == 1) {
  if ($ARGV[0] eq 'stop') {
  system 'touch ' . $pidFileName;
    $logger->info("Application is stopping.");
  }
  $logger->info("Application is running on $runningEnv\n");
} else{
print "Usage: $0 start/stop";
exit 1;
}

unlink $pidFileName;

#init database connection
lib::MysqlDAOHandler::init();

#init redis connection
lib::RedisClientHandler::init();

#init queue connection
lib::QueueClientHandler::init();

#main thread pulling from mysql
#multiple thread downloading the file
#single thread split the file
#multiple threads execute the php import

##################################################################
# Main Processor
##################################################################

$logger->info("Start the Main thread.");

while (!-f $pidFileName) {
#keep running in main thread

$logger->info("Main-Thread - Scanning for tasks");

    lib::QueueClientHandler::fetchMessage();

sleep 15;
}

$logger->info("Main-Thread - JobsProducerApp stop running.");

__END__

References:
http://sillycat.iteye.com/blog/2304196
http://sillycat.iteye.com/blog/2304197

猜你喜欢

转载自sillycat.iteye.com/blog/2308430
今日推荐