Python crawler crawls data from mobile APP

1. Capture APP data packets

    For details of the method, please refer to this blog post: http://my.oschina.net/jhao104/blog/605963

    Get the address of the Super Curriculum Login : http://120.55.151.61/V2/StudentSkip/loginCheckV4.action

    Form:

The form includes the user name and password, which of course are encrypted, and there is also a device information, which is directly posted in the past.

    In addition, the header must be added. At first, I got a login error without adding the header, so I have to bring the header information.

 


2. Login

Login code:

import urllib2
from cookielib import CookieJar
loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
headers = {
    'Content-Type''application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent''Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host''120.55.151.61',
    'Connection''Keep-Alive',
    'Accept-Encoding''gzip',
    'Content-Length''207',
    }
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
print loginResult

 

Successful login will return a string of json data of account information

Just like the data returned when capturing packets, it proves that the login is successful.

 


3. Capture data

    Use the same method to get the url and post parameters of the topic

    The practice is the same as the simulated login website. For details, see: http://my.oschina.net/jhao104/blog/547311

    See the final code below, with home page fetching and pull-down loading updates. Topic content can be loaded infinitely.

 

#!/usr/local/bin/python2.7
# -*- coding: utf8 -*-
"""
  超级课程表话题抓取
"""
import urllib2
from cookielib import CookieJar
import json


''' 读Json数据 '''
def fetch_data(json_data):
    data = json_data['data']
    timestampLong = data['timestampLong']
    messageBO = data['messageBOs']
    topicList = []
    for each in messageBO:
        topicDict = {}
        if each.get('content'False):
            topicDict['content'] = each['content']
            topicDict['schoolName'] = each['schoolName']
            topicDict['messageId'] = each['messageId']
            topicDict['gender'] = each['studentBO']['gender']
            topicDict['time'] = each['issueTime']
            print each['schoolName'],each['content']
            topicList.append(topicDict)
    return timestampLong, topicList


''' 加载更多 '''
def load(timestamp, headers, url):
    headers['Content-Length'] = '159'
    loadData = 'timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&' % timestamp
    req = urllib2.Request(url, loadData, headers)
    loadResult = opener.open(req).read()
    loginStatus = json.loads(loadResult).get('status'False)
    if loginStatus == 1:
        print 'load successful!'
        timestamp, topicList = fetch_data(json.loads(loadResult))
        load(timestamp, headers, url)
    else:
        print 'load fail'
        print loadResult
        return False

loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
topicUrl = 'http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action'
headers = {
    'Content-Type''application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent''Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host''120.55.151.61',
    'Connection''Keep-Alive',
    'Accept-Encoding''gzip',
    'Content-Length''207',
    }

''' ---登录部分--- '''
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
loginStatus = json.loads(loginResult).get('data'False)
if loginResult:
    print 'login successful!'
else:
    print 'login fail'
    print loginResult

''' ---获取话题--- '''
topicData = 'timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
headers['Content-Length'] = '147'
topicRequest = urllib2.Request(topicUrl, topicData, headers)
topicHtml = opener.open(topicRequest).read()
topicJson = json.loads(topicHtml)
topicStatus = topicJson.get('status'False)
print topicJson
if topicStatus == 1:
    print 'fetch topic success!'
    timestamp, topicList = fetch_data(topicJson)
    load(timestamp, headers, topicUrl)

result:

Please indicate the source for reprint: http://my.oschina.net/jhao104/blog/606922

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325560249&siteId=291194637