Commodity review analysis 1

Compared with customer behavior (click habits, access depth, demand association, etc.) or customer attributes (age group, gender, region, etc.), positive inquiries and feedback from customers are particularly important. At present, there are probably two directions for positive inquiries. One is intelligent archiving after voice-to-text conversion of customer calls, and the other is human-computer interaction (ASR or online text); feedback is user suggestions, comments or questions, etc. wait.

  This article is about the analysis of the feedback. One is that the information that is positively inquired is often only available within the company, and the other is that feedback analysis is also of great significance. Personal opinion, for enterprises, feedback analysis can understand the user's concerns; for competitors, since positive information is difficult to obtain, analyzing the opponent's user feedback can understand the opponent; for ordinary shoppers, negative reviews of products are often rarely or even covered up , At this time, obtaining feedback from other buyers and pinpointing negative comments can help you judge (sometimes a product has many advantages, but there are some details that may not be acceptable to you).

  The author's idea is to collect comments on a certain Tmall product, and analyze the differences between products of different specifications, user concerns, product problems, etc. based on the reviews.

  Steps: collecting comments - data processing - data analysis

1. Collect comment text

There are many ways to collect, crawlers (via comment links), with the help of collectors (like octopus), and writing automation programs, such as using python (selenium, pyautogui). The author uses the function of uibot plus a custom plug-in to collect and improve the efficiency~

1. UIbot custom plug-in

    uibot supports IE browser (others include Google and Firefox, I haven't tried it yet), the plug-in supports python, and the customized functions are placed in the extend folder, and then the involved python packages are placed in the lib\site-packages folder. It is recommended to restart uibot after placing the plug-in, because sometimes it will prompt that it is not found. The author's custom function is to find the product title (regular expression) and standardize the content collected by uibot, which is convenient for writing into excel.

Plugin code:

def getallwt():
    titles = pygetwindow.getAllTitles()
    return titles
def arrtmall(arr):
    arr1=[]
    arr2=[]
    for i in range(len(arr)):
        for j in range(3):#len(arr[i])
            arr1.append(arr[i][j])
    for i in range(0,len(arr1),3):
            arr2.append(arr1[i:i+3])
    return arr2
def sleeptime(t):
    time.sleep(t)

2. Collect comments

The method first collects the comments on the first page, and then collects the subsequent ones in a loop.

Problems encountered during this period and solutions:

(1) After uibot finds the "next page" element, it just stays at the bottom of the page, and the mouse cannot be clicked. The solution is to move to the element first, then simulate scrolling, then click.

(2) Turning the page too fast triggers Tmall's anti-climbing mechanism (pop-up verification box). The solution is to delay the process. The author set the delay to 5s, and the verification box did not appear.

Effect picture: 99 pages were collected (it seems to be up to 99?), and the 3 fields are comments, categories and buyers (you can also collect the time if necessary later)

uibot code:

title=XX//打开的商品链接浏览器标题
kfExcelWorkBook = Excel.OpenExcel("D:\\tmall评论爬取.xlsx",true)
win=getwindow.getwt(title)
TracePrint win
Window.SetActive({"wnd":[{"cls":"IEFrame","title":win,"app":"iexplore"}]})
tmall = UiElement.DataScrap({"html":{"attrMap":{"parentid":"J_Reviews","tag":"TABLE"},"index":0,"tagName":"TABLE"},"wnd":[{"app":"iexplore","cls":"IEFrame","title":win},{"cls":"Internet Explorer_Server"}]},{"Columns":[],"ExtractTable":1},{"objNextLinkElement":"","iMaxNumberOfPage":5,"iMaxNumberOfResult":-1,"iDelayBetweenMS":1000,"bContinueOnError":false})
// TracePrint tmall
arr=arrtmall1.arrtmall(tmall)
// TracePrint arr
rows = Excel.GetRowsCount(kfExcelWorkBook,"Sheet1")
Excel.WriteRange(kfExcelWorkBook,"Sheet1","A"&rows+1,arr,true)
Excel.Save(kfExcelWorkBook)
// 爬取页数
page=100
c=2
Do While c<=page
    Window.SetActive({"wnd":[{"cls":"IEFrame","title":win,"app":"iexplore"}]})
    Mouse.Hover({"html":[{"aaname":"下一页>>","parentid":"J_Reviews","tag":"A"}],"wnd":[{"app":"iexplore","cls":"IEFrame","title":win},{"cls":"Internet Explorer_Server"}]},10000,{"bContinueOnError":false,"iDelayAfter":300,"iDelayBefore":200,"bSetForeground":true,"sCursorPosition":"Center","iCursorOffsetX":0,"iCursorOffsetY":0,"sKeyModifiers":[],"sSimulate":"simulate"})
    Mouse.Wheel(1,"down", [],{"iDelayAfter":300,"iDelayBefore":200})
    Mouse.Action({"html":[{"aaname":"下一页>>","parentid":"J_Reviews","tag":"A"}],"wnd":[{"app":"iexplore","cls":"IEFrame","title":win},{"cls":"Internet Explorer_Server"}]},"left","click",10000,{"bContinueOnError":false,"iDelayAfter":300,"iDelayBefore":200,"bSetForeground":true,"sCursorPosition":"Center","iCursorOffsetX":0,"iCursorOffsetY":0,"sKeyModifiers":[],"sSimulate":"simulate"})
    arrtmall1.sleeptime(5)
    tmall = UiElement.DataScrap({"html":{"attrMap":{"parentid":"J_Reviews","tag":"TABLE"},"index":0,"tagName":"TABLE"},"wnd":[{"app":"iexplore","cls":"IEFrame","title":win},{"cls":"Internet Explorer_Server"}]},{"Columns":[],"ExtractTable":1},{"objNextLinkElement":"","iMaxNumberOfPage":5,"iMaxNumberOfResult":-1,"iDelayBetweenMS":1000,"bContinueOnError":false})
    // TracePrint tmall
    arr=arrtmall1.arrtmall(tmall)
    // TracePrint arr
    rows = Excel.GetRowsCount(kfExcelWorkBook,"Sheet1")
    Excel.WriteRange(kfExcelWorkBook,"Sheet1","A"&rows+1,arr,true)
    Excel.Save(kfExcelWorkBook)
    c=c+1
Loop

Follow-up https://blog.csdn.net/m0_49621298/article/details/107585855

Guess you like

Origin blog.csdn.net/m0_49621298/article/details/107603652