[Big data project training] E-commerce recommendation system


Preface

main content

  • project framework
  • Data source analysis
  • Statistical recommendation module
  • Offline recommendation module based on LFM
  • Real-time recommendation module based on custom model
  • Other forms of offline similar recommendation modules
    • Content-based module recommendations
    • Item-based collaborative filtering recommendation module

1. Project Framework

Big data processing process

  • Data sources: structured data (relational data), semi-structured data (log data), unstructured data (pictures and videos)
  • Data collection: ETL tools, Scribe, Flume, Kafka, Sqoop
  • Data storage: Oracle, GreenPlum, Cassandra, Hbase, HDFS
  • Data computing: Mahout, Storm, Flink, Spark, MapReduce
  • Data applications: business applications, Tableau, BI analysis, visualization ECharts D3

Real-time processing flow

  • User interface (business request)
  • Backend server (front-end/back-end buried point)
  • Log file (Flume)
  • Log collection (kafka)
  • Data bus (Kafka message queue)
  • real time calculation
  • data storage
  • data visualization

Offline processing process
User interface -> Backend server -> Log file -> Log collection -> Log storage -> Log cleaning -> Data loading -> Data warehouse -> Data calculation -> Data storage -> Data visualization


2. Project system design

System module design

  • Real-time recommendations
  • Offline recommendation
  • Popular recommendations
  • Label
  • similar recommendation

Project system architecture

Business system composition

  • User visualization: NGULARJS

  • Recommended results display

  • Product search

  • Product information details

  • Product tag

  • product rating

  • Comprehensive business services: Spring

  • Recommended result query

  • Product search

  • Product information details

  • Product tag

  • product rating

  • Business database: MongDB (popular, large amount of data, document database => Json string)

  • Offline statistics service: historical popular product statistics, recent popular product statistics, product average score statistics

  • Offline recommendation service:

    • ALS - LFM – UserRecs – ProductRecs
    • TF-IDF –
  • Cache database: Redis

Recommendation system composition

Offline recommendations (offline):

  • Offline statistics service Scala Spark SQL
  • Offline recommendation service Scala Spark MLlib
    real-time recommendation (online):
  • Log collection service Flume-ng
  • Message buffering service kafka
  • Implement recommendation service Spark Streaming

Project data flow diagram

Data source analysis

  • Product information: products.csv
    • Product ID (productId)
    • Product name (name)
    • Categories
    • Product image URL (imageUrl)
    • Product tags
  • User rating data:ratings.csv
    • User ID (uid)
    • Product ID (productid)
    • Product rating (score)
    • Rating time (timestamp)

Main data model

  • Product information sheet

  • User rating information table

  • user table

  • Historical popular product statistics table

  • Recent popular product statistics table

  • Product average rating statistics table

  • Offline (LFM-based) user recommendation list

  • Offline (based on LFM) product similarity table (prepared for subsequent real-time recommendations)

  • Offline (content-based) product similarity table

  • Offline (based on Item-CF) product similarity table

  • Real-time user recommendation list

Implement module

Statistical recommendation module

Historical popular product statistics

  • Calculate the average score of each product in all historical data
  • select productId, count(productId) as count from rating group by productId order by count desc => RateMoreProducts
  • RateMoreProducts data structure: productId, count

Recent popular product statistics

  • Count the number of product ratings per month, representing the recent popularity of the product
  • select productId, score, changeDate(timestamp) as yearmonth from ratings => ratingOfMonth
  • select productId, count(productId) as count, yearmonth from ratingOfMonth group by yearmonth, productId order by yearmonth desc, count desc => RateMoreRecentlyProducts
  • changeDate: UDF function, use SimpleDateFormat to convert the Date format into ''yyyyMM''
  • RateMoreRecentlyProducts data structure: productId, count, yearmonth

Product average rating statistics

  • select productId, avg(sorce) as avg from ratings group by productId order by avg desc => AverageProducts
  • AverageProducts data structure: productId, avg
    Insert image description here

Offline recommendation module based on LFM

  • Training latent semantic model using ALS algorithm

    • val model = ALS.train(trainData, rank, iterations, lambda)
    • Required data structure: RDD/DataFrame
    • trainData: training data
    • rank: number of latent features k
    • iterations: number of iterations
    • lambda: number of regularizations
    • RMSE: root mean square error
    • Parameter adjustment: adjust parameter values ​​multiple times through the root mean square error, and select a set of parameter values ​​with the smallest RMSE
      Insert image description here
  • Calculate user recommendation matrix
    Insert image description here

  • Calculate product similarity matrix
    Insert image description here

Model-based real-time recommendation module

  • Fast calculation speed
  • The results may not be particularly accurate
  • There are pre-designed recommendation models
    Insert image description here

Recommendation priority calculation

  • Basic principle: Users’ tastes in the recent period are similar
  • Similarity - Rating score
  • Insert image description here

Insert image description here

Other forms of offline similar recommendations

Insert image description here

Content-based recommendations

  • Based on the user tag information of the product, the TF-IDF algorithm is used to extract the feature vector.
  • Calculate the cosine similarity of the feature vector and obtain a similar list of products
  • In practical applications, similar products are generally recommended on the product details page or product purchase page.
    Insert image description here

Item-based collaborative filtering recommendations

  • Item-based collaborative filtering (Item-CF) only needs regular behavioral data of mobile phone users (such as clicks, collections, purchases) to obtain the similarity between items, and is widely cited in actual projects.
  • "Co-occurrence similarity" - using behavioral data to calculate the similarity between different products
    Insert image description here

Mixed Recommendation - Partition Mixing

  • Model-based recommendations
  • Recommendations based on collaborative filtering
  • Content-based recommendations
  • Statistically based recommendations

3. Project framework construction

Guess you like

Origin blog.csdn.net/Lenhart001/article/details/131505843