2020美赛建模C题思路和理解

思路和理解

问题中心:评论数据星级建模
简要思路:理解成京东淘宝商城的评论数据,解释4.8星的指数怎么来的,你对商品的一段评论对该等级有多大影响?
  • 个人的习惯是大数据问题第四章单独写数据清洗,具体流程看群中研究生数模的国奖论文。
  • 首先要分析三个附件,提取有用的变量、删除字段缺失的数据条、分类变量归一化等,进一步处理等待更新。
  1. 基础星级评定模型。
    首先从数据中15个变量中筛选有效变量,分析每个变量的分布,构建STAR_RATING关于投票数、消费类型、消费者来源等的关系。你们的模型可以是定性以及定量的。
    该问题最终落脚在对三种产品的模式分析,简单的excel 处理后每个队都能出些工作量。亮点在模式挖掘的有效性及图表的美观上。

  2. 知识挖掘模型(自然语言处理)
    第二问工作量较大。对于每种产品,要求分析①星级评定的主要影响指标(主成分分析,层次聚类、因子分析等模型),②每种模式基于时间的声誉变化(时间序列分析),③确定基于文本的衡量标准(注意是标准,需要制定规则),来给出相应成功或者失败的暗示。(如评论中某些词语出现的频率+时间序列的某种模式=某产品要凉)
    。问题二可分开建立多个模型。

  3. 写一份1-2页的信,向市场总监提供你们的文本挖掘结果,要有数据支撑(图表)。
    M奖分界点:针对模型分析结果,给出改善某产品声誉或者销量的建议

该问题相对较难,大数据问题虽有吸引力,但因为文本变量较关键,对于没有自然语言处理经验的同学可能难以做出成果,慎重考虑。

原题目翻译


数据解释

原文

2020 MCM Weekend 2
Problem C: A Wealth of Data
In the online marketplace it created, Amazon provides customers with an opportunity to rate and
review purchases. Individual ratings - called “star ratings” – allow purchasers to express their
level of satisfaction with a product using a scale of 1 (low rated, low satisfaction) to 5 (highly
rated, high satisfaction). Additionally, customers can submit text-based messages – called
“reviews” – that express further opinions and information about the product. Other customers
can submit ratings on these reviews as being helpful or not – called a “helpfulness rating” –
towards assisting their own product purchasing decision. Companies use these data to gain
insights into the markets in which they participate, the timing of that participation, and the
potential success of product design feature choices.
Sunshine Company is planning to introduce and sell three new products in the online
marketplace: a microwave oven, a baby pacifier, and a hair dryer. They have hired your team as
consultants to identify key patterns, relationships, measures, and parameters in past customersupplied ratings and reviews associated with other competing products to 1) inform their online
sales strategy and 2) identify potentially important design features that would enhance product
desirability. Sunshine Company has used data to inform sales strategies in the past, but they have
not previously used this particular combination and type of data. Of particular interest to
Sunshine Company are time-based patterns in these data, and whether they interact in ways that
will help the company craft successful products.
To assist you, Sunshine’s data center has provided you with three data files for this project:
hair_dryer.tsv, microwave.tsv, and pacifier.tsv. These data represent customer-supplied
ratings and reviews for microwave ovens, baby pacifiers, and hair dryers sold in the Amazon
marketplace over the time period(s) indicated in the data. A glossary of data label definitions is
provided as well. THE DATA FILES PROVIDED CONTAIN THE ONLY DATA YOU
SHOULD USE FOR THIS PROBLEM.
Requirements

  1. Analyze the three product data sets provided to identify, describe, and support with
    mathematical evidence, meaningful quantitative and/or qualitative patterns, relationships,
    measures, and parameters within and between star ratings, reviews, and helpfulness ratings that
    will help Sunshine Company succeed in their three new online marketplace product offerings.
  2. Use your analysis to address the following specific questions and requests from the Sunshine
    Company Marketing Director:
    a. Identify data measures based on ratings and reviews that are most informative for
    Sunshine Company to track, once their three products are placed on sale in the online
    marketplace.
    b. Identify and discuss time-based measures and patterns within each data set that might
    suggest that a product’s reputation is increasing or decreasing in the online marketplace.
    c. Determine combinations of text-based measure(s) and ratings-based measures that best
    indicate a potentially successful or failing product.d. Do specific star ratings incite more reviews? For example, are customers more likely to
    write some type of review after seeing a series of low star ratings?
    e. Are specific quality descriptors of text-based reviews such as ‘enthusiastic’,
    ‘disappointed’, and others, strongly associated with rating levels?
  3. Write a one- to two-page letter to the Marketing Director of Sunshine Company summarizing
    your team’s analysis and results. Include specific justification(s) for the result that your team
    most confidently recommends to the Marketing Director.
    Your submission should consist of:
     One-page Summary Sheet
     Table of Contents
     One- to Two-page Letter
     Your solution of no more than 20 pages, for a maximum of 24 pages with your summary
    sheet, table of contents, and two-page letter.
    Note: Reference List and any appendices do not count toward the page limit and should appear
    after your completed solution. You should not make use of unauthorized images and materials
    whose use is restricted by copyright laws. Ensure you cite the sources for your ideas and the
    materials used in your report.
    Glossary
    Helpfulness Rating: an indication of how valuable a particular product review is when
    making a decision whether or not to purchase that product.
    Pacifier: a rubber or plastic soothing device, often nipple shaped, given to a baby to suck
    or bite on.
    Review: a written evaluation of a product.
    Star Rating: a score given in a system that allows people to rate a product with a number
    of stars.
    Attachments: The Problem Datasets
    Problem_C_Data.zip
    The three data sets provided contain product user ratings and reviews extracted from the
    Amazon Customer Reviews Dataset thru Amazon Simple Storage Service (Amazon S3).
    hair_dryer.tsv
    microwave.tsv
    pacifier.tsvData Set Definitions: Each row represents data partitioned into the following columns.
    ● marketplace (string): 2 letter country code of the marketplace where the review was
    written.
    ● customer_id (string): Random identifier that can be used to aggregate reviews written by
    a single author.
    ● review_id (string): The unique ID of the review.
    ● product_id (string): The unique Product ID the review pertains to.
    ● product_parent (string): Random identifier that can be used to aggregate reviews for the
    same product.
    ● product_title (string): Title of the product.
    ● product_category (string): The major consumer category for the product.
    ● star_rating (int): The 1-5 star rating of the review.
    ● helpful_votes (int): Number of helpful votes.
    ● total_votes (int): Number of total votes the review received.
    ● vine (string): Customers are invited to become Amazon Vine Voices based on the trust
    that they have earned in the Amazon community for writing accurate and insightful
    reviews. Amazon provides Amazon Vine members with free copies of products that have
    been submitted to the program by vendors. Amazon doesn’t influence the opinions of
    Amazon Vine members, nor do they modify or edit reviews.
    ● verified_purchase (string): A “Y” indicates Amazon verified that the person writing the
    review purchased the product at Amazon and didn’t receive the product at a deep
    discount.
    ● review_headline (string): The title of the review.
    ● review_body (string): The review text.
    ● review_date (bigint): The date the review was written.
发布了61 篇原创文章 · 获赞 100 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/gzn00417/article/details/104692139