Analysis of Question C on Mathematical Modeling for Graduate Students in the 2023 Huawei Cup

For complete analysis, check out the business card at the end of the article!

Question 1:  At each review stage, works are usually randomly distributed, and each work requires independent review by multiple judges. In order to increase the comparability of the scores given by different review experts, there should be some overlap between the collections of works reviewed by different experts. But if some intersections are large, there must be some intersections that are small, and comparability becomes weaker. Please establish a mathematical model to determine the optimal "cross-distribution" plan based on the situation of 3,000 participating teams and 125 review experts, and each work is reviewed by 5 experts, and discuss the relevant indicators (your own definition) and implementation details of the plan .

Problem one mainly involves establishing an optimal "cross-distribution" plan for 3,000 participating teams and 125 review experts. The key here is to ensure that each work is reviewed by 5 experts, and that there is a certain overlap between the collections of works reviewed by different experts. This problem can be regarded as a combinatorial optimization problem. We can use a graph theory model to model it as a vertex coloring problem of the graph and solve it to obtain the optimal "cross distribution" solution .

Our variable is defined as a binary variable xij , which is 1 when the i-th expert reviews the j-th work, and 0 otherwise.

Our objective function is to maximize the size of the intersection of works among all experts, that is, to maximize

We give the constraint that each work is reviewed by exactly 5 experts; the number of works reviewed by each expert should be evenly distributed to prevent a certain expert from being overly heavy or too light.

This is an NP-hard problem, and we can apply heuristic algorithms such as genetic algorithms and simulated annealing algorithms to solve it. These algorithms are suitable for searching the solution space of large-scale combinatorial optimization problems and can find satisfactory solutions within a reasonable time.

Question 2  uses a ranking method based on standard scores (Appendix 1) in the review, which assumes that the academic level distribution of the collections of works reviewed by different review experts is the same. However, in the review of large-scale innovation competitions, usually only a small part of the works reviewed by any two experts are in common, and the vast majority of the works are different (see question 1), and each expert only sees a small collection of works. Therefore, the assumptions of the standard sub-evaluation scheme may not be valid, and new evaluation schemes need to be explored. Please select two or more existing or self-designed review plans and question attachment data, analyze the distribution characteristics of the original scores of each expert and each work, and the scores after adjustment (such as standard scores), and sort according to different plans , and try to compare the pros and cons of these options. Then, a new standard score (formula) calculation model is designedfor the review of large-scale innovation competitions . In addition, it is generally believed that award-winning papers that have been unanimously agreed upon by multiple experts have the greatest credibility. For the data 1 provided in Appendix 2, the ranking of the first-prize works selected in the second review stage was reached through expert consultation. Please use This batch of data will improve your standard score calculation model.

Question two involves comparing and analyzing different review schemes and designing a new standard score calculation model based on the given data. We can analyze several existing review schemes and compare the advantages and disadvantages of these schemes using methods such as descriptive statistics and hypothesis testing. , such as mean, median, standard deviation, etc., to analyze the distribution characteristics of the original scores and adjusted scores of each expert and each work. We make some visual displays of the score distribution under different plans to more intuitively understand the differences between different plans.

In order to determine whether the differences between different scenarios are significant, we can use hypothesis testing methods. Use ANOVA (Analysis of Variance) to compare whether there are significant differences in the mean scores of multiple programs. Use the chi-square test or Fisher's exact test to compare differences in the distribution of grades under different programs.

Then based on these analysis results, we design a new standard score calculation model. For this problem , we can consider using regression analysis. In addition to using regression analysis, we can also build an optimization model to solve the optimal standard score calculation method. The objective function of this model can be to minimize the variance of the standard scores of all works to reduce the differences between different solutions. Constraints can include maintaining fairness in scoring and maintaining a certain degree of diversity.

Question 3:  The characteristic of the "Innovation" competition is "innovation", that is, there is no standard answer. Since the problems in this type of competition are difficult, they generally require innovation to be partially solved during the competition. It is difficult to agree on the extent of the innovation of the work and the prospects for subsequent research. Even if experts communicate face-to-face, they may not be able to unify due to their own opinions. In addition, the graduate students' papers are not well expressed and the review experts have different perspectives. The scores given by several experts on the same work will be quite different (extremely poor). Large ranges are a characteristic of large-scale innovation competitions. Works with relatively large ranges are generally in high or low categories. Low-segmented items belong to the elimination range. The reason for the large difference in low-segmented items is that some experts have given very low scores to works that violate regulations or have major mistakes, or the review experts all agree that the quality of the work is not high, but one of them ( Some) experts even disagree with the work. Therefore, although the range here is large, it falls within the category of non-award-winning, and generally does not need to be adjusted. High-segmented works must also participate in the more authoritative second-stage review (the same row in the attached data table represents the results of the same work in the two stages. Works without second-stage review scores only participated in the first-stage review. ). In the second stage of the review, there are still some works with large differences. Because it is the final review, the errors may affect the award level. Therefore, some works with large differences need to be reconsidered and adjusted (the data in the attachment are recorded, and the review score is the expert's). The final standard score is used to replace the original standard score). In the second stage (note that the number of review experts for each work is different in the two stages), the rules for experts to adjust the "large range" can be used as a reference for establishing a range model.  

Based on the simulation data 2.1 and 2.2 given in the question, please discuss the overall changes in the two-stage scores and the overall changes in the two-stage extremes, and analyze the advantages and disadvantages of the two-stage evaluation plan compared to the non-stage evaluation plan. Notice that there is a certain relationship between the two characteristics of large range and strong innovation. In order to discover innovative papers, please establish a "range" model (including analysis, classification, adjustment, etc.), and try to give the given data The first review stage is a programmed (without manual intervention) method of handling the "large difference" of works that are not high and not low.

For question three , we will focus on the comparison between the two-stage review plan and the non-stage review plan, as well as the establishment of the "extremely poor" model. It is necessary to analyze the performance changes and range changes in the two stages, and explore how to deal with the "large range".

To compare the two-stage evaluation plan and the non-stage evaluation plan , you can use analysis of variance (ANOVA) to compare the difference in scores between the two-stage evaluation plan and the non-stage evaluation plan, and test whether there is a significant difference in the mean scores under different plans, and these Whether the differences can be attributed to the review scheme used

Then calculate their mean, standard deviation, interquartile range and other descriptive statistics, so that you can understand the difference in score distribution between the two programs in more detail. And these differences can be demonstrated through visualization tools such as box plots and histograms .

To build a range model, you can use either classification or clustering. First, use the classification model. Let's predict the range size of the work. By inputting various characteristics of the work (such as preliminary scores from experts, type of work, etc.), the classification model can predict whether the extreme difference of the work will exceed a certain threshold. As for algorithms, you can use decision trees, random forests, support vector machines, etc. Finally, cross-validation is used to select the best model and parameters.

With cluster analysis, we can group works with similar extremely poor characteristics into the same category. It allows us to understand which works are more likely to produce large extreme differences. The clustering algorithm can use K-means clustering or hierarchical clustering.

Question 4:  For the "Innovation" competition, give a complete review model (hint: such as an optimization model), and study how to solve it based on the given data? Specific suggestions for improvements to the current review program can also be given (including what data should be collected in the future).

More ideas code↓↓

Guess you like

Origin blog.csdn.net/zzzzzzzxxaaa/article/details/133170424