Complete ideas and code sharing for the 2023 China Graduate Mathematical Modeling Competition (Huawei Cup Mathematical Modeling) Question C large-scale innovation competition review plan research

There are many innovative competitions now, and the larger competitions generally adopt two-stage (online review, on-site review) or three-stage (online review, on-site review and defense) review. The characteristic of innovation competitions is that there is no standard answer and requires independent review by review experts based on the review framework (suggestions) proposed by the proposer (group). Therefore, for the same work, the scores of different judges may be quite different. In fact, when the scale of the competition is large and the number of judges is large, the problem of large extreme differences (see Appendix 1 for definition) is more prominent. Obviously, simply ranking based on the sum of the scores of multiple judges is not a good solution for judging innovation competitions. Therefore, it is of far-reaching significance to explore the impartiality, fairness and scientific nature of large-scale innovation competition review plans.

Currently, various innovation competitions are exploring and adjusting their own review plans. The existing solutions include: (1) Standardize the scores of each review expert (see Appendix 1 for the formula), add the standard scores according to the works to get the total score of each work, and then sort according to the total score; (2) Remove the same work The highest score and the lowest score among the works will be calculated, and then the remaining scores will be added up, and the final ranking will be based on the total score; (3) If there is a large difference (extreme difference) between experts’ scores for the same work, relevant experts will be organized to negotiate and adjust, and the adjusted results will be The scores are added up and then sorted according to the total score; (4) When the scale of the competition is large, first use the above scheme (1) or (2) or (3) to preliminarily select the works, and then organize experts for the shortlisted works. The list of winners will be determined through review (second stage review) or through defense and other stages. These solutions are reasonable to a certain extent, but they also have limitations. Especially for the review of large-scale innovation competitions, existing solutions are relatively simple and there is not much research.

In large-scale innovation competitions, increasing the number of experts reviewing each work is obviously conducive to fairness and justice in the review work. However, due to various reasons, the number of experts participating in the review work is limited. If there are fewer review experts, the errors in review work will become larger. However, considering that the proportion of winners in large-scale innovation competitions is usually less than 50%, some errors do not affect whether the winners are awarded. Therefore, without affecting the award-winning level, in order to adapt to the current situation with a small number of review experts, many competitions adopt a two-stage review method.

In order to explore a good method for judging large-scale innovation competitions, the attachment provides data for simulating large-scale innovation competitions. It consists of two stages of review. In the first stage, five experts will evaluate the works. After taking the standard score, the five experts' standard scores will be averaged and sorted. The top-ranked works will be selected according to a pre-agreed ratio and enter the second stage. Review. In the second stage, three experts will evaluate the works and take standard scores respectively. After making necessary adjustments to the standard scores of a few extremely poor works, the average of the standard scores of the five experts in the first stage and the three in the second stage will be compared. The evaluation criteria of each expert are divided into 4 scores and are then sorted according to the final total score. Please use this batch of data to build a mathematical model and explore the establishment of a more reasonable and fair evaluation plan. 

Question 1:  At each review stage, works are usually randomly distributed, and each work requires independent review by multiple judges. In order to increase the comparability of the scores given by different review experts, there should be some overlap between the collections of works reviewed by different experts. But if some intersections are large, there must be some intersections that are small, and comparability becomes weaker. Please establish a mathematical model to determine the optimal "cross-distribution" plan based on the situation of 3,000 participating teams and 125 review experts, and each work is reviewed by 5 experts, and discuss the relevant indicators (your own definition) and implementation details of the plan .

answer:

The core of this problem is how to balance the intersection of the collections of works reviewed by each reviewer so that the scores of all reviewers are highly comparable. The following is a specific solution for modeling:

The code is as follows, complete appendix!

Question 2  uses a ranking method based on standard scores (Appendix 1) in the review, which assumes that the academic level distribution of the collections of works reviewed by different review experts is the same. However, in the review of large-scale innovation competitions, usually only a small part of the works reviewed by any two experts are in common, and the vast majority of the works are different (see question 1), and each expert only sees a small collection of works. Therefore, the assumptions of the standard sub-evaluation scheme may not be valid, and new evaluation schemes need to be explored. Please select two or more existing or self-designed review plans and question attachment data, analyze the distribution characteristics of the original scores of each expert and each work, and the scores after adjustment (such as standard scores), and sort according to different plans , and try to compare the pros and cons of these options. Then, a new standard score (formula) calculation model is designedfor the review of large-scale innovation competitions . In addition, it is generally believed that award-winning papers that have been unanimously agreed upon by multiple experts have the greatest credibility. For the data 1 provided in Appendix 2, the ranking of the first-prize works selected in the second review stage was reached through expert consultation. Please use This batch of data will improve your standard score calculation model.

answer:

Analyzing the Problems of Standardized Point Schemes
Different sets of works being judged: As you said, in large-scale innovation competitions, the sets of works judged by each expert reviewer are often very different. If each expert judges works of varying difficulty, the calculation of standard scores may be affected because they are calculated based on the mean and standard deviation of each judge.

Standardized method of judging experts: The calculation of standard scores requires the average score and standard deviation of each expert, but because each expert reviews a different set of works, their average scores and standard deviations may also be very different.

Two evaluation methods:
Fixed reference set method: select a part of the works as a reference, and each judge must evaluate this part of the works. The score of this part of the work is used to adjust the scores of the judges’ other works.

Two-stage review: All works will be initially reviewed, and some high-scoring and controversial works will be selected for the second round of review. All reviewers will participate in the second round of review.

Analyze the data provided
The data you provide include raw scores and standard scores from different experts on different works. We can perform the following analysis on this data:

Calculate the mean and standard deviation of each expert to understand the trend and dispersion of each expert's ratings.
Compare each expert's ratings to those of other experts, looking for possible systematic differences or biases.
Compare each work's raw grade to its standard score to understand the impact of the standardization process on the work's ranking.
Designing a new standard score calculation model
Considering that in large-scale innovation competitions, each judge can only see a part of the works, the following standard score calculation method can be considered:

Standard score = (original score − mean of all works of each reviewer) / standard
deviation

In this way, each work's standard score reflects its position relative to all works reviewed by that reviewer.

Use the data provided in Appendix 2 to make improvements.
Weighted average of multiple review results: You can consider weighting the average of the results of the first review and the second review to improve the accuracy of the review.
Consider the reexamination score: For works with reexamination, you can consider combining the reexamination score with the original score, for example, taking the average or weighted average of the two.

Question 3:  The characteristic of the "Innovation" competition is "innovation", that is, there is no standard answer. Since the problems in this type of competition are difficult, they generally require innovation to be partially solved during the competition. It is difficult to agree on the extent of the innovation of the work and the prospects for subsequent research. Even if experts communicate face-to-face, they may not be able to unify due to their own opinions. In addition, the graduate students' papers are not well expressed and the review experts have different perspectives. The scores given by several experts on the same work will be quite different (extremely poor). Large ranges are a characteristic of large-scale innovation competitions. Works with relatively large ranges are generally in high or low categories. Low-segmented items belong to the elimination range. The reason for the large difference in low-segmented items is that some experts have given very low scores to works that violate regulations or have major mistakes, or the review experts all agree that the quality of the work is not high, but one of them ( Some) experts even disagree with the work. Therefore, although the range here is large, it falls within the category of non-award-winning, and generally does not need to be adjusted. High-segmented works must also participate in the more authoritative second-stage review (the same row in the attached data table represents the results of the same work in the two stages. Works without second-stage review scores only participated in the first-stage review. ). In the second stage of the review, there are still some works with large differences. Because it is the final review, the errors may affect the award level. Therefore, some works with large differences need to be reconsidered and adjusted (the data in the attachment are recorded, and the review score is the expert's). The final standard score is used to replace the original standard score). In the second stage (note that the number of review experts for each work is different in the two stages), the rules for experts to adjust the "large range" can be used as a reference for establishing a range model.  

Based on the simulation data 2.1 and 2.2 given in the question, please discuss the overall changes in the two-stage scores and the overall changes in the two-stage extremes, and analyze the advantages and disadvantages of the two-stage evaluation plan compared to the non-stage evaluation plan. Notice that there is a certain relationship between the two characteristics of large range and strong innovation. In order to discover innovative papers, please establish a "range" model (including analysis, classification, adjustment, etc.), and try to give the given data The first review stage is a programmed (without manual intervention) method of handling the "large difference" of works that are not high and not low.

Question 4:  For the "Innovation" competition, give a complete review model (hint: such as an optimization model), and study how to solve it based on the given data? Specific suggestions for improvements to the current review program can also be given (including what data should be collected in the future).

All questions are updated! See appendix for complete information! ! !

appendix:

Supongo que te gusta

Origin blog.csdn.net/weixin_52051317/article/details/133174825
Recomendado
Clasificación