Suggestions on topic selection for the 2023 Higher Education Society Cup National College Student Mathematical Modeling Competition

The following are Mr. C’s suggestions for topic selection for the 2023 Higher Education Society Cup National Undergraduate Mathematical Modeling Contest (National Competition):

Tip: DS C’s opinion of difficulty: C<B<A, openness: B<A<C  .

It is recommended to choose topic E for questions D and E. The E paper and ideas will be directly updated in the future. Topic selection analysis will not be conducted here. The following are topic selection suggestions and preliminary analysis for questions A, B, and C.

Question A: Optimal design of heliostat field

Question A is a very common physics question in digital and analog competitions and requires learning a lot of relevant knowledge. Some numerical calculations may also require the use of multi-objective programming from operations research .

Here is a brief mention of the idea of ​​the first question. Question 1 requires the calculation of the annual average optical efficiency, annual average output thermal power, and annual average output thermal power per unit mirror area of ​​the heliostat field. To solve this problem, we can use the following steps and algorithms to solve the problem:

1 Determine the heliostat position: According to the given heliostat center position, determine the coordinates of each heliostat in the circular heliostat field.

2 Calculate the sun's altitude and azimuth: According to the geographic location and date time, use the formula to calculate the sun's altitude and azimuth to obtain the direction of the incident light.

3 Calculate the normal direct radiation irradiance: Use the obtained solar altitude angle and azimuth angle, combined with the formula of the solar radiation energy received on the unit area of ​​the plane perpendicular to the sun's rays, to calculate the normal direct radiation irradiance .

4 Calculate the optical efficiency of the heliostat: Use the optical efficiency formula to calculate the shadow occlusion efficiency, cosine efficiency, atmospheric transmittance and collector truncation efficiency respectively, and multiply them to get the optical efficiency of the heliostat.

5 Calculate the output thermal power of the heliostat field: Calculate the output thermal power of each heliostat according to the normal direct radiation irradiance and the optical efficiency of the heliostat, and add them to obtain the output of the heliostat field thermal power.

6 Calculate the annual average thermal output power per unit mirror area: Divide the output thermal power of the heliostat field by the total area of ​​the heliostat to obtain the annual average thermal output power per unit mirror area.

During problem solving, numerical computation and optimization algorithms may be required to handle complex calculations and problem solving. For example, numerical integration methods can be used to estimate normal direct radiation irradiance, iterative or optimization algorithms can be used to determine the optimal location and size of heliostats, etc.

This question is highly professional, and the follow-up account will conduct specific analysis and modeling when analyzing the specific ideas of this question. The degree of openness is low and the difficulty is moderate. However, such questions usually have a high threshold , so beginners/non-related majors should choose carefully. Mr. C suggested that the answer should be correct at the end. Whether the answer is correct or not will have a greater impact on the final score. It is recommended to choose physics, electrical, automation and other related majors.

Question B: Multi-beam line survey problem

This year's national competition questions are very strange, probably because of the popularity of a series of AI tools such as chatgpt. Question B is the same as question A, both of which are physics questions. The types of these two questions are very similar. In previous years, there is usually a more interesting topic. However, it can be clearly seen that question B is more friendly to mathematics and statistics related majors. Question B requires many simulation-related algorithms, and it is recommended to use lingo to solve it.

There will no longer be a more detailed analysis here. We will release relevant specific ideas in the evening, so you can pay attention.

There is an optimal solution to this question, with a low degree of openness and moderate difficulty. It is best for everyone to choose this question and check the answers online and offline after finishing. It is recommended for students majoring in statistics, mathematics, physics, etc. to choose.

Question C: Automatic pricing and replenishment decisions for vegetable commodities

This question is the type of questions that many students often do during training. It belongs to the big data and data analysis topics, and it is also a topic that the team is good at. It requires a certain modeling ability, which is similar to other types of competition questions. It is recommended that everyone (all majors can) choose it.

The topic needs to establish a mathematical model, and you can use evaluation algorithms , such as gray comprehensive evaluation method and fuzzy comprehensive evaluation method to establish connections between various indicators.

Before the first question, everyone needs to analyze and numerically process the data, which is EDA (Exploratory Data Analysis) . For numerical data, you can use normalization, removal of outliers, etc. to perform data preprocessing. For quantification of non-numeric data, you can use the following methods:

1 tag encoding

Label encoding is a method of quantizing non-numeric data by converting a set of possible values ​​into integers. For example, in the field of machine learning, for a variable with multiple categories, we can assign a unique integer value to each category, so that it can be converted into numerical data.

2 one-hot encoding onehot

One-hot encoding is a method of converting multiple possible values ​​into a binary array. In one-hot encoding, each possible value corresponds to a binary array whose length is the total number of possible values, in which only one element is 1, and the rest are 0. For example, for a gender variable, one-hot encoding can be used to convert "male" and "female" to [1, 0] and [0, 1] respectively.

3 classification count

Categorical counts are an easy way to convert non-numeric data into numeric data. In categorical counting, we classify data according to some specific attributes (such as education, occupation, etc.), and then count the number or frequency of each category. For example, in a survey questionnaire, we can classify the responses to a question into the categories "yes", "no", and "not sure" and count the number or frequency of each category.

4 principal component analysis

Principal component analysis is a method of converting multidimensional data into a low-dimensional representation. In principal component analysis, we perform dimensionality reduction on raw data by finding the principal components that best explain the variation in the data. This converts non-numeric data to numeric data.

The first question suggests that you use some visualization methods, you can use common EDA visualization methods:

l Histogram and Density Plot: Displays the distribution of numerical variables.

l Scatterplot: Shows the relationship between two continuous variables.

l Boxplot: Shows the distribution and outliers of numerical variables.

l Bar and pie charts: Show the distribution of categorical variables.

l Line chart: shows the trend over time or sequence.

l Heat map: Shows the correlation between different variables.

l Scatter matrix plot: Displays a scatter plot matrix between multiple variables.

l Geographic map: displays geographic location data and spatial distribution information.

The first question can be given to beginners first, and we will update the specific ideas for each question in the future. The first question is that we need to do a correlation analysis to see if the correlation coefficient between those indicators is high. If it is high, it means that the impact is greater, and if it is low, it means that the impact is small. Here you can use a heat map to visualize the impact. In addition, for the distribution law, my suggestion is to do it simply and use statistical descriptions: calculate the total sales volume, average sales volume, maximum sales volume, and minimum sales volume of each vegetable category and single product to understand them. overall situation.

If possible, you can also use a clustering algorithm: according to the sales characteristics of vegetable categories or single products, you can use cluster analysis methods (such as K-means clustering) to divide them into different groups, and further understand the relationship between different groups. Sales volume distribution rules among time periods.

Since this article is a suggestion for topic selection, you can see my follow-up articles/videos for detailed ideas. I won’t go into details. How to analyze the data set, visualize the code and so on, will be updated later. This topic is relatively open and difficult, and it is the first choice for the undergraduate group of this competition. It is recommended that students of all majors choose a program with a low threshold and a relatively high degree of openness.

For ideas, related codes, explanation videos, references and other related content, you can click on the group business card below!

Guess you like

Origin blog.csdn.net/weixin_43345535/article/details/132743996