The overall idea of mathematical modeling


Preface

This article mainly shares and explains the entire thought and step process of mathematical modeling in the traditional sense. Here we take the mathorcup and Huawei Cup that bloggers have participated in as examples to briefly sort out the process and ideas, hoping to provide some ideas for novice players ( Don’t step on me anymore. I am just a cute new one who has landed on one and a half feet. I hope there are more cute new ones who can cross the ocean)

1. How to choose the topic

The topic selection is actually a technical job. After all, there are only three days. It is impossible to wait for you to try one by one. It seems that you can make a choice before choosing. Here, it is best to decide which topic to choose at a glance.
For a great god with strong ability, of course it doesn’t matter which topic you choose, but for the first time you get started, or if you used to team up with others, this time the contestant who has become the main C inexplicably needs to consider it carefully. Consider the questions.
Here are two methods:
1. Open your message network and collect the topic selection intentions of most people in the fastest time. Generally speaking, most people choose topics that match their own abilities. Questions, and this kind of question is generally a question that most people can complete. I think no one would think about giving money to the competition organization. Therefore, this kind of gossip is also very necessary.
2. If your network is not as good as you think, and your team members are lone wolves just like you, then quickly filter from the topic structure. The current routine topics are generally based on big data, allowing you to design a fairly reliable model to solve some practical problems. Remember, what I’m talking about here is routine, which means that there are still a bunch of unconventional problems, and these problems will naturally not enter the scope of our consideration (for example, the A question of this year’s Huawei Cup, designing chip algorithms I'm old, I have no idea at all). So how do we filter out the topics that we can complete well from these regular topics? First of all, skip the background of the problem involved in the topic, because these so-called backgrounds basically cannot provide us with useful information for solving the problem, and the most important useful information is basically all in the problem. Don't worry about a few questions here, let's see if the continuity of the problem can reach what I described below. If so, then this is the most conventional modeling problem and the easiest problem to solve. Let's not go far, here is what I call coherence? The question first provides a bunch of big data. Let us remove the null values ​​and data outliers. After the preprocessing is completed, the main independent variables are filtered out from the data, and then the main independent variables are used for predictive modeling and the images And data to verify whether your prediction model is accurate, followed by a scalability problem, such as: give a reasonable price strategy (2020 mathorcup question A), octane loss optimization strategy (2020 Huawei Cup mathematical modeling Question B). Regardless of the expansion problem, as long as the previous preprocessing, searching for the main variables and the establishment of the prediction model meet the continuity of the topic you are looking at now, then don't hesitate to choose, because this is easy to do and conforms to the routine Mathematical modeling ideas.

2. Data processing methods

For data preprocessing, I think everyone should have their own means, with visual inspection (this is a talented master, I dare not pretend, hundreds of thousands or even millions of data are indeed a bit beyond human The limit is too high), Python's pandas (the main method for bloggers), it seems that the features of the R language can also handle data, etc. (other I am the hen, after all, I have only used pandas).
Here are some simple processing of panda:
1. The general data is stored in several excel tables, so we naturally need to read the excel tables.

import pandas
df=pd.read_excel('testProject.xlsx')#这个会直接默认读取到这个Excel的第一个表单
print(df)#打印读取到的数据,这里为一个二维数组。

As a basic test, the effect picture read here is as follows:
pandas read data style
Generally, most of the methods of processing data are to find and eliminate or fill in empty values.

df.dropna()#剔除掉空值
df.fillna(axis=1,method="ffill")#横向填充空值,axis=1横向,axis=0纵向

For the treatment of special outliers, see if there is a given value range. If there is, you can directly limit it. If not, I like to observe it in the form of a line chart, for example:
Input data sample graph
you can see a sudden value, we can observe After the approximate range of the data, define a limited range. Of course, this kind of sudden change is a very individual value. If there is a large amount of such data, then it should be regarded as normal data (of course, for me, I feel such a big sudden change. Most of the time, the topic must be something wrong.)
There are many basic data processing methods, but bloggers like to use these. You can search for a lot of data processing methods on the Internet. This article is intended for Mengxin The player provides a complete idea for the problem.

Three, the main variable screening

The main purpose of searching for the main variable is to obtain a linear model of the independent variable and the dependent variable. Since the final goal is clear, our goal in this step is naturally clear, that is, to filter the available linear variables.
There are many methods here, such as the entropy method. However, if the amount of data is too large (for example, the data dimension of the Huawei Cup is more than 300), the corresponding computational complexity will increase accordingly. At this time of race against time, too much time is bound to be disadvantageous. Therefore, here, the blogger proposes a quick and no-brained method, rationally using spss data analysis software to calculate the correlation between each independent variable and the dependent variable, and then select some highly correlated variables as a linear model Input parameters.

Fourth, the establishment of the linear model

This step belongs to the category of machine learning. If placed many years ago, this is definitely a big hurdle, but now, this question has become the simplest question.
First, we need to determine what linear model we want? There are so many choices. In order to prevent everyone from making choices, the blogger only talks about one, which I basically use-the multiple linear regression model. The accuracy of this model is very good, and the main internal methods are also very good. The classic least squares method, for processing linear data, I think this should be the most cost-effective model.
It can be seen that the verification effect of the model is as follows:
Image verification of multiple linear regression model
So, what is the problem now? How do we construct this model?
This problem should be known to everyone. GitHub goes to clone, Baidu, transfers, etc. (As for all the big guys who hand-coded, they can go out and turn right. It's too strong. I'm afraid this article can't bear yours. Fighting spirit). Indeed, this is the fastest and best method. Regarding these three methods, the blogger has his own opinions.
First of all, the code on GitHub is basically very complete code. It may take a lot of effort for Mengxin to tailor this core code, so this method is not recommended (of course, the code reading ability is relatively strong It’s best for students to adopt this method. In addition to doing the questions, it is also very helpful for improving their own strength).
As for Baidu, how do you say it? Too much copy and paste, and sometimes a lot of errors, it feels terrible to think about it, the original digital model people are broken enough, now there are a bunch of bugs, it is not death on the spot.
For the third tuning library, there are quite a lot of machine learning packages on the market. In order to alleviate everyone's choice difficulties, the bloggers also only propose one-sklearn (Python). It is quite classic, and you can find more detailed documents by searching its Chinese documents directly.
So for students who don’t want to use their brains like bloggers, you can directly choose the third method, which is simple and rude.

Fourth, the solution to the expansion problem

For the final expansion problem, the blogger is really hard to analyze in detail, because various topics have different expansion directions, so here, the blogger recommends that everyone improve themselves in peacetime to face the wrong needs, such as: intelligent optimization Algorithms and so on.

to sum up

In the end, I need to mention that when forming a team to select teammates, there must be a better code and a better written paper, so as to maximize the completion of the project and the fastest Speed ​​of completion.
In mathematical modeling competitions, as long as all the questions are completed as much as possible, it is not difficult to win the prize. As for what the prize is for, of course it is to get the scholarship for extra points! ! With money, you can buy more modeling books, participate in more competitions, get more generals, and buy more information books. . .
Is it a bit happy to think about it, haha, let’s buy food, it’s important for people to live or eat.
Stop talking, the computer is out of ink. . .

Guess you like

Origin blog.csdn.net/jiljdlawjdlada/article/details/108714144