The first pair programming ASE

Problem Definition

Gold point game from economists Richar Thaler conceived quiz conducted a public event in the 1997 London Financial Times, the game has some very interesting properties, so the teacher as a first junction topic of programming.

Problem Definition

In the golden point game, assuming there are N players, rules of the game, for each person to write one or two rational numbers between 0 and 100 (not including 0 or 100), submitted to the server, all calculated at the end of the current round average numbers, and then multiplied by 0.618 (the so-called golden constant) to give a G value. G closest player number (absolute value) obtained submissions-N, G farthest from the player gets points -2, 0 points to other players. Only one player does not score points when participating.

Difficulties issue

  • Rules of the game for the winner-take-all, only the first game in each round will be rewarded, and only the last one will be deducted from the score, how risky the idea of ​​trying to predict the most accurate in each round, the closest point of gold very difficult.
  • Game environment is unknown, but the game is heavily dependent on someone else's strategy, the same model are compared in different people, possible effects vary greatly.
  • The game can use the information only in the past almost golden point of information, lack of available information, it is difficult to learn a better model.
  • Only basic and room0 and room1 several rule-based bot confrontation adjusted during training model, Gap between the larger and the actual game scenes.
  • Assigned short-term gains and long-term gains, short-term refers to the current round score as much as possible, as much as possible long-term refers to a period of high scores.

Modeling method

Model introduction and motivation

The core algorithm we used q learning, first, because of time constraints, but are in a foreign country, the second is less data for the task and online feedback and limited and slow, if you want to pass training complex models require a better model spend a lot of time to collect race data, because the entry has not been, and there will be no other students similar to the game, so we have not adopted the q learning method using neural network, both simple and prevent because the data sparse and less fit.

Briefly Learning q q is to use a different value recorded Table action under different state, and then updates the table by q reward corresponding actions collected from the environment.

The algorithm is as follows:

flow chart

Specific improvements

The task because there is no label, so we naturally take intensive study, because our model no parameters, so the most important thing is how to design action and learn how a good action distribution.

The former is a range primarily based on the demo we give the eight action to improve, first of all we have to change the size and range of values, because the original action in design is very unreasonable, then we find that the last point gold will converge to a relatively small , we designed a method of performing random sampling in various ranges, it is possible to prevent the value of the action produced large deviation.

The latter because only half a day, no time to train, so we try to get the bot to predict the value of other, through a contest to generate multiple sets of data, to make up for lack of time and data problems.

Result analysis

The first results of the competition is the first six, we mainly based on FIG gold point change, and change the numerical size of each sampling range of action, and the new addition of several new several hybrid operation action in the second race made the first four, relative to the time invested is ready to accept the rise results from our improvement first, two other students may be in a rush to improve on the results played opposite effect.

Reflection summary

Results 1. golden point game of it in line with your expectations?

The results of the first round of the game is not very in line with expectations, then we improve a little action, there had been some improvement, but the result is not very satisfactory, for three reasons: 1. The lack of training iterations. 2. The model indicates insufficient capacity. 3.action design yet to be improved.

2. Before the official game, you take what kind of strategy to evaluate the quality of the model?

Election some time in room1 and room0 see the scores obtained because of no entry is not looking for students to play.

3. If the numbers may be submitted for each round into 3 or looking for more participants to participate in the competition, your method is also suitable it?

Design concept is applicable, but the specific method of action that is required to select some changes may be disturbing ideas and methods will become more.

4. evaluation of partners, evaluation methods please reference the discussion of the sandwich method. Twinning partners and proposed areas for improvement.

My partner is Wu Ziwei and Wu Xueqing, his teammates are too strong.

Wu Xueqing code and organizational skills are strong, we are busy, but she was soon made a demo, and wrote a lot of improvement.

Wu Ziwei well have been made according to the results of efficiency improvements, and noted that more promising direction, of course, the ability to code is also very strong.

I think our problem is mainly the result of the general tension on time and in different places, and some people just returned jet lag, some people on the plane, and some people in the US, and we do not have entry, Some requirements are not very clear, do not have much opportunity to other students and teachers discussed.

5. some ideas.

Scores divided by the number of people feel very unreasonable, how could one more person will be able to linearly increase the output of the results of it, a fact made out of effect will not do much worse than three people, hoping to replace the log or the same project, others take 10 minutes, so we have 8,9 points. 

 
 

 

Scores divided by the number of people feel very unreasonable, how could one more person will be able to linearly increase the output of the results of it, a fact made out of effect will not do much worse than three people, hoping to replace the log or the same project, others take 10 minutes, so we have 8,9 points. 

Guess you like

Origin www.cnblogs.com/wtxwtx/p/11565891.html