Stats 101C Kaggle Competition Final Project

Stats 101C Kaggle Competition Final Project
There are two competitions:
- A classification competition
- And a regression competition
Each competition accounts for 16% of your final course grade. Each competition is scored separately.
Grading of each competition:
- 4% Competition performance and R Script verification
- 12% Report
Competition performance grading:
First place and ties for first place: 4 points
Last place and ties for last place: 0 points
Everyone else in between first and last place earns points that are scaled.
For example, let’s say that at the end of the competition there are 12 unique positions on the leader

Stats 101C作业代做、代做R语言作业、代写R编程设计作业、代写Kaggle Competition作业
board. There might be more than 12 teams, but with ties, let’s say there are only 12 unique positions.
Having 12 unique positions means there will be 11 gaps. Each gap will be 4/11 = 0.3636 points. And the
scoring will be as follows:
- 1
st place (and any ties): 4 points
- 2
nd place (and any ties): 3.636 points
- 3
rd place (and any ties): 3.273 points
- Etc.
- 10th place (and any ties): 0.727 points
- 11th place (and any ties): 0.364 points
- 12th place (and any ties): 0 points
Each competition is scored separately. It is possible for a team/individual to get first place in one
competition and earn 4 points in that competition while ending up in last place for the other
competition and getting 0 points for that one.
R Script verification
You will submit an R script that shows how your predictions were made. I have provided a starting
template that imports the data and produces the necessary output file to submit to Kaggle.
Your R script will be run to verify that it does indeed produce the predictions you submitted to Kaggle.
If the predictions you submitted to Kaggle do not match the output produced by your R script, you will
get a 0 for the competition performance portion of your project grade. This rule is to prevent students
from making a model in R and then manually changing the predictions in the submission file to get a
higher score in the competition.
Similarly, your R script should not make predictions manually. It must use the trained model for making
predictions.
Report guidelines:
You will submit a PDF report explaining the model you fit. The report is worth 12 points.
The report will describe anything that is done to the data before the model is fit. This includes any data
cleaning, data manipulation, or data transformation that was performed. It includes any variable
selection or dimension reduction process or any new variables that were created. You don’t need to do
any of the above things in your script to get full credit, but if you do any of the above steps, they must
be explained in the report.
The report will describe what kind of model was chosen for the final prediction and submission.
The report will explain why you think your model is a good choice and/or any shortcomings of the model
and areas of improvement. This section should include how you evaluated your model performance.
(Your evaluation of model performance should not be, “I submitted the predictions to Kaggle and got a
score.”)
Report should be about 2 pages long.
Grading Rubric for the report.
Good: Basic: Needs Improvement:
Overall writing Explanations are correct,
complete, and convincing.
Assumptions are made explicit
and given justification.
[minus ~0 pts]
Explanations are partially
correct but incomplete or
unconvincing.
Assumptions are made explicit
but not justified.
[minus ~1pts]
Explanations are illogical,
incorrect, or incoherent.
Assumptions are not made
explicit.
[minus ~3 pts]
Description of
things done to
the data before
fitting the model
Explanation of any data
manipulation is complete
without mistakes.
Any and all steps that are
performed in the script are
explained. Reasons for each step
is provided and are justifiable.
[minus ~0pts]
Any and all steps that are
performed in the script are
explained.
Reasons for each step is
unconvincing or questionable.
[minus ~1pts]
Explanation of any data
manipulation is not complete.
There are steps performed in
the script that are not
explained.
Reasons for each step is not
provided or are not justified.
[minus ~3 pts]
Description of
final model
Explanation of model is
complete and without mistakes.
Report describes how many /
what variables are used. Report
describes properties of the
model (e.g. parametric vs nonparametric).
Report provides
reasons for using this particular
model.
[minus ~0pts]
Explanation of model is
complete but has minor
mistakes.
Report describes how many /
what variables are used. Report
describes properties of the
model. Report provides reasons
for using this particular model.
[minus ~1pts]
Explanation of model contains
serious mistakes. The model
used is not adequately
described.
Report does not provide reasons
for using this particular model.
[minus ~3 pts]
Discussion of
model strengths
and weaknesses
and model
performance
Evaluation of model
performance is complete and
reasonable.
Report discusses model
strengths and weaknesses /
possible improvement.
Discussion is correct and
justifiable.
[minus ~ 0pts]
Evaluation of model
performance is provided but
contains minor mistakes.
Report discusses model
strengths and weaknesses /
possible improvement.
[minus ~1pts]
Evaluation of model
performance is missing or
contains serious mistakes.
Discussion of model strengths
and weaknesses is missing or
contains serious mistakes.
[minus ~3 pts]

因为专业，所以值得信赖。如有需要，请加QQ：99515681 或邮箱：[email protected]

微信：codehelp

Stats 101C Kaggle Competition Final Project

猜你喜欢