Term Project - Part 1 MATH 3560H - Wesley Burr
Problems
Your term project is intended to be a summary of all of the practical skills with regression and
modeling you’ve gained across the semester. The goal is to pick a data set which is of interest to you,
munge1
the data into R, and then performing a full-fledged data analysis using the linear modeling
framework we’ve learned about.
This initial, Part 1, of the project is due on February 27th, and simply requires that you pick your
data set, and perform your first, flailing, import of the data into R. It’s entirely fine if this import does
not work, or runs into problems: that’s part of the “fun”.
You will hand in a short R Markdown rendered PDF, of no more than 2 pages, which discusses
your data set of choice, its source and provenance, and the reason it is interesting to you. You should
also state one thing you’d like to explore or discover from this data: a hypothesis, if you will. As
mentioned above, you should then try to import the data into R and see if you can manage it. This
mini-report is worth 10% of your final term project grade, or 2% of your final grade.
The rules for the project are as follows:
1. Your data set must be at least 100 observations of at least 3 variables. You do not have to use
them all, but it must be at least that big when you begin.
2. All analyses should be done using R as the framework. You may use other tools as required,
but interfaced through, and analyzed by, R.
3. The final term project report will be delivered as a worked analysis in R Markdown, of no more
than 20 pages length, and a minimum of 5 pages (realistically, you won’t be able to fit it in 5
pages, this is just to keep you sensible).
4. A rubric for evaluation of the final project will be posted this month.
5. All steps taken to clean up and organize your data must be documented and reproducible.
Here are a list of suggested places you can find data to start with:
1. Kaggle Competition Data Sets:
2. Canadian Government Open Data:
3. City of Toronto Open Data:
4. Environment and Climate Change Canada’s National Air Pollution Surveillance network data:
http://maps-cartes.ec.gc.ca/rnspa-naps/data.aspx. Air pollution of all sorts.
5. Environment and Climate Change Canada’s Climate Data (Meteorology):
6. The Bank of International Settlements (BIS): programmatic API access to historical data available
via an R Package,
BIS.html
1Data “munging” is the process of organizing, cleaning and importing the data into a formatted, ready-to-be-analyzed
data set.
1
Term Project - Part 1 MATH 3560H - Wesley Burr
7. Old Textbook Data (not preferred, but ok as a final choice):
8. Data Sets available as part of an R Package (also not preferred, but ok as a final choice):
Do not feel constrained by these, they’re just intended as a starting point and inspiration for you.
I strongly encourage you to find your own data set which you find interesting, and start from there. I
am available to assist you in searching if you have something in mind and need a start point.
http://www.daixie0.com/contents/18/1294.html
Our field of direction: window programming, numerical algorithm, AI, artificial intelligence, financial statistics, econometric analysis, big data, network programming, WEB programming, communication programming, game programming, multimedia linux, plug-in programming program, API, image processing, embedded/MCU database programming, console process and thread, network security, assembly language hardware Programming software design engineering standards and regulations. The ghostwriting and ghostwriting programming languages or tools include but are not limited to the following:
C/C++/C# ghostwriting
Java ghostwriting
IT ghostwriting
Python ghostwriting
Tutored programming assignments
Matlab ghostwriting
Haskell ghostwriting
Processing ghostwriting
Building a Linux environment
Rust ghostwriting
Data Structure Assginment
MIPS ghostwriting
Machine Learning homework ghostwriting
Oracle/SQL/PostgreSQL/Pig database ghostwriting/doing/coaching
web development, website development, website work
ASP.NET website development
Finance Insurance Statistics Statistics, Regression, Iteration
Prolog ghostwriting
Computer Computational method
Because professional, so trustworthy. If necessary, please add QQ: 99515681 or email: [email protected]
WeChat: codinghelp