Univariate Statistics and Methodology using R

Coursework
Univariate Statistics and Methodology using R - 2017/2018
Martin Corley and Milan Valá?ek
Read this whole document before you do anything else.
Overview

代写Statistics留学生作业、代做R编程语言作业、代写Methodology作业

For the course assignment, you will be expected to retrieve, clean, and analyse a data set. In this
document we provide the primary research questions to be answered, information on the structure
and format of the final report, information on code that should be submitted, and a brief overview
of the marking criteria. You can find the codebook for the data set and the R script template on
LEARN. The data for this assignment come from the Timed picture naming in seven languages study
(Bates, et al., 2003) available as part of the (International Picture Naming Project)[https://crl.ucsd.
edu/experiments/ipnp/]
It can be tempting to over-complicate assessments like this, particularly if you have a long time to
complete them. The labs have been designed to prepare you for this assignment: to explore data,
to conduct appropriate analyses for given data types, and to make decisions that you can justify.
Bear in mind that completing this assessment does not require any knowledge that wasn’t covered
in lectures, labs, and readings.
What you need to submit
For your assessment you need to submit two documents: your report and your R code. More instructions
on how to submit are below. Here, we provide more detail on what to submit.
Report
You need to produce a report answering the assignment questions below. Your report should include
appropriate analyses to provide answers to these questions while describing the process and utilising
graphics where necessary to illustrate your points.
Your report should clearly identify the decisions you made in analysing the data, as well as
summarising what can be concluded from your analysis.
Figures and tables should be numbered and captioned, and referred to in the text; important
statistical outcomes should be summarised in the text.
1
Reporting should follow APA 6th Edition guidelines for the presentation of tables, figures, and
statistical results (see final lecture for more information). Alternative style is acceptable so long
as it is clear and consistent.
Your report should be a maximum of 4 sides of A4 (including tables and figures), in a standard
font, size 12, with normal 1 inch margins.
Code
Your report must be accompanied by an R script (a text file with the extension .R, the default file
type when saving a script from R-Studio) which can be used to exactly reproduce the results set out
in your submitted report. It should include all steps taken in data cleaning and all analyses. Every
answer to the assignment tasks/questions given below must be accompanied by code used to find
out the answer. You should provide clear and informative comments within the file describing the
steps taken. Please download the script template from LEARN and use it to write your script.
Important: Do not edit the lines of code in the script template that read in the data sets!
This lines will obtain the data to be used for this assignment from the internet and assign them to data
frames.
We will check that the code runs and produces the results presented in your report.
Any code copied and pasted or otherwise adapted from internet examples should be cited appropriately
in the comments. An appropriate citation should include the URL where the code was found,
the name of the website or blog, and the original author’s name. In the absence of a proper name,
you can cite the contributor’s nickname or alias.
You can work on the R-script in small groups (no more than 4 students) if preferred. If you do this, it is
important that you take a couple of steps:
1. At the very start of your script include a comment line (line starting with #) which includes the
exam numbers (not the names) of those you worked with. For example:
# Produced in collaboration with students B045329 and B018429
2. Within the script point out (again using comments) which blocks of code are shared.
3. Please ensure that your acknowledgements match those of others in your group (if you say you
produced the script in collaboration with B045329, we expect B045329 to acknowledge you).
Important: While the code can be worked on in small groups, the written report must be produced
entirely independently. It is not OK to include sections in the written report that are written collaboratively.
2
Submission and Marking
Submitting your work
All coursework must be submitted before 12:00 (noon) on Monday the 21st of January 2019 via Turnitin.
You can access it by clicking on the “Assessment details and submission” tab of the course page
on LEARN. There are two sections there, one for each of the two files you are required to submit.
You will be asked to provide your name and submission title. The submission title must be your exam
number (and nothing more). Your name will not appear anywhere in the documents accessed by
the markers. To ensure that the marking is entirely anonymous, please do not include your name or
student number anywhere in either of the submitted files.
Remember, the files you are required to submit are:
Report, as described above. The filename must be your exam number with whatever extension
is provided by your chosen word processor (e.g., ‘B045329.docx’). The file you create should
have your exam number on each page (e.g., in the header or footer).
R script which runs all of the data cleaning and the final analyses reported. The filename must
be your exam number with the .R extension (e.g., ‘B045329.R’).
Please ensure that you name your documents exactly as above. File names such as ‘R Script for
B04329.R’ or ‘B044329 Report final.docx’ slow down document matching and marking and will result
in loss of marks.
Please check LEARN for detailed instructions on the submission process prior to submitting.
Marking Criteria
The code is worth 30% of the coursework marks, and the report is worth 70% of the coursework marks.
Work will automatically fail (max mark of 30%) unless both components are submitted.
You will be assessed on the following:
1. Appropriate cleaning of the data set and key variables of interest, making appropriate and
justified decisions on the steps you take.
2. Selection of appropriate statistical tests and variables to answer the primary research question
and the justifications provided for your selections.
3. Interpretation of the results of the selected analyses.
4. R-code that runs without errors all the way through, is clear and appropriately commented.
For handy tips on writing good code, see http://adv-r.had.co.nz/Style.html (no need to stick
religiously to the guidelines but following them does make code nice and tidy).
5. Last but not least: Clarity of writing and formatting. The report should conform to the APA
6
th Edition style guidelines for formatting text, tables, and figures, reporting results of statistical
analyses, writing style, etc. However, alternative style is acceptable provided it is comparably
3
clear and consistent. For a useful resource, see https://owl.purdue.edu/owl/research_and_
citation/apa_style/apa_style_introduction.html.
Data
You are given four separate data sets:
df_e is a data set of 520 pictures and their associated variables in a English language picture
naming study
df_c, df_h, and df_s are data sets of 173 pictures in a Chinese, Hungarian, and Spanish
language picture naming study, respectively. Each data set uses a different picture set.
The code book for the data sets can be found on LEARN.
Assignment Questions
Question 1
Is there a relationship between the frequency of a target word in the English corpus and reaction
time (RT) on a picture-naming task? Once you are content that the data are appropriately cleaned,
run the following model:
m1 <- lm(rttar ~ lnfreq, data = df_e)
Question 1.1
Concisely report and interpret the results of the model.
Question 1.2
What is the predicted RT for a word with a frequency of 20?
Hint: Don’t forget the the frequency variable is log-transformed (see codebook for details) and that,
in R, exp() is the inverse function of log().
Question 1.3
Produce and interpret a diagnostic plot of the model that shows whether or not the model residuals
are normally distributed.
4
Question 2
Do target word length and the number of synonyms a word has have additional effects on RT above
and beyond that of word frequency in the English language data set?
Question 2.1
Fit an appropriate model to test this question.
Question 2.2
Run model diagnostics and, if needed, re-fit the model.
Question 2.3
Report and interpret the results of the final model.
Question 3
Do variables entered as predictors in the Question 2 model predict whether or not at least one participant
will produce an error response on a picture naming task in the English language data set?
Pictures for which there are any invalid or incorrect responses should be coded as containing errors.
Question 3.1
Fit, report, and interpret an appropriate model to test this question.
Question 3.2
What is the predicted probability of a correct response on a picture whose name has a frequency of
12, is 3 syllables long and has only one form?
Question 4
Does the effect of target word frequency on RT vary significantly between Chinese, Hungarian, and
Spanish?
Hint: You will need to construct a single data frame to answer this question.
5
Question 4.1
Fit an appropriate model to test this question. Run model diagnostics and re-fit the model if needed.
Question 4.2
Report and interpret the results of the final model.
Question 4.3
Which language has the weakest effect of frequency on RT? Describe it in terms of unit change in RT
as a result in a unit change in log-frequency.
Remember:
Explore and describe the data.
Build appropriate models, evaluate them and their associated assumptions, and interpret the
results.
Let your models be informed by the research questions they are supposed to address. There is
seldom a need for mind-bogglingly complex and borderline uninterpretable 6-way interaction
models.
GOOD LUCK!

因为专业，所以值得信赖。如有需要，请加QQ：99515681 或邮箱：[email protected]

微信：codinghelp

Univariate Statistics and Methodology using R

猜你喜欢