R BRFSS language data visualization to explore

Original link: http://tecdat.cn/?p=9266

 

set up

Load package

In this experiment, we will use dplyr package to explore data and use it to visualize ggplot2 package for data visualization

library(ggplot2)
library(dplyr)

Loading data

load("brfss2013.RData")
dim(brfss2013)
## [1] 491775    330

We can see the dimensions of the data set. Our data set containing 491,775 observations (rows) and 330 variable (column)


Part 1: Data

About BRFSS

Behavioral risk factor surveillance system (BRFSS) is an annual telephone survey of more than 40 million people in the United States. The system collects risk behavior related to US residents with health-related, long-term health, and the use of preventive services. As the name suggests, BRFSS is designed to identify risk factors and report the health of the adult population trends emerging. 

Data collection methods

By telephone interviews with residents of the home, collected data from US states, the District of Columbia and US territories participating. In 2011 it carried out more than 500,000 times such interviews, using a random sample telephone interviews and collected samples of mobile access, access to a telephone sample by stratified sampling method from a state based on density, while the phone is a sample from a random sampling extraction.  

In order to maintain consistency between the states, BRFSS follow the standard data collection protocols, including to eligible families random sampling, building survey, conducted phone interviews, maintain procedures to protect the confidentiality of respondents and to ensure that the interview process the quality of. Month sample telephone interview conducted in the same month. 

Impact on inference about the scope of data collected comments

BRFSS survey covered 50 states and US territories, including more than 500,000 telephone interviews with a random household data collected, the data is only a random sample, and established strict procedures in data collection to ensure that a representative population sample. 

Since this is an observational study, researchers randomly assigned the trail and target random sample of experimental control and not have and can not infer a causal relationship between variables.


Part 2: Research questions

Research Question 1:

Association between physical and mental health and sleep you? 

This issue involves the issue of the impact of sleep on human health and improve with age. This will be the focus of exploration data from the interesting correlation. The variables being considered are:

  • physhlth: poor physical health days
  • menthlth: poor mental health days
  • sleptim1: How much time your sleep
  • Gender: Gender respondents

Research Question 2:

Whether income level and employment status would improve health?

 Because the income level and employment status have a huge impact on the individual's self-worth and psychological state. Personal financial insecurity will cause great spiritual harm, and we hope that these people have adverse health conditions.

Variables to consider are:

  • genhlth: General Health
  • 1 employment: Employment Status
  • 2 Income: income level

Research Question 3:

Obesity (high BMI) may increase the risk of heart attack and high cholesterol levels do?

This problem trying to increase the impact on the health risks of heart attacks answer to obesity. Cardiac arrest is one of the background population affects all of the most common diseases. We will try to find high cholesterol levels, BMI and the relationship between elevated risk of heart disease.

The variables being considered are:

  • _bmi5cat: body mass index categories
  • tellhi2: hypercholesterolemia
  • cvdinfr4: he had been diagnosed with heart disease

Part 3: Exploratory Data Analysis

Research Question 1:

V1<-brfss2013%>%
  filter(!is.na(physhlth),!is.na(sleptim1),!is.na(menthlth),!is.na(sex))%>%
  select(physhlth,sleptim1,menthlth,sex)

We have created a new data frame V1, which contains four continuous variables. After you remove the line containing NA input, we classified the data.

ggplot(data=V1,aes(x=sleptim1,y=physhlth,color=sex))+
  geom_point()+scale_fill_manual(values =c("red","seagreen3"))

 

​ 

Research Question 2:

 

Variable data set cleanup goals and the results are stored in a new V2 in

 

 

Research Question 3:

 

The new variable V3 stored data frame 3 consisting of the target variables.

count(V3,cvdinfr4)
## # A tibble: 2 x 2
##   cvdinfr4      n
##     <fctr>  <int>
## 1      Yes  26935
## 2       No 370021
ggplot(data=V3,aes(x=cvdinfr4,fill=X_bmi5cat))+
  geom_bar()

Overweight and obese people seem most susceptible to heart disease.

 

Hypercholesterolemia most serious causes are overweight or obese.

 

If you have any questions, please leave a comment below. 

 

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

 

QQ exchange group: 186 388 004 

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

 

Welcome attention to micro-channel public number for more information about data dry!
 
 

Welcome to elective our R language data analysis will be mining will know the course!

Guess you like

Origin www.cnblogs.com/tecdat/p/11995494.html