SPSS Learning 4 Data Preprocessing (Zero Basic)


Preface

This chapter mainly explains data preprocessing in SPSS, which mainly includes data sorting, finding duplicate cases, variable calculation, case selection, counting of values ​​within cases, classification and summary, data grouping, array transposition, weighting processing and data splitting, etc. content.


1. Data preprocessing

1.1 Sorting of data

  • The role of sorting in data analysis: quickly find possible outliers; some operations require sorting as a prerequisite (such as file merging)
  • Rearrange all cases in ascending or descending order according to the variable values ​​of one or more variables specified by the user
    (1) Sorting order: ascending order, descending order
    (2) Multiple sorting: the order of selecting variable names is critical
    Example 1
    (1) Data Instructions: Use [Employee Data] (the data has been uploaded to the resource, please download it yourself if needed)
    (2) Requirements: Sort by [Professional Title] in descending order, then [Basic Salary] in ascending order
    (3) Operation: Click [Data]-- ---[Case Sorting]
    Insert image description here
    Insert image description hereExample 2
    (1) Data Description: Use [College Student Career Planning] (the data has been uploaded to the resource, if you need, please download the data link here )
    (2) Requirements: Sort by [Professional Classification] in ascending order first , and then sort in descending order by [Q5 Intention after Graduation]
    (3) Operation: Click [Data]-----[Case Sorting]
    Insert image description here

1.2 Find duplicate cases

  • Usually when analyzing data, there should be no cases with the same key variable (e.g. number).
  • If there are duplicate cases, the main reason may be negligence or unreasonable coding during data entry, etc.
  • When the amount of data being processed is relatively large, it is necessary to automatically find duplicate cases.
    Example 1
    (1) Data description: Use [Employee Data], which is the data after vertical merger. The vertical merger content is in SPSS Learning 3
    (2) Requirement: Find duplicate cases
    (3) Operation: Click [Data]-----[Identify duplicate cases]
    Insert image description here
    Insert image description here
    Insert image description here

1.3 Variable calculation

  • According to the SPSS arithmetic expression given by the user, process all or part of the sample data to generate new variables or perform necessary transformations on the original variables (such as: forecasting problems, generating ratio data, normal processing of skewed data, time series Smooth processing, etc.)
    (1) SPSS arithmetic expression:
    an expression composed of arithmetic operators (+, -, *, l, **), SPSS functions and SPSS variable names
    (2) SPSS functions

  • Including: arithmetic functions, statistical functions, distribution functions, logical functions, string functions, missing value functions, date and time functions, and other functions

  • For example: arithmetic functions
    Insert image description here

  • Statistical function
    Insert image description here
    (3) SPSS conditional expression

  • An expression composed of SPSS relational operators, logical operators, SPSS functions and SPSS variable names.
    Relational operators: > (greater than), < (less than), = (equal to), ~= (not equal to), >= (greater than or equal to), <= (less than or equal to) such as: nl>32, sr<=700
    ( "Employee data" case)
    Logical operators: & or AND (and), | or OR (or), ~ or NOT (not)
    such as: (nl>32) and (sr<=700)
    such as: (nl=32 ) /(sr<>700)
    Such as: not xb=1
    Example 1
    (1) Data description: Use the [College Student Career Planning] data here
    (2) Requirement: Generate a new variable from Q61, Q62, Q63 and Q64 ( Level of professional awareness)
    (3) Operation: Click [Convert]-----[Calculate Variables]
    Insert image description here
    If you only produce new variables for a certain part (for example: only generate new variables for men whose gender is male) you can choose Click [If] in the lower left corner for further settings.

1.4 Case selection

  • Case selection (data selection) is to extract some data (samples) according to certain rules from a large amount of collected data (population) to participate in analysis
  • Case selection can improve the efficiency of data analysis and test models
  • Case selection methods include:
    (1) Conditional selection
    (2) Random selection
    (3) Selecting samples in the specified interval
  • The operations after case selection are all based on the selected data.
    Examples
    (1) Data description: Use [College Student Career Planning]
    (2) Requirements: Case selection for Q3 (whether there is guidance on participating in career planning courses), there will be no participation Remove students who have passed
    (3) Operation: Click [Data]-----[Select Case]
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description here

1.5 within-case values ​​are counted

  • For all or part of the cases, calculate how many of several variables have values ​​that fall within the specified range, and store the results in new variables.
  • Specify variables that need to participate in counting
  • The new variable in which the counting result is stored is set by the user.
  • Specifying the counting interval is a critical step. In SPSS, single variable values, system missing values, intervals of given maximum and minimum values, etc., all belong to counting intervals.
    Example
    (1) Data description: Use [College Student Career Planning]
    (2) Requirement: Count all Q61 to Q616 (data with an answer of 0) and put them into new variables
    (3) Operation: Click [Convert] - ----[Count the values ​​in the case]
    Insert image description here
    Insert image description here

1.6 Classification and summary

  • Group samples by specified grouping variable value
  • Calculate the basic statistics of summary variables in each group separately.
    Example : Compare the average age and average salary of male and female employees. (1) Data description: Use [College Student Career Planning] (
    Insert image description here
    2 ) Requirement: x1 (professional and career cognitive score) (3) Operation: Click [Data]-----[Summary]



    Insert image description here
    Insert image description here

1.7 Data grouping

  • The purpose is to better understand the distribution characteristics of continuous variables
  • The method is to perform grouping by distance:
    (1) Specify which variable to group by
    (2) Define the grouping interval (no duplication or omission)
    (3) Specify the group flag variable
    Insert image description here
    example
    to store the grouping results (1) Data description: Use [College Student Career Planning]
    (2) Demand: Grouping by professional and occupational cognitive scores
  • x1 = "Professional and Occupational Cognition Score" = Q61+Q62+Q63+Q64
    The values ​​of Q61, Q61, Q63, and Q64 are respectively between O-5 points.
    The value of X1 is between 0-20
    . If there is a missing value, then no operation is performed
  • Group X1, the group distance is 5, set:
    X1<5→1
    5<X1<10→2
    10<X1<15 →3
    X1>15→4
  • Afterwards, frequency analysis can be performed on the grouping results, histograms can be drawn, etc.
    (3) Operation: Click [Convert]----[Recode into different variables]
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description here
    (4) Frequency analysis: [Analysis]----[Descriptive Statistics]-----[Frequency]
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description here
    Insert image description hereInsert image description here

1.8 Data transposition

Data transposition : Exchange data rows and columns in the data editor window
Insert image description here
(1) Data description: Use [Employee Data]
(2) Requirement: Realize the exchange of rows and columns
Insert image description here
(3) Operation: Click [Data] --- --【Transpose】
Insert image description here
Insert image description here

1.9 Weighted processing

Weighting processing in statistical analysis is extremely common, such as calculating weighted averages, etc.
For another example, the website used an online scoring survey to investigate whether the audience was satisfied with the Spring Festival Gala. If 10% of the viewers rated it 5 points, 25% of the viewers rated it 4 points, 40% of the viewers rated it 3 points, and 25% of the viewers rated it 2 points, then how should we use these scores for analysis and evaluation? Obviously, it can A weighted average is used for analysis, with each percentage acting as a weight.

(1) Data description: use [blood pressure and age]
(2) Requirement: weight the number of people

After the data is opened, look at the lower right corner. You can see that the opened data has been weighted. When we want to do the weighting, we need to cancel it first.
Insert image description here

(3) Operation: Click [Data]----[Case Weighting]

  • The first step is to cancel the weighting. After confirming, the "Weighting On" in the lower right corner will disappear.
    Insert image description here
  • Step 2: Weighting operation, weighting the number of people. After confirmation, you can see "Weight on" in the lower right corner
    Insert image description here
    (4) What is the use of the weighted data? ---------You can make a crosstab.
    Operation: Click [Analysis]-----[Descriptive Statistics]----[Crosstab]
    Insert image description here
    Insert image description here

If you cancel the [case weighting] of the number of people and perform a crosstab, the results will not be ideal.Insert image description here

1.10 Data splitting

  • Sort data
  • Data is also grouped
  • Provide convenience for future group statistical analysis
  • If you want to analyze all the data as a whole, you need to cancel the data splitting again.
    Example
    (1) Data Description: Use [Employee Data]
    (2) Requirement: Split [zc] (Professional Title)
    (3) Operation: Click [Data] ---- the bottom one [Split File], After completion, you can see "Split based on zc" in the lower right corner.
    Insert image description here

Guess you like

Origin blog.csdn.net/weixin_61472217/article/details/131026525