Creating and Using SAS index (SAS INDEX)

I. Overview

In the combined data set when DATA step may be used, but require sequencing using KEY VALUE DATA step, and KEY VALUE name must be the same;
can also be used PROC SQL, unnecessary for the sorting, the step of renaming. Which way is used when a small amount of data will not affect the efficiency of the code, but
10 million lines and there is a serious decline in the efficiency of the above code when hundreds of variables. At this point if the code will use the index to improve operating efficiency.
INDEX Index is divided into simple and complex index, you can also create temporary and permanent index index (the index a permanent record but create work, to generate an index file in the file storage .sas7bndx ground)

Second, the method of creating the index:

1, DATA step to create an index

Using (index =) option to create indexes DATA step, as follows:
explicit index (Explicitly) INDEX = (ID / UNIQUE)
implicit index (Implicitly) INDEX = (ID)
NOTE: When using significant loss index, if the KEY VALUE not the only, it will generate an error message in the log page. (It is recommended to use an explicit index).

A simple index:

Score the DATA (the INDEX = (the student_id));
the SET Test;
the RUN;
. 1
2
. 3
may be a simple index to create a plurality of simultaneously:

Score the DATA (the INDEX = (the student_id class));
the SET Test;
the RUN;
. 1
2
. 3
complex index:

Score the DATA (INDEX = (INDEX_NAME = (the CLASS ID)) / UNIQUE);
the SET the Test;
RUN; * INDEX_NAME create a complex index of names.
. 1
2
. 3
2, the PROC DataSets step to create an index (an index to create a set of existing data sas, fast execution time, because the read only KEY VALUE)

PROC DATASETS LIBRARY=;
MODIFY data_set_name;
INDEX CREATE var/UNIQUE NOMISS; *var是创建索引的key value;
INDEX CREATE index_name=(var1 var2)/UNIQUE;
QUIT;

NOTE: In the DELETE PROC DATASET remove the index with the INDEX;
. 1
2
. 3
. 4
. 5
. 6
. 7
. 3, PROC SQL creating the index;

The SQL the PROC;
the CREATE <UNIQUE> the INDEX index_name the ON column_name; * UNIQUE optional;
the QUIT;
NOTE: Remove the INDEX index with the DROP;
. 1
2
. 3
. 4
III combined data set using the index

The following data sets datasource 1 and 2 as subsequent steps.
Dataset 1: SCORE

DATA score;
input ID $ SCORES;
DATALINES;
1 80
2 85
3 60
4 75
5 90
6 99
;
RUN;
1
2
3
4
5
6
7
8
9
10
11
数据集2: AGES

AGES the DATA;
the INPUT $ ID of AGE;
DATALINES;
2 18 is
. 3. 19
. 4 16
. 7 20 is
. 8. 19
. 9 15
;
the RUN;
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
. 11
to create an index:

DATA SCORE(INDEX = (ID));SET SCORE;RUN;
DATA AGES (INDEX = (ID));SET AGES;RUN;
1
2
交集:

data S_AND_A;
SET SCORE;
_ERROR_ = 0;
SET AGES KEY = ID/UNIQUE;
IF _IORC_ = 0;
RUN;
1
2
3
4
5
6
只包含SCORE:

DATA SCORE_ONLY;
SET SCORE;
_ERROR_ =0;
SET AGES KEY = ID/UNIQUE;
IF _IORC_ NE 0 THEN AGE=0;
RUN;
1
2
3
4
5
6
只包含AGES:

AGES_ONLY the DATA;
the SET AGES;
_ERROR_ = 0;
the SET SCORE ID = KEY / UNIQUE;
the IF THEN SCORES _IORC_ NE 0 = 0;
the RUN;
. 1
2
. 3
. 4
. 5
. 6
contains only records SCORE ID does not belong to AGE:

S_NOTIN_A the DATA;
the SET SCORE;
_ERROR_ = 0;
the SET AGES ID = KEY / UNIQUE;
the IF _IORC_ NE 0;
of AGE = 0;
the RUN;
. 1
2
. 3
. 4
. 5
. 6
. 7
contains only records AGES ID does not belong to the SCORE:

DATA A_NOTIN_S;
SET AGES;
_ERROR_ = 0;
SET SCORE KEY = ID/UNIQUE;
IF _IORC_ NE 0;
SCORES = 0;
RUN;
1
2
3
4
5
6
7
并集:

DATA SOA; SET SCORE(KEEP = ID) AGES(KEEP = ID);RUN;
PROC SORT DATA =SOA NODUPKEY; BY ID;RUN;

DATA SORA;
SET SOA;
_ERROR_ =0;
SET SCORE KEY=ID/UNIQUE;
IF _IORC_ NE 0 THEN SCORES = 0;
_ERROR_ =0;
SET AGES KEY = ID/UNIQUE;
IF _IORC_ NE 0 THEN AGE = 0;
RUN;
1
2
3
4
5
6
7
8
9
10
11
12
*note:

. 1: ERROR IS RESET to 0 to Prevent AN error for condition Condition that Would Write The Contents of The the PDV to The the SAS log.
2: IORC IS A Automatic variable (Program Data Vector the PDV), IT apos Used with INDEXED DataSet to Check Whether The Direct Read a matching Observation found, for Matched Observation IORC = 0; NE IORC 0 otherwise;
. 3: can not be used to create and index data in step a;
4: set when the original data is overwritten, the original index is missing, the need for the use of new index;
5: using the length statement, to prevent the string is truncated *
---------------------

Guess you like

Origin www.cnblogs.com/ly570/p/11161456.html