sas数据操作set语句及其相关选项,widgets

sas数据操作set语句及其相关选项,widgets
set语句:
set语句有什么⽤?
试想如果要给数据集增加⼀列(固定列或者计算列),增加新变量或者创建⼦集
下⾯给出创建新列和增加固定列data步和sql过程的办法
data me(keep=name newVariable total);
set sashelp.class;
  if sex='男';
newVariable=.;
total = height+weight;
run;
proc print noobs;run;
proc sql;
select name, '.'as newVariable, height+weight as total from sashelp.class
where sex='男';
quit;
Set statement
 Type: Executable
  Syntax
    SET<SAS-data-set(s)<(data-set-option(s))>><options>
  Without Arguments
    when you do not specify an argument, the SET statement reads an observation from the most recently created data set.
  Arguments
    SAS-data-set(s): specifies a one-level name, a two-level name, or one of the special SAS data set names.
    (data-set-options): specifies actions SAS is to take when it reads variables or observations into the program data vector for processing.
      DROP=  KEEP=  RENAME=  (execution sequence: drop>keep>rename)
      FIRSTOBS=(first obs to be read) 
      OBS=(last obs to be read)   IN=   WHERE=
  Options
    END: creates and names a temporary variable that contains an end-of-file indicator. The variable, which is initialized to zero, is set to 1 when SET reads the last observation of the last data set listed. This variable is not added to any new data set.
迪兰恒进6750    NOBS:creates and names a temporary variable whose value is usually the total number of observations in the input data set or data sets. If more than one data set is listed in the SET statement, NOBS= the total number of observations in the data sets that are listed. The number of observations includes those observations that are marked for deletion but are not yet deleted.
    POINT:specifies a temporary variable whose numeric value determines which observation is read. POINT= causes the SET statement to use random (direct) access to read a SAS data set.
  Details
    What Set Does?
      Each time the SET statement is executed, SAS reads one observation into the program data vector. SET reads all variables and all observations from the input data sets unless you tell SAS to do otherwise. A SET statement can contain multiple data sets; a DATA step can contain multiple SET statements
按从前到后的顺序纵向堆叠数据集1-n。
sas程序内部执⾏的过程如下:
多媒体电教室1:编译阶段
SAS reads the descriptor information of each data set that is named in the SET statement and then creates a program data vector that contains all the variables from all data sets as well as variables created by the DATA step.
2:SAS reads the first observation from the first data set into the program data vector. It processes the first observation and executes other statements in the DATA step. It then writes the contents of the program data vector to the new data set.
The SET statement does not reset the values in the program data vector to missing, except for variables whose value is calculated or assigned during the DATA step. Variables that are created by the DATA step are set to missing at the beginning of each iteration of the DATA step. Variables that are read from a data set are not.  (sas不会将pdv中原数据集的内容清空,新⽣成的变量除外)
3:SAS continues to read one observation at a time from the first data set until it finds an end-of-file indicator. The values of the variables in the program data vector are then set to missing, and SAS begins reading observations from the second data set, and so on, until it reads all observations from all data sets
对于带by的set data1-datan
1:基于前⾯的描述增加 SAS creates the FIRST.variable and LAST.variable for each variable listed in the BY statement
2:清空变量的⽅式有不同,The values of the variables in the program data vector are set to missing each time SAS starts to read a new data set and when the BY group changes。根据by组的改变来清空,当by组改变时会进⾏清空。
然后根据by进⾏观测值的排序
对于两个已经排好序的数据集,如果想要合并后依然排好序,有两种⽅法
第⼀种:set data1 data2;然后再执⾏proc sort。
第⼆种:set data1 data2;by variable;这种效率⽐第⼀种⾼,虽然不知道但是书上这么说的。我觉得可能是数据读取次数的问题吧,第⼆种只需要读⼀次,第⼀种要读两次
set语句从⼀个或多个sas数据集中读取观测值并实现纵向合并,每⼀个set语句执⾏时,sas就会读⼀个观测到pdv中,⼀个data步可以有多个set语句,每个set语句可以跟多个sas数据集,多个set语句含有多个数据指针。
set会将输⼊数据集中的所有观测值和变量读取,除⾮你中间执⾏其他步骤
SET<SAS-data-set(s)<(data-set-options(s) )>><options>;
(data-set-options) specifies actions SAS is to take when it reads variables or observations into the program data vector for processing.
Tip:Data set options that apply to a data set list apply to all of the data sets in the list. Data set options specify actions that apply only to the SAS data set with which they appear. They let you perform the following operations:
主要的功能是以下四天,并给出相关例⼦
renaming variables  ex--> set sashelp.class(rename = (name = name_new));
selecting only the first or last n observations for processing sashelp.class(where =(sex='M')); where和rename要⽤括号括起来
dropping variables from processing or from the output data set sashelp.class(drop =name sex);sashelp.class(keep=name sex);
specifying a password for a data set
输出两个数据集
data d1(keep = name) d2(keep = name sex);
  set sashelp.class(keep = name sex);
run;
IN=选项应⽤
IN本⾝不是变量,所以不能通过赋值语句获得,IN=的最⼤作⽤是标识不同的数据集 
data one;
input x y$;
cards;
1 a
2 b
;
run;
data two;
电视连续剧红娘子input x z$;
cards;
3 c
2 d
;
run;
data me;
set one(in=ina)two(in=inb);
if ina=1 then flag=1;else flag=0;
run;
res:
set sashelp.class(firstobs=3 obs=6);  /*读取第三到第六个变量*/
run;
*获取数据集中的总观测数;
data me(keep = total);
set sashelp.Slkwxl nobs=total_obs;  *if 0 then  set sashelp.Slkwxl nobs=total_obs;改进语句,因为sas是先编译再执⾏,所以可以选择不执⾏,只获取编译的信息就⾜够了
total = total_obs;
output;
stop; *这⾥⽤stop是因为,我们只要象征性读取set中的第⼀条即可,输出total变量,然后终⽌程序;
run;
set的流程是这样的,先set第⼀个观测值,然后往下执⾏total=total_obs;然后继续执⾏,遇到stop则停⽌,否则在没遇到错误的情况下会返回data步开头继续set第⼆⾏观测值,所以,如果不屑stop语句,则会出来总数个相同的值为总数的变量
1:程序编译时⾸先读nobs=选项,该选项在头⽂件中,nobs=total_obs将总观测数传给临时变量total_obs
2:pdv读⼊数据集,并把所有变量放⼊pdv。
。。。。省略
POINT=选项取指定的⼀条观测
data me;
失依儿童
n=3;
set sashelp.class point=n;
output;
do n=3,6,10;
set sashelp.class point=n;
output;*获取多个指定⾏的观测;
end;
set sashelp.class nobs = last point=last;
output; *获取最后⼀⾏观测值;
stop;
run;
point=n对应的是变量,不能直接赋值数字,省略stop后会让程序进⼊死循环,不⽤stop语句sas⽆法
判断该数据指针是否指向了最后⼀条观测,从⽽会陷⼊死循环。如果不⽤output,会得不到数据集,point和stop⼀般是连在⼀起使⽤
_N_的使⽤
data d1 d2;
set sashelp.class;
if _n_ le 10 then output d1;
else output d2;吡啶甲酸
run;
set读取序列数据集合的⼀些注意事项:
set goods1:;*读取所有以good1开头的⽂件,⽐如goods12 goods13;
set sales1-sales4; set sales1 sales2 sales3 sales4;*这两条语句等价;
/*
set sales1-sales99; *合法;
set sales001-sales99; *不合法,如果以0开头,那么后⾯的⽂件的数字要⽐前⾯的⽂件的数值的位数多,⾄少是相等;
*/
set cost1-cost4 cost2: cost33-37; *可⾏;
/* these two lines are the same */
set sales1 - sales4;
set'sales1'n - 'sales4'n;
/* blanks in these statements will cause errors */
set sales 1 - sales 4;
set'sales 1'n - 'sales 4'n;
/* trailing blanks in this statement will be ignored */
set'sales1  'n - 'sales4  'n;
set data1 data2;
其执⾏顺序为先读取data1,直⾄data1的最后⼀条语句后再读取data2,并将其纵向合并。
set data1;set data2;
data a;
input x $ @@;
cards;
a1 a2 a3
;
run;
data b;
input y $ @@;
cards;
b1 b2
;
run;
/*编译后内存出现两条数据指针分别指向a b,同时产⽣⼀个pdv*/
/*读取数据集a的第⼀条观测进⼊pdv,数据集b的第⼀条观测进⼊pdv,然后输出,再返回data步开头,重复进⾏,当读⼊a的第三⾏时,b中的指针已经指向了⽂件尾部,所以跳出data步*/ data ab;
set a;set b;
run;
data ba;
set b;set a;
run;
data c;
桂枝二陈汤set a b;
run;
Widgets:
  1. Get the count of observations.
%macro Get_Obs_Cnt(dsName);
Data test;
call symput('n_obs', last);
if0then set&dsName nobs=last;
run;
%mend Get_Obs_Cnt;
%put 'n_obs=' &n_obs;
  2. select random observations.
%macro Generate_Random_Obs(inData, outData, num);
data &outData;
rand_num = ceil(totalObs*ranui(totalObs));
do i=1to#
set&inData nobs=totalObs point=rand_num;
if(_error_) then abort;
output;
end;
stop;
run;
%mend Generate_Random_Obs;

本文发布于:2024-09-22 07:15:24,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/505565.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:数据   读取   语句
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议