stata和sas命令对比

已有 556 次阅读 2010-5-8 10:56 |个人分类:Stata|系统分类:科研笔记|关键词:SAS, STATA
朝鲜改革开放
SAS
Stata

In SAS operators can be symbols or mnemonic equivalents such as: & or and For many situations in SAS order doesn't matter: <= can be: =< and >= can be: =>


Most operators are the same in Stata as in SAS, but in Stata operators do not have mnemonic equivalents.  For example, you have to use the ampersand ( & ) and not the word "and".

This works: var_a >= 1 & var_b <= 10 where this does not: var_a >= 1 and var_b <= 10
These are the operators that are different in Stata: Symbol Definition
& and
| or
>= greater than or equal to
<= less than or equal to
== equality (for equality testing)
!= does not equal
! not
^ power
Note:  Symbols have to be in the order shown: " >= " not " => " .


/* this is a comment */ * this is also a comment ;


/* this is a comment */ * this is also a comment // this is a comment as well To continue a command to the next line (line continuation): /// you can comment here as well For example: list id state gender age income /// race income date


Range of values: if 1 <= var_a <= 10 or: if var_a in(1,2,3,4,5,6,7,8,9,10) or a list of character values: if state in("NC","AZ","TX","NY","MA","CA","NJ")


if var_a >= 1 & var_a <= 10 or: if inrange(var_a,1,10) or: if inlist(var_a,1,2,3,4,5,6,7,8,9,10) or a list of string values: if inlist(state,"NC","AZ","TX","NY","MA","CA","NJ") Stata has a limit of 10 arguments to inlist() (which includes the string variable) when the arguments are strings.  More than one variable can be specified.


Referencing multiple variables at a time:
Say the following variables are in a data file in the order shown: var1 var2 var3 age var4 var5 Then you could code them as: var1--var5 To SAS, this means "all variables that are positionally between var1 and var5," which would include the variable age.


Referencing multiple variables at a time: var1-var5 To Stata, this means "all variables that are positionally between var1 and var5."  Notice that there is only one dash ( - ).


Referencing multiple variables at a time: var1-var5 is the same as: var1 var2 var3 var4 var5 no matter the positions of the variables are in the observation.

Using a colon selects variables containing the same prefix:
var: could represent:
var1 var2 var10 variable varying var_1


Referencing multiple variables at a time: var? The question mark ( ? ) is a wild card that represents one character in the variable name.  It could be a number, a letter, or an underscore ( _ ). var* The asterisk/star ( * ) is a wild card that represents many characters in the variable name.  They could be numbers, letters, or underscores.  Thus: var* could represent: var1 var2 var10 variable varying var_1


To save the contents of the Log window and/or Output window, go to that window and click on the menu bar's "File", "Save".  In SAS batch mode these files are automatically generated for you.


To save the contents of the results window, start logging to a log file BEFORE you submit commands that you want logged.  Open a log file by clicking on the icon in the tool bar that looks like a scroll and a traffic light.  A "*.log" file is a simple ASCII text file; a "*.smcl" file is formatted with html-like tags. 

You can also use the log command: log using "D:\mydata\mydofile.log", replace Note: The replace option simply tells Stata to overwrite the log file if it already exists.  This is helpful when you have to run a do-file over and over again.

libname in "D:\mydata\"; data new; SASfile; run; or, starting in SAS 8: data new; set "D:\mydata\mysasfile.sas7bdat"; run;

use "D:\mydata\myStataFile.dta" You can also click on the "open file" icon and select your dataset.


Save the dataset newer to "D:\mydata\": libname in "D:\mydata\"; wer; set new; run;

save "D:\mydata\newer.dta" To overwrite the dataset newer if it already exists: save "D:\mydata\newer.dta" , replace 王文澜You can also click on the "save" icon to save your dataset.

proc contents; On selected variables: proc contents data = in.newer (keep= id state gender age income); run;

describe On selected variables: describe id state gender age income


proc means; On selected variables: proc means; var age income; run; or proc univariate; var age income; run;


summarize On selected variables: summarize age income If you want variable labels and a proc univariate style output try: summarize age income, detail or: codebook age income

proc freq; table var1; run;

tabulate var1 or, for just checking out your dataset, try the codebook command.


A series of 1-way tables: proc freq; tables var1 var2; run;


A series of 1-way tables: tab1 var1 var2


A 2-way table: proc freq; tables var1*var2; run;


A 2-way table: tab2 var1 var2

proc print; selected variables in this order: proc print; var id age income; run; On selected variables and a limited range of observations: proc print data = new (firstobs = 1 obs = 20); var id age income; run;

list On selected variables in this order: list id age income On selected variables and a limited range of observations: list id age income in 1/20


Create a numeric variable with a default length of 8 bytes: var1 = 1234;

Create a numeric variable with the minimum allowable length (3 bytes): length var1 3; var1 = 1234;


generate var1 = 1234 Note:  the default numeric data type is "float."  The statement above is relying on that default. 
It could have been written explicitly as: generate float var1 = 1234 "float" stands for "floating point decimal."
You could more wisely save storage space by specifying: gen int var1 = 1234
"int" stands for "integer data type."


Create a character variable with a length of 3 bytes: name = "Bob";


Generate a string variable with a length of 3 bytes: gen str3 name = "Bob"


Increase the variable length to allow for 5 characters: data new; length name $5; set new; *Change the values of numeric * and character variables: *; var1 = 123456; name = "Bobby";run;
replace var1 = 123456 Stata automatically increases the storage type if necessary.  To change the storage of a variable manually, use the recast command.  replace name = "Bobby" Stata automatically increases length to 5


Example of an if-then statement: if var1 = 123456 then var2 = 1;


The condition follows the command:
replace var2 = 1 if var1 == 123456 Notice that Stata requires two equals signs when testing equality.


Example of an if-then do loop: if age <= 10 then do; child = 1; parent = 0; end;


replace child = 1 if age <= 10 replace parent = 0 if age <= 10
Since each command is executed on all observations before the next command is executed, the if-then-do loop is not an option.  Stata does have excellent looping tools: foreach, forvalues, and while.


Example of an if-then-else: if 0 <= age <= 2 then agegp = 1; else if 2 < age <= 10 then agegp = 2; else if 10 < age <= 20 then agegp = 3; else if 20 < age <= 40 then agegp = 4; else agegp = . ;


For the same reason if-then-do loops (above) are not possible in Stata, the same goes for if-then-else.  But here is a way of doing the same thing.  In this example "missing(agegp)" is used to simply highlight the fact that it has not been assigned a value, just like the else does in if-then-else:
gen agegp = . replace agegp = 1 if missing(agegp) /// & age >= 0 & age <= 2 replace agegp = 2 if missing(agegp) /// & age > 2 & age <= 10 replace agegp = 3 if missing(agegp) /// & age > 10 & age <= 20 replace agegp = 4 if missing(agegp) /// & age > 20 & age <= 40
The cond() function can also be used: // nest cond() functions gen agegp = cond(missing(age),., /// else cond(age >= 1 & age <= 2 ,1, /// else cond(age > 2 & age <= 10,2, /// else cond(age > 10 & age <= 20,3, /// else cond(age > 20 & age <= 40,4,.))))) Check out this example of cond() in the Stata code examples page.

Better done with the recode command which can also create value labels: recode age ( 0/2.9999 = 1 "0 to 2 year olds") /// ( 3/10.9999 = 2 "3 to 10 year olds") /// (11/20.9999 = 3 "11 to 20 year olds") /// (21/40.9999 = 4 "21 to 40 year olds") /// ( else = . ) , gen(agegp) test
The test option checks to see if the ranges overlap.
Since recode's ranges are >= and <= , adding .9999 to the upper range ensures that fractional values are handled correctly.


Drop variables var1, var2, and var3: data new(drop= var1 var2 var3); set new; run;


Drop variables var1, var2, and var3: drop var1 var2 var3


Keep variables var1, var2, and var3: data new(keep= var1 var2 var3); set new; run;


Keep variables var1, var2, and var3: keep var1 var2 var3


Keep observations / subsetting if statement: data new; set new; if var1 = 1 then output new; run;


Keep observations keep if var1 == 1
新安江第二小学


Delete observations: data new; set new; if var1 = 1 then delete; run;


Drop observations: drop if var1 == 1
旧金山和约


Loop over a variable list (varlist):
data new(drop= i); set new; array raymond {4} var1 var2 var3 var4; do i = 1 to 4; if raymond{i} = 99 then raymond{i} = . ; end; run;
Check out this array example in the SAS programming examples page.

foreach i of varlist var1 var2 var3 var4 { replace `i' = . if `i' == 99 } Note:  Notice that the quote to the left of the local macro variable i is a left quote ( ` ).  The left quote is located at the top of your keyboard next to the ( ! 1 ) key.  In this example i is a local macro variable that exists only for the duration of the foreach command so it does not need to be dropped like the variable i in the SAS code.


Create variable labels:

label age = "age in years" income = "salary plus bonuses" ;;;



label var age "age in years" label var income "salary plus bonuses"


Define a format: proc format; value yesno 1 = "yes" 2 = "no" ;;; run;
Assign the format to a variable:
data newer; set newer; format smokes yesno.; run;


Define a format.  These are called "value labels": label define yesno 1 "yes" /* */ 2 "no"

Assign the value label to a variable: label value smokes yesno

Remove formats from a variable: data newer; set newer; ** just do not specify a format **; format smokes ; run;

label value smokes .


Assign formats defined by SAS to a variable: format interview_date mmddyy8.;


Assign formats defined by Stata to a variable: format interview_date %tdNN/DD/YY /* pre Stata 10 the format did not start * with the letter "t" and did not * need two letters for each part of the date: */ format interview_date %dN/D/Y
Note:  The letter N in %tdNN/DD/YY stands for "number of the month".  Specifying Mon in %tdDDMonCCYY uses the three letter abbreviation of the name of the month.  So %tdNN/DD/YY displays as "11/06/45" and %tdDDMonCCYY displays as "06Nov1945".

title "Number of Companies That Got Acquired";


Since the Results window/log file is a mix of both the log and the Output window Stata doesn't need a title statement.  Titling can be accomplished with a comment. /* Number of Companies That Got Acquired */

proc sort data = new out = newer; by id; run;

sort id


proc sort data= sashelp.shoes (keep= region product subsidiary stores sales inventory) out= work.shoes; by region subsidiary product; run; /* fix flaw in dataset * where the Copenhagen subsidiary * has 2 obs for product = "Sport Shoe" **/ proc summary nway data= work.shoes; /* the by statement fixes * the variable order in work.shoes **/ by region subsidiary product; var stores sales inventory; output out= work.shoes (drop= _TYPE_ _FREQ_) sum=stores sales inventory;run; /* long to wide because: * there are repeats of by-variable values **/ proc transpose data= work.shoes out= shoes_wide prefix=prodnum; by region subsidiary; var product; run;

keep region subsidiary product bysort region subsidiary (product) : gen prodnum = _n reshape wide product, /// i(region subsidiary) j(prodnum)

The xpose command is similar but only works with numeric data.  It will turn string variables into missing values.


/* wide to long because: * there are no repeats of by-variable values **/ proc transpose data= work.shoes_wide out= shoes_long name=prodnum; by region subsidiary; var prodnum: ; run;


// "j(prodnum)" just names the _j variable prodnum reshape long product, i(region subsidiary) j(prodnum)
Check out this reshape example in the Stata code examples page.

Using by-groups: data newer; set newer; by id; if first.id = 1 then f_num = 1; if first.id = 1 and last.id = 1 then s_num = 1; if last.id = 1 then l_num = 1; run;

by id: gen f_num = 1 if _n == 1 by id: gen s_num = 1 if _n == 1 & _N == 1 by id: gen l_num = 1 if _n == _N
Stata's _n is equivalent to SAS's _n_ in that it is equal to the observation number; but when inside a by command _n is equal to 1 for the first observation of the by-group, 2 for the second observation of the by-group, etc.

Stata's _N is equal to the number of observations in the dataset except in a by command when it is equal to the total number of observations in the by-group.


Count the total number of observations within each ID group, and add that total to each observation: proc summary data= new nway; class id; var age; output out= temp(drop= _type_ _freq_) n= totboys; run; proc sort data= temp; by id; run; proc sort data= new; by id; run; data newer; merge new temp; by id; run;


bysort id: egen totboys = count(age)
Note伯牙绝弦教学实录:  in both SAS and Stata, the count will be the number of observations where the variable being counted has a non-missing value.  Here we used the variable age.


Create a cumulative/running sum of boys within each ID group: data new; set newer; by id; retain count 0; if first.id then count = 0; if gender = 1 and age <= 18 then count = count + 1; run;
bysort id: gen count = sum(gender == 1 & age <= 18)

data both; w(in = a) in.newer(in = b); by id; if a = 1 and b = 1; run;
Check out this merge example in the SAS programming examples page.

use "D:\mydata\new.dta" sort id /* Starting in Stata 11 you have to specify * what type of merge you are doing nor have. * to have your datasets sorted before the merge. * This is a one-to-one merge: */ merge 1:1 id using "D:\mydata\newer.dta" // or in previous versions of Stata: merge id using "D:\mydata\newer.dta" keep if _merge == 3 Stata automatically creates the variable _merge after a merge.  Stata will not merge on another dataset if the variable _merge already exists in one of the datasets.
The dataset in memory is the "master" dataset.  The dataset that is being merged on is the "using" dataset.  Unlike SAS, variables shared by the master dataset and the using dataset will not be updated (values overwritten) by the using dataset.  Like SAS, the formats, labels, and informats of variables shared by the master dataset and the using dataset will be defined by the master dataset.  Remember that the master always wins.  Use the update option to overwrite missing data in master file.


Concatenate two datasets / add observations to a dataset: data both; wer; run;


use "D:\mydata\new.dta" append using "D:\mydata\newer.dta" /* Starting in Stata 11 you can use append without * having a dataset already in memory: */ append using "D:\mydata\new.dta" "D:\mydata\newer.dta"


Sort datasets in order to prepare them for a merge:

Sort permanently stored datasets and create new, sorted copies in the WORK library: proc sort data = inpany out = workpany; by id; run; proc sort data = in.firm out = work.firm; by id; run; data temp2; merge firm(in = a) company(in = b); by id; run;


Sorting datasets in order to prepare them for a merge is only required if you are using a version of Stata prior to Stata 11:

Create a local macro variable to represent a filename for Stata to use in temporarily storing a data file on the computer's hard drive if requested to do so later: tempfile company use "D:\mydata\company.dta" sort id
Save the dataset that's currently in memory to a temporary filename in Stata's temp directory.  This file will be deleted when Stata is exited just like a dataset in SAS's WORK library: save "`company'" use "D:\mydata\firm.dta" // pre Stata 11 code: sort id merge id using "`company'" /* Starting in Stata 11 the data does not need to * be sorted but the type of merge needs to be * specified like in this one-to-one merege: */ merge 1:1 id using "`company'"

proc surveymeans; cluster sampunit; strata stratum; var age income; weight sampwt; run;

svyset sampunit [pweight = sampwt], strata(stratum) svy: mean age income


Analyze a subpopulation by implementing the domain option: proc surveymeans; cluster sampunit; strata stratum; domain female; var age income; weight sampwt; run;


Analyze a subpopulation by implementing the subpop option: svy: mean age income, subpop(females) Note:  options come after a comma ( , ).


Starting in SAS 9:
proc surveyfreq; cluster sampunit; strata stratum; tables females*var1*var2; weight sampwt; run; When using proc surveyfreq the domain/subpop variable needs to be included in the tables statement.


svyset sampunit [pweight = sampwt], strata(stratum) svy: tab var1 var2, subpop(females) svy: tab var1 , subpop(females)

proc surveyreg; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run; The surveyreg procedure does not have a way of dealing with subpopulations.  Using by or where will not suffice as they will compute incorrect standard errors.

svyset sampunit [pweight = sampwt], strata(stratum) svy: regress depvar indvar1 indvar2 indvar3, /// subpop(females)


Starting in SAS 9:

proc surveylogistic; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run; The surveylogistic procedure does not have a way of dealing with subpopulations.  Using by or where will not suffice as they will compute incorrect standard errors.


svyset sampunit [pweight = sampwt], strata(stratum) svy: logit depvar indvar1 indvar2 indvar3, /// subpop(females)


Create a local macro variable ver: %let ver = 7; version = &ver.;
Technically, SAS macro variables begin with an ampersand ( & ) and end with a period ( . ).  It's good practice to end your macro variables with a period.


local ver = 7 gen version = `ver'
Notice that to evaluate the local macro variable ver a left quote ( `&nsp;) is used and then a right quote ( '&nsp;).  The left quote is located on your keyboard next to the ( ! 1&nsp;) key.

Print a subset of observations when a condition is true just to see examples (not all situations) where the condition exists in your data: /** WHERE subsets the data * * before OBS subsets the data */ proc print data= sashelp.shoes (where=(stores < 20) obs = 10); run; The above code lists the first 10 observations where (stores < 20).

list in 1/10 if stores < 20 // the order of if and in does not matter: list if stores < 20 in 1/10 Both will first subset the data to the first 10 observations and then attempt to subset the data based on the condition "if stores < 20".  So, a hack way of doing the same in Stata is to use the sum() function.  Since sum() creates a running sum, you have to repeat the condition outside the sum() to subset the data to that condition to list the first 10 observations.  The sum() function adds up the true conditions because true conditions evaluate to 1 (one) and false evaluate to 0 (zero). list if sum((stores < 20)) <= 10 & stores < 20 So you have to repeat the condition to subset the dataset to just those observations before starting the running sum.

If the condition is long you could mess up typing it twice so put it in a local macro variable: local cond stores < 20 list if sum((`cond')) <= 10 & `cond' This is what the Stata command ifwins does.

Get a frequency count for each combination of a set of multiple categorical variables: ** example of a 3-way table **; proc freq data= sashelp.shoes; tables region * product * stores / list; run;

There is no built-in Stata command to do this, but the contract command can be used like so:
preserve contract region product stores , /// freq(frequency) /// percent(percentage) /// cfreq(cumulative_freq) /// cpercent(cumulative_pct) list restore
举报
来源:李琦
| 分享(378) | 浏览(1125)
autoit源地址: /GetEntry.do?id=725558273&owner=240755041

本文发布于:2024-09-21 00:27:51,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/505580.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:绝弦   分类   笔记
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议