常用stata命令(CommonlyusedStatacommands)


2023年12月29日发(作者:discovery属于什么档次)

常用stata命令(Commonly used Stata commands)

Two command the most important thing is help and search. Even

people who use Stata often is difficult, every detail is not

necessary to remember the common commands, not to mention those

not commonly used. So, in the face of difficulties and no free

expert advice, the use of Stata comes with the help file is the

best choice. Stata help file is very detailed, exhaustive, it

is both benefit and trouble. When you see the long help file,

is not related to quickly find information have no confidence?

Gossip not say. Help and search are finding help file, the

difference between them lies in the help to find the precise

command name, and search is a fuzzy search. If you know the name

of a command, and want to know its specific use, simply enter

the help space with the name Stata in the command line window.

Enter the screen will show all the contents of this command help

file. If you want to know a estimate or some calculations in

Stata, but do not know specifically how to achieve, you need

to use the search command. The method used is similar to help,

but the exact command name into a keyword. Enter the results

window will give all the keywords and related help file name

and link list. To find the most relevant content in the list,

click will be given after the relevant help file in the pop-up

window in the view. Patience for repeated experiments, usually

can quickly find the content you need.

Following the formal data processing. Data processing

experience is the best I can with Stata do file editor and write

down your work done. Because there is little empirical study

can be completed once, so when you next time to work. To repeat

the previous work is very important. Sometimes because of some

small differences, you will find that the original results

cannot be copied. Then if there is a do file record of previous

work will take you from hell to heaven. Because you don't have

to again and again return to work. There is a small isolated

button in the Stata toolbar in the upper portion of the window,

put up the mouse will appear "bring do-file editor to front",

click it appears do file editor.

In order to make the do file to work properly, the general need

to edit the do file in the "head" and "tail". Here I use the

"head" and "tail".

* (label. Jot down the file's mission). * /

Capture clear (empty memory data)

Capture log close (close all open log file)

Set MEM 128M (set used for Stata memory capacity)

Set more off (to close the more options. If you open this option,

then the screen output, namely a just a screen output results.

You press the spacebar and then output the next screen, until

all is finished. If the closure is a non-stop, a total output).

Set matsize 4000 (the maximum number of matrix order set. I use

is not too big?)

CD D: (enter the data where the drive and folder. Dos and the

command line is very similar).

Log using (file name).Log, replace (open the log file, and

update. The log file will record the results given all the

documents after the operation, if you modify the contents of

the file, the replace option can be updated to the recent

operating results).

Use (file name), clear (open data file.)

(file contents)

Log close (closing the log file.)

Exit, clear (exit and empty data in memory.)

In the empirical work often contact the original data. These

data are not after finishing, there are some mistakes and

inconsistencies. For example, the lack of a variable

observation value, sometimes, sometimes with -9, -99 said.

If you use these observations to return, often draw very

erroneous results. Also, in a different data file, the same

variables sometimes use the variable name, will cause trouble

to merge data. Therefore, to get the raw data, often need to

need to generate a new database, and using only the data of the

new library. This part of the work is not difficult, but very

basic. Because if you're not careful here, behind the white

things tend to do.

If you know the variables required to do now is to check the

data and generate the necessary data and form a database for

future use. An important command to check data including

codebook, Su, Ta, DES and list. Among them, codebook provides

the most comprehensive information, the disadvantage is not if

limits, so, sometimes with other help. To observe the number

of non deletion Su spaces with variable name report variables

of the corresponding mean, standard deviation, minimum and

maximum value. TA followed by a space (or two) report variable

name is a variable (or two variable 2D) values (excluding

missing values) of the frequency ratio and size cumulative

ratio. Des can add arbitrary variable name, as long as the data

in the. Type it report variable storage, display format and

labels. Definitions and units generally record the variables

in the label. Observe the list report the value of a variable

can be used to limit the range of if or in. All of these commands

can be followed without any variables, the reported results are

the corresponding information of all variables are used in the

database. Speaking of feeble, open the Stata prove it to

yourself.

I said a little digression. In addition to codebook, the

statistical commands are belongs to the R command (also called

general command). After the implementation can use return list

stored in the R (report) in statistical results. The most

typical is the summarize R command. It will store the sample

mean, standard deviation, variance, minimum, maximum, sum up

the statistical information. You in the implementation of Su,

just type return list can get all the information. In fact,

similar to general command of the return command, the command

is estimated (also called e command) also has a ereturn command,

with the function of information storage report. In more

complex programming, such as regression decomposition,

calculation program to directly calculate the statistics of

these functions is essential.

Check the data, first use codebook to look at the domain and

its units. If you have a -9 value, such as -99, check the record

of missing value method in the questionnaire. Sure they are

missing values, instead of using point record. The command is

replace (variable name) (variable name) ==-9 if. Look at the

missing point with the recorded value of how much, as a basis

for selection of variables.

From the available data, I will give no label variables plus

notes. The label or unified naming rules or unified variables.

Change the variable name command is Ren (the original variable

name space) (new variable name). The definition of tag command

is label var (variable name) "(blank label content)". Uniform

variable name help memory, concise label to help clear the

variable units of information.

If you need to use the new variables through the primitive

variables derived, then you need to know about gen, Egen and

replace of the three command. Gen and replace are often used

together. The basic syntax of them is Gen (or replace) space

(variable name) = (expression). The difference between the two

is that Gen is to generate a new variable, replace is redefining

old variables.

The dummy variable is a derived variable we often need to use

the. If you need to generate the virtual variable number, there

are two ways to generate. A concise method: Gen spaces (variable

name) = ((restrictions)) [the outside of the small bracket is

ordered to, inside the small bracket is not a command needed,

only that "condition" is not the order restriction.

If an observation satisfies the constraints, then the virtual

variables it is 1, or 0. Another point to trouble. Namely

Gen (variable name) = 1 if (value restriction)

Replace (the same name) = 0 if (equal to zero restrictions)

The two method looks like, but there is a little difference

between. If the constraints of the variables used in any missing

values, then the results of the two methods. If there is a

missing value, the first method is to observe the missing values

of dummy variables are defined as 0. And the second method can

value the dummy variable is divided into three kinds, one is

equal to 1, two is equal to 0, three is equal to the missing

value. This avoids the originally unknown error observation

information into the regression. Another time to easily

generate hundreds of dummy variables.

A large number of dummy variables are often generated according

to the value of a known variable. For example, hope to control

each observation communities in a regression, dummy variables

that control mark community. The community number may have

hundreds, if need to be repeated hundreds of times for the last

generation said, this is too stupid. A large number of virtual

variable commands;

Ta (variable name), Gen ((name))

The first variable name in brackets are known variables, in the

above example is community encoding. After a variable name in

brackets is the common prefix of the newly generated virtual

variables, followed by a digital representation of different

virtual variables. If I enter d here, then the order will be

the new generation of D1, D2, and so on, until all the

communities have a dummy variable.

Control community variables in the regression, simply add these

variables. A problem is too much simply by adding dummy

variables, how? One solution is to use an ellipsis, d* said all

d letter variables, another method is to use the dash, d1-d150

said the first to the 150th community dummy variables (assuming

a total of 150 communities).

There is also a way to directly control the dummy variables in

the regression, without actually to generate these dummy

variables. The use of the areg command can be done, it is the

grammar

Areg (explanatory variables) (explanatory variables), absorb

(variable name)

The first variable absorb following the variable name and

speaking in front of the same name in order. In the above example

is the encoding of community. The results of the regression and

directly into the corresponding dummy variable in the same reg.

The last generation of variable is egen. Egen and Gen are used

to generate new variables, but Egen is characterized by its more

powerful functions. Gen can support Egen function, support

additional function. If you don't use gen, you have to use Egen

to. But I was lazy, so far only by taking the average, and the

simple function.

Sometimes the data is more complex, often generate the required

variables is not very direct, you need to process. Have met some

strange record date format in the original data. For example,

in October 23, 1991 a record 19911023. I want to use it for the

year and month, and generate dummy variables. Here is my way:

Gen yr=int (date)

Gen mo=int ((data-yr*10000) /100)

TA yr Gen (YD)

TA Mo Gen (MD)

If you have created all the necessary variables, now the most

important thing is to save your work. The command is save space

(file name), replace. As mentioned, the replace option will

update your changes to the database, so we must be careful to

use. The best save a new database, if the original library

changed and not turned back, called the day should not be called

by anyone.


本文发布于:2024-09-21 21:57:28,感谢您对本站的认可!

本文链接:https://www.17tex.com/fanyi/41909.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:属于   档次   作者
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议