常用stata命令(Commonly used Stata commands)
Two command the most important thing is help and search. Even
people who use Stata often is difficult, every detail is not
necessary to remember the common commands, not to mention those
not commonly used. So, in the face of difficulties and no free
expert advice, the use of Stata comes with the help file is the
best choice. Stata help file is very detailed, exhaustive, it
is both benefit and trouble. When you see the long help file,
is not related to quickly find information have no confidence?
Gossip not say. Help and search are finding help file, the
difference between them lies in the help to find the precise
command name, and search is a fuzzy search. If you know the name
of a command, and want to know its specific use, simply enter
the help space with the name Stata in the command line window.
Enter the screen will show all the contents of this command help
file. If you want to know a estimate or some calculations in
Stata, but do not know specifically how to achieve, you need
to use the search command. The method used is similar to help,
but the exact command name into a keyword. Enter the results
window will give all the keywords and related help file name
and link list. To find the most relevant content in the list,
click will be given after the relevant help file in the pop-up
window in the view. Patience for repeated experiments, usually
can quickly find the content you need.
Following the formal data processing. Data processing
experience is the best I can with Stata do file editor and write
down your work done. Because there is little empirical study
can be completed once, so when you next time to work. To repeat
the previous work is very important. Sometimes because of some
small differences, you will find that the original results
cannot be copied. Then if there is a do file record of previous
work will take you from hell to heaven. Because you don't have
to again and again return to work. There is a small isolated
button in the Stata toolbar in the upper portion of the window,
put up the mouse will appear "bring do-file editor to front",
click it appears do file editor.
In order to make the do file to work properly, the general need
to edit the do file in the "head" and "tail". Here I use the
"head" and "tail".
* (label. Jot down the file's mission). * /
Capture clear (empty memory data)
Capture log close (close all open log file)
Set MEM 128M (set used for Stata memory capacity)
Set more off (to close the more options. If you open this option,
then the screen output, namely a just a screen output results.
You press the spacebar and then output the next screen, until
all is finished. If the closure is a non-stop, a total output).
Set matsize 4000 (the maximum number of matrix order set. I use
is not too big?)
CD D: (enter the data where the drive and folder. Dos and the
command line is very similar).
Log using (file name).Log, replace (open the log file, and
update. The log file will record the results given all the
documents after the operation, if you modify the contents of
the file, the replace option can be updated to the recent
operating results).
Use (file name), clear (open data file.)
(file contents)
Log close (closing the log file.)
Exit, clear (exit and empty data in memory.)
In the empirical work often contact the original data. These
data are not after finishing, there are some mistakes and
inconsistencies. For example, the lack of a variable
observation value, sometimes, sometimes with -9, -99 said.
If you use these observations to return, often draw very
erroneous results. Also, in a different data file, the same
variables sometimes use the variable name, will cause trouble
to merge data. Therefore, to get the raw data, often need to
need to generate a new database, and using only the data of the
new library. This part of the work is not difficult, but very
basic. Because if you're not careful here, behind the white
things tend to do.
If you know the variables required to do now is to check the
data and generate the necessary data and form a database for
future use. An important command to check data including
codebook, Su, Ta, DES and list. Among them, codebook provides
the most comprehensive information, the disadvantage is not if
limits, so, sometimes with other help. To observe the number
of non deletion Su spaces with variable name report variables
of the corresponding mean, standard deviation, minimum and
maximum value. TA followed by a space (or two) report variable
name is a variable (or two variable 2D) values (excluding
missing values) of the frequency ratio and size cumulative
ratio. Des can add arbitrary variable name, as long as the data
in the. Type it report variable storage, display format and
labels. Definitions and units generally record the variables
in the label. Observe the list report the value of a variable
can be used to limit the range of if or in. All of these commands
can be followed without any variables, the reported results are
the corresponding information of all variables are used in the
database. Speaking of feeble, open the Stata prove it to
yourself.
I said a little digression. In addition to codebook, the
statistical commands are belongs to the R command (also called
general command). After the implementation can use return list
stored in the R (report) in statistical results. The most
typical is the summarize R command. It will store the sample
mean, standard deviation, variance, minimum, maximum, sum up
the statistical information. You in the implementation of Su,
just type return list can get all the information. In fact,
similar to general command of the return command, the command
is estimated (also called e command) also has a ereturn command,
with the function of information storage report. In more
complex programming, such as regression decomposition,
calculation program to directly calculate the statistics of
these functions is essential.
Check the data, first use codebook to look at the domain and
its units. If you have a -9 value, such as -99, check the record
of missing value method in the questionnaire. Sure they are
missing values, instead of using point record. The command is
replace (variable name) (variable name) ==-9 if. Look at the
missing point with the recorded value of how much, as a basis
for selection of variables.
From the available data, I will give no label variables plus
notes. The label or unified naming rules or unified variables.
Change the variable name command is Ren (the original variable
name space) (new variable name). The definition of tag command
is label var (variable name) "(blank label content)". Uniform
variable name help memory, concise label to help clear the
variable units of information.
If you need to use the new variables through the primitive
variables derived, then you need to know about gen, Egen and
replace of the three command. Gen and replace are often used
together. The basic syntax of them is Gen (or replace) space
(variable name) = (expression). The difference between the two
is that Gen is to generate a new variable, replace is redefining
old variables.
The dummy variable is a derived variable we often need to use
the. If you need to generate the virtual variable number, there
are two ways to generate. A concise method: Gen spaces (variable
name) = ((restrictions)) [the outside of the small bracket is
ordered to, inside the small bracket is not a command needed,
only that "condition" is not the order restriction.
If an observation satisfies the constraints, then the virtual
variables it is 1, or 0. Another point to trouble. Namely
Gen (variable name) = 1 if (value restriction)
Replace (the same name) = 0 if (equal to zero restrictions)
The two method looks like, but there is a little difference
between. If the constraints of the variables used in any missing
values, then the results of the two methods. If there is a
missing value, the first method is to observe the missing values
of dummy variables are defined as 0. And the second method can
value the dummy variable is divided into three kinds, one is
equal to 1, two is equal to 0, three is equal to the missing
value. This avoids the originally unknown error observation
information into the regression. Another time to easily
generate hundreds of dummy variables.
A large number of dummy variables are often generated according
to the value of a known variable. For example, hope to control
each observation communities in a regression, dummy variables
that control mark community. The community number may have
hundreds, if need to be repeated hundreds of times for the last
generation said, this is too stupid. A large number of virtual
variable commands;
Ta (variable name), Gen ((name))
The first variable name in brackets are known variables, in the
above example is community encoding. After a variable name in
brackets is the common prefix of the newly generated virtual
variables, followed by a digital representation of different
virtual variables. If I enter d here, then the order will be
the new generation of D1, D2, and so on, until all the
communities have a dummy variable.
Control community variables in the regression, simply add these
variables. A problem is too much simply by adding dummy
variables, how? One solution is to use an ellipsis, d* said all
d letter variables, another method is to use the dash, d1-d150
said the first to the 150th community dummy variables (assuming
a total of 150 communities).
There is also a way to directly control the dummy variables in
the regression, without actually to generate these dummy
variables. The use of the areg command can be done, it is the
grammar
Areg (explanatory variables) (explanatory variables), absorb
(variable name)
The first variable absorb following the variable name and
speaking in front of the same name in order. In the above example
is the encoding of community. The results of the regression and
directly into the corresponding dummy variable in the same reg.
The last generation of variable is egen. Egen and Gen are used
to generate new variables, but Egen is characterized by its more
powerful functions. Gen can support Egen function, support
additional function. If you don't use gen, you have to use Egen
to. But I was lazy, so far only by taking the average, and the
simple function.
Sometimes the data is more complex, often generate the required
variables is not very direct, you need to process. Have met some
strange record date format in the original data. For example,
in October 23, 1991 a record 19911023. I want to use it for the
year and month, and generate dummy variables. Here is my way:
Gen yr=int (date)
Gen mo=int ((data-yr*10000) /100)
TA yr Gen (YD)
TA Mo Gen (MD)
If you have created all the necessary variables, now the most
important thing is to save your work. The command is save space
(file name), replace. As mentioned, the replace option will
update your changes to the database, so we must be careful to
use. The best save a new database, if the original library
changed and not turned back, called the day should not be called
by anyone.
本文发布于:2024-09-21 21:57:28,感谢您对本站的认可!
本文链接:https://www.17tex.com/fanyi/41909.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |