在r语言标准差中probit回归中怎么参数估计量的样本标准差用什么函数生成

点击联系发帖人 时间：2016-12-10 07:20

r语言求标准差

查看: 1207|回复: 0
R语言 subselect包 wald.coef()函数中文帮助文档(中英文对照)
wald.coef(subselect)
wald.coef()所属R语言包：subselect
& && && && && && && && && && && && && & Wald statistic for variable selection in generalized linear models
& && && && && && && && && && && && && &&&Wald统计量广义线性模型中的变量选择
& && && && && && && && && && && && && &&&译者：生物统计家园网机器人LoveR
描述----------Description----------
Computes the value of Wald's statistic, testing the significance of the excluded&&variables, in the context of variable subset selection in generalized linear models
瓦尔德统计计算值，检测的意义被排除在外的变量，广义线性模型中的变量子集选择的背景下，
用法----------Usage----------
参数----------Arguments----------
An estimate (FI) of Fisher's information matrix for the&&full model variable-coefficient estimates
费舍尔的信息矩阵的估计值（FI）的完整模型变量系数估计值
A matrix product of the form FI %*% b %*% t(b) %*% FI&&where b is a vector of variable-coefficient estimates
一个矩阵乘积的形式FI %*% b %*% t(b) %*% FIb是一个向量变量系数估计值
参数：indices
a numerical vector, matrix or 3-d array of integers giving the indices of the variables in the subset. If a matrix is specified, each row is taken to represent a different k-variable subset. If a 3-d array is given, it is assumed that the third dimension corresponds to different cardinalities.
给中的子集的变量的指数的整数数值向量，矩阵或三维阵列。如果指定矩阵中，每一行代表一个不同的k-变量子集。如果给定的3维阵列中，假设第三个维度对应于不同的基数。
参数：tolval
the tolerance level to be used in checks for&&ill-conditioning and positive-definiteness of the Fisher Information and the auxiliar (H) matrices. Values smaller than tolval are considered equivalent to zero.
使用的容忍程度的病态和费舍尔信息和auxiliar的（H）矩阵的正定性，检查。值小于tolval，被视为等同于零。
参数：tolsym
the tolerance level for symmetry of the Fisher Information and the auxiliar (H) matrices. If corresponding&&matrix entries differ by more than this value, the input matrices will be considered asymmetric and execution will be aborted. If corresponding entries are different, but by less than this value, the input matrix will be replaced by its symmetric part, i.e., input matrix A becomes (A+t(A))/2.
对称的Fisher信息和auxiliar的（H）矩阵的容忍程度。如果超过此值，相应的矩阵元素不同的输入矩阵将被视为非对称和执行将被中止。如果相应的条目是不同的，但小于这个值，输入矩阵将取代其对称的部分，即输入矩阵A（A + T（A））/ 2。
详细信息----------Details----------
Variable selection in the context of generalized linear models is typically based on the minimization of statistics that test the significance of excluded variables.&&In particular, the likelihood ratio, Wald's, Rao's and some adaptations of such statistics, are often proposed as comparison criteria for variable subsets of the same dimensionality. All these statistics are assympotically equivalent and can be converted into information criteria, like the AIC, that are also able to compare&&subsets of different dimensionalities (see references [1] and [2] for further&&details).
的背景下，广义线性模型中的变量选择通常是基于最小化的统计数据，测试排除变量的意义。特别地，似然比，沃尔德，饶的和一些适应这种统计，经常提出相同的维数的变量的子集作为比较标准。所有这些统计数据是assympotically等价的，可以转换成信息标准，如AIC，还可以比较不同维度的子集（请参阅参考资料[1]和[2]为进一步的细节）。
Among these criteria, Wald's statistic has some computational advantages because it can always be derived from the same (concerning the full model) maximum likelihood and Fisher&&information estimates. In particular, if W_{allv} is the value of the Wald statistic testing the significance of the full covariate vector, b and FI are coefficient and Fisher&&information estimates and H is an auxiliary rank-one matrix given by H =&&FI %*% b %*% t(b) %*% FI, it follows that the value of Wald's statistic for the&&excluded variables (W_{excv}) in a given subset&&is given by&&$W_{excv} = W_{allv} - tr (FI_{indices}^{-1} H_{indices}) ,$ where FI_{indices} and H_{indices} are the& &portions of the FI and H matrices associated with the selected variables.
在这些标准中，沃尔德的统计数据有一定的计算优势，因为它总是可以来自相同（有关完整的模型）的最大似然法和Fisher信息的估计。特别是，如果W_{allv}的Wald统计量的值是测试完整的协变量向量的意义，b和FI系数和Fisher信息的估计，H是一个辅助秩矩阵由H =&&FI %*% b %*% t(b) %*% FI ，它遵循的价值被排除的变量（W_{excv}）在给定的子集沃尔德统计$W_{excv} = W_{allv} - tr (FI_{indices}^{-1} H_{indices}) ,$FI_{indices}和H_{indices}部分的FI和H矩阵与所选变量相关联。
The FI and H matrices can be retrieved (from a glm object) by the glmHmat function and may be used as input to the search functions anneal,&&genetic, improve and eleaps. The Wald function computes the value of Wald statistc from these matrices for a subset specified by indices&&
FI和H矩阵可以由glmHmat函数检索（从的glm对象），并可以被用作输入到搜索功能anneal，genetic，improve和eleaps。沃尔德函数值计算的瓦尔德statistc从这些矩阵的一个子集指定的indices
The fact that indices can be a matrix or 3-d array allows for the computation of the Wald statistic values of subsets produced by the search functions anneal, genetic,&&improve and eleaps (whose output option $subsets are matrices or 3-d arrays), using a different criterion (see the example below).
事实上，indices可以是一个矩阵或3-D数组可以计算Wald统计量的值的子集所产生的搜索功能anneal，genetic，improve和eleaps（输出选项“$subsets是矩阵或3-D数组），使用不同的标准（见下面的例子）。
值----------Value----------
The value of the Wald statistic.
Wald统计量的值。
参考文献----------References----------
Regression Models, Biometrics, Vol. 34, 318-327.
Program for Generalized Models I. Statistical and Computational Background,&&Computer Methods and Programs in Biomedicine, Vol. 24, 117-124.
实例----------Examples----------
## ---------------------------------------------------------------[＃------------------------------------------------- --------------]
##&&An example of variable selection in the context of binary response[，＃变量选择的一个例子的上下文中的二进制响应]
##&&regression models. The logarithms and original physical measurements[＃回归模型。对数和原来的物理测量]
##&&of the &Leptograpsus variegatus crabs& considered in the MASS crabs [＃的“Leptograpsus的粘多糖在大众螃蟹螃蟹”的考虑]
##&&data set are used to fit a logistic model that takes the sex of each crab[＃数据集，以适应MF模式，每个蟹的性别]
##&&as the response variable.[＃作为响应变量。]
library(MASS)
data(crabs)
lFL &- log(crabs$FL)
lRW &- log(crabs$RW)
lCL &- log(crabs$CL)
lCW &- log(crabs$CW)
logrfit &- glm(sex ~ FL + RW + CL + CW&&+ lFL + lRW + lCL + lCW,
crabs,family=binomial)
## Warning message:[＃警告消息：]
## fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, [＃拟合概率数值0或1中发生：glm.fit（= X时，y = Y，]
## weights = weights, start = start, etastart = etastart, [＃重量=重量，开始=开始，etastart，etastart，]
lHmat &- glmHmat(logrfit)
wald.coef(lHmat$mat,lHmat$H,c(1,6,7),tolsym=1E-06)
## [1] 2.286739[＃[1] 2.286739]
## Warning message:[＃警告消息：]
## The covariance/total matrix supplied was slightly asymmetric: [＃协方差/总矩阵提供的是稍不对称：]
## symmetric entries differed by up to 6.93e-14.[＃对称条目相差6.93e-14。]
## (less than the 'tolsym' parameter).[＃（小于参数tolsym）。]
## It has been replaced by its symmetric part.[＃它已取代了它的对称部分。]
## in: validmat(mat, p, tolval, tolsym)[＃：validmat（垫，P，tolval，tolsym）]
## ---------------------------------------------------------------[＃------------------------------------------------- --------------]
## 2) An example computing the value of the Wald statistic in a logistic [＃2）的一个例子Wald统计量的计算值在逻辑]
##&&model for five subsets produced when a probit model was originally [时产生的＃五子集模型原本是一个概率模型]
##&&considered[＃考虑]
library(MASS)
data(crabs)
lFL &- log(crabs$FL)
lRW &- log(crabs$RW)
lCL &- log(crabs$CL)
lCW &- log(crabs$CW)
probfit &- glm(sex ~ FL + RW + CL + CW&&+ lFL + lRW + lCL + lCW,
crabs,family=binomial(link=probit))
## Warning message:[＃警告消息：]
## fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, [＃拟合概率数值0或1中发生：glm.fit（= X时，y = Y，]
## weights = weights, start = start, etastart = etastart) [＃重量=重量，“开始”=“开始，etastart = etastart）]
pHmat &- glmHmat(probfit)
probresults &-eleaps(pHmat$mat,kmin=3,kmax=3,nsol=5,criterion=&Wald&,H=pHmat$H,
r=1,tolsym=1E-10)
## Warning message:[＃警告消息：]
## The covariance/total matrix supplied was slightly asymmetric: [＃协方差/总矩阵提供的是稍不对称：]
## symmetric entries differed by up to 3.64e-12.[＃对称条目相差3.64e-12。]
## (less than the 'tolsym' parameter).[＃（小于参数tolsym）。]
## It has been replaced by its symmetric part.[＃它已取代了它的对称部分。]
## in: validmat(mat, p, tolval, tolsym) [＃：validmat（垫，P，tolval，tolsym）]
logrfit &- glm(sex ~ FL + RW + CL + CW&&+ lFL + lRW + lCL + lCW,
crabs,family=binomial)
## Warning message:[＃警告消息：]
## fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, [＃拟合概率数值0或1中发生：glm.fit（= X时，y = Y，]
## weights = weights, start = start, etastart = etastart)[＃重量=重量，“开始”=“开始，etastart = etastart）]
lHmat &- glmHmat(logrfit)
wald.coef(lHmat$mat,H=lHmat$H,probresults$subsets,tolsym=1e-06)
##& && && && & Card.3[＃Card.3]
## Solution 1 2.286739[＃解决方案1 2.286739]
## Solution 2 2.595165[＃解决方案2 2.595165]
## Solution 3 2.585149[＃解决方案3 2.585149]
## Solution 4 2.669059[解决方案4 2.669059]
## Solution 5 2.690954[＃解决方案5 2.690954]
## Warning message:[＃警告消息：]
## The covariance/total matrix supplied was slightly asymmetric: [＃协方差/总矩阵提供的是稍不对称：]
## symmetric entries differed by up to 6.93e-14.[＃对称条目相差6.93e-14。]
## (less than the 'tolsym' parameter).[＃（小于参数tolsym）。]
## It has been replaced by its symmetric part.[＃它已取代了它的对称部分。]
## in: validmat(mat, p, tolval, tolsym)[＃：validmat（垫，P，tolval，tolsym）]
转载请注明:出自生物统计家园网(http://www.biostatistic.net)。
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。
Powered by用R语言进行分位数回归_百度文库
两大类热门资源免费畅读
续费一年阅读会员，立省24元！
用R语言进行分位数回归
上传于||文档简介
&&应用计算机语言进行分位数回归分析计算,可用于论文写作和其他科学研究工作
阅读已结束，如果下载本文需要使用1下载券
想免费下载本文？
定制HR最喜欢的简历
下载文档到电脑，查找使用更方便
还剩23页未读，继续阅读
定制HR最喜欢的简历
你可能喜欢【数据分析 R语言实战】学习笔记第七章假设检验及R实现
分类：其他|标签：其他|日期：
假设检验及R实现
7.1假设检验概述
对总体参数的具体数值所作的陈述，称为假设;再利用样本信息判断假设足否成立，这整个过程称为假设检验。
7.1.1理论依据
假设检验之所以可行，其理沦背景是小概率理论。小概率事件在一次试验中儿乎是不可能发生的，但是它一以发生，我们就有理由拒绝原假设:反之，小概率事件没有发生，则认为原假设是合理的。这个小概率的标准由研究者事先确定，即以所谓的显著性水平α(0&α&1)作为小概率的界限，α的取值与实际问题的性质相关，通常我们取α=0.1, 0.05或0.01，假设检验也称为显著性检验。
7 .1.2检验步骤
(1)&&& 提出假设
(2)&&& 确定检验统计量，计算统计量的值
(3)&&& 规定显著性水平，建立检验规则
(4)&&& 作出统计决策
临界值规则:
双侧检验:|统计量|&临界值时，拒绝H0
左侧检验:统计量&=临界值时，拒绝H0
右侧检验:统计量&临界值时，拒绝H0
在一个假设检验问题中，拒绝原假设H0,的最小显著性水平称为检验的p值。p值可以告诉我们，如果原假设是正确的话，我们得到目前这个样本统计值的可能性有多人，如果这个可能性很小，就应该拒绝原假设。也就是说，P值越小，拒绝H0的可能性越大。在显著性水平α下,P值规则为:如果P≤α，则拒绝H0;如果P&a，则不拒绝原假设。
7.1.3两类错误
7.2单正态总体的检验
单正态总体的假设检验方法:
7.2.1均值μ的检验
(1) σ2已知
R自带的函数中只提供了t检验的函数t.test()，而没有Z检验的函数，自己编写函数z.test()，用于计算z统计量的值以及P值:
& z.test=function(x,mu,sigma,alternative=&two.sided&){
n=length(x)
result=list()
#构造一个空的list，用于存放输出结果
mean=mean(x)
z=(mean-mu)/(sigma/sqrt(n))
#计算z统计量的值
options(digits=4)
#结果显示至小数点后4位
result$mean=result$z=z
#将均值、z值存入结果
result$P=2*pnorm(abs(z),lower.tail=FALSE)
#根据z计算P值
#若是单侧检验，重新计算P值
if(alternative==&greater&) result$P=pnorm(z,lower.tail=FALSE)
else if(alternative==&less&) result$P=pnorm(z)
BSDA包提供了函数z.test( )，它可以对基于正态分布的单样本和双样本进行假设检验，其使用方法如下:
z.test(x,y=NULL,alternative=&two.sided&,mu=0,sigma.x=NULL,
sigma.y=NULL, onf.level = 0.95)
其中，x和Y为数值向量，默认y=NULL，即进行单样本的假设检验:alternative用于指定求置信区问的类型，默认为two.sided&表示求双尾的置信区间，为less则求置信上限，greater求置信F限:mu表示均值，仅在假设检验中起作用，默认为0;sigma.x和sigma.y分别指定两个样本总体的标准差。
东方财富数据中心可以获得2012年各月北京市的新建住宅价格指数，是否服从均值为102.4、方差为0.45(标准差为0.67)的正态分布
& bj=c(102.5,102.4,102.0,101.8,101.8,102.1,102.3,102.5,102.6,102.8,103.4,104.2)
& z.test(x=bj,mu=102.4,sigma=0.67,alternative=&two.sided&)
[1] 0.6894
[1] 0.4906
使用程序包BSDA中的函数z.test()
& library(BSDA)
& z.test(x=bj,mu=102.4,sigma=0.67,alternative=&two.sided&)
[1] 0.6894
[1] 0.4906
检验的结果是，由于P =0.4906& a =0.05，因此在0.05的显署性水平下，不能拒绝原假设，认为2012年各月北京的新建住宅价格指数服从均值为102.4的正态分布。
(2)σ2未知
直接调用t检验函数t.test()即可:
t.test(x, y = NULL,alternative = c(&two.sided&, &less&, &greater&),mu = 0, paired = FALSE, var.equal = FALSE,conf.level = 0.95, ...)
其中，x为样本数据，若仅出现x，则进行单样本t检验:若x和Y同时输入，则做双样本t检验;alternative用于指定所求置信区间的类型，默认为two.sided，表示求双尾的置信区问，若为less则求置信上限，greater求置信下限:mu表示均值，表示原假设中事先判断的均值，默认值为0 ;& paired是逻辑值，表示是否进行配对样本t检验，默认为不配对;var.equal也是逻辑值，表示双样本检验时两个总体的方差是否相等;另外，这个函数还可以直接计算置信IX问，conf.level用来表示区间的置信水平。
& t.test(x=bj,mu=102.4,alternative=&less&)
One Sample t-test
t = 0.67, df = 11, p-value = 0.7
alternative hypothesis: true mean is less than 102.4
95 percent confidence interval:
-Inf 102.9
sample estimates:
7.2.2方差σ2的检验
(1) μ已知
(2) μ未知
R中没有直接的函数可以做样本方差的卡方检验(只有检验卡方分布的函数)，所以我们把上述两种情形写在同一个函数chisq.var.test()中，调用它就可以直接做各种情形的单样本方差检验。应用到2012年北京市新建住宅价格指数的案例中，如果样本方差保持在一定范围内，则说明房价比较稳定，l}}此我们在0.05的显著性水平下检验总体方差是否不超过0.25。
& chisq.var.test=function(x,var,mu=Inf,alternative=&two.sided&){
n=length(x)
#均值未知时的自由度
#均值未知时的方差估计值
#总体均值已知的情况
if(mu&Inf){df=n;v=sum((x-mu)^2)/n}
chi2=df*v/var
#卡方统计量
options(digits=4)
result=list()
#产生存放结果的列表
result$df=result$var=v;result$chi2=chi2;
result$P=2*min(pchisq(chi2,df),pchisq(chi2,df,lower.tail=F))
#若是单侧检验，重新计算P值
if(alternative==&greater&) result$P=pchisq(chi2,df,lower.tail=F)
else if(alternative==&less&) result$P=pchisq(chi2,df)
& chisq.var.test(bj,0.25,alternative=&less&)
[1] 0.4752
[1] 0.9656
检验的结果为P值非常大，远大于a=0.05 ,因此不能拒绝原假设，说明新建住宅价格指数的方差大于0.25，变动很大。
7.3两正态总体的检验
单正态总体的假设检验方法:
7.3.1均值差的检验
(1)两个总体的方差已知
编写均值差的正态检验函数z.test2()
& z.test2=function(x,y,sigma1,sigma2,alternative=&two.sided&){
n1=length(x);n2=length(y)
result=list()
#构造一个空的list，用于存放输出结果
mean=mean(x)-mean(y)
z=mean/sqrt(sigma1^2/n1+sigma2^2/n2)
#计算z统计量的值
options(digits=4)
#结果显示至小数点后4位
result$mean=result$z=z
#将均值、z值存入结果
result$P=2*pnorm(abs(z),lower.tail=FALSE)
#根据z计算P值
#若是单侧检验，重新计算P值
if(alternative==&greater&) result$P=pnorm(z,lower.tail=FALSE)
else if(alternative==&less&) result$P=pnorm(z)
程序包BDSA中的函数z.test（）可以快速地实现方差己知时两总体均值差的假设检验。
以Bamberger's百货公司的数据为例，公司实施延长营业时间的改革计划，假设已知改革前后销售额的总体标准差分别为8和12，检验这项措施对销售业绩是否有显著影响。
& sales=read.table(&D:/Program Files/RStudio/sales.txt&,header=T)
& attach(sales)
& z.test2(prior,post,8,12,alternative=&less&)
[1] -24.54
[1] -8.843
[1] 4.678e-19
使用函数z.test()可以得到相同的结果，同时还可以输出置信区间估计。
& z.test(prior,post,sigma.x=8,sigma.y=12,alternative=&less&)
Two-sample z-Test
prior and post
z = -8.8, p-value &2e-16
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
sample estimates:
mean of x mean of y
表明延长营业时间后销售额更高
(2)两个总体的方差未知但相等
(3)两个总体的方差未知且不等
& t.test(prior,post,var.equal=FALSE,alternative=&less&)
Welch Two Sample t-test
prior and post
t = -8.4, df = 44, p-value = 6e-11
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -19.62
sample estimates:
mean of x mean of y
7.3.2成对数据的t检验
& x=c(117,127,141,107,110,114,115,138,127,122)
& y=c(113,108,120,107,104,98,102,132,120,114)
& t.test(x,y,paired=TRUE,alternative=&greater&)
Paired t-test
t = 4.6, df = 9, p-value = 7e-04
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
sample estimates:
mean of the differences
p远小于a=0.05 ,拒绝原假设，说明药物组均值明显降低，该药物有降压作用。
7.3.3两总体方差的检验
R中的函数var.rest()做方差比较的F检验以及相应的区问估计
& var.test(prior,post)
F test to compare two variances
prior and post
F = 0.39, num df = 26, denom df = 26, p-value = 0.02
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
sample estimates:
ratio of variances
检验结果的P =0.01914&a=0.05，故拒绝原假设，说明延长营业时间前后销售额的方差不相同。
7.4比率的检验
7.4.1比率的二项分布检验
在R中使用函数binom.test()完成:
binom.test(x,n,p=0.5,alternative=c(&two.sided&,&less&,&greater&),conf.level = 0.95)
2000户家庭中人均不足5平米的困难户有214个，政府希望将总体中困难户的比率控制在10%左右，判断这一目标是否达到。
& binom.test(214,2000,p=0.1)
Exact binomial test
214 and 2000
number of successes = 210, number of trials = 2000,
p-value = 0.3
alternative hypothesis: true probability of success is not equal to 0.1
95 percent confidence interval:
sample estimates:
probability of success
由于p=0.2966&a=0.05，故不能拒绝原假设，说明总体居民的困难户比率保持在10%左右。检验结果还给出了置信区问和样本比率估计值0.107
7.4.2比率的近似检验
大样本，可以使用正态检验方法代替二项分布：
& prop.test(214,2000,p=0.1)
1-sample proportions test with continuity correction
214 out of 2000, null probability 0.1
X-squared = 1, df = 1, p-value = 0.3
alternative hypothesis: true p is not equal to 0.1
95 percent confidence interval:
sample estimates:
7.5非参数的检验
7.5.1总体分布的c2检验
(1)理论分布已知
R软件中提供了实现Pearson拟合优度卡方检验的函数chisq.test()，其调用格式为
chisq.test(x, y = NULL, correct = TRUE,p = rep(1/length(x), length(x)), rescale.p = FALSE,simulate.p.value = FALSE, B = 2000)
& bj=c(102.5,102.4,102.0,101.8,101.8,102.1,102.3,102.5,102.6,102.8,103.4,104.2)
& hist(bj)
函数cut()用于将变量的区域分成若干区间，其调用格式为
cut(x, breaks, labels = NULL,
&&& include.lowest = FALSE, right = TRUE, dig.lab = 3,
&&& ordered_result = FALSE, ...)
函数table()可以计算因子合并后的个数，以列联表的形式展示出每个区间的数据频数。
table(..., exclude = if (useNA == &no&) c(NA, NaN), useNA = c(&no&,
&ifany&, &always&), dnn = list.names(...), deparse.level = 1)
& A=table(cut(bj,breaks=c(101.4,101.9,102.4,102.9,104.5)))
#两个函数嵌套使用
(101.4,101.9] (101.9,102.4] (102.4,102.9] (102.9,104.5]
& br=c(101.5,102,102.5,103,104.5)
& p=pnorm(br,mean(bj),sd(bj))
#注意pnorm()计算出的是分布函数
& p=c(p[1],p[2]-p[1],p[3]-p[2],1-p[3])
& options(digits=2)
[1] 0.067 0.153 0.261 0.519
& chisq.test(A,p=p)
Chi-squared test for given probabilities
X-squared = 7, df = 3, p-value = 0.06
总体分布的卡方检验结果为P=0.05849& a =0.05 ,因此在0.05的显著性水平下，不能够拒绝原假设，可以认为北京市新建住宅价格指数服从正态分布。
7.5.2Kolmogrov-Smirnov检验
(1)单样本KS检验
Kolmogorov-Smirnov检验是用来检验一个数据的观测经验分布是否是已知的理论分布，当两者之间的差距很小时可以认为该样本取自己知的理论分布。KS检验通过经验分布与假设分布的上确界来构造统计量，因此它可以检验任何分布类型：
ks.test(x, y, ...,
&&&&&&& alternative = c(&two.sided&, &less&, &greater&),
&&&&&&& exact = NULL)
对一台设备进行寿命检验，一记录10次无故障工作时间，检验其是否服从参数为1/1500的指数分布
& X=c(420,500,920,50,00,2350)
& ks.test(X,&pexp&,1/1500)
#pxep为指数分布累积分布函数的名称，1/1500为指数分布参数
One-sample Kolmogorov-Smirnov test
D = 0.3, p-value = 0.3
alternative hypothesis: two-sided
单样本KS检验的结果为P值=0.2654,其大于显著性水平0.05，因此不能拒绝原假设，说明该设备的寿命服从λ=1/1500的指数分布。
(2)两样本KS检验
假设有分别来自两个独立总体的两样本，要想检验它们背后的总体分布是否相同，就可以进行两独立样本的KS检验。原理与单样本相同，只需要把原假设中的分布换成另一个样本的经验分布即可。
有分别从两个总体抽取的25个和20个观测值的随机样本，判断它们是否来自同一分布。
& xx=c(0.61,0.29,0.06,0.59,-1.73,-0.74,0.51,-0.56,0.39,1.64,0.05,-0.06,0.64,-0.82,0.37,1.77,1.09,-1.28,2.36,1.31,1.05,-0.32,-0.40,1.06,-2.47)
& yy=c(2.20,1.66,1.38,0.20,0.36,0.00,0.96,1.56,0.44,1.50,-0.30,0.66,2.31,3.29,-0.27,-0.37,0.38,0.70,0.52,-0.71)
& ks.test(xx,yy)
Two-sample Kolmogorov-Smirnov test
D = 0.2, p-value = 0.5
alternative hypothesis: two-sided
(3) KS检验与卡方检验的比较
KS检验与卡方检验的相同之处在一于它们都是采用实际频数和期望频数之差进行检验。但不同点在于，卡方检验必须先将数据分组才能获得实际的观测频数，而KS检验法可以直接对原始数据的n个观测值进行检验，所以它对数据的利用更完整。另外在使用范围上，卡方检验主要用于分类数据，而KS检验主要用于有计量单位的连续和定量数据。
KS检验作为一种非参数方法，具有稳健性。它不依赖于均值的位置，对数据量纲不敏感，一般来讲比卡方检验更有效。与其他参数检验不同，KS检验的适用范围非常广，不像t检验一样局限于正态分布(当数据偏离较大时t检验会失效)。
文章署名：
文章地址：
优质IT资料分享网，由广大资源爱好者通过共享互助而共享各种学习资料，但本站无法保证所共享，资料的完整性和合理性
如果有资料对您的合法权益造成侵害，请立即联系我们处理}

我爱游戏网