<html lang="en"> <head> <title>Descriptive Statistics - Untitled</title> <meta http-equiv="Content-Type" content="text/html"> <meta name="description" content="Untitled"> <meta name="generator" content="makeinfo 4.13"> <link title="Top" rel="start" href="index.html#Top"> <link rel="up" href="Statistics.html#Statistics" title="Statistics"> <link rel="next" href="Basic-Statistical-Functions.html#Basic-Statistical-Functions" title="Basic Statistical Functions"> <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> <meta http-equiv="Content-Style-Type" content="text/css"> <style type="text/css"><!-- pre.display { font-family:inherit } pre.format { font-family:inherit } pre.smalldisplay { font-family:inherit; font-size:smaller } pre.smallformat { font-family:inherit; font-size:smaller } pre.smallexample { font-size:smaller } pre.smalllisp { font-size:smaller } span.sc { font-variant:small-caps } span.roman { font-family:serif; font-weight:normal; } span.sansserif { font-family:sans-serif; font-weight:normal; } --></style> </head> <body> <div class="node"> <a name="Descriptive-Statistics"></a> <p> Next: <a rel="next" accesskey="n" href="Basic-Statistical-Functions.html#Basic-Statistical-Functions">Basic Statistical Functions</a>, Up: <a rel="up" accesskey="u" href="Statistics.html#Statistics">Statistics</a> <hr> </div> <h3 class="section">25.1 Descriptive Statistics</h3> <p>Octave can compute various statistics such as the moments of a data set. <!-- ./statistics/base/mean.m --> <p><a name="doc_002dmean"></a> <div class="defun"> — Function File: <b>mean</b> (<var>x, dim, opt</var>)<var><a name="index-mean-1819"></a></var><br> <blockquote><p>If <var>x</var> is a vector, compute the mean of the elements of <var>x</var> <pre class="example"> mean (x) = SUM_i x(i) / N </pre> <p>If <var>x</var> is a matrix, compute the mean for each column and return them in a row vector. <p>With the optional argument <var>opt</var>, the kind of mean computed can be selected. The following options are recognized: <dl> <dt><code>"a"</code><dd>Compute the (ordinary) arithmetic mean. This is the default. <br><dt><code>"g"</code><dd>Compute the geometric mean. <br><dt><code>"h"</code><dd>Compute the harmonic mean. </dl> <p>If the optional argument <var>dim</var> is supplied, work along dimension <var>dim</var>. <p>Both <var>dim</var> and <var>opt</var> are optional. If both are supplied, either may appear first. </p></blockquote></div> <!-- ./statistics/base/median.m --> <p><a name="doc_002dmedian"></a> <div class="defun"> — Function File: <b>median</b> (<var>x, dim</var>)<var><a name="index-median-1820"></a></var><br> <blockquote><p>If <var>x</var> is a vector, compute the median value of the elements of <var>x</var>. If the elements of <var>x</var> are sorted, the median is defined as <pre class="example"> x(ceil(N/2)), N odd median(x) = (x(N/2) + x((N/2)+1))/2, N even </pre> <p>If <var>x</var> is a matrix, compute the median value for each column and return them in a row vector. If the optional <var>dim</var> argument is given, operate along this dimension. <!-- Texinfo @sp should work but in practice produces ugly results for HTML. --> <!-- A simple blank line produces the correct behavior. --> <!-- @sp 1 --> <p class="noindent"><strong>See also:</strong> <a href="doc_002dstd.html#doc_002dstd">std</a>, <a href="doc_002dmean.html#doc_002dmean">mean</a>. </p></blockquote></div> <!-- ./statistics/base/quantile.m --> <p><a name="doc_002dquantile"></a> <div class="defun"> — Function File: <var>q</var> = <b>quantile</b> (<var>x, p</var>)<var><a name="index-quantile-1821"></a></var><br> — Function File: <var>q</var> = <b>quantile</b> (<var>x, p, dim</var>)<var><a name="index-quantile-1822"></a></var><br> — Function File: <var>q</var> = <b>quantile</b> (<var>x, p, dim, method</var>)<var><a name="index-quantile-1823"></a></var><br> <blockquote><p>For a sample, <var>x</var>, calculate the quantiles, <var>q</var>, corresponding to the cumulative probability values in <var>p</var>. All non-numeric values (NaNs) of <var>x</var> are ignored. <p>If <var>x</var> is a matrix, compute the quantiles for each column and return them in a matrix, such that the i-th row of <var>q</var> contains the <var>p</var>(i)th quantiles of each column of <var>x</var>. <p>The optional argument <var>dim</var> determines the dimension along which the percentiles are calculated. If <var>dim</var> is omitted, and <var>x</var> is a vector or matrix, it defaults to 1 (column wise quantiles). In the instance that <var>x</var> is a N-d array, <var>dim</var> defaults to the first dimension whose size greater than unity. <p>The methods available to calculate sample quantiles are the nine methods used by R (http://www.r-project.org/). The default value is METHOD = 5. <p>Discontinuous sample quantile methods 1, 2, and 3 <ol type=1 start=1> <li>Method 1: Inverse of empirical distribution function. <li>Method 2: Similar to method 1 but with averaging at discontinuities. <li>Method 3: SAS definition: nearest even order statistic. </ol> <p>Continuous sample quantile methods 4 through 9, where p(k) is the linear interpolation function respecting each methods' representative cdf. <ol type=1 start=4> <li>Method 4: p(k) = k / n. That is, linear interpolation of the empirical cdf. <li>Method 5: p(k) = (k - 0.5) / n. That is a piecewise linear function where the knots are the values midway through the steps of the empirical cdf. <li>Method 6: p(k) = k / (n + 1). <li>Method 7: p(k) = (k - 1) / (n - 1). <li>Method 8: p(k) = (k - 1/3) / (n + 1/3). The resulting quantile estimates are approximately median-unbiased regardless of the distribution of <var>x</var>. <li>Method 9: p(k) = (k - 3/8) / (n + 1/4). The resulting quantile estimates are approximately unbiased for the expected order statistics if <var>x</var> is normally distributed. </ol> <p>Hyndman and Fan (1996) recommend method 8. Maxima, S, and R (versions prior to 2.0.0) use 7 as their default. Minitab and SPSS use method 6. <span class="sc">matlab</span> uses method 5. <p>References: <ul> <li>Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. <li>Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician, 50, 361–365. <li>R: A Language and Environment for Statistical Computing; <a href="http://cran.r-project.org/doc/manuals/fullrefman.pdf">http://cran.r-project.org/doc/manuals/fullrefman.pdf</a>. </ul> </p></blockquote></div> <!-- ./statistics/base/prctile.m --> <p><a name="doc_002dprctile"></a> <div class="defun"> — Function File: <var>y</var> = <b>prctile</b> (<var>x, p</var>)<var><a name="index-prctile-1824"></a></var><br> — Function File: <var>q</var> = <b>prctile</b> (<var>x, p, dim</var>)<var><a name="index-prctile-1825"></a></var><br> <blockquote><p>For a sample <var>x</var>, compute the quantiles, <var>y</var>, corresponding to the cumulative probability values, P, in percent. All non-numeric values (NaNs) of X are ignored. <p>If <var>x</var> is a matrix, compute the percentiles for each column and return them in a matrix, such that the i-th row of <var>y</var> contains the <var>p</var>(i)th percentiles of each column of <var>x</var>. <p>The optional argument <var>dim</var> determines the dimension along which the percentiles are calculated. If <var>dim</var> is omitted, and <var>x</var> is a vector or matrix, it defaults to 1 (column wise quantiles). In the instance that <var>x</var> is a N-d array, <var>dim</var> defaults to the first dimension whose size greater than unity. </blockquote></div> <!-- ./statistics/base/meansq.m --> <p><a name="doc_002dmeansq"></a> <div class="defun"> — Function File: <b>meansq</b> (<var>x</var>)<var><a name="index-meansq-1826"></a></var><br> — Function File: <b>meansq</b> (<var>x, dim</var>)<var><a name="index-meansq-1827"></a></var><br> <blockquote><p>For vector arguments, return the mean square of the values. For matrix arguments, return a row vector containing the mean square of each column. With the optional <var>dim</var> argument, returns the mean squared of the values along this dimension. </p></blockquote></div> <!-- ./statistics/base/std.m --> <p><a name="doc_002dstd"></a> <div class="defun"> — Function File: <b>std</b> (<var>x</var>)<var><a name="index-std-1828"></a></var><br> — Function File: <b>std</b> (<var>x, opt</var>)<var><a name="index-std-1829"></a></var><br> — Function File: <b>std</b> (<var>x, opt, dim</var>)<var><a name="index-std-1830"></a></var><br> <blockquote><p>If <var>x</var> is a vector, compute the standard deviation of the elements of <var>x</var>. <pre class="example"> std (x) = sqrt (sumsq (x - mean (x)) / (n - 1)) </pre> <p>If <var>x</var> is a matrix, compute the standard deviation for each column and return them in a row vector. <p>The argument <var>opt</var> determines the type of normalization to use. Valid values are <dl> <dt>0:<dd> normalizes with N-1, provides the square root of best unbiased estimator of the variance [default] <br><dt>1:<dd> normalizes with N, this provides the square root of the second moment around the mean </dl> <p>The third argument <var>dim</var> determines the dimension along which the standard deviation is calculated. <!-- Texinfo @sp should work but in practice produces ugly results for HTML. --> <!-- A simple blank line produces the correct behavior. --> <!-- @sp 1 --> <p class="noindent"><strong>See also:</strong> <a href="doc_002dmean.html#doc_002dmean">mean</a>, <a href="doc_002dmedian.html#doc_002dmedian">median</a>. </p></blockquote></div> <!-- ./statistics/base/var.m --> <p><a name="doc_002dvar"></a> <div class="defun"> — Function File: <b>var</b> (<var>x</var>)<var><a name="index-var-1831"></a></var><br> <blockquote><p>For vector arguments, return the (real) variance of the values. For matrix arguments, return a row vector containing the variance for each column. <p>The argument <var>opt</var> determines the type of normalization to use. Valid values are <dl> <dt>0:<dd>Normalizes with N-1, provides the best unbiased estimator of the variance [default]. <br><dt>1:<dd>Normalizes with N, this provides the second moment around the mean. </dl> <p>The third argument <var>dim</var> determines the dimension along which the variance is calculated. </p></blockquote></div> <!-- ./statistics/base/mode.m --> <p><a name="doc_002dmode"></a> <div class="defun"> — Function File: [<var>m</var>, <var>f</var>, <var>c</var>] = <b>mode</b> (<var>x, dim</var>)<var><a name="index-mode-1832"></a></var><br> <blockquote><p>Count the most frequently appearing value. <code>mode</code> counts the frequency along the first non-singleton dimension and if two or more values have the same frequency returns the smallest of the two in <var>m</var>. The dimension along which to count can be specified by the <var>dim</var> parameter. <p>The variable <var>f</var> counts the frequency of each of the most frequently occurring elements. The cell array <var>c</var> contains all of the elements with the maximum frequency . </p></blockquote></div> <!-- ./statistics/base/cov.m --> <p><a name="doc_002dcov"></a> <div class="defun"> — Function File: <b>cov</b> (<var>x, y</var>)<var><a name="index-cov-1833"></a></var><br> <blockquote><p>Compute covariance. <p>If each row of <var>x</var> and <var>y</var> is an observation and each column is a variable, the (<var>i</var>, <var>j</var>)-th entry of <code>cov (</code><var>x</var><code>, </code><var>y</var><code>)</code> is the covariance between the <var>i</var>-th variable in <var>x</var> and the <var>j</var>-th variable in <var>y</var>. If called with one argument, compute <code>cov (</code><var>x</var><code>, </code><var>x</var><code>)</code>. </p></blockquote></div> <!-- ./statistics/base/cor.m --> <p><a name="doc_002dcor"></a> <div class="defun"> — Function File: <b>cor</b> (<var>x, y</var>)<var><a name="index-cor-1834"></a></var><br> <blockquote><p>Compute correlation. <p>The (<var>i</var>, <var>j</var>)-th entry of <code>cor (</code><var>x</var><code>, </code><var>y</var><code>)</code> is the correlation between the <var>i</var>-th variable in <var>x</var> and the <var>j</var>-th variable in <var>y</var>. <pre class="example"> corrcoef(x,y) = cov(x,y)/(std(x)*std(y)) </pre> <p>For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors. <p><code>cor (</code><var>x</var><code>)</code> is equivalent to <code>cor (</code><var>x</var><code>, </code><var>x</var><code>)</code>. <p>Note that the <code>corrcoef</code> function does the same as <code>cor</code>. </p></blockquote></div> <!-- ./statistics/base/corrcoef.m --> <p><a name="doc_002dcorrcoef"></a> <div class="defun"> — Function File: <b>corrcoef</b> (<var>x, y</var>)<var><a name="index-corrcoef-1835"></a></var><br> <blockquote><p>Compute correlation. <p>If each row of <var>x</var> and <var>y</var> is an observation and each column is a variable, the (<var>i</var>, <var>j</var>)-th entry of <code>corrcoef (</code><var>x</var><code>, </code><var>y</var><code>)</code> is the correlation between the <var>i</var>-th variable in <var>x</var> and the <var>j</var>-th variable in <var>y</var>. <pre class="example"> corrcoef(x,y) = cov(x,y)/(std(x)*std(y)) </pre> <p>If called with one argument, compute <code>corrcoef (</code><var>x</var><code>, </code><var>x</var><code>)</code>. </p></blockquote></div> <!-- ./statistics/base/kurtosis.m --> <p><a name="doc_002dkurtosis"></a> <div class="defun"> — Function File: <b>kurtosis</b> (<var>x, dim</var>)<var><a name="index-kurtosis-1836"></a></var><br> <blockquote><p>If <var>x</var> is a vector of length N, return the kurtosis <pre class="example"> kurtosis (x) = N^(-1) std(x)^(-4) sum ((x - mean(x)).^4) - 3 </pre> <p class="noindent">of <var>x</var>. If <var>x</var> is a matrix, return the kurtosis over the first non-singleton dimension. The optional argument <var>dim</var> can be given to force the kurtosis to be given over that dimension. </p></blockquote></div> <!-- ./statistics/base/skewness.m --> <p><a name="doc_002dskewness"></a> <div class="defun"> — Function File: <b>skewness</b> (<var>x, dim</var>)<var><a name="index-skewness-1837"></a></var><br> <blockquote><p>If <var>x</var> is a vector of length n, return the skewness <pre class="example"> skewness (x) = N^(-1) std(x)^(-3) sum ((x - mean(x)).^3) </pre> <p class="noindent">of <var>x</var>. If <var>x</var> is a matrix, return the skewness along the first non-singleton dimension of the matrix. If the optional <var>dim</var> argument is given, operate along this dimension. </p></blockquote></div> <!-- ./statistics/base/statistics.m --> <p><a name="doc_002dstatistics"></a> <div class="defun"> — Function File: <b>statistics</b> (<var>x</var>)<var><a name="index-statistics-1838"></a></var><br> <blockquote><p>If <var>x</var> is a matrix, return a matrix with the minimum, first quartile, median, third quartile, maximum, mean, standard deviation, skewness and kurtosis of the columns of <var>x</var> as its columns. <p>If <var>x</var> is a vector, calculate the statistics along the non-singleton dimension. </p></blockquote></div> <!-- ./statistics/base/moment.m --> <p><a name="doc_002dmoment"></a> <div class="defun"> — Function File: <b>moment</b> (<var>x, p, opt, dim</var>)<var><a name="index-moment-1839"></a></var><br> <blockquote><p>If <var>x</var> is a vector, compute the <var>p</var>-th moment of <var>x</var>. <p>If <var>x</var> is a matrix, return the row vector containing the <var>p</var>-th moment of each column. <p>With the optional string opt, the kind of moment to be computed can be specified. If opt contains <code>"c"</code> or <code>"a"</code>, central and/or absolute moments are returned. For example, <pre class="example"> moment (x, 3, "ac") </pre> <p class="noindent">computes the third central absolute moment of <var>x</var>. <p>If the optional argument <var>dim</var> is supplied, work along dimension <var>dim</var>. </p></blockquote></div> </body></html>