<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>4.2. Clustering — scikits.learn v0.6.0 documentation</title> <link rel="stylesheet" href="../_static/nature.css" type="text/css" /> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '../', VERSION: '0.6.0', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <link rel="shortcut icon" href="../_static/favicon.ico"/> <link rel="author" title="About these documents" href="../about.html" /> <link rel="top" title="scikits.learn v0.6.0 documentation" href="../index.html" /> <link rel="up" title="4. Unsupervised learning" href="../unsupervised_learning.html" /> <link rel="next" title="4.3. Decomposing signals in components (matrix factorization problems)" href="decompositions.html" /> <link rel="prev" title="4.1. Gaussian mixture models" href="mixture.html" /> </head> <body> <div class="header-wrapper"> <div class="header"> <p class="logo"><a href="../index.html"> <img src="../_static/scikit-learn-logo-small.png" alt="Logo"/> </a> </p><div class="navbar"> <ul> <li><a href="../install.html">Download</a></li> <li><a href="../support.html">Support</a></li> <li><a href="../user_guide.html">User Guide</a></li> <li><a href="../auto_examples/index.html">Examples</a></li> <li><a href="../developers/index.html">Development</a></li> </ul> <div class="search_form"> <div id="cse" style="width: 100%;"></div> <script src="http://www.google.com/jsapi" type="text/javascript"></script> <script type="text/javascript"> google.load('search', '1', {language : 'en'}); google.setOnLoadCallback(function() { var customSearchControl = new google.search.CustomSearchControl('016639176250731907682:tjtqbvtvij0'); customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET); var options = new google.search.DrawOptions(); options.setAutoComplete(true); customSearchControl.draw('cse', options); }, true); </script> </div> </div> <!-- end navbar --></div> </div> <div class="content-wrapper"> <!-- <div id="blue_tile"></div> --> <div class="sphinxsidebar"> <div class="rel"> <a href="mixture.html" title="4.1. Gaussian mixture models" accesskey="P">previous</a> | <a href="decompositions.html" title="4.3. Decomposing signals in components (matrix factorization problems)" accesskey="N">next</a> | <a href="../genindex.html" title="General Index" accesskey="I">index</a> </div> <h3>Contents</h3> <ul> <li><a class="reference internal" href="#">4.2. Clustering</a><ul> <li><a class="reference internal" href="#affinity-propagation">4.2.1. Affinity propagation</a></li> <li><a class="reference internal" href="#mean-shift">4.2.2. Mean Shift</a></li> <li><a class="reference internal" href="#k-means">4.2.3. K-means</a></li> <li><a class="reference internal" href="#spectral-clustering">4.2.4. Spectral clustering</a></li> </ul> </li> </ul> </div> <div class="content"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="clustering"> <span id="id1"></span><h1>4.2. Clustering<a class="headerlink" href="#clustering" title="Permalink to this headline">¶</a></h1> <p><a class="reference external" href="http://en.wikipedia.org/wiki/Cluster_analysis">Clustering</a> of unlabeled data can be performed with the module <cite>scikits.learn.cluster</cite>.</p> <p>Each clustering algorithm comes in two variants: a class, that implements the <cite>fit</cite> method to learn the clusters on train data, and a function, that, given train data, returns an array of integer labels corresponding to the different clusters. For the class, the labels over the training data can be found in the <cite>labels_</cite> attribute.</p> <p>Here, we only explain the different algorithms. For usage examples, click on the class name to read the reference documentation.</p> <div class="section" id="affinity-propagation"> <h2>4.2.1. Affinity propagation<a class="headerlink" href="#affinity-propagation" title="Permalink to this headline">¶</a></h2> <p><tt class="xref py py-class docutils literal"><span class="pre">AffinityPropagation</span></tt> clusters data by diffusion in the similarity matrix. This algorithm automatically sets its numbers of cluster. It will have difficulties scaling to thousands of samples.</p> <div class="figure align-center"> <a class="reference external image-reference" href="../auto_examples/cluster/plot_affinity_propagation.html"><img alt="auto_examples/cluster/images/plot_affinity_propagation.png" src="auto_examples/cluster/images/plot_affinity_propagation.png" /></a> </div> <div class="topic"> <p class="topic-title first">Examples:</p> <ul class="simple"> <li><a class="reference internal" href="../auto_examples/cluster/plot_affinity_propagation.html#example-cluster-plot-affinity-propagation-py"><em>Demo of affinity propagation clustering algorithm</em></a>: Affinity Propagation on a synthetic 2D datasets with 3 classes.</li> <li><a class="reference internal" href="../auto_examples/applications/stock_market.html#example-applications-stock-market-py"><em>Finding structure in the stock market</em></a> Affinity Propagation on Financial time series to find groups of companies</li> </ul> </div> </div> <div class="section" id="mean-shift"> <h2>4.2.2. Mean Shift<a class="headerlink" href="#mean-shift" title="Permalink to this headline">¶</a></h2> <p><tt class="xref py py-class docutils literal"><span class="pre">MeanShift</span></tt> clusters data by estimating <em>blobs</em> in a smooth density of points matrix. This algorithm automatically sets its numbers of cluster. It will have difficulties scaling to thousands of samples.</p> <div class="figure align-center"> <a class="reference external image-reference" href="../auto_examples/cluster/plot_mean_shift.html"><img alt="auto_examples/cluster/images/plot_mean_shift.png" src="auto_examples/cluster/images/plot_mean_shift.png" /></a> </div> <div class="topic"> <p class="topic-title first">Examples:</p> <ul class="simple"> <li><a class="reference internal" href="../auto_examples/cluster/plot_mean_shift.html#example-cluster-plot-mean-shift-py"><em>A demo of the mean-shift clustering algorithm</em></a>: Mean Shift clustering on a synthetic 2D datasets with 3 classes.</li> </ul> </div> </div> <div class="section" id="k-means"> <h2>4.2.3. K-means<a class="headerlink" href="#k-means" title="Permalink to this headline">¶</a></h2> <p>The <tt class="xref py py-class docutils literal"><span class="pre">KMeans</span></tt> algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the ‘inertia’ of the groups. This algorithm requires the number of cluster to be specified. It scales well to large number of samples, however its results may be dependent on an initialisation.</p> </div> <div class="section" id="spectral-clustering"> <h2>4.2.4. Spectral clustering<a class="headerlink" href="#spectral-clustering" title="Permalink to this headline">¶</a></h2> <p><tt class="xref py py-class docutils literal"><span class="pre">SpectralClustering</span></tt> does an low-dimension embedding of the affinity matrix between samples, followed by a KMeans in the low dimensional space. It is especially efficient if the affinity matrix is sparse and the <a class="reference external" href="http://code.google.com/p/pyamg/">pyamg</a> module is installed. SpectralClustering requires the number of clusters to be specified. It works well for a small number of clusters but is not advised when using many clusters.</p> <p>For two clusters, it solves a convex relaxation of the <a class="reference external" href="http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf">normalised cuts</a> problem on the similarity graph: cutting the graph in two so that the weight of the edges cut is small compared to the weights in of edges inside each cluster. This criteria is especially interesting when working on images: graph vertices are pixels, and edges of the similarity graph are a function of the gradient of the image.</p> <div class="figure align-center"> <a class="reference external image-reference" href="../auto_examples/cluster/plot_segmentation_toy.html"><img alt="auto_examples/cluster/images/plot_segmentation_toy.png" src="auto_examples/cluster/images/plot_segmentation_toy.png" /></a> </div> <div class="topic"> <p class="topic-title first">Examples:</p> <ul class="simple"> <li><a class="reference internal" href="../auto_examples/cluster/plot_lena_segmentation.html#example-cluster-plot-lena-segmentation-py"><em>Segmenting the picture of Lena in regions</em></a>: Spectral clustering to split the image of lena in regions.</li> <li><a class="reference internal" href="../auto_examples/cluster/plot_segmentation_toy.html#example-cluster-plot-segmentation-toy-py"><em>Spectral clustering for image segmentation</em></a>: Segmenting objects from a noisy background using spectral clustering.</li> </ul> </div> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> </div> <div class="footer"> <p style="text-align: center">This documentation is relative to scikits.learn version 0.6.0<p> © 2010, scikits.learn developers (BSD Lincense). Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.5. Design by <a href="http://webylimonada.com">Web y Limonada</a>. </div> </body> </html>