<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Roget — NetworkX 1.8.1 documentation</title> <link rel="stylesheet" href="../../_static/networkx.css" type="text/css" /> <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '../../', VERSION: '1.8.1', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: false }; </script> <script type="text/javascript" src="../../_static/jquery.js"></script> <script type="text/javascript" src="../../_static/underscore.js"></script> <script type="text/javascript" src="../../_static/doctools.js"></script> <link rel="search" type="application/opensearchdescription+xml" title="Search within NetworkX 1.8.1 documentation" href="../../_static/opensearch.xml"/> <link rel="top" title="NetworkX 1.8.1 documentation" href="../../index.html" /> <link rel="up" title="Graph" href="index.html" /> <link rel="next" title="Unix Email" href="unix_email.html" /> <link rel="prev" title="Napoleon Russian Campaign" href="napoleon_russian_campaign.html" /> </head> <body> <div style="color: black;background-color: white; font-size: 3.2em; text-align: left; padding: 15px 10px 10px 15px"> NetworkX </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../../genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="../../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="unix_email.html" title="Unix Email" accesskey="N">next</a> |</li> <li class="right" > <a href="napoleon_russian_campaign.html" title="Napoleon Russian Campaign" accesskey="P">previous</a> |</li> <li><a href="http://networkx.github.com/">NetworkX Home </a> | </li> <li><a href="http://networkx.github.com/documentation.html">Documentation </a>| </li> <li><a href="http://networkx.github.com/download.html">Download </a> | </li> <li><a href="http://github.com/networkx">Developer (Github)</a></li> <li><a href="../index.html" >NetworkX Examples</a> »</li> <li><a href="index.html" accesskey="U">Graph</a> »</li> </ul> </div> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <h4>Previous topic</h4> <p class="topless"><a href="napoleon_russian_campaign.html" title="previous chapter">Napoleon Russian Campaign</a></p> <h4>Next topic</h4> <p class="topless"><a href="unix_email.html" title="next chapter">Unix Email</a></p> <div id="searchbox" style="display: none"> <h3>Quick search</h3> <form class="search" action="../../search.html" method="get"> <input type="text" name="q" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> <p class="searchtip" style="font-size: 90%"> Enter search terms or a module, class or function name. </p> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="roget"> <span id="graph-roget"></span><h1>Roget<a class="headerlink" href="#roget" title="Permalink to this headline">ΒΆ</a></h1> <p>[<a class="reference external" href="../../_static/examples/roget.py">source code</a>]</p> <div class="highlight-python"><div class="highlight"><pre><span class="c">#!/usr/bin/env python</span> <span class="sd">"""</span> <span class="sd">Build a directed graph of 1022 categories and</span> <span class="sd">5075 cross-references as defined in the 1879 version of Roget's Thesaurus</span> <span class="sd">contained in the datafile roget_dat.txt. This example is described in</span> <span class="sd">Section 1.2 in Knuth's book [1,2].</span> <span class="sd">Note that one of the 5075 cross references is a self loop yet</span> <span class="sd">it is included in the graph built here because</span> <span class="sd">the standard networkx DiGraph class allows self loops.</span> <span class="sd">(cf. 400pungency:400 401 403 405).</span> <span class="sd">References.</span> <span class="sd">----------</span> <span class="sd">[1] Donald E. Knuth,</span> <span class="sd"> "The Stanford GraphBase: A Platform for Combinatorial Computing",</span> <span class="sd"> ACM Press, New York, 1993.</span> <span class="sd">[2] http://www-cs-faculty.stanford.edu/~knuth/sgb.html</span> <span class="sd">"""</span> <span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span> <span class="n">__author__</span> <span class="o">=</span> <span class="s">"""Brendt Wohlberg</span><span class="se">\n</span><span class="s">Aric Hagberg (hagberg@lanl.gov)"""</span> <span class="n">__date__</span> <span class="o">=</span> <span class="s">"$Date: 2005-04-01 07:56:22 -0700 (Fri, 01 Apr 2005) $"</span> <span class="n">__credits__</span> <span class="o">=</span> <span class="s">""""""</span> <span class="n">__revision__</span> <span class="o">=</span> <span class="s">""</span> <span class="c"># Copyright (C) 2004 by</span> <span class="c"># Aric Hagberg <hagberg@lanl.gov></span> <span class="c"># Dan Schult <dschult@colgate.edu></span> <span class="c"># Pieter Swart <swart@lanl.gov></span> <span class="c"># All rights reserved.</span> <span class="c"># BSD license.</span> <span class="kn">from</span> <span class="nn">networkx</span> <span class="kn">import</span> <span class="o">*</span> <span class="kn">import</span> <span class="nn">re</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="k">def</span> <span class="nf">roget_graph</span><span class="p">():</span> <span class="sd">""" Return the thesaurus graph from the roget.dat example in</span> <span class="sd"> the Stanford Graph Base.</span> <span class="sd"> """</span> <span class="c"># open file roget_dat.txt.gz (or roget_dat.txt)</span> <span class="kn">import</span> <span class="nn">gzip</span> <span class="n">fh</span><span class="o">=</span><span class="n">gzip</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s">'roget_dat.txt.gz'</span><span class="p">,</span><span class="s">'r'</span><span class="p">)</span> <span class="n">G</span><span class="o">=</span><span class="n">DiGraph</span><span class="p">()</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">fh</span><span class="o">.</span><span class="n">readlines</span><span class="p">():</span> <span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">"*"</span><span class="p">):</span> <span class="c"># skip comments</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">" "</span><span class="p">):</span> <span class="c"># this is a continuation line, append</span> <span class="n">line</span><span class="o">=</span><span class="n">oldline</span><span class="o">+</span><span class="n">line</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s">"</span><span class="se">\\\n</span><span class="s">"</span><span class="p">):</span> <span class="c"># continuation line, buffer, goto next</span> <span class="n">oldline</span><span class="o">=</span><span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">(</span><span class="s">"</span><span class="se">\\\n</span><span class="s">"</span><span class="p">)</span> <span class="k">continue</span> <span class="p">(</span><span class="n">headname</span><span class="p">,</span><span class="n">tails</span><span class="p">)</span><span class="o">=</span><span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">":"</span><span class="p">)</span> <span class="c"># head</span> <span class="n">numfind</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s">"^\d+"</span><span class="p">)</span> <span class="c"># re to find the number of this word</span> <span class="n">head</span><span class="o">=</span><span class="n">numfind</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">headname</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="c"># get the number</span> <span class="n">G</span><span class="o">.</span><span class="n">add_node</span><span class="p">(</span><span class="n">head</span><span class="p">)</span> <span class="k">for</span> <span class="n">tail</span> <span class="ow">in</span> <span class="n">tails</span><span class="o">.</span><span class="n">split</span><span class="p">():</span> <span class="k">if</span> <span class="n">head</span><span class="o">==</span><span class="n">tail</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="s">"skipping self loop"</span><span class="p">,</span><span class="n">head</span><span class="p">,</span><span class="n">tail</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span> <span class="n">G</span><span class="o">.</span><span class="n">add_edge</span><span class="p">(</span><span class="n">head</span><span class="p">,</span><span class="n">tail</span><span class="p">)</span> <span class="k">return</span> <span class="n">G</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span> <span class="kn">from</span> <span class="nn">networkx</span> <span class="kn">import</span> <span class="o">*</span> <span class="n">G</span><span class="o">=</span><span class="n">roget_graph</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="s">"Loaded roget_dat.txt containing 1022 categories."</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"digraph has </span><span class="si">%d</span><span class="s"> nodes with </span><span class="si">%d</span><span class="s"> edges"</span>\ <span class="o">%</span><span class="p">(</span><span class="n">number_of_nodes</span><span class="p">(</span><span class="n">G</span><span class="p">),</span><span class="n">number_of_edges</span><span class="p">(</span><span class="n">G</span><span class="p">)))</span> <span class="n">UG</span><span class="o">=</span><span class="n">G</span><span class="o">.</span><span class="n">to_undirected</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">number_connected_components</span><span class="p">(</span><span class="n">UG</span><span class="p">),</span><span class="s">"connected components"</span><span class="p">)</span> </pre></div> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../../genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="../../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="unix_email.html" title="Unix Email" >next</a> |</li> <li class="right" > <a href="napoleon_russian_campaign.html" title="Napoleon Russian Campaign" >previous</a> |</li> <li><a href="http://networkx.github.com/">NetworkX Home </a> | </li> <li><a href="http://networkx.github.com/documentation.html">Documentation </a>| </li> <li><a href="http://networkx.github.com/download.html">Download </a> | </li> <li><a href="http://github.com/networkx">Developer (Github)</a></li> <li><a href="../index.html" >NetworkX Examples</a> »</li> <li><a href="index.html" >Graph</a> »</li> </ul> </div> <div class="footer"> © Copyright 2013, NetworkX Developers. Last updated on Oct 23, 2013. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3. </div> </body> </html>