Sophie

Sophie

distrib > Fedora > 18 > i386 > by-pkgid > 125a65453a9c15180d517fd989836236 > files > 155

python-imdb-4.9-1.fc18.i686.rpm

  IMDbPY'S NEW HTML PARSERS
  =========================

Since version 3.7, IMDbPY has moved its parsers for the HTML of
the IMDb's website from a set of subclasses of SGMLParser (they
were finite-states machines, being SGMLParser a SAX parser) to
a set of parsers based on the libxml2 library or on the BeautifulSoup
module (and so, using a DOM/XPath-based approach).
The idea and the implementation of these new parsers is mostly a
work of H. Turgut Uyar, and can bring to parsers that are shorter,
easier to write and maybe even faster.

The old set of parsers was removed since IMDbYP 4.0.


  LIBXML AND/OR BEAUTIFULSOUP
  ===========================

To use "lxml", you need the libxml2 library installed (and its
python-lxml binding).  If it's not present on your system, you'll
fall-back to BeautifulSoup - distributed alongside IMDbPY, and so
you don't need to install anything.
However, beware that being pure-Python, BeautifulSoup is much
slower than lxml, so install it, if you can.

If for some reason you can't get lxml and BeautifulSoup is too
slow for your needs, consider the use of the 'mobile' data
access system.


  GETTING LIBXML, LIBXSLT AND PYTHON-LXML
  =======================================

If you're in a Microsoft Windows environment, all you need is
python-lxml (it includes all the required libraries), which can
be downloaded from here:
  http://pypi.python.org/pypi/lxml/

Otherwise, if you're in a Unix environment, you can download libxml2
and libxslt from here (you need both, to install python-lxml):
  http://xmlsoft.org/downloads.html
  http://xmlsoft.org/XSLT/downloads.html

The python-lxml package can be found here:
  http://codespeak.net/lxml/index.html#download

Obviously you should first check if these libraries are already
packaged for your distribution/operating system.

IMDbPY was tested with libxml2 2.7.1, libxslt 1.1.24 and
python-lxml python-lxml 2.1.1.  Older versions can work, too; if
you have problems, submit a bug report specifying your versions.

You can also get the latest version of BeautifulSoup from here:
  http://www.crummy.com/software/BeautifulSoup/
but since it's distributed with IMDbPY, you don't need it (or
you have to override the '_bsoup.py' file in the imdb/parser/http
directory), and this is probably not a good idea, since newer versions
of BeautifulSoup behave in new and unexpected ways.


  USING THE OLD PARSERS
  =====================

The old set of parsers was removed since IMDbYP 4.0.


  FORCING LXML OR BEAUTIFULSOUP
  =============================

By default, IMdbPY uses python-lxml, if it's installed.
You can force the use of one given parser passing the 'useModule'
parameter.  Valid values are 'lxml' and 'BeautifulSoup'.  E.g.:
    from imdb import IMDb
    ia = IMDb('http', useModule='BeautifulSoup')
    ...

useModule can also be a list/tuple of strings, to specify the
preferred order.