Sophie

Sophie

distrib > Fedora > 18 > i386 > by-pkgid > 125a65453a9c15180d517fd989836236 > files > 142

python-imdb-4.9-1.fc18.i686.rpm

  IMDbPY FAQS
  ===========

Q1: Since version 3.7, parsing the data from the IMDb web site is slow,
    sloow, slooow!  Why?

A1: if python-lxml is not installed in your system, IMDbPY uses the
    pure-python BeautifulSoup module as a fall-back; BeautifulSoup does
    an impressive job, but it can't be as fast as a parser written in C.
    You can install python-lxml following the instructions in the
    README.newparsers file.


Q2: why the movieID (and other IDs) used in the 'sql' database are not
    the same used on the IMDb.com site?

A2: first, a bit of nomenclature: we'll call "movieID" (or things like
    "personID", for instance of the Person class) a unique identifier used
    by IMDbPY to manage a single movie (or other kinds of object).
    We'll call "imdbID" a unique identifier used, for the same kind
    of data, by the IMDb.com site (i.e.: the 7-digit number in tt0094226,
    as seen in the URL for "The Untouchables").

    Using IMDbPY to access the web ('http' and 'mobile' data access
    systems), movieIDs and imdbIDs are the same thing - beware that
    in this case a movieID is a string, with the leading zeroes.

    Unfortunately, populating a sql database with data from the plain
    text data files, we don't have access to imdbIDs - since they are
    not distributed at all - and so we have to made them by ourselves
    (they are the 'id' column in tables like 'title' or 'name').
    This mean that these values are valid only for your current database:
    if you update it with a newer set of plain text data files, these IDs
    will surely change (and, by the way, they are integers).
    It's also obvious, now, that you can't exchange IDs between the
    'http' (or 'mobile') data access system and 'sql', and in the same
    way you can't use imdbIDs with your local database or vice-versa.


Q3: using a sql database, what's the imdb_id (or something like that)
    column in tables like 'title', 'name' and so on?

A3: it's internally used by IMDbPY to remember the imdbID (the one
    used by the web site - accessing the database you'll use the numeric
    value of the 'id' column, as movieID) of a movie, once it stumbled
    upon.  This way, if IMDbPY is asked again about the imdbID of
    a movie (or person, or ...), it doesn't have to contact again to
    the web site.  Notice that you have to access the sql database using
    a user with write permission, to update it.

    As a bonus, when possible, the values of these imdbIDs are saved
    between updates of the sql database (using the imdbpy2sql.py script).
    Beware that it's tricky and not always possible, but the script does
    its best to succeed.


Q4: but what if I really need the imdbIDs, to use my database?

A4: no, you don't.  Search for a title, get its information.  Be happy!


Q5: I have a great idea: write a script to fetch all the imdbID from the
    web site!  Can't you do it?

A5: yeah, I can.  But I won't. :-)
    It would be somewhat easy to map every title on the web to its
    imdbID, but there are still a lot of problems.
    First of all, every user will end up doing it for its own copy
    of the plain text data files (and this will make the imdbpy2sql.py
    script painfully slow and prone to all sort of problems).
    Moreover, the imdbIDs are unique and never reused, true, but movie
    title _do_ change: to fix typos, override working titles, to cope
    with a new movie with the same title release in the same year (not
    to mention cancelled or postponed movies).

    Besides that, we'd have to do the same for persons, characters and
    companies.  Believe me: it doesn't make sense.
    Work on your local database using your movieIDs (or even better:
    don't mind about movieIDs and think in terms of searches and Movie
    instances!) and retrieve the imdbID only in the rare circumstances
    when you really need them (see the next FAQ).
    Repeat with me: I DON'T NEED ALL THE imdbIDs. :-)


Q6: using a sql database, how can I convert a movieID (whose value
    is valid only locally) to an imdbID (the ID used by the imdb.com site)?

A6: various functions can be used to convert a movieID (or personID or
    other IDs) to the imdbID used by the web site.
    Example of code:

      from imdb import IMDb
      ia = IMDb('sql', uri=URI_TO_YOUR_SQL_DATABASE)
      movie = ia.search_movie('The Untouchables')[0] # a Movie instance.
      print 'The movieID for The Untouchables:', movie.movieID
      print 'The imdbID used by the site:', ia.get_imdbMovieID(movie.movieID)
      print 'Same ID, smarter function:', ia.get_imdbID(movie)

    It goes without saying that get_imdbMovieID has some sibling
    methods: get_imdbPersonID, get_imdbCompanyID and get_imdbCharacterID.
    Also notice that the get_imdbID method is smarter, and takes any kind
    of instance (the other functions need a movieID, personID, ...)

    Another method that will try to retrieve the imdbID is get_imdbURL,
    which works like get_imdbID but returns an URL.

    In case of problems, these methods will return None.


Q7: I have a movie title (in the format used by the plain text data files)
    or other kind of data (like a person/character/company name) and I want
    to get its imdbID.  How can I do?

A7: the safest thing, is probably to do a normal search on IMDb (using the
    'http' or 'mobile' data access system of IMDbPY) and see if the first
    item is the correct one.
    You can also try the 'title2imdbID' method (and similar) of the IMDb
    instance (no matter if you're using 'http', 'mobile' or 'sql'), but
    expect some failures - it returns None in this case.


Q8: I have an URL (of a movie, person or something else); how can I
    get a Movie/Person/... instance?

A8: import the imdb.helpers module and use the get_byURL function.


Q9: I'm writing an interface based on IMDbPY and I have problems handling
    encoding, chars conversions, replacements of references and so on.

A9: see the many functions in the imdb.helpers module.