TODO for IMDbPY =============== See the code, and search for XXX, FIXME and TODO. NOTE: it's always time to clean the code! <g> [general] * improve the logging facility. * ability to silent warnings/logging. * create mobile versions for smart phones (with GUI). * Write better summary() methods for Movie, Person, Character and Company classes. * Some portions of code are poorly commented. * The documentation is written in my funny Anglo-Bolognese. * a better test-suite is really needed. * Compatibility with Python 2.2 and previous versions is no more assured for every data access system (the imdbpy2sql.py script for sure requires at least Python 2.3). Be sure to keep 2.2 compatibility at least for 'http' and 'mobile', since they are used by mobile devices. * The analyze_title/build_title functions are grown too complex and beyond their initial goals. [searches] * Support advanced query for movie titles and person/character/company names - if possible this should be available in every data access systems. [xml] * bugs of the helpers.parseXML function: * doesn't handle Movie instances as keys of a dictionary; person->episodes, character->episodes, character->quotes. * changes tuples to lists (not a big deal). * person/movieRefs are lost. * uses only BeautifulSoup; maybe it can work with lxml, too. * use the ID type, instead of CDATA, for the 'id' attribute? * keywords (and hence tags) generated by SQL may be slightly different, and must be integrate in the DTD. [Movie objects] * Define invariable names for the sections (the keys you use to access info stored in a Movie object). * Should a movie object automatically build a Movie object, when an 'episode of' dictionary is in the data? * Should isSameTitle() check first the accessSystem and the movieID, and use 'title' only if movieID is None? * For TV series the list of directors/writers returned by 'sql' is a long list with every single episodes listed in the 'notes' attribute (i.e.: the same person is listed more than one time, just with a different note). For 'http' and 'mobile' there's a list with one item for every person, with a long 'notes' listing every episode. It's not easy to split these information since they can contain notes ("written by", "as Aka Name", ...) * The 'laserdisc' information for 'sql' is probabily wrong: I think they merge data from different laserdisc titles. Anyway these data are no more updated by IMDb, and so... * there are links to hollywoodreporter.com that are not gathered in the "external reviews" page. [Person objects] * Define invariable names for the sections (the keys you use to access info stored in a Person object). * Should isSameName() check first the accessSystem and the personID, and use 'name' only if personID is None? * Fetching data from the web ('http' and 'mobile'), the filmography for a given person contains a list named "himself" or "herself" for movies/shows where they were not acting. In 'sql', these movies/shows are listed in the "actor" or "actress" list. Check if this is still true, with the new IMDb's schema. [Character objects] * Define invariable names for the sections (the keys you use to access info stored in a Character object). [Company objects] * Define invariable names for the sections (the keys you use to access info stored in a Company object). [http data access system] * Serious confusion about handling XML/HTML/SGML char references; there are too many fixes for special cases, and a better understanding of how lxml and BeautifulSoup behave is required. * If the access through the proxy fails, is it possible to automatically try without? It doesn't seem easy... * Access to the "my IMDb" functions for registered users would be really cool. * Gather more movies' data: user comments, laserdisc details, trailers, posters, photo gallery, on tv, schedule links, showtimes, message boards. * Gather more people's data: photo gallery. [httpThin data access system] * It should be made _really_ faster than 'http'. [mobile data access system] * General optimization. * Make find() methods case insensitive. [sql data access system] NOTE NOTE NOTE: this is still beta code and I'm not a database guru; moreover I'm short of time and so I will be happy to fix every bug you'll find, but if you're about to write me an email like "ehi, the database access should be faster", "the imdbpy2sql.py script must run with 64 MB of RAM and complete in 2 minutes" or "your database layout sucks: I've an idea for a better structure...", well, consider that _these_ kinds of email will be probably immediately discarded. I _know_ these are important issues, but I've neither the time nor the ability to fix these problems by myself, sorry. Obviously if you want to contribute with patches, new code, new SQL queries and new database structures you're welcome and I will be very grateful for your work. Again: if there's something that bother you, write some code. It's free software, after all. Things to do: * The imdbpy2sql.py script MUST be run on a database with empty tables; unfortunately so far a SQL installation can't be "updated" without recreating the database from scratch. IMDb releases also "diff" files to keep the plain text files updated; it would be wonderful to directly use these diff files to upgrade the SQL database, but I think this is a nearly impossible task. A lot of attempts were made in this direction, always failing. * There are a lot of things to do to improve SQLAlchemy support (especially in terms of performances); see FIXME/TODO/XXX notices in the code. * The pysqlite2.dbapi2.OperationalError exception is raise when SQLite is used with SQLAlchemy (but only if the --sqlite-transactions command line argument is used). * With 0.5 branch of SQLAlchemy, it seems that there are serious problems using SQLite; try switching to SQLObject, as a temporary solution.