<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>SQLObject and Database Programming in Python</title> <meta name="version" content="S5 1.0" /> <link rel="stylesheet" href="ui/slides.css" type="text/css" media="projection" id="slideProj" /> <link rel="stylesheet" href="ui/opera.css" type="text/css" media="projection" id="operaFix" /> <link rel="stylesheet" href="ui/print.css" type="text/css" media="print" id="slidePrint" /> <link rel="stylesheet" href="ui/custom.css" type="text/css"/> <script src="ui/slides.js" type="text/javascript"></script> </head> <body> <div class="layout"> <div id="currentSlide"></div> <div id="header"></div> <div id="footer"> <h1>ChiPy, 11 November 2004</h1> <h2>SQLObject and Database Programming in Python</h2> <div id="controls"></div> </div> </div> <div class="presentation"> <div class="slide"> <h1>SQLObject and Database Programming in Python</h1> <h3>Ian Bicking</h3> <h4>Imaginary Landscape</h4> </div> <!-- ======================================== --> <div class="slide"> <h1>The Ongoing Example</h1> <h2>Address book:</h2> <p> <ul> <li><code>person</code>: <ul> <li><code>first_name</code></li> <li><code>last_name</code></li> <li><code>email</code> (many emails for one person)</li> </ul></li> <li><code>email</code>: <ul> <li><code>type</code></li> <li><code>address</code></li> </ul></li> </ul> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>The DB API</h1> <p>Database programming in Python is based on the <b><a href="http://www.python.org/peps/pep-0249.html">DB-API</a></b>. Some supported databases: <ul> <li>MySQL</li> <li>PostgreSQL (multiple drivers)</li> <li>SQLite</li> <li>Firebird (Interbase)</li> <li>MAXDB (SAP DB)</li> <li>Oracle, Sybase, MS SQL Server</li> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>DB API Example</h1> <p> Example: <pre> import MySQLdb conn = MySQLdb.connect( db=<span class="string">'test'</span>, username=<span class="string">'ianb'</span>) cur = conn.cursor() cur.execute( <span class="sql">"SELECT id, first_name, last_name FROM person"</span>) result = cur.fetchall() for id, first_name, last_name in result: print <span class="string">"%2i %s, %s"</span> % (id, last_name, first_name) </pre> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>DB API</h1> <ul> <li> Import the database module (<code>MySQLdb</code>, <code>psycopg</code>, <code>sqlite</code>, etc) </li> <li> Use <code>module.connect(...)</code> to create a connection. </li> <li> Use <code>connection.cursor()</code> to get a cursor. Cursors do all the work. </li> <li> Use <code>cursor.execute(sql_query)</code> to run something. </li> <li> Use <code>cursor.fetchall()</code> to get results. </li> </div> <!-- ======================================== --> <div class="slide"> <h1>DB API problems</h1> <ol> <li> Connections and cursors are tedious. </li> <li> The databases don't all work the same; multiple SQL dialects. </li> <li> Lots of time spent writing SQL. </li> <li> Database updates can effect all the SQL you've written. </li> <li> Unless you make abstractions... </li> </div> <!-- ======================================== --> <div class="slide"> <h1>Abstractions</h1> <p> <ul> <li> Every significant database application has a database abstraction layer. </li> <li> You write one whether you mean to or not. </li> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>Object-Relational Mappers</h1> <p> The table: <table border=1> <tr><th colspan=3>person</th></tr> <tr> <th>id</th> <th>first_name</th> <th>last_name</th> </tr> <tr> <td>1</td> <td>John</td> <td>Doe</td> </tr> <tr> <td>2</td> <td>Tom</td> <td>Jones</td> </tr> <tr> <td colspan=3 align=center>...</td> </tr> </table> <p> </div> <!-- ======================================== --> <div class="slide"> <h1>ORMs: Classes</h1> <p> The class: <table width="100%"> <tr><td> <table border=1> <tr class="code-accent"><th colspan=3>person</th></tr> <tr class="code-accent"> <th>id</th> <th>first_name</th> <th>last_name</th> </tr> <tr> <td>1</td> <td>John</td> <td>Doe</td> </tr> <tr> <td>2</td> <td>Tom</td> <td>Jones</td> </tr> <tr> <td colspan=3 align=center>...</td> </tr> </table> </td> <td> <pre> class Person(SQLObject): <span class="comment"># id is implicit</span> first_name = StringCol() last_name = StringCol() </pre> </td></tr></table> </div> <!-- ======================================== --> <div class="slide"> <h1>ORMs: Instances</h1> <p> An instance: <table width="100%"> <tr><td> <table border=1> <tr><th colspan=3>person</th></tr> <tr> <th>id</th> <th>first_name</th> <th>last_name</th> </tr> <tr class="code-accent"> <td>1</td> <td>John</td> <td>Doe</td> </tr> <tr> <td>2</td> <td>Tom</td> <td>Jones</td> </tr> <tr> <td colspan=3 align=center>...</td> </tr> </table> </td><td> <pre> <span class="prompt">>>></span> john = Person.get(1) <span class="prompt">>>></span> john.first_name <span class="output">'John'</span> </pre> </td></tr></table> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>ORM: summary</h1> <ul> <li>Every table is a class</li> <li>Every row is an instance</li> <li>Columns become attributes</li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Usage</h1> <pre> __connection__ = <span class="string">'postgres://pgsql@localhost/test'</span> class Person(SQLObject): first_name = StringCol() last_name = StringCol() emails = MultipleJoin('Email') </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Usage (classes)</h1> <pre> class Email(SQLObject): person = ForeignKey(<span class="string">'Person'</span>) type = EnumCol(['home', 'work']) address = StringCol() </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Usage (creating tables)</h1> <pre> def createTables(): for table in (Person, Email): table.createTable(ifNotExists=True) </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Usage (instances)</h1> Using your classes: <pre> <span class="prompt">>>></span> john = Person(first_name=<span class="string">'John'</span>, last_name=<span class="string">'Doe'</span>) <span class="prompt">>>></span> email = Email(person=john, type=<span class="string">'home'</span>, <span class="prompt">...</span> address=<span class="string">'john@work.com'</span>) <span class="prompt">>>></span> john.emails <span class="output">[]</span> </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Usage (instances)</h1> <pre> <span class="prompt">>>></span> tom = Person(first_name=<span class="string">'Tom'</span>, last_name=<span class="string">'Jones'</span>) <span class="prompt">>>></span> tom is Person.get(tom.id) <span class="output">True</span> <span class="prompt">>>></span> list(Person.selectBy(first_name=<span class="string">'John'</span>)) <span class="output">[]</span> </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Defining Classes</h1> <ul> <li> The class name matches the table name </li> <li> The attributes match the column names </li> <li> (kind of...) </li> </div> <!-- ======================================== --> <div class="slide"> <h1>Defining Classes...</h1> <p> You can add extra methods: <pre> class Person(SQLObject): ... def _get_name(self): return self.first_name + ' ' + self.last_name </pre> <pre> <span class="prompt">>>></span> tom.name <span class="output">'Tom Jones'</span> </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Automatic properties</h1> <ul> <li> <code>_get_attr()</code> methods are called whenever the <code>obj.attr</code> attribute is called </li> <li> <code>_set_attr()</code> methods are called whenever the <code>obj.attr = value</code> statement is run </li> <li> <code>_set_attr()</code> is optional </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Defining classes...</h1> <p> You can override columns: <pre> class Person(SQLObject): ... last_name_searchable = StringCol() def _set_last_name(self, value): self._SO_set_last_name(self, value) self.last_name_lower = re.sub(<span class="string">r'[^a-zA-Z]'</span>, <span class="string">''</span>, value).lower() </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Defining classes...</h1> <p> You can fiddle with the naming: <pre> class Person(SQLObject): _table = <span class="string">"people"</span> first_name = StringCol(dbName=<span class="string">"fname"</span>) ... </pre> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>Connecting classes</h1> <p> Foreign Keys: <pre> class Email(SQLObject): person = ForeignKey(<span class="string">'Person'</span>) ... </pre> Note we use a string for <code class="string">'Person'</code>. </div> <!-- ======================================== --> <div class="slide"> <h1>Backreferences</h1> <p> The other side of one-to-many: <pre> class Person(SQLObject): ... emails = MultipleJoin(<span class="string">'Email'</span>) </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Many-to-many</h1> <p> Many to many relationships imply a "hidden" table: <pre> class Address(SQLObject): people = RelatedJoin(<span class="string">'Person'</span>) ... class Person(SQLObject): addresses = RelatedJoin(<span class="string">'Address'</span>) </pre> </div> <!-- ======================================== --> <div class="slide"> <h1>Many-to-many...</h1> <p> The intermediate table created: <pre> CREATE TABLE address_person ( address_id INT NOT NULL, person_id INT NOT NULL ); </pre> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>Lazy classes</h1> <p> SQLObject can use reflection to figure out what columns your table has: <pre> class Person(SQLObject): _fromDatabase = True </pre> You can start with <code>_fromDatabase = True</code> and add in explicit columns, overriding defaults. </p> </div> <!-- ======================================== --> <div class="slide"> <h1>Instances</h1> <ul> <li>Use <code>Person.get(id)</code> to retrieve a row</li> <li>Use <code>Person(first_name=...)</code> to insert a row (the new object is returned)</li> <li>Use <code>aPerson.destroySelf()</code> to delete a row</li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Updating</h1> <ul> <li><code>aPerson.first_name = <span class="string">"new_name"</span></code><br> <code class="sql">UPDATE</code> is executed immediately </li> <li> Update multiple columns at once: <br> <code>aPerson.set(first_name=<span class="string">"new_fname"</span>, last_name=<span class="string">"new_lname"</span>)</code> </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Selecting</h1> <p> You can use Python expressions (kind of): <pre> query = Person.q.first_name == <span class="string">"Joe"</span> Person.select(query) </pre> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>Selecting...</h1> <p> <ul> <li><code>Class.<b style="color: #990000">q</b>.column_name</code> creates a magical object that creates queries.</li> <li><code>sqlrepr()</code> turns these query objects into SQL (mostly hidden)</li> <li><code>sqlrepr(Person.q.first_name == <span class="string">"Joe"</span>, <span class="string">'mysql')</code> creates the SQL <code class="sql">person.first_name = 'Joe'</code></li> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>Selecting...</h1> <p> Complicated joins are possible: <pre> Person.select((Person.q.id == Email.q.personID) & (Email.q.address.startswith('joe'))) </pre> Becomes: <pre> SELECT person.id, person.first_name, person.last_name FROM person, email WHERE person.id = email.person_id AND email.address LIKE 'joe%' </pre> </p> </div> <!-- ======================================== --> <div class="slide"> <h1>SelectResult</h1> Select results are more than just lists: <ul> <li> You can slice them, and <code class="sql">LIMIT</code> and <code class="sql">OFFSET</code> statements are added to your query</li> <li> You can iterate through them, and avoid loading all objects into memory </li> <li> You can do things like <code>select_result.count()</code> to run aggregate functions </li> <li> To get a real list, you must do <code>list(select_result)</code></li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Caching</h1> <ul> <li>SQLObject caches everything</li> <li>Maybe too much (bad if you have multiple updaters)</li> <li>Needs to because there's no joins</li> <li>Has potential to be faster than ad hoc queries</li> <li>Thread safety or thread confusion?</li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Performance</h1> <ul> <li> Not good if you are dealing with lots and lots of rows </li> <li> Inserts and selects substantially slower </li> <li> But for many applications you'll pay that price sooner or later anyway </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>The problem with ORMs</h1> <p> ORMs live in the world between Object Oriented programmers (and programs) and Relational programmers (and programs). Neither will be happy. </p> </div> <!-- ======================================== --> <div class="slide"> <h1>ORM Mismatch</h1> <ul> <li> Python classes aren't tables</li> <li> Python instances aren't rows</li> <ul> </div> <!-- ======================================== --> <div class="slide"> <h1>ORM Mismatch: Classes</h1> <ul> <li> Databases don't have inheritance </li> <li> Even when they do, no one uses it </li> <li> Some tables are too boring (e.g., intermediate tables for many-to-many relationships), or don't warrant the overhead </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>ORM Mismatch: Instances</h1> <ul> <li> Instances get garbage collected, rows are forever </li> <li> Rows exist even when instances don't </li> <li> Instances may still exist when the row is gone </li> <li> In the relational calculus, rows don't have real identity </li> </div> <!-- ======================================== --> <div class="slide"> <h1>ORM Mismatch: Relations</h1> <ul> <li> Things like joins don't make sense for objects, but are central to relations </li> <li> Views and stored procedures also don't make sense </li> <li> Aggregate functions and set operations aren't natural in Python (the <code>for</code> loop is natural) </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Solving the question by avoidance</h1> <ul> <li> SQLObject doesn't let you forget SQL </li> <li> You still have to think relationally </li> <li> But many relational concepts don't work either </li> <li> Advantage: it's easy </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Solving the Mismatch</h1> SQLObject's answer: don't try too hard. <ul> <li> There exist database restrictions; not every database schema maps well. (But most are fine) </li> <li> Sometimes you have to do things more slowly or more imperatively, compared to a purely relational technique </li> <li> There exist object restrictions; not every object-oriented concept maps well; e.g., no inheritance </li> <li> Inheritance isn't planned; this isn't transparent persistence </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Some other alternatives</h1> Several Python ORMs exist; contrasts: <ul> <li> SQLObject doesn't try to be too OO, so it works well with normal database schemas </li> <li> SQLObject doesn't have a compile step </li> <li> SQLObject isn't <i>just</i> a SQL wrapper </li> <li> SQLObject is reasonably complete </li> </ul> </div> <!-- ======================================== --> <div class="slide"> <h1>Other alternatives...</h1> <p> See the Python Wiki at: <a href="http://python.org/moin">python.org/moin</a><br> The page is <a href="http://python.org/moin/HigherLevelDatabaseProgramming">HigherLevelDatabaseProgramming</a>. </p> </div> <!-- ======================================== --> </div> </body> </html>